Blog posts tagged with kl3m

KL3M Data Gallery

ALEA on Tue Feb 25 2025

Exploring Clean Data with the KL3M Data Gallery

KL3M Data Project: Copyright-Clean AI Training Resources

ALEA on Tue Apr 15 2025

Introducing the KL3M Data Project: a comprehensive collection of legally sound training resources for large language models spanning 132+ million documents.

KL3M on HuggingFace

ALEA on Sun Nov 10 2024

Our first Fairly Trained L-certified models are now publicly available.

Domain-Specific Tokenizers: Enhancing Efficiency for Legal and Financial NLP

ALEA on Fri Mar 21 2025

Our research demonstrates how specialized tokenizers can achieve up to 83% efficiency gains for domain-specific terminology while maintaining semantic coherence.