Blog posts tagged with kl3m

KL3M Data Gallery ALEA on Tue Feb 25 2025: Exploring Clean Data with the KL3M Data Gallery
KL3M Data Project: Copyright-Clean AI Training Resources ALEA on Tue Apr 15 2025: Introducing the KL3M Data Project: a comprehensive collection of legally sound training resources for large language models spanning 132+ million documents.
KL3M on HuggingFace ALEA on Sun Nov 10 2024: Our first Fairly Trained L-certified models are now publicly available.
Domain-Specific Tokenizers: Enhancing Efficiency for Legal and Financial NLP ALEA on Fri Mar 21 2025: Our research demonstrates how specialized tokenizers can achieve up to 83% efficiency gains for domain-specific terminology while maintaining semantic coherence.

KL3M Data Gallery