Our blog

What we're reading, thinking, and doing.

KL3M Data Project: Copyright-Clean AI Training Resources

ALEA on Tue Apr 15 2025

Introducing the KL3M Data Project: a comprehensive collection of legally sound training resources for large language models spanning 132+ million documents.

Improving Legal Text Analysis with Precise Sentence Boundary Detection

ALEA on Tue Apr 08 2025

Introducing NUPunkt and CharBoundary: two specialized libraries that dramatically improve sentence boundary detection in legal documents.

Domain-Specific Tokenizers: Enhancing Efficiency for Legal and Financial NLP

ALEA on Fri Mar 21 2025

Our research demonstrates how specialized tokenizers can achieve up to 83% efficiency gains for domain-specific terminology while maintaining semantic coherence.

KL3M Data Gallery

ALEA on Tue Feb 25 2025

Exploring Clean Data with the KL3M Data Gallery

Introducing the Federal Bill Statistics Project

ALEA on Mon Dec 23 2024

Announcing the launch of usbills.ai - an open platform for analyzing US federal legislation

KL3M on HuggingFace

ALEA on Sun Nov 10 2024

Our first Fairly Trained L-certified models are now publicly available.

Announcing the ALEA Institute

ALEA on Sun Aug 04 2024

Artificial intelligence is changing our world. But will it be for the better?

Contact us

Want to talk or collaborate?

Don't be shy. We'd love to hear from you.

Subscribe


News and Updates from the ALEA Institute.