What we're reading, thinking, and doing.
Introducing the KL3M Data Project: a comprehensive collection of legally sound training resources for large language models spanning 132+ million documents.
Introducing NUPunkt and CharBoundary: two specialized libraries that dramatically improve sentence boundary detection in legal documents.
Our research demonstrates how specialized tokenizers can achieve up to 83% efficiency gains for domain-specific terminology while maintaining semantic coherence.
Exploring Clean Data with the KL3M Data Gallery
Announcing the launch of usbills.ai - an open platform for analyzing US federal legislation
Our first Fairly Trained L-certified models are now publicly available.
Artificial intelligence is changing our world. But will it be for the better?
Don't be shy. We'd love to hear from you.