Collecting, enriching, and open-sourcing data to support the legal and ethical development and use of AI systems.
More →Conducting technical research related to the legal and ethical use of AI systems.
More →Conducting empirical policy research related to the legal and ethical use of AI systems.
More →Providing educational resources and programs related to the legal and ethical use of AI systems.
More →Supporting physical and digital communities related to the legal and ethical use of AI systems.
More →KL3M Data Project
Copyright-clean training resources for large language models across legal, regulatory, and government domains.
Legal Sentence Boundary Detection
Precision tools for legal text analysis with NUPunkt and CharBoundary libraries.
KL3M Tokenizers
Domain-specific tokenizers achieving up to 83% efficiency gains for legal and financial NLP
FOLIO - Federated Open Legal Information Ontology
Creating and supporting open knowledge graphs for legal
All the Patents
Generating and publishing obvious inventions to improve the patent system
KL3M Toxicity
Toxicity analysis
Let's build a better future together.
We are always looking for new opportunities to collaborate with organizations and individuals who share our vision for a better future.
Here are some examples of how we can work together:
Don't be shy. We'd love to hear from you.
Introducing the KL3M Data Project: a comprehensive collection of legally sound training resources for large language models spanning 132+ million documents.
Introducing NUPunkt and CharBoundary: two specialized libraries that dramatically improve sentence boundary detection in legal documents.
Our research demonstrates how specialized tokenizers can achieve up to 83% efficiency gains for domain-specific terminology while maintaining semantic coherence.
Exploring Clean Data with the KL3M Data Gallery