The Problem

AI will bring change, and this change will create complex challenges that we must address together as a society.

Successfully navigating these challenges requires that we maintain trust between all stakeholders, including the public, policymakers, and users and developers of AI systems.

Critically, we believe that this trust can only be maintained if AI systems are developed within our existing legal systems and ethical frameworks.

If we want to realize the social, economic, and scientific benefits of AI, we must support this shared trust within our shared rules and norms.

Announcing the ALEA Institute

We believe so strongly in this goal that we have founded the Institute for the Advancement of Legal and Ethical AI (ALEA), a non-profit organization dedicated to supporting socially, economically, and environmentally sustainable futures through open research and education on AI.

Founding Contributions

Thankfully, we are not starting from scratch.

273 Ventures LLC, the for-profit company that our team helped start in 2022, has generously donated the following assets:

Kelvin Legal Data Pack: one of the world’s largest legally collected and documented datasets, which we will be open-sourcing and maintaining.
Kelvin Legal Large Language Models (KL3M): the only Fairly Trained L-certified family of embedding and generative models, which we will be open-sourcing and maintaining.
All the Patents Models: data, software, and fine-tuned KL3M models related to the All the Patents project, which we will be open-sourcing and maintaining.

Michael Bommarito, our President, has also contributed the following assets as an individual:

leeky: a Python library for training data contamination testing, which we will be continuing to develop and maintain as an open source project
Linux-as-a-Model: a model trained from scratch on the Linux 1.0 kernel to demonstrate copyright and memorization risks in large language models

Our Activities

Our first priority is on the open source release of and education around the Kelvin Legal Data Pack and KL3M models.

Our goal is to make these resources accessible to the public and other researchers as soon as possible, and to support the sustainable development and use of these resources as the first Fairly Trained models generally available.

Going forward, we will be focusing on the following activities:

Open Data: Collecting, enriching, and open-sourcing data to support the legal and ethical development and use of AI systems.
Open Models: Training and open-sourcing clean, efficient base models like KL3M.
Technical Research: Conducting technical research related to AI systems, especially as related to their legal and ethical use.
Policy Research: Conducting empirical policy research related to the legal and ethical use of AI systems.
Education and Awareness: Providing educational resources and programs related to the legal and ethical use of AI systems.
Community Building: Supporting physical and digital communities related to the legal and ethical use of AI systems.

Reach Out

As we begin this journey, we are excited to collaborate with other individuals and organizations who share our vision for a better, more sustainable future with AI.

Please reach out to us at hello@aleainstitute.ai if you are interested in learning more or collaborating with us.