Open dataset
Mexican legal tasks annotated by legal professionals. Public access for research and evaluation.
An open, reproducible and public standard for evaluating AI systems applied to Mexican law.
BELMA is an open initiative to build the first Mexican legal dataset annotated by the legal community. It convenes professors, researchers, practicing lawyers, in-house legal teams and students to define methodology, curate the corpus and annotate the tasks. The dataset, methodology, evaluation code and results will be public.
Mexican legal tasks annotated by legal professionals. Public access for research and evaluation.
Rubrics and criteria documented so any team can run the benchmark and verify results independently.
Multiple areas of Mexican law and a mix of tasks that prevents the benchmark from favoring any specific system.
Decisions are made by a Technical Committee of academia, bar and practice. Temis holds neither majority nor veto.
Each provider presents their own metrics. Each firm tests in its own way. Buyers lack a standard to distinguish tools that work from those that simply communicate well.
“A public dataset, annotated by Mexican legal professionals, against which any system can be evaluated transparently and reproducibly.”
Academia lacks a common base on which to investigate language-model behavior on legal tasks in Mexican legal Spanish.
BELMA fills that gap: a public dataset, annotated by Mexican legal professionals, with methodology validated by an independent committee.
Open registration. Formation of the Technical Advisory Committee with plural representation.
Task taxonomy, annotation schema, evaluation rubrics, dataset access policy. Public comment before close.
Sourcing from public sources. Distributed annotation with double review and adjudication. Inter-annotator agreement reported.
Reference-model evaluation. Release of dataset, technical paper and open leaderboard.
Methodological decisions, task selection and dataset curation are the Committee's responsibility. Temis holds neither majority nor veto.
Dataset, methodology, code and results are public. Temis publishes its own results regardless of leaderboard position.
Double review, adjudication and inter-annotator agreement reporting. External review before the 1.0 release.
Professors, researchers and graduate students in law or computer science interested in evaluation methodology.
Lawyers with active practice in any area of Mexican law, contributing professional judgment on task difficulty and relevance.
Corporate legal areas that evaluate or use legal AI tools, with perspective on real-world utility criteria.
Late-stage law students interested in research, legal tech or evaluation methodology.
Free registration, open to individuals and organizations at belma.org.mx.