Domain-specific tokenizers achieving up to 83% efficiency gains for legal and financial NLP.
leeky is an open source Python library to test for training data contamination on black box models.