Analysis
How-To
To analyze the trained tokenizers (and inspect the evaluation metrics), take the following steps:
-
start a jupyter notebook server
start jupyter notebook server
cd notebook; jupyter notebook
-
open the notebook
tokenizer_analysis.ipynb
in your browser
Results
The notebook allows to examine e.g.
- tokenized examples
- evaluation metrics
- vocabulary and performance comparison across languages
- effect of the vocabulary size