Skip to content

Analysis


How-To

To analyze the trained tokenizers (and inspect the evaluation metrics), take the following steps:

  • start a jupyter notebook server

    start jupyter notebook server
    cd notebook; jupyter notebook
    
  • open the notebook tokenizer_analysis.ipynb in your browser


Results

The notebook allows to examine e.g.

  • tokenized examples
  • evaluation metrics
  • vocabulary and performance comparison across languages
  • effect of the vocabulary size