Skip to content

Start

nerblackbox - a high-level library for named entity recognition in python

latest version: 1.0.0


Resources


Installation

pip install nerblackbox

About

nerblackboxmain

Take a dataset from one of many available sources. Then train, evaluate and apply a language model in a few simple steps.

  • Choose a dataset from HuggingFace (HF), the Local Filesystem (LF), or a Built-in (BI) dataset

    dataset = Dataset("conll2003",  source="HF")  # HuggingFace
    dataset = Dataset("my_dataset", source="LF")  # Local Filesystem
    dataset = Dataset("swe_nerc",   source="BI")  # Built-in
    

  • Set up the dataset

    dataset.set_up()
    
     

Datasets from an Annotation Tool (AT) server can also be used. See Data for more details.

  • Define the training by choosing a pretrained model and a dataset

    training = Training("my_training", model="bert-base-cased", dataset="conll2003")
    

  • Run the training and get the performance of the fine-tuned model

    training.run()
    training.get_result(metric="f1", level="entity", phase="test")
    # 0.9045
    
     

See Training for more details.

  • Load the model

    model = Model.from_training("my_training")
    

  • Evaluate the model

    results = model.evaluate_on_dataset("conll2003", phase="test")
    results["micro"]["entity"]["f1"]
    # 0.9045
    
     

See Evaluation for more details.

  • Load the model

    model = Model.from_training("my_training")
    

  • Let the model predict

    model.predict("The United Nations has never recognised Jakarta's move.")  
    # [[
    #  {'char_start': '4', 'char_end': '18', 'token': 'United Nations', 'tag': 'ORG'},
    #  {'char_start': '40', 'char_end': '47', 'token': 'Jakarta', 'tag': 'LOC'}
    # ]]
    

See Inference for more details.


Get Started

In order to get familiar with nerblackbox, it is recommended to

  1. read the doc sections Preparation, Data, Training, Evaluation and Inference

  2. go through one of the example notebooks

  3. check out the Python API documentation


Features

Data

  • Integration of Datasets from Multiple Sources (HuggingFace, Annotation Tools, ..)
  • Support for Multiple Dataset Types (Standard, Pretokenized)
  • Support for Multiple Annotation Schemes (IO, BIO, BILOU)
  • Text Encoding

Training

  • Adaptive Fine-tuning
  • Hyperparameter Search
  • Multiple Runs with Different Random Seeds
  • Detailed Analysis of Training Results

Evaluation

  • Evaluation of Any Model on Any Dataset

Inference

  • Versatile Model Inference (Entity/Word Level, Probabilities, ..)

Other

  • Full Compatibility with HuggingFace
  • GPU Support
  • Language Agnosticism

Citation

@misc{nerblackbox,
  author = {Stollenwerk, Felix},
  title  = {nerblackbox: a high-level library for named entity recognition in python},
  year   = {2021},
  url    = {https://github.com/flxst/nerblackbox},
}