Start

nerblackbox - a high-level library for named entity recognition in python

latest version: 1.0.0

Resources

source code: https://github.com/flxst/nerblackbox
documentation: https://flxst.github.io/nerblackbox
PyPI: https://pypi.org/project/nerblackbox

Installation

pip install nerblackbox

About

nerblackboxmain

Take a dataset from one of many available sources. Then train, evaluate and apply a language model in a few simple steps.

DataTrainingEvaluationInference

Choose a dataset from HuggingFace (HF), the Local Filesystem (LF), or a Built-in (BI) dataset

dataset = Dataset("conll2003",  source="HF")  # HuggingFace
dataset = Dataset("my_dataset", source="LF")  # Local Filesystem
dataset = Dataset("swe_nerc",   source="BI")  # Built-in

Set up the dataset
```
dataset.set_up()
```

Datasets from an Annotation Tool (AT) server can also be used. See Data for more details.

Define the training by choosing a pretrained model and a dataset

training = Training("my_training", model="bert-base-cased", dataset="conll2003")

Run the training and get the performance of the fine-tuned model

training.run()
training.get_result(metric="f1", level="entity", phase="test")
# 0.9045

See Training for more details.

Load the model

model = Model.from_training("my_training")

Evaluate the model

results = model.evaluate_on_dataset("conll2003", phase="test")
results["micro"]["entity"]["f1"]
# 0.9045

See Evaluation for more details.

Load the model

model = Model.from_training("my_training")

Let the model predict

model.predict("The United Nations has never recognised Jakarta's move.")  
# [[
#  {'char_start': '4', 'char_end': '18', 'token': 'United Nations', 'tag': 'ORG'},
#  {'char_start': '40', 'char_end': '47', 'token': 'Jakarta', 'tag': 'LOC'}
# ]]

See Inference for more details.

Get Started

In order to get familiar with nerblackbox, it is recommended to

read the doc sections Preparation, Data, Training, Evaluation and Inference
go through one of the example notebooks
check out the Python API documentation

Features

Data

Integration of Datasets from Multiple Sources (HuggingFace, Annotation Tools, ..)
Support for Multiple Dataset Types (Standard, Pretokenized)
Support for Multiple Annotation Schemes (IO, BIO, BILOU)
Text Encoding

Training

Adaptive Fine-tuning
Hyperparameter Search
Multiple Runs with Different Random Seeds
Detailed Analysis of Training Results

Evaluation

Evaluation of Any Model on Any Dataset

Inference

Versatile Model Inference (Entity/Word Level, Probabilities, ..)

Other

Full Compatibility with HuggingFace
GPU Support
Language Agnosticism

Citation

@misc{nerblackbox,
  author = {Stollenwerk, Felix},
  title  = {nerblackbox: a high-level library for named entity recognition in python},
  year   = {2021},
  url    = {https://github.com/flxst/nerblackbox},
}