Start
nerblackbox - a high-level library for named entity recognition in python
latest version: 1.0.0
Resources
- source code: https://github.com/flxst/nerblackbox
- documentation: https://flxst.github.io/nerblackbox
- PyPI: https://pypi.org/project/nerblackbox
Installation
pip install nerblackbox
About
Take a dataset from one of many available sources. Then train, evaluate and apply a language model in a few simple steps.
-
Choose a dataset from HuggingFace (HF), the Local Filesystem (LF), or a Built-in (BI) dataset
dataset = Dataset("conll2003", source="HF") # HuggingFace dataset = Dataset("my_dataset", source="LF") # Local Filesystem dataset = Dataset("swe_nerc", source="BI") # Built-in
-
Set up the dataset
dataset.set_up()
Datasets from an Annotation Tool (AT) server can also be used. See Data for more details.
-
Define the training by choosing a pretrained model and a dataset
training = Training("my_training", model="bert-base-cased", dataset="conll2003")
-
Run the training and get the performance of the fine-tuned model
training.run() training.get_result(metric="f1", level="entity", phase="test") # 0.9045
See Training for more details.
-
Load the model
model = Model.from_training("my_training")
-
Evaluate the model
results = model.evaluate_on_dataset("conll2003", phase="test") results["micro"]["entity"]["f1"] # 0.9045
See Evaluation for more details.
-
Load the model
model = Model.from_training("my_training")
-
Let the model predict
model.predict("The United Nations has never recognised Jakarta's move.") # [[ # {'char_start': '4', 'char_end': '18', 'token': 'United Nations', 'tag': 'ORG'}, # {'char_start': '40', 'char_end': '47', 'token': 'Jakarta', 'tag': 'LOC'} # ]]
See Inference for more details.
Get Started
In order to get familiar with nerblackbox, it is recommended to
-
read the doc sections Preparation, Data, Training, Evaluation and Inference
-
go through one of the example notebooks
-
check out the Python API documentation
Features
Data
- Integration of Datasets from Multiple Sources (HuggingFace, Annotation Tools, ..)
- Support for Multiple Dataset Types (Standard, Pretokenized)
- Support for Multiple Annotation Schemes (IO, BIO, BILOU)
- Text Encoding
Training
- Adaptive Fine-tuning
- Hyperparameter Search
- Multiple Runs with Different Random Seeds
- Detailed Analysis of Training Results
Evaluation
- Evaluation of Any Model on Any Dataset
Inference
- Versatile Model Inference (Entity/Word Level, Probabilities, ..)
Other
- Full Compatibility with HuggingFace
- GPU Support
- Language Agnosticism
Citation
@misc{nerblackbox,
author = {Stollenwerk, Felix},
title = {nerblackbox: a high-level library for named entity recognition in python},
year = {2021},
url = {https://github.com/flxst/nerblackbox},
}