Compatibility with HuggingFace
General
nerblackbox is heavily based on HuggingFace Transformers. Moreover, HuggingFace Datasets and HuggingFace Evaluate are well-integrated, see Data and Evaluation, respectively.
Therefore, compatibility with HuggingFace is generally given. In particular,
-
nerblackbox's model checkpoints (and tokenizer files) are identical to the ones from HuggingFace.
model checkpoint directory
ls <checkpoint_directory> # config.json # pytorch_model.bin # special_tokens_map.json # tokenizer.json # tokenizer_config.json # vocab.txt
-
After a Model instance is created from a checkpoint, it contains a HuggingFace model and tokenizer as attributes:
model attributes
model = Model(<checkpoint_directory>) print(type(model.model)) # <class 'transformers.models.bert.modeling_bert.BertForTokenClassification'> print(type(model.tokenizer)) # <class 'transformers.models.bert.tokenization_bert_fast.BertTokenizerFast'>
Hence,
model.model
andmodel.tokenizer
can be used like any other transformers model and tokenizer, respectively.
Models
-
Model architectures that have successfully been tested (see Reproduction of Results) with nerblackbox are:
BERT
DistilBERT
RoBERTa
DeBERTa
ELECTRA
-
Model architectures that are known to currently not work with nerblackbox are:
XLM-RoBERTa
ALBERT