Model

model that predicts tags for given input text

`init(checkpoint_directory, batch_size=16, max_seq_length=None, dataset=None)`

Parameters:

Name	Type	Description	Default
`checkpoint_directory`	`str`	path to the checkpoint directory	required
`batch_size`	`int`	batch size used by dataloader	`16`
`max_seq_length`	`Optional[int]`	maximum sequence length (Optional). Loaded from checkpoint if not specified.	`None`
`dataset`	`Optional[str]`	should be provided in case model is missing information on id2label in config	`None`

`evaluate_on_dataset(dataset_name, dataset_format='infer', phase='test', class_mapping=None, number=None, derived_from_jsonl=False, rounded_decimals=3)`

evaluate model on dataset from huggingface or local dataset in jsonl or csv format

Parameters:

Name	Type	Description	Default
`dataset_name`	`str`	e.g. 'conll2003'	required
`dataset_format`	`str`	'huggingface', 'jsonl', 'csv'	`'infer'`
`phase`	`str`	e.g. 'test'	`'test'`
`class_mapping`	`Optional[Dict[str, str]]`	e.g. {"PER": "PI", "ORG": "PI}	`None`
`number`	`Optional[int]`	e.g. 100	`None`
`derived_from_jsonl`	`bool`		`False`
`rounded_decimals`	`Optional[int]`	if not None, results will be rounded to provided decimals	`3`

Returns:

Name	Type	Description
`evaluation_dict`	`EVALUATION_DICT`	Dict with keys [label][level][metric] where label in ['micro', 'macro'], level in ['entity', 'token'] metric in ['precision', 'recall', 'f1', 'precision_seqeval', 'recall_seqeval', 'f1_seqeval'] and values = float between 0 and 1

`from_checkpoint(checkpoint_directory)` `classmethod`

Parameters:

Name	Type	Description	Default
`checkpoint_directory`	`str`	path to the checkpoint directory	required

Returns:

Name	Type	Description
`model`	`Optional[Model]`	best model from training

`from_huggingface(repo_id, dataset=None)` `classmethod`

Parameters:

Name	Type	Description	Default
`repo_id`	`str`	id of the huggingface hub repo id, e.g. 'KB/bert-base-swedish-cased-ner'	required
`dataset`	`Optional[str]`	should be provided in case model is missing information on id2label in config	`None`

Returns:

Name	Type	Description
`model`	`Optional[Model]`	model

`from_training(training_name)` `classmethod`

load best model from training.

Parameters:

Name	Type	Description	Default
`training_name`	`str`	name of the training, e.g. "training0"	required

Returns:

Name	Type	Description
`model`	`Optional[Model]`	best model from training

`predict(input_texts, level='entity', autocorrect=False, is_pretokenized=False)`

predict tags for input texts. output on entity or word level.

Examples:

predict(["arbetsförmedlingen finns i stockholm"], level="word", autocorrect=False)
# [[
#     {"char_start": "0", "char_end": "18", "token": "arbetsförmedlingen", "tag": "I-ORG"},
#     {"char_start": "19", "char_end": "24", "token": "finns", "tag": "O"},
#     {"char_start": "25", "char_end": "26", "token": "i", "tag": "O"},
#     {"char_start": "27", "char_end": "36", "token": "stockholm", "tag": "B-LOC"},
# ]]

predict(["arbetsförmedlingen finns i stockholm"], level="word", autocorrect=True)
# [[
#     {"char_start": "0", "char_end": "18", "token": "arbetsförmedlingen", "tag": "B-ORG"},
#     {"char_start": "19", "char_end": "24", "token": "finns", "tag": "O"},
#     {"char_start": "25", "char_end": "26", "token": "i", "tag": "O"},
#     {"char_start": "27", "char_end": "36", "token": "stockholm", "tag": "B-LOC"},
# ]]

predict(["arbetsförmedlingen finns i stockholm"], level="entity", autocorrect=False)
# [[
#     {"char_start": "27", "char_end": "36", "token": "stockholm", "tag": "LOC"},
# ]]

predict(["arbetsförmedlingen finns i stockholm"], level="entity", autocorrect=True)
# [[
#     {"char_start": "0", "char_end": "18", "token": "arbetsförmedlingen", "tag": "ORG"},
#     {"char_start": "27", "char_end": "36", "token": "stockholm", "tag": "LOC"},
# ]]

Parameters:

Name	Type	Description	Default
`input_texts`	`Union[str, List[str]]`	e.g. ["example 1", "example 2"]	required
`level`	`str`	"entity" or "word"	`'entity'`
`autocorrect`	`bool`	if True, autocorrect annotation scheme (e.g. B- and I- tags).	`False`
`is_pretokenized`	`bool`	True if input_texts are pretokenized	`False`

Returns:

Name	Type	Description
`predictions`	`PREDICTIONS`	[list] of predictions for the different examples. each list contains a [list] of [dict] w/ keys = char_start, char_end, word, tag

`predict_on_file(input_file, output_file)`

predict tags for all input texts in input file, write results to output file

Parameters:

Name	Type	Description	Default
`input_file`	`str`	e.g. strangnas/test.jsonl	required
`output_file`	`str`	e.g. strangnas/test_anonymized.jsonl	required

`predict_proba(input_texts, is_pretokenized=False)`

predict probability distributions for input texts. output on word level.

Examples:

predict_proba(["arbetsförmedlingen finns i stockholm"])
# [[
#     {"char_start": "0", "char_end": "18", "token": "arbetsförmedlingen", "proba_dist: {"O": 0.21, "B-ORG": 0.56, ..}},
#     {"char_start": "19", "char_end": "24", "token": "finns", "proba_dist: {"O": 0.87, "B-ORG": 0.02, ..}},
#     {"char_start": "25", "char_end": "26", "token": "i", "proba_dist: {"O": 0.95, "B-ORG": 0.01, ..}},
#     {"char_start": "27", "char_end": "36", "token": "stockholm", "proba_dist: {"O": 0.14, "B-ORG": 0.22, ..}},
# ]]

Parameters:

Name	Type	Description	Default
`input_texts`	`Union[str, List[str]]`	e.g. ["example 1", "example 2"]	required
`is_pretokenized`	`bool`	True if input_texts are pretokenized	`False`

Returns:

Name	Type	Description
`predictions`	`PREDICTIONS`	[list] of probability predictions for different examples. each list contains a [list] of [dict] w/ keys = char_start, char_end, word, proba_dist where proba_dist = [dict] that maps self.annotation.classes to probabilities

Model

__init__(checkpoint_directory, batch_size=16, max_seq_length=None, dataset=None)

evaluate_on_dataset(dataset_name, dataset_format='infer', phase='test', class_mapping=None, number=None, derived_from_jsonl=False, rounded_decimals=3)

from_checkpoint(checkpoint_directory) classmethod

from_huggingface(repo_id, dataset=None) classmethod

from_training(training_name) classmethod

predict(input_texts, level='entity', autocorrect=False, is_pretokenized=False)

predict_on_file(input_file, output_file)

predict_proba(input_texts, is_pretokenized=False)

`init(checkpoint_directory, batch_size=16, max_seq_length=None, dataset=None)`

`evaluate_on_dataset(dataset_name, dataset_format='infer', phase='test', class_mapping=None, number=None, derived_from_jsonl=False, rounded_decimals=3)`

`from_checkpoint(checkpoint_directory)` `classmethod`

`from_huggingface(repo_id, dataset=None)` `classmethod`

`from_training(training_name)` `classmethod`

`predict(input_texts, level='entity', autocorrect=False, is_pretokenized=False)`

`predict_on_file(input_file, output_file)`

`predict_proba(input_texts, is_pretokenized=False)`