Skip to content

TextEncoder

__init__(encoding, model_special_tokens=None)

Examples:

TextEncoder(
    encoding={"\n": "[NEWLINE]", "\t": "[TAB]"},
    model_special_tokens=["[NEWLINE]", "[TAB]"],
)

Parameters:

Name Type Description Default
encoding Dict[str, str]

mapping to special tokens

required
model_special_tokens Optional[List[str]]

special tokens that the model was trained on

None

decode(text_encoded_list, encode_decode_mappings_list, predictions_encoded_list)

decodes list of text_encoded and predictions_encoded using encode_decode_mappings

Examples:

text_list, predictions_list = decode(
    text_encoded_list=["an[NEWLINE] example"],
    encode_decode_mappings_list=[[(2, "\n", "[NEWLINE]")]]),
    predictions_encoded_list=[[{"char_start": "12", "char_end": "19", "token": "example", "tag": "TAG"}]]
)
# text_list = ["an\n example"]
# predictions_list = [[{"char_start": "4", "char_end": "11", "token": "example", "tag": "TAG"}]]

Parameters:

Name Type Description Default
text_encoded_list List[str]

encoded text

required
encode_decode_mappings_list List[EncodeDecodeMappings]

mappings (char_start, original token, encoded token)

required
predictions_encoded_list List[Predictions]

encoded predictions

required

Returns:

Name Type Description
text_list List[str]

original / decoded text

predictions_list List[Predictions]

original / decoded predictions

encode(text_list)

encodes list of text using self.encoding

Examples:

text_encoded_list, encode_decode_mappings_list = encode(text_list=["an\n example"])
# text_encoded_list           = ["an[NEWLINE] example"]
# encode_decode_mappings_list = [[(2, "\n", "[NEWLINE]")]]

Parameters:

Name Type Description Default
text_list List[str]

original text

required

Returns:

Name Type Description
text_encoded_list List[str]

encoded text

encode_decode_mappings_list List[EncodeDecodeMappings]

mappings (char_start, original token, encoded token)