TextEncoder
__init__(encoding, model_special_tokens=None)
Examples:
TextEncoder(
encoding={"\n": "[NEWLINE]", "\t": "[TAB]"},
model_special_tokens=["[NEWLINE]", "[TAB]"],
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoding |
Dict[str, str]
|
mapping to special tokens |
required |
model_special_tokens |
Optional[List[str]]
|
special tokens that the model was trained on |
None
|
decode(text_encoded_list, encode_decode_mappings_list, predictions_encoded_list)
decodes list of text_encoded and predictions_encoded using encode_decode_mappings
Examples:
text_list, predictions_list = decode(
text_encoded_list=["an[NEWLINE] example"],
encode_decode_mappings_list=[[(2, "\n", "[NEWLINE]")]]),
predictions_encoded_list=[[{"char_start": "12", "char_end": "19", "token": "example", "tag": "TAG"}]]
)
# text_list = ["an\n example"]
# predictions_list = [[{"char_start": "4", "char_end": "11", "token": "example", "tag": "TAG"}]]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_encoded_list |
List[str]
|
encoded text |
required |
encode_decode_mappings_list |
List[EncodeDecodeMappings]
|
mappings (char_start, original token, encoded token) |
required |
predictions_encoded_list |
List[Predictions]
|
encoded predictions |
required |
Returns:
Name | Type | Description |
---|---|---|
text_list |
List[str]
|
original / decoded text |
predictions_list |
List[Predictions]
|
original / decoded predictions |
encode(text_list)
encodes list of text using self.encoding
Examples:
text_encoded_list, encode_decode_mappings_list = encode(text_list=["an\n example"])
# text_encoded_list = ["an[NEWLINE] example"]
# encode_decode_mappings_list = [[(2, "\n", "[NEWLINE]")]]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_list |
List[str]
|
original text |
required |
Returns:
Name | Type | Description |
---|---|---|
text_encoded_list |
List[str]
|
encoded text |
encode_decode_mappings_list |
List[EncodeDecodeMappings]
|
mappings (char_start, original token, encoded token) |