langchain.text_splitter
.Tokenizer¶
- class langchain.text_splitter.Tokenizer(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]])[source]¶
Tokenizer data class.
Attributes
chunk_overlap
Overlap in tokens between chunks
tokens_per_chunk
Maximum number of tokens per chunk
decode
Function to decode a list of token ids to a string
encode
Function to encode a string to a list of token ids
Methods
__init__
(chunk_overlap, tokens_per_chunk, ...)- __init__(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]]) None ¶