langchain_text_splitters.base.Tokenizer¶
- class langchain_text_splitters.base.Tokenizer(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]])[source]¶
Tokenizer data class.
Attributes
chunk_overlapOverlap in tokens between chunks
tokens_per_chunkMaximum number of tokens per chunk
decodeFunction to decode a list of token ids to a string
encodeFunction to encode a string to a list of token ids
Methods
__init__(chunk_overlap, tokens_per_chunk, ...)- Parameters
chunk_overlap (int) –
tokens_per_chunk (int) –
decode (Callable[[List[int]], str]) –
encode (Callable[[str], List[int]]) –
- __init__(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]]) None¶
- Parameters
chunk_overlap (int) –
tokens_per_chunk (int) –
decode (Callable[[List[int]], str]) –
encode (Callable[[str], List[int]]) –
- Return type
None