langchain_text_splitters.base.Tokenizer¶
- class langchain_text_splitters.base.Tokenizer(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]])[source]¶
- Tokenizer data class. - Attributes - chunk_overlap- Overlap in tokens between chunks - tokens_per_chunk- Maximum number of tokens per chunk - decode- Function to decode a list of token ids to a string - encode- Function to encode a string to a list of token ids - Methods - __init__(chunk_overlap, tokens_per_chunk, ...)- Parameters
- chunk_overlap (int) – 
- tokens_per_chunk (int) – 
- decode (Callable[[List[int]], str]) – 
- encode (Callable[[str], List[int]]) – 
 
 - __init__(chunk_overlap: int, tokens_per_chunk: int, decode: Callable[[List[int]], str], encode: Callable[[str], List[int]]) None¶
- Parameters
- chunk_overlap (int) – 
- tokens_per_chunk (int) – 
- decode (Callable[[List[int]], str]) – 
- encode (Callable[[str], List[int]]) – 
 
- Return type
- None