langchain_text_splitters 0.0.1

langchain_text_splitters.base

Classes

base.Language(value[, names, module, ...])

Enum of the programming languages.

base.TextSplitter(chunk_size, chunk_overlap, ...)

Interface for splitting text into chunks.

base.TokenTextSplitter([encoding_name, ...])

Splitting text to tokens using model tokenizer.

base.Tokenizer(chunk_overlap, ...)

Tokenizer data class.

Functions

base.split_text_on_tokens(*, text, tokenizer)

Split incoming text and return chunks using tokenizer.

langchain_text_splitters.character

Classes

character.CharacterTextSplitter([separator, ...])

Splitting text that looks at characters.

character.RecursiveCharacterTextSplitter([...])

Splitting text by recursively look at characters.

langchain_text_splitters.html

Classes

html.ElementType

Element type as typed dict.

html.HTMLHeaderTextSplitter(headers_to_split_on)

Splitting HTML files based on specified headers.

langchain_text_splitters.json

Classes

json.RecursiveJsonSplitter([max_chunk_size, ...])

langchain_text_splitters.konlpy

Classes

konlpy.KonlpyTextSplitter([separator])

Splitting text using Konlpy package.

langchain_text_splitters.latex

Classes

latex.LatexTextSplitter(**kwargs)

Attempts to split the text along Latex-formatted layout elements.

langchain_text_splitters.markdown

Classes

markdown.HeaderType

Header type as typed dict.

markdown.LineType

Line type as typed dict.

markdown.MarkdownHeaderTextSplitter(...[, ...])

Splitting markdown files based on specified headers.

markdown.MarkdownTextSplitter(**kwargs)

Attempts to split the text along Markdown-formatted headings.

langchain_text_splitters.nltk

Classes

nltk.NLTKTextSplitter([separator, language])

Splitting text using NLTK package.

langchain_text_splitters.python

Classes

python.PythonCodeTextSplitter(**kwargs)

Attempts to split the text along Python syntax.

langchain_text_splitters.sentence_transformers

Classes

sentence_transformers.SentenceTransformersTokenTextSplitter([...])

Splitting text to tokens using sentence model tokenizer.

langchain_text_splitters.spacy

Classes

spacy.SpacyTextSplitter([separator, ...])

Splitting text using Spacy package.