langchain.text_splitter
.MarkdownHeaderTextSplitter¶
- class langchain.text_splitter.MarkdownHeaderTextSplitter(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]¶
Splitting markdown files based on specified headers.
Create a new MarkdownHeaderTextSplitter.
- Parameters
headers_to_split_on – Headers we want to track
return_each_line – Return each line w/ associated headers
strip_headers – Strip split headers from the content of the chunk
Methods
__init__
(headers_to_split_on[, ...])Create a new MarkdownHeaderTextSplitter.
aggregate_lines_to_chunks
(lines)Combine lines with common metadata into chunks :param lines: Line of text / associated header metadata
split_text
(text)Split markdown file :param text: Markdown file
- __init__(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]¶
Create a new MarkdownHeaderTextSplitter.
- Parameters
headers_to_split_on – Headers we want to track
return_each_line – Return each line w/ associated headers
strip_headers – Strip split headers from the content of the chunk