langchain.text_splitter.MarkdownHeaderTextSplitter¶

class langchain.text_splitter.MarkdownHeaderTextSplitter(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]¶

Splitting markdown files based on specified headers.

Create a new MarkdownHeaderTextSplitter.

Parameters
  • headers_to_split_on – Headers we want to track

  • return_each_line – Return each line w/ associated headers

  • strip_headers – Strip split headers from the content of the chunk

Methods

__init__(headers_to_split_on[, ...])

Create a new MarkdownHeaderTextSplitter.

aggregate_lines_to_chunks(lines)

Combine lines with common metadata into chunks :param lines: Line of text / associated header metadata

split_text(text)

Split markdown file :param text: Markdown file

__init__(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]¶

Create a new MarkdownHeaderTextSplitter.

Parameters
  • headers_to_split_on – Headers we want to track

  • return_each_line – Return each line w/ associated headers

  • strip_headers – Strip split headers from the content of the chunk

aggregate_lines_to_chunks(lines: List[LineType]) List[Document][source]¶

Combine lines with common metadata into chunks :param lines: Line of text / associated header metadata

split_text(text: str) List[Document][source]¶

Split markdown file :param text: Markdown file

Examples using MarkdownHeaderTextSplitter¶