langchain_community.document_loaders.html_bs.BSHTMLLoader¶
- class langchain_community.document_loaders.html_bs.BSHTMLLoader(file_path: str, open_encoding: Optional[str] = None, bs_kwargs: Optional[dict] = None, get_text_separator: str = '')[source]¶
- Load HTML files and parse them with beautiful soup. - Initialise with path, and optionally, file encoding to use, and any kwargs to pass to the BeautifulSoup object. - Parameters
- file_path – The path to the file to load. 
- open_encoding – The encoding to use when opening the file. 
- bs_kwargs – Any kwargs to pass to the BeautifulSoup object. 
- get_text_separator – The separator to use when calling get_text on the soup. 
 
 - Methods - __init__(file_path[, open_encoding, ...])- Initialise with path, and optionally, file encoding to use, and any kwargs to pass to the BeautifulSoup object. - A lazy loader for Documents. - load()- Load HTML document into document objects. - load_and_split([text_splitter])- Load Documents and split into chunks. - __init__(file_path: str, open_encoding: Optional[str] = None, bs_kwargs: Optional[dict] = None, get_text_separator: str = '') None[source]¶
- Initialise with path, and optionally, file encoding to use, and any kwargs to pass to the BeautifulSoup object. - Parameters
- file_path – The path to the file to load. 
- open_encoding – The encoding to use when opening the file. 
- bs_kwargs – Any kwargs to pass to the BeautifulSoup object. 
- get_text_separator – The separator to use when calling get_text on the soup. 
 
 
 - load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]¶
- Load Documents and split into chunks. Chunks are returned as Documents. - Parameters
- text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter. 
- Returns
- List of Documents.