langchain_community.document_loaders.parsers.html.bs4
.BS4HTMLParser¶
- class langchain_community.document_loaders.parsers.html.bs4.BS4HTMLParser(*, features: str = 'lxml', get_text_separator: str = '', **kwargs: Any)[source]¶
Pparse HTML files using Beautiful Soup.
Initialize a bs4 based HTML parser.
Methods
__init__
(*[, features, get_text_separator])Initialize a bs4 based HTML parser.
lazy_parse
(blob)Load HTML document into document objects.
parse
(blob)Eagerly parse the blob into a document or documents.
- __init__(*, features: str = 'lxml', get_text_separator: str = '', **kwargs: Any) None [source]¶
Initialize a bs4 based HTML parser.
- parse(blob: Blob) List[Document] ¶
Eagerly parse the blob into a document or documents.
This is a convenience method for interactive development environment.
Production applications should favor the lazy_parse method instead.
Subclasses should generally not over-ride this parse method.
- Parameters
blob – Blob instance
- Returns
List of documents