langchain_community.document_loaders.parsers.html.bs4
.BS4HTMLParser¶
- class langchain_community.document_loaders.parsers.html.bs4.BS4HTMLParser(*, features: str = 'lxml', get_text_separator: str = '', **kwargs: Any)[source]¶
Pparse HTML files using Beautiful Soup.
Initialize a bs4 based HTML parser.
Methods
__init__
(*[, features, get_text_separator])Initialize a bs4 based HTML parser.
lazy_parse
(blob)Load HTML document into document objects.
parse
(blob)Eagerly parse the blob into a document or documents.
- Parameters
features (str) –
get_text_separator (str) –
kwargs (Any) –
- __init__(*, features: str = 'lxml', get_text_separator: str = '', **kwargs: Any) None [source]¶
Initialize a bs4 based HTML parser.
- Parameters
features (str) –
get_text_separator (str) –
kwargs (Any) –
- Return type
None