langchain_community.document_loaders.chromium
.AsyncChromiumLoader¶
- class langchain_community.document_loaders.chromium.AsyncChromiumLoader(urls: List[str])[source]¶
Scrape HTML pages from URLs using a headless instance of the Chromium.
Initialize the loader with a list of URL paths.
- Parameters
urls (List[str]) – A list of URLs to scrape content from.
- Raises
ImportError – If the required ‘playwright’ package is not installed.
Methods
__init__
(urls)Initialize the loader with a list of URL paths.
ascrape_playwright
(url)Asynchronously scrape the content of a given URL using Playwright's async API.
Lazily load text content from the provided URLs.
load
()Load and return all Documents from the provided URLs.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(urls: List[str])[source]¶
Initialize the loader with a list of URL paths.
- Parameters
urls (List[str]) – A list of URLs to scrape content from.
- Raises
ImportError – If the required ‘playwright’ package is not installed.
- async ascrape_playwright(url: str) str [source]¶
Asynchronously scrape the content of a given URL using Playwright’s async API.
- Parameters
url (str) – The URL to scrape.
- Returns
The scraped HTML content or an error message if an exception occurs.
- Return type
str
- lazy_load() Iterator[Document] [source]¶
Lazily load text content from the provided URLs.
This method yields Documents one at a time as they’re scraped, instead of waiting to scrape all URLs before returning.
- Yields
Document – The scraped content encapsulated within a Document object.
- load() List[Document] [source]¶
Load and return all Documents from the provided URLs.
- Returns
A list of Document objects containing the scraped content from each URL.
- Return type
List[Document]
- load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document] ¶
Load Documents and split into chunks. Chunks are returned as Documents.
- Parameters
text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns
List of Documents.