`langchain_community.document_loaders.chromium`.AsyncChromiumLoader¶

class langchain_community.document_loaders.chromium.AsyncChromiumLoader(urls: List[str])[source]¶

Scrape HTML pages from URLs using a headless instance of the Chromium.

Initialize the loader with a list of URL paths.

Parameters: urls (List[str]) – A list of URLs to scrape content from.
Raises: ImportError – If the required ‘playwright’ package is not installed.

Methods

`__init__`(urls)	Initialize the loader with a list of URL paths.
`ascrape_playwright`(url)	Asynchronously scrape the content of a given URL using Playwright's async API.
`lazy_load`()	Lazily load text content from the provided URLs.
`load`()	Load and return all Documents from the provided URLs.
`load_and_split`([text_splitter])	Load Documents and split into chunks.

__init__(urls: List[str])[source]¶

Initialize the loader with a list of URL paths.

Parameters: urls (List[str]) – A list of URLs to scrape content from.
Raises: ImportError – If the required ‘playwright’ package is not installed.

async ascrape_playwright(url: str) → str[source]¶

Asynchronously scrape the content of a given URL using Playwright’s async API.

Parameters: url (str) – The URL to scrape.
Returns: The scraped HTML content or an error message if an exception occurs.
Return type: str

lazy_load() → Iterator[Document][source]¶

Lazily load text content from the provided URLs.

This method yields Documents one at a time as they’re scraped, instead of waiting to scrape all URLs before returning.

Yields: Document – The scraped content encapsulated within a Document object.

Load and return all Documents from the provided URLs.

Returns: A list of Document objects containing the scraped content from each URL.
Return type: List[Document]

load_and_split(text_splitter: Optional[TextSplitter] = None) → List[Document]¶

Load Documents and split into chunks. Chunks are returned as Documents.

Parameters: text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns: List of Documents.

Examples using AsyncChromiumLoader¶