`langchain_community.document_loaders.chromium`.AsyncChromiumLoader¶

class langchain_community.document_loaders.chromium.AsyncChromiumLoader(urls: List[str])[source]¶

Scrape HTML pages from URLs using a headless instance of the Chromium.

Initialize the loader with a list of URL paths.

Parameters: urls (List[str]) – A list of URLs to scrape content from.
Raises: ImportError – If the required ‘playwright’ package is not installed.

Methods

`__init__`(urls)	Initialize the loader with a list of URL paths.
`alazy_load`()	A lazy loader for Documents.
`ascrape_playwright`(url)	Asynchronously scrape the content of a given URL using Playwright's async API.
`lazy_load`()	Lazily load text content from the provided URLs.
`load`()	Load data into Document objects.
`load_and_split`([text_splitter])	Load Documents and split into chunks.

__init__(urls: List[str])[source]¶

Initialize the loader with a list of URL paths.

Parameters: urls (List[str]) – A list of URLs to scrape content from.
Raises: ImportError – If the required ‘playwright’ package is not installed.

async alazy_load() → AsyncIterator[Document]¶

A lazy loader for Documents.

async ascrape_playwright(url: str) → str[source]¶

Asynchronously scrape the content of a given URL using Playwright’s async API.

Parameters: url (str) – The URL to scrape.
Returns: The scraped HTML content or an error message if an exception occurs.
Return type: str

lazy_load() → Iterator[Document][source]¶

Lazily load text content from the provided URLs.

This method yields Documents one at a time as they’re scraped, instead of waiting to scrape all URLs before returning.

Yields: Document – The scraped content encapsulated within a Document object.
Return type: Iterator[Document]

load() → List[Document]¶

Load data into Document objects.

load_and_split(text_splitter: Optional[TextSplitter] = None) → List[Document]¶

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns: List of Documents.
Return type: List[Document]

Examples using AsyncChromiumLoader¶