langchain_community.document_loaders.chromium.AsyncChromiumLoader¶

class langchain_community.document_loaders.chromium.AsyncChromiumLoader(urls: List[str])[source]¶

Scrape HTML pages from URLs using a headless instance of the Chromium.

Initialize the loader with a list of URL paths.

Parameters

urls (List[str]) – A list of URLs to scrape content from.

Raises

ImportError – If the required ‘playwright’ package is not installed.

Methods

__init__(urls)

Initialize the loader with a list of URL paths.

alazy_load()

A lazy loader for Documents.

ascrape_playwright(url)

Asynchronously scrape the content of a given URL using Playwright's async API.

lazy_load()

Lazily load text content from the provided URLs.

load()

Load data into Document objects.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(urls: List[str])[source]¶

Initialize the loader with a list of URL paths.

Parameters

urls (List[str]) – A list of URLs to scrape content from.

Raises

ImportError – If the required ‘playwright’ package is not installed.

async alazy_load() AsyncIterator[Document]¶

A lazy loader for Documents.

Return type

AsyncIterator[Document]

async ascrape_playwright(url: str) str[source]¶

Asynchronously scrape the content of a given URL using Playwright’s async API.

Parameters

url (str) – The URL to scrape.

Returns

The scraped HTML content or an error message if an exception occurs.

Return type

str

lazy_load() Iterator[Document][source]¶

Lazily load text content from the provided URLs.

This method yields Documents one at a time as they’re scraped, instead of waiting to scrape all URLs before returning.

Yields

Document – The scraped content encapsulated within a Document object.

Return type

Iterator[Document]

load() List[Document]¶

Load data into Document objects.

Return type

List[Document]

load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]¶

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns

List of Documents.

Return type

List[Document]

Examples using AsyncChromiumLoader¶