langchain_community.document_loaders.url_playwright
.PlaywrightURLLoader¶
- class langchain_community.document_loaders.url_playwright.PlaywrightURLLoader(urls: List[str], continue_on_failure: bool = True, headless: bool = True, remove_selectors: Optional[List[str]] = None, evaluator: Optional[PlaywrightEvaluator] = None)[source]¶
Load HTML pages with Playwright and parse with Unstructured.
This is useful for loading pages that require javascript to render.
- urls¶
List of URLs to load.
- Type
List[str]
- continue_on_failure¶
If True, continue loading other URLs on failure.
- Type
bool
- headless¶
If True, the browser will run in headless mode.
- Type
bool
Load a list of URLs using Playwright.
Methods
__init__
(urls[, continue_on_failure, ...])Load a list of URLs using Playwright.
aload
()Load the specified URLs with Playwright and create Documents asynchronously.
A lazy loader for Documents.
load
()Load the specified URLs using Playwright and create Document instances.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(urls: List[str], continue_on_failure: bool = True, headless: bool = True, remove_selectors: Optional[List[str]] = None, evaluator: Optional[PlaywrightEvaluator] = None)[source]¶
Load a list of URLs using Playwright.
- async aload() List[Document] [source]¶
Load the specified URLs with Playwright and create Documents asynchronously. Use this function when in a jupyter notebook environment.
- Returns
A list of Document instances with loaded content.
- Return type
List[Document]
- load() List[Document] [source]¶
Load the specified URLs using Playwright and create Document instances.
- Returns
A list of Document instances with loaded content.
- Return type
List[Document]
- load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document] ¶
Load Documents and split into chunks. Chunks are returned as Documents.
- Parameters
text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns
List of Documents.