langchain_community.document_loaders.url_playwright.UnstructuredHtmlEvaluator¶

class langchain_community.document_loaders.url_playwright.UnstructuredHtmlEvaluator(remove_selectors: Optional[List[str]] = None)[source]¶

Evaluates the page HTML content using the unstructured library.

Initialize UnstructuredHtmlEvaluator.

Methods

__init__([remove_selectors])

Initialize UnstructuredHtmlEvaluator.

evaluate(page, browser, response)

Synchronously process the HTML content of the page.

evaluate_async(page, browser, response)

Asynchronously process the HTML content of the page.

Parameters

remove_selectors (Optional[List[str]]) –

__init__(remove_selectors: Optional[List[str]] = None)[source]¶

Initialize UnstructuredHtmlEvaluator.

Parameters

remove_selectors (Optional[List[str]]) –

evaluate(page: Page, browser: Browser, response: Response) str[source]¶

Synchronously process the HTML content of the page.

Parameters
  • page (Page) –

  • browser (Browser) –

  • response (Response) –

Return type

str

async evaluate_async(page: AsyncPage, browser: AsyncBrowser, response: AsyncResponse) str[source]¶

Asynchronously process the HTML content of the page.

Parameters
  • page (AsyncPage) –

  • browser (AsyncBrowser) –

  • response (AsyncResponse) –

Return type

str