• API
  • Core
  • Community
  • Experimental
  • Python Docs
Prev Up Next

LangChain 0.0.349

  • langchain_core.utils.html.extract_sub_links

langchain_core.utils.html.extract_sub_linksΒΆ

langchain_core.utils.html.extract_sub_links(raw_html: str, url: str, *, base_url: Optional[str] = None, pattern: Optional[Union[str, Pattern]] = None, prevent_outside: bool = True, exclude_prefixes: Sequence[str] = ()) → List[str][source]ΒΆ

Extract all links from a raw html string and convert into absolute paths.

Parameters
  • raw_html – original html.

  • url – the url of the html.

  • base_url – the base url to check for outside links against.

  • pattern – Regex to use for extracting links from raw html.

  • prevent_outside – If True, ignore external links which are not children of the base url.

  • exclude_prefixes – Exclude any URLs that start with one of these prefixes.

Returns

sub links

Return type

List[str]

© 2023, Harrison Chase. Last updated on Dec 14, 2023. Show this page source