• LangChain
  • Core
  • Community
  • Experimental
  • openai
  • mistralai
  • robocorp
  • google-vertexai
  • nvidia-trt
  • anthropic
  • together
  • nvidia-ai-endpoints
  • exa
  • google-genai
  • Partner libs
    openai mistralai robocorp google-vertexai nvidia-trt anthropic together nvidia-ai-endpoints exa google-genai
  • Docs
Prev Up Next
  • langchain_core.utils.html.extract_sub_links
    • extract_sub_links()

langchain_core.utils.html.extract_sub_linksΒΆ

langchain_core.utils.html.extract_sub_links(raw_html: str, url: str, *, base_url: Optional[str] = None, pattern: Optional[Union[str, Pattern]] = None, prevent_outside: bool = True, exclude_prefixes: Sequence[str] = ()) → List[str][source]ΒΆ

Extract all links from a raw html string and convert into absolute paths.

Parameters
  • raw_html – original html.

  • url – the url of the html.

  • base_url – the base url to check for outside links against.

  • pattern – Regex to use for extracting links from raw html.

  • prevent_outside – If True, ignore external links which are not children of the base url.

  • exclude_prefixes – Exclude any URLs that start with one of these prefixes.

Returns

sub links

Return type

List[str]

© 2023, LangChain, Inc.. Last updated on Jan 29, 2024. Show this page source