langchain_community.document_loaders.parsers.vsdx
.VsdxParser¶
- class langchain_community.document_loaders.parsers.vsdx.VsdxParser[source]¶
Parser for vsdx files.
Methods
__init__
()get_pages_content
(zfile, source)Get the content of the pages of a vsdx file.
get_relationships
(page, zfile, filelist, ...)Get the relationships of a page and the relationships of its relationships, etc.
lazy_parse
(blob)Retrieve the contents of pages from a .vsdx file and insert them into documents, one document per page.
parse
(blob)Parse a vsdx file.
- __init__()¶
- get_pages_content(zfile: ZipFile, source: str) List[Tuple[int, str, str]] [source]¶
Get the content of the pages of a vsdx file.
- zfile¶
The vsdx file under zip format.
- Type
zipfile.ZipFile
- source¶
The path of the vsdx file.
- Type
str
- Returns
A list of tuples containing the page number, the name of the page and the content of the page for each page of the vsdx file.
- Return type
list[tuple[int, str, str]]
- Parameters
zfile (ZipFile) –
source (str) –
- get_relationships(page: str, zfile: ZipFile, filelist: List[str], pagexml_rels: List[dict]) Set[str] [source]¶
Get the relationships of a page and the relationships of its relationships, etc… recursively. Pages are based on other pages (ex: background page), so we need to get all the relationships to get all the content of a single page.
- Parameters
page (str) –
zfile (ZipFile) –
filelist (List[str]) –
pagexml_rels (List[dict]) –
- Return type
Set[str]