langchain_community.document_loaders.doc_intelligence
.AzureAIDocumentIntelligenceLoader¶
- class langchain_community.document_loaders.doc_intelligence.AzureAIDocumentIntelligenceLoader(api_endpoint: str, api_key: str, file_path: Optional[str] = None, url_path: Optional[str] = None, api_version: Optional[str] = None, api_model: str = 'prebuilt-layout', mode: str = 'markdown')[source]¶
Loads a PDF with Azure Document Intelligence
Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
This constructor initializes a AzureAIDocumentIntelligenceParser object to be used for parsing files using the Azure Document Intelligence API. The load method generates Documents whose content representations are determined by the mode parameter.
Parameters:¶
- api_endpoint: str
The API endpoint to use for DocumentIntelligenceClient construction.
- api_key: str
The API key to use for DocumentIntelligenceClient construction.
- file_pathOptional[str]
The path to the file that needs to be loaded. Either file_path or url_path must be specified.
- url_pathOptional[str]
The URL to the file that needs to be loaded. Either file_path or url_path must be specified.
- api_version: Optional[str]
The API version for DocumentIntelligenceClient. Setting None to use the default value from SDK.
- api_model: str
The model name or ID to be used for form recognition in Azure.
- mode: Optional[str]
The type of content representation of the generated Documents.
Examples:¶
>>> obj = AzureAIDocumentIntelligenceLoader( ... file_path="path/to/file", ... api_endpoint="https://endpoint.azure.com", ... api_key="APIKEY", ... api_version="2023-10-31-preview", ... model="prebuilt-document", ... mode="markdown" ... )
Methods
__init__
(api_endpoint, api_key[, file_path, ...])Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
A lazy loader for Documents.
Lazy load given path as pages.
load
()Load data into Document objects.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(api_endpoint: str, api_key: str, file_path: Optional[str] = None, url_path: Optional[str] = None, api_version: Optional[str] = None, api_model: str = 'prebuilt-layout', mode: str = 'markdown') None [source]¶
Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
This constructor initializes a AzureAIDocumentIntelligenceParser object to be used for parsing files using the Azure Document Intelligence API. The load method generates Documents whose content representations are determined by the mode parameter.
Parameters:¶
- api_endpoint: str
The API endpoint to use for DocumentIntelligenceClient construction.
- api_key: str
The API key to use for DocumentIntelligenceClient construction.
- file_pathOptional[str]
The path to the file that needs to be loaded. Either file_path or url_path must be specified.
- url_pathOptional[str]
The URL to the file that needs to be loaded. Either file_path or url_path must be specified.
- api_version: Optional[str]
The API version for DocumentIntelligenceClient. Setting None to use the default value from SDK.
- api_model: str
The model name or ID to be used for form recognition in Azure.
- mode: Optional[str]
The type of content representation of the generated Documents.
Examples:¶
>>> obj = AzureAIDocumentIntelligenceLoader( ... file_path="path/to/file", ... api_endpoint="https://endpoint.azure.com", ... api_key="APIKEY", ... api_version="2023-10-31-preview", ... model="prebuilt-document", ... mode="markdown" ... )
- Parameters
api_endpoint (str) –
api_key (str) –
file_path (Optional[str]) –
url_path (Optional[str]) –
api_version (Optional[str]) –
api_model (str) –
mode (str) –
- Return type
None
- async alazy_load() AsyncIterator[Document] ¶
A lazy loader for Documents.
- Return type
AsyncIterator[Document]
- lazy_load() Iterator[Document] [source]¶
Lazy load given path as pages.
- Return type
Iterator[Document]
- load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document] ¶
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters
text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns
List of Documents.
- Return type
List[Document]
- Parameters
api_endpoint (str) –
api_key (str) –
file_path (Optional[str]) –
url_path (Optional[str]) –
api_version (Optional[str]) –
api_model (str) –
mode (str) –