langchain_community.document_loaders.parsers.generic
.MimeTypeBasedParser¶
- class langchain_community.document_loaders.parsers.generic.MimeTypeBasedParser(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: Optional[BaseBlobParser] = None)[source]¶
Parser that uses mime-types to parse a blob.
This parser is useful for simple pipelines where the mime-type is sufficient to determine how to parse a blob.
To use, configure handlers based on mime-types and pass them to the initializer.
Example
from langchain_community.document_loaders.parsers.generic import MimeTypeBasedParser
- parser = MimeTypeBasedParser(
- handlers={
“application/pdf”: …,
}, fallback_parser=…,
)
Define a parser that uses mime-types to determine how to parse a blob.
- Parameters
handlers – A mapping from mime-types to functions that take a blob, parse it and return a document.
fallback_parser – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.
Methods
__init__
(handlers, *[, fallback_parser])Define a parser that uses mime-types to determine how to parse a blob.
lazy_parse
(blob)Load documents from a blob.
parse
(blob)Eagerly parse the blob into a document or documents.
- __init__(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: Optional[BaseBlobParser] = None) None [source]¶
Define a parser that uses mime-types to determine how to parse a blob.
- Parameters
handlers – A mapping from mime-types to functions that take a blob, parse it and return a document.
fallback_parser – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.
- parse(blob: Blob) List[Document] ¶
Eagerly parse the blob into a document or documents.
This is a convenience method for interactive development environment.
Production applications should favor the lazy_parse method instead.
Subclasses should generally not over-ride this parse method.
- Parameters
blob – Blob instance
- Returns
List of documents