langchain_community.document_loaders.parsers.grobid
.GrobidParser¶
- class langchain_community.document_loaders.parsers.grobid.GrobidParser(segment_sentences: bool, grobid_server: str = 'http://localhost:8070/api/processFulltextDocument')[source]¶
Load article PDF files using Grobid.
Methods
__init__
(segment_sentences[, grobid_server])lazy_parse
(blob)Lazy parsing interface.
parse
(blob)Eagerly parse the blob into a document or documents.
process_xml
(file_path, xml_data, ...)Process the XML file from Grobin.
- __init__(segment_sentences: bool, grobid_server: str = 'http://localhost:8070/api/processFulltextDocument') None [source]¶
- lazy_parse(blob: Blob) Iterator[Document] [source]¶
Lazy parsing interface.
Subclasses are required to implement this method.
- Parameters
blob – Blob instance
- Returns
Generator of documents
- parse(blob: Blob) List[Document] ¶
Eagerly parse the blob into a document or documents.
This is a convenience method for interactive development environment.
Production applications should favor the lazy_parse method instead.
Subclasses should generally not over-ride this parse method.
- Parameters
blob – Blob instance
- Returns
List of documents