
class langchain_community.document_transformers.openai_functions.OpenAIMetadataTagger[source]

Bases: BaseDocumentTransformer, BaseModel

Extract metadata tags from document contents using OpenAI functions.

from langchain_community.chat_models import ChatOpenAI
from langchain_community.document_transformers import OpenAIMetadataTagger
from langchain_core.documents import Document

schema = {
    "properties": {
        "movie_title": { "type": "string" },
        "critic": { "type": "string" },
        "tone": {
            "type": "string",
            "enum": ["positive", "negative"]
        "rating": {
            "type": "integer",
            "description": "The number of stars the critic rated the movie"
    "required": ["movie_title", "critic", "tone"]

# Must be an OpenAI model that supports functions
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
tagging_chain = create_tagging_chain(schema, llm)
document_transformer = OpenAIMetadataTagger(tagging_chain=tagging_chain)
original_documents = [
    Document(page_content="Review of The Bee Movie

By Roger Ebert

This is the greatest movie ever made. 4 out of 5 stars.”),

Document(page_content=”Review of The Godfather

By Anonymous

This movie was super boring. 1 out of 5 stars.”, metadata={“reliable”: False}),


enhanced_documents = document_transformer.transform_documents(original_documents)

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param tagging_chain: Any = None

The chain used to extract metadata from each document.

async atransform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document][source]

Asynchronously transform a list of documents.


documents – A sequence of Documents to be transformed.


A list of transformed Documents.

