Get Started¶
This page demonstrates how to combine Graph Traversal and Vector Search using
langchain-graph-retriever
with langchain
.
Pre-requisites¶
We assume you already have a working langchain
installation, including an LLM and
embedding model as well as a supported vector store.
In that case, you only need to install langchain-graph-retriever
:
Preparing Data¶
Loading data is exactly the same as for whichever vector store you use. The main thing to consider is what structured information you wish to include in the metadata to support traversal.
For this guide, I have a JSON file with information about animals. Several example
entries are shown below. The actual file has one entry per line, making it easy to
load into Document
s.
{
"id": "alpaca",
"text": "alpacas are domesticated mammals valued for their soft wool and friendly demeanor.",
"metadata": {
"type": "mammal",
"number_of_legs": 4,
"keywords": ["wool", "domesticated", "friendly"],
"origin": "south america"
}
}
{
"id": "caribou",
"text": "caribou, also known as reindeer, are migratory mammals found in arctic regions.",
"metadata": {
"type": "mammal",
"number_of_legs": 4,
"keywords": ["migratory", "arctic", "herbivore", "tundra"],
"diet": "herbivorous"
}
}
{
"id": "cassowary",
"text": "cassowaries are flightless birds known for their colorful necks and powerful legs.",
"metadata": {
"type": "bird",
"number_of_legs": 2,
"keywords": ["flightless", "colorful", "powerful"],
"habitat": "rainforest"
}
}
from graph_rag_example_helpers.datasets.animals import fetch_documents
animals = fetch_documents()
Populating the Vector Store¶
The following shows how to populate a variety of vector stores with the animal data.
from langchain_community.vectorstores.cassandra import Cassandra
from langchain_openai import OpenAIEmbeddings
from langchain_graph_retriever.transformers import ShreddingTransformer
shredder = ShreddingTransformer() # (1)!
vector_store = Cassandra.from_documents(
documents=list(shredder.transform_documents(animals)),
embedding=OpenAIEmbeddings(),
table_name="animals",
)
- Since Cassandra doesn't index items in lists for querying, it is necessary to
shred metadata containing list to be queried. By default, the
ShreddingTransformer
shreds all keys. It may be configured to only shred those metadata keys used as edge targets.
from langchain_community.vectorstores import OpenSearchVectorSearch
from langchain_openai import OpenAIEmbeddings
vector_store = OpenSearchVectorSearch.from_documents(
opensearch_url=OPEN_SEARCH_URL,
index_name="animals",
embedding=OpenAIEmbeddings(),
engine="faiss",
documents=animals,
bulk_size=500, # (1)!
)
- There is currently a bug in the OpenSearchVectorStore implementation that requires this extra parameter.
from langchain_chroma.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_graph_retriever.transformers import ShreddingTransformer
shredder = ShreddingTransformer() # (1)!
vector_store = Chroma.from_documents(
documents=list(shredder.transform_documents(animals)),
embedding=OpenAIEmbeddings(),
collection_name_name="animals",
)
- Since Chroma doesn't index items in lists for querying, it is necessary to
shred metadata containing list to be queried. By default, the
ShreddingTransformer
shreds all keys. It may be configured to only shred those metadata keys used as edge targets.
Simple Traversal¶
For our first retrieval and graph traversal, we're going to start with a single animal best
matching the query, and then traverse to other animals with the same habitat
and/or origin
.
from graph_retriever.strategies import Eager
from langchain_graph_retriever import GraphRetriever
from langchain_graph_retriever.adapters.cassandra import CassandraAdapter
simple = GraphRetriever(
store = CassandraAdapter(vector_store, shredder, {"keywords"}),,
edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")],
strategy = Eager(k=10, start_k=1, depth=2),
)
from graph_retriever.strategies import Eager
from langchain_graph_retriever import GraphRetriever
from langchain_graph_retriever.adapters.chroma import ChromaAdapter
simple = GraphRetriever(
store = ChromaAdapter(vector_store, shredder, {"keywords"}),
edges = [("habitat", "habitat"), ("origin", "origin"), ("keywords", "keywords")],
strategy = Eager(k=10, start_k=1, depth=2),
)
Shredding
The above code is exactly the same for all stores, however adapters for shredded stores (Chroma and Apache Cassandra) require configuration to specify which metadata fields need to be rewritten when issuing queries.
The above creates a graph traversing retriever that starts with the nearest animal
(start_k=1
), retrieves 10 documents (k=10
) and limits the search to documents that
are at most 2 steps away from the first animal (depth=2
).
The edges define how metadata values can be used for traversal. In this case, every animal is connected to other animals with the same habitat and/or same origin.
simple_results = simple.invoke("what mammals could be found near a capybara")
for doc in simple_results:
print(f"{doc.id}: {doc.page_content}")
Visualizing¶
langchain-graph-retrievers
includes code for converting the document graph into a
networkx
graph, for rendering and other analysis. See @fig-document-graph