graph_rag_example_helpers¶
datasets ¶
animals ¶
fetch_documents ¶
fetch_documents() -> list[Document]
Download and parse a list of Documents for use with Graph Retriever.
This is a small example dataset with useful links.
This method downloads the dataset each time -- generally it is preferable to invoke this only once and store the documents in memory or a vector store.
RETURNS | DESCRIPTION |
---|---|
list[Document]
|
The fetched animal documents. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/animals/fetch.py
fetch ¶
fetch_documents ¶
fetch_documents() -> list[Document]
Download and parse a list of Documents for use with Graph Retriever.
This is a small example dataset with useful links.
This method downloads the dataset each time -- generally it is preferable to invoke this only once and store the documents in memory or a vector store.
RETURNS | DESCRIPTION |
---|---|
list[Document]
|
The fetched animal documents. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/animals/fetch.py
astrapy ¶
fetch_documents ¶
fetch_documents() -> list[Document]
Download and parse a list of Documents for use with Graph Retriever.
This dataset contains the documentation for the AstraPy project as of version 1.5.2.
This method downloads the dataset each time -- generally it is preferable to invoke this only once and store the documents in memory or a vector store.
RETURNS | DESCRIPTION |
---|---|
list[Document]
|
The fetched astra-py documentation Documents. |
Notes
- The dataset is setup in a way where the path of the item is the
id
, the pydoc description is thepage_content
, and the items other attributes are stored in themetadata
. - There are many documents that contain an id and metadata, but no page_content.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/astrapy/fetch.py
fetch ¶
fetch_documents ¶
fetch_documents() -> list[Document]
Download and parse a list of Documents for use with Graph Retriever.
This dataset contains the documentation for the AstraPy project as of version 1.5.2.
This method downloads the dataset each time -- generally it is preferable to invoke this only once and store the documents in memory or a vector store.
RETURNS | DESCRIPTION |
---|---|
list[Document]
|
The fetched astra-py documentation Documents. |
Notes
- The dataset is setup in a way where the path of the item is the
id
, the pydoc description is thepage_content
, and the items other attributes are stored in themetadata
. - There are many documents that contain an id and metadata, but no page_content.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/astrapy/fetch.py
wikimultihop ¶
BatchPreparer
module-attribute
¶
Function to apply to batches of lines to produce the document.
aload_2wikimultihop
async
¶
aload_2wikimultihop(
limit: int | None,
*,
full_para_with_hyperlink_zip_path: str,
store: VectorStore,
batch_prepare: BatchPreparer,
) -> None
Load 2wikimultihop data into the given VectorStore
.
PARAMETER | DESCRIPTION |
---|---|
limit
|
Maximum number of lines to load.
If a number less than one thousand, limits loading to the given number of lines.
If
TYPE:
|
full_para_with_hyperlink_zip_path
|
Path to
TYPE:
|
store
|
The VectorStore to populate.
TYPE:
|
batch_prepare
|
Function to apply to batches of lines to produce the document.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/wikimultihop/load.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
|
load ¶
BatchPreparer
module-attribute
¶
Function to apply to batches of lines to produce the document.
aload_2wikimultihop
async
¶
aload_2wikimultihop(
limit: int | None,
*,
full_para_with_hyperlink_zip_path: str,
store: VectorStore,
batch_prepare: BatchPreparer,
) -> None
Load 2wikimultihop data into the given VectorStore
.
PARAMETER | DESCRIPTION |
---|---|
limit
|
Maximum number of lines to load.
If a number less than one thousand, limits loading to the given number of lines.
If
TYPE:
|
full_para_with_hyperlink_zip_path
|
Path to
TYPE:
|
store
|
The VectorStore to populate.
TYPE:
|
batch_prepare
|
Function to apply to batches of lines to produce the document.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/wikimultihop/load.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
|
wikipedia_lines ¶
Return iterable of lines from the wikipedia file.
PARAMETER | DESCRIPTION |
---|---|
para_with_hyperlink_zip_path
|
Path to
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
str
|
Lines from the Wikipedia file. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/wikimultihop/load.py
env ¶
NON_SECRETS
module-attribute
¶
Environment variables that can use input
instead of getpass
.
Environment ¶
Bases: Enum
Enumeration of supported environments for examples.
ASTRAPY
class-attribute
instance-attribute
¶
ASTRAPY = auto()
Environment variables for connecting to AstraDB via AstraPy
CASSIO
class-attribute
instance-attribute
¶
CASSIO = auto()
Environment variables for connecting to AstraDB via CassIO
required_envvars ¶
Return the required environment variables for this environment.
RETURNS | DESCRIPTION |
---|---|
list[str]
|
The environment variables required in this environment. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the environment isn't recognized. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
initialize_environment ¶
initialize_environment(env: Environment = CASSIO)
Initialize the environment variables.
PARAMETER | DESCRIPTION |
---|---|
env
|
The environment to initialize
TYPE:
|
Notes
This uses the following:
1. If a `.env` file is found, load environment variables from that.
2. If not, and running in colab, set necessary environment variables from
secrets.
3. If necessary variables aren't set by the above, then prompts the user.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
initialize_from_colab_userdata ¶
initialize_from_colab_userdata(env: Environment = CASSIO)
Try to initialize environment from colab userdata
.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
initialize_from_prompts ¶
initialize_from_prompts(env: Environment = CASSIO)
Initialize the environment by prompting the user.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
verify_environment ¶
verify_environment(env: Environment = CASSIO)
Verify the necessary environment variables are set.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
examples ¶
code_generation ¶
format_docs ¶
Format documents as documentation for including as context in a LLM query.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/examples/code_generation/format.py
format_document ¶
Format a document as documentation for including as context in a LLM query.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/examples/code_generation/format.py
converter ¶
convert ¶
convert(
package_name: str,
search_paths: list[str],
docstring_parser: DocstringStyle,
output_path: str,
) -> None
Load and convert a package's objects and documentation into a JSONL file.
This method converts the internal documentation of modules, classes, functions, and attributes of a package into a format that is better suited for RAG (and GraphRAG in particular).
The code uses the griffe
library, which is a Python code analysis tool that
extracts information from Python code and docstrings.
The JSONL file contains one JSON object per line, with the following structure:
id: the path to the object in the package
text: the description of the object (if any, can be empty)
metadata: Always includes name
, path
, kind
keys.
The remaining keys below are included when available.
name: the name of the object
path: the path to the object in the package
kind: either module
, class
, function
, or attribute
parameters: the parameters for a class or function. Includes type
information, default values, and descriptions
attributes: the attributes on a class or module. Includes type
information and descriptions
gathered_types: list of non-standard types in the parameters and attributes
imports: list of non-standard types imported by the class or module
exports: list of non-standard types exported by the module
properties: list of boolean properties about the module
example: any code examples for the class, function, or module
references: list of any non-standard types used in the example code
returns: the return type and description
yields: the yield type and description
bases: list of base types inherited by the class
implemented_by: list of types that implement the a base class
PARAMETER | DESCRIPTION |
---|---|
package_name
|
The name of the package to convert.
TYPE:
|
search_paths
|
The paths to search for the package. |
docstring_parser
|
The docstring parser to use.
TYPE:
|
output_path
|
The path to save the JSONL file.
TYPE:
|
Examples:
from graph_rag_example_helpers.examples.code_generation.converter import convert convert("astrapy", [".venv/lib/python3.12/site-packages"], "google", "data")
Notes
- This code was written the
code-generation
example andastrapy==1.5.2
. It will probably need tweaking for use with other python packages. Use at your own risk.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/examples/code_generation/converter.py
format ¶
format_docs ¶
Format documents as documentation for including as context in a LLM query.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/examples/code_generation/format.py
format_document ¶
Format a document as documentation for including as context in a LLM query.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/examples/code_generation/format.py
persistent_iteration ¶
PersistentIteration ¶
Bases: Generic[T]
Create a persistent iteration.
This creates a journal file with the name journal_name
containing the indices
of completed items. When resuming iteration, the already processed indices will
be skipped.
PARAMETER | DESCRIPTION |
---|---|
journal_name
|
Name of the journal file to use. If it doesn't exist it will be created. The indices of completed items will be written to the journal.
TYPE:
|
iterator
|
The iterator to process persistently. It must be deterministic -- elements should always be returned in the same order on restarts.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/persistent_iteration.py
__iter__ ¶
__next__ ¶
Return the next offset and item.
RETURNS | DESCRIPTION |
---|---|
offset
|
The offset of the next item. Should be acknowledge after the item is finished processing.
TYPE:
|
item
|
The next item.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/persistent_iteration.py
ack ¶
Acknowledge the given offset.
This should only be called after the elements in that offset have been persisted.
PARAMETER | DESCRIPTION |
---|---|
offset
|
The offset to acknowledge.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
The numebr of pending elements. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/persistent_iteration.py
completed_count ¶
completed_count() -> int
Return the numebr of completed elements.
RETURNS | DESCRIPTION |
---|---|
int
|
The number of completed elements. |