graph_rag_example_helpers¶
datasets ¶
animals ¶
fetch_documents ¶
fetch_documents() -> list[Document]
Download and parse a list of Documents for use with Graph Retriever.
This is a small example dataset with useful links.
This method downloads the dataset each time -- generally it is preferable to invoke this only once and store the documents in memory or a vector store.
RETURNS | DESCRIPTION |
---|---|
list[Document]
|
The fetched animal documents. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/animals/fetch.py
fetch ¶
fetch_documents ¶
fetch_documents() -> list[Document]
Download and parse a list of Documents for use with Graph Retriever.
This is a small example dataset with useful links.
This method downloads the dataset each time -- generally it is preferable to invoke this only once and store the documents in memory or a vector store.
RETURNS | DESCRIPTION |
---|---|
list[Document]
|
The fetched animal documents. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/animals/fetch.py
wikimultihop ¶
BatchPreparer
module-attribute
¶
Function to apply to batches of lines to produce the document.
aload_2wikimultihop
async
¶
aload_2wikimultihop(
limit: int | None,
*,
full_para_with_hyperlink_zip_path: str,
store: VectorStore,
batch_prepare: BatchPreparer,
) -> None
Load 2wikimultihop data into the given VectorStore
.
PARAMETER | DESCRIPTION |
---|---|
limit
|
Maximum number of lines to load.
If a number less than one thousand, limits loading to the given number of lines.
If
TYPE:
|
full_para_with_hyperlink_zip_path
|
Path to
TYPE:
|
store
|
The VectorStore to populate.
TYPE:
|
batch_prepare
|
Function to apply to batches of lines to produce the document.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/wikimultihop/load.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
|
load ¶
BatchPreparer
module-attribute
¶
Function to apply to batches of lines to produce the document.
aload_2wikimultihop
async
¶
aload_2wikimultihop(
limit: int | None,
*,
full_para_with_hyperlink_zip_path: str,
store: VectorStore,
batch_prepare: BatchPreparer,
) -> None
Load 2wikimultihop data into the given VectorStore
.
PARAMETER | DESCRIPTION |
---|---|
limit
|
Maximum number of lines to load.
If a number less than one thousand, limits loading to the given number of lines.
If
TYPE:
|
full_para_with_hyperlink_zip_path
|
Path to
TYPE:
|
store
|
The VectorStore to populate.
TYPE:
|
batch_prepare
|
Function to apply to batches of lines to produce the document.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/wikimultihop/load.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
|
wikipedia_lines ¶
Return iterable of lines from the wikipedia file.
PARAMETER | DESCRIPTION |
---|---|
para_with_hyperlink_zip_path
|
Path to
TYPE:
|
YIELDS | DESCRIPTION |
---|---|
str
|
Lines from the Wikipedia file. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/datasets/wikimultihop/load.py
env ¶
NON_SECRETS
module-attribute
¶
Environment variables that can use input
instead of getpass
.
Environment ¶
Bases: Enum
Enumeration of supported environments for examples.
ASTRAPY
class-attribute
instance-attribute
¶
ASTRAPY = auto()
Environment variables for connecting to AstraDB via AstraPy
CASSIO
class-attribute
instance-attribute
¶
CASSIO = auto()
Environment variables for connecting to AstraDB via CassIO
required_envvars ¶
Return the required environment variables for this environment.
RETURNS | DESCRIPTION |
---|---|
list[str]
|
The environment variables required in this environment. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If the environment isn't recognized. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
initialize_environment ¶
initialize_environment(env: Environment = CASSIO)
Initialize the environment variables.
PARAMETER | DESCRIPTION |
---|---|
env
|
The environment to initialize
TYPE:
|
Notes
This uses the following:
1. If a `.env` file is found, load environment variables from that.
2. If not, and running in colab, set necessary environment variables from
secrets.
3. If necessary variables aren't set by the above, then prompts the user.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
initialize_from_colab_userdata ¶
initialize_from_colab_userdata(env: Environment = CASSIO)
Try to initialize environment from colab userdata
.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
initialize_from_prompts ¶
initialize_from_prompts(env: Environment = CASSIO)
Initialize the environment by prompting the user.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
verify_environment ¶
verify_environment(env: Environment = CASSIO)
Verify the necessary environment variables are set.
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/env.py
persistent_iteration ¶
PersistentIteration ¶
Bases: Generic[T]
Create a persistent iteration.
This creates a journal file with the name journal_name
containing the indices
of completed items. When resuming iteration, the already processed indices will
be skipped.
PARAMETER | DESCRIPTION |
---|---|
journal_name
|
Name of the journal file to use. If it doesn't exist it will be created. The indices of completed items will be written to the journal.
TYPE:
|
iterator
|
The iterator to process persistently. It must be deterministic -- elements should always be returned in the same order on restarts.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/persistent_iteration.py
__iter__ ¶
__next__ ¶
Return the next offset and item.
RETURNS | DESCRIPTION |
---|---|
offset
|
The offset of the next item. Should be acknowledge after the item is finished processing.
TYPE:
|
item
|
The next item.
TYPE:
|
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/persistent_iteration.py
ack ¶
Acknowledge the given offset.
This should only be called after the elements in that offset have been persisted.
PARAMETER | DESCRIPTION |
---|---|
offset
|
The offset to acknowledge.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
int
|
The numebr of pending elements. |
Source code in packages/graph-rag-example-helpers/src/graph_rag_example_helpers/persistent_iteration.py
completed_count ¶
completed_count() -> int
Return the numebr of completed elements.
RETURNS | DESCRIPTION |
---|---|
int
|
The number of completed elements. |