langchain_community.utilities.spark_sql
.SparkSQL¶
- class langchain_community.utilities.spark_sql.SparkSQL(spark_session: Optional[SparkSession] = None, catalog: Optional[str] = None, schema: Optional[str] = None, ignore_tables: Optional[List[str]] = None, include_tables: Optional[List[str]] = None, sample_rows_in_table_info: int = 3)[source]¶
SparkSQL is a utility class for interacting with Spark SQL.
Initialize a SparkSQL object.
- Parameters
spark_session (Optional[SparkSession]) – A SparkSession object. If not provided, one will be created.
catalog (Optional[str]) – The catalog to use. If not provided, the default catalog will be used.
schema (Optional[str]) – The schema to use. If not provided, the default schema will be used.
ignore_tables (Optional[List[str]]) – A list of tables to ignore. If not provided, all tables will be used.
include_tables (Optional[List[str]]) – A list of tables to include. If not provided, all tables will be used.
sample_rows_in_table_info (int) – The number of rows to include in the table info. Defaults to 3.
Methods
__init__
([spark_session, catalog, schema, ...])Initialize a SparkSQL object.
from_uri
(database_uri[, engine_args])Creating a remote Spark Session via Spark connect.
get_table_info
([table_names])get_table_info_no_throw
([table_names])Get information about specified tables.
Get names of tables available.
run
(command[, fetch])run_no_throw
(command[, fetch])Execute a SQL command and return a string representing the results.
- __init__(spark_session: Optional[SparkSession] = None, catalog: Optional[str] = None, schema: Optional[str] = None, ignore_tables: Optional[List[str]] = None, include_tables: Optional[List[str]] = None, sample_rows_in_table_info: int = 3)[source]¶
Initialize a SparkSQL object.
- Parameters
spark_session (Optional[SparkSession]) – A SparkSession object. If not provided, one will be created.
catalog (Optional[str]) – The catalog to use. If not provided, the default catalog will be used.
schema (Optional[str]) – The schema to use. If not provided, the default schema will be used.
ignore_tables (Optional[List[str]]) – A list of tables to ignore. If not provided, all tables will be used.
include_tables (Optional[List[str]]) – A list of tables to include. If not provided, all tables will be used.
sample_rows_in_table_info (int) – The number of rows to include in the table info. Defaults to 3.
- classmethod from_uri(database_uri: str, engine_args: Optional[dict] = None, **kwargs: Any) SparkSQL [source]¶
Creating a remote Spark Session via Spark connect. For example: SparkSQL.from_uri(“sc://localhost:15002”)
- Parameters
database_uri (str) –
engine_args (Optional[dict]) –
kwargs (Any) –
- Return type
- get_table_info(table_names: Optional[List[str]] = None) str [source]¶
- Parameters
table_names (Optional[List[str]]) –
- Return type
str
- get_table_info_no_throw(table_names: Optional[List[str]] = None) str [source]¶
Get information about specified tables.
Follows best practices as specified in: Rajkumar et al, 2022 (https://arxiv.org/abs/2204.00498)
If sample_rows_in_table_info, the specified number of sample rows will be appended to each table description. This can increase performance as demonstrated in the paper.
- Parameters
table_names (Optional[List[str]]) –
- Return type
str
- get_usable_table_names() Iterable[str] [source]¶
Get names of tables available.
- Return type
Iterable[str]
- run(command: str, fetch: str = 'all') str [source]¶
- Parameters
command (str) –
fetch (str) –
- Return type
str
- run_no_throw(command: str, fetch: str = 'all') str [source]¶
Execute a SQL command and return a string representing the results.
If the statement returns rows, a string of the results is returned. If the statement returns no rows, an empty string is returned.
If the statement throws an error, the error message is returned.
- Parameters
command (str) –
fetch (str) –
- Return type
str