`langchain_community.utilities.spark_sql`.SparkSQL¶

class langchain_community.utilities.spark_sql.SparkSQL(spark_session: Optional[SparkSession] = None, catalog: Optional[str] = None, schema: Optional[str] = None, ignore_tables: Optional[List[str]] = None, include_tables: Optional[List[str]] = None, sample_rows_in_table_info: int = 3)[source]¶

SparkSQL is a utility class for interacting with Spark SQL.

Initialize a SparkSQL object.

Parameters

spark_session (Optional[SparkSession]) – A SparkSession object. If not provided, one will be created.
catalog (Optional[str]) – The catalog to use. If not provided, the default catalog will be used.
schema (Optional[str]) – The schema to use. If not provided, the default schema will be used.
ignore_tables (Optional[List[str]]) – A list of tables to ignore. If not provided, all tables will be used.
include_tables (Optional[List[str]]) – A list of tables to include. If not provided, all tables will be used.
sample_rows_in_table_info (int) – The number of rows to include in the table info. Defaults to 3.

Methods

`__init__`([spark_session, catalog, schema, ...])	Initialize a SparkSQL object.
`from_uri`(database_uri[, engine_args])	Creating a remote Spark Session via Spark connect.
`get_table_info`([table_names])
`get_table_info_no_throw`([table_names])	Get information about specified tables.
`get_usable_table_names`()	Get names of tables available.
`run`(command[, fetch])
`run_no_throw`(command[, fetch])	Execute a SQL command and return a string representing the results.

__init__(spark_session: Optional[SparkSession] = None, catalog: Optional[str] = None, schema: Optional[str] = None, ignore_tables: Optional[List[str]] = None, include_tables: Optional[List[str]] = None, sample_rows_in_table_info: int = 3)[source]¶

Initialize a SparkSQL object.

Parameters

spark_session (Optional[SparkSession]) – A SparkSession object. If not provided, one will be created.
catalog (Optional[str]) – The catalog to use. If not provided, the default catalog will be used.
schema (Optional[str]) – The schema to use. If not provided, the default schema will be used.
ignore_tables (Optional[List[str]]) – A list of tables to ignore. If not provided, all tables will be used.
include_tables (Optional[List[str]]) – A list of tables to include. If not provided, all tables will be used.
sample_rows_in_table_info (int) – The number of rows to include in the table info. Defaults to 3.

classmethod from_uri(database_uri: str, engine_args: Optional[dict] = None, **kwargs: Any) → SparkSQL[source]¶

Creating a remote Spark Session via Spark connect. For example: SparkSQL.from_uri(“sc://localhost:15002”)

Parameters

database_uri (str) –
engine_args (Optional[dict]) –
kwargs (Any) –

Return type

SparkSQL

get_table_info(table_names: Optional[List[str]] = None) → str[source]¶

Parameters: table_names (Optional[List[str]]) –
Return type: str

get_table_info_no_throw(table_names: Optional[List[str]] = None) → str[source]¶

Get information about specified tables.

Follows best practices as specified in: Rajkumar et al, 2022 (https://arxiv.org/abs/2204.00498)

If sample_rows_in_table_info, the specified number of sample rows will be appended to each table description. This can increase performance as demonstrated in the paper.

Parameters: table_names (Optional[List[str]]) –
Return type: str

get_usable_table_names() → Iterable[str][source]¶

Get names of tables available.

Return type: Iterable[str]

run(command: str, fetch: str = 'all') → str[source]¶

Parameters

command (str) –
fetch (str) –

Return type

str

run_no_throw(command: str, fetch: str = 'all') → str[source]¶

Execute a SQL command and return a string representing the results.

If the statement returns rows, a string of the results is returned. If the statement returns no rows, an empty string is returned.

If the statement throws an error, the error message is returned.

Parameters

command (str) –
fetch (str) –

Return type

str

Examples using SparkSQL¶

Spark SQL

langchain_community.utilities.spark_sql.SparkSQL¶

Examples using SparkSQL¶

`langchain_community.utilities.spark_sql`.SparkSQL¶