class RDDFunctions[T] extends WritableToCassandra[T] with Serializable
Provides Cassandra-specific methods on RDD
- Alphabetic
- By Inheritance
- RDDFunctions
- Serializable
- Serializable
- WritableToCassandra
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new RDDFunctions(rdd: RDD[T])
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
deleteFromCassandra(keyspaceName: String, tableName: String, deleteColumns: ColumnSelector = SomeColumns(), keyColumns: ColumnSelector = PrimaryKeyColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit
Delete data from Cassandra table, using data from the RDD as primary keys.
Delete data from Cassandra table, using data from the RDD as primary keys. Uses the specified column names.
- keyspaceName
the name of the Keyspace to use
- tableName
the name of the Table to use
- deleteColumns
The list of column names to delete, empty ColumnSelector means full row.
- keyColumns
Primary key columns selector, Optional. All RDD primary columns columns will be checked by default
- writeConf
additional configuration object allowing to set consistency level, batch size, etc.
- Definition Classes
- RDDFunctions → WritableToCassandra
- See also
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
joinWithCassandraTable[R](keyspaceName: String, tableName: String, selectedColumns: ColumnSelector = AllColumns, joinColumns: ColumnSelector = PartitionKeyColumns, readConf: ReadConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), newType: ClassTag[R], rrf: RowReaderFactory[R], ev: ValidRDDType[R], currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraJoinRDD[T, R]
Uses the data from RDD to join with a Cassandra table without retrieving the entire table.
Uses the data from RDD to join with a Cassandra table without retrieving the entire table. Any RDD which can be used to saveToCassandra can be used to joinWithCassandra as well as any RDD which only specifies the partition Key of a Cassandra Table. This method executes single partition requests against the Cassandra Table and accepts the functional modifiers that a normal com.datastax.spark.connector.rdd.CassandraTableScanRDD takes.
By default this method only uses the Partition Key for joining but any combination of columns which are acceptable to C* can be used in the join. Specify columns using joinColumns as a parameter or the on() method.
Example With Prior Repartitioning:
val source = sc.parallelize(keys).map(x => new KVRow(x)) val repart = source.repartitionByCassandraReplica(keyspace, tableName, 10) val someCass = repart.joinWithCassandraTable(keyspace, tableName)
Example Joining on Clustering Columns:
val source = sc.parallelize(keys).map(x => (x, x * 100)) val someCass = source.joinWithCassandraTable(keyspace, wideTable).on(SomeColumns("key", "group"))
-
def
keyByCassandraReplica(keyspaceName: String, tableName: String, partitionKeyMapper: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), currentType: ClassTag[T], rwf: RowWriterFactory[T]): RDD[(Set[InetAddress], T)]
Key every row in the RDD by with the IP Adresses of all of the Cassandra nodes which a contain a replica of the data specified by that row.
Key every row in the RDD by with the IP Adresses of all of the Cassandra nodes which a contain a replica of the data specified by that row. The calling RDD must have rows that can be converted into the partition key of the given Cassandra Table.
-
def
leftJoinWithCassandraTable[R](keyspaceName: String, tableName: String, selectedColumns: ColumnSelector = AllColumns, joinColumns: ColumnSelector = PartitionKeyColumns, readConf: ReadConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), newType: ClassTag[R], rrf: RowReaderFactory[R], ev: ValidRDDType[R], currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraLeftJoinRDD[T, R]
Uses the data from RDD to left join with a Cassandra table without retrieving the entire table.
Uses the data from RDD to left join with a Cassandra table without retrieving the entire table. Any RDD which can be used to saveToCassandra can be used to leftJoinWithCassandra as well as any RDD which only specifies the partition Key of a Cassandra Table. This method executes single partition requests against the Cassandra Table and accepts the functional modifiers that a normal com.datastax.spark.connector.rdd.CassandraTableScanRDD takes.
By default this method only uses the Partition Key for joining but any combination of columns which are acceptable to C* can be used in the join. Specify columns using joinColumns as a parameter or the on() method.
Example With Prior Repartitioning:
val source = sc.parallelize(keys).map(x => new KVRow(x)) val repart = source.repartitionByCassandraReplica(keyspace, tableName, 10) val someCass = repart.leftJoinWithCassandraTable(keyspace, tableName)
Example Joining on Clustering Columns:
val source = sc.parallelize(keys).map(x => (x, x * 100)) val someCass = source.leftJoinWithCassandraTable(keyspace, wideTable).on(SomeColumns("key", "group"))
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
repartitionByCassandraReplica(keyspaceName: String, tableName: String, partitionsPerHost: Int = 10, partitionKeyMapper: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraPartitionedRDD[T]
Repartitions the data (via a shuffle) based upon the replication of the given
keyspaceName
andtableName
.Repartitions the data (via a shuffle) based upon the replication of the given
keyspaceName
andtableName
. Calling this method before using joinWithCassandraTable will ensure that requests will be coordinator local.partitionsPerHost
Controls the number of Spark Partitions that will be created in this repartitioning event. The calling RDD must have rows that can be converted into the partition key of the given Cassandra Table. -
def
saveAsCassandraTable(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T], columnMapper: ColumnMapper[T]): Unit
Saves the data from RDD to a new table with definition taken from the
ColumnMapper
for this class.Saves the data from RDD to a new table with definition taken from the
ColumnMapper
for this class.- keyspaceName
keyspace where to create a new table
- tableName
name of the table to create; the table must not exist
- columns
Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.
- writeConf
additional configuration object allowing to set consistency level, batch size, etc.
- connector
optional, implicit connector to Cassandra
- rwf
factory for obtaining the row writer to be used to extract column values from items of the RDD
- columnMapper
a column mapper determining the definition of the table
-
def
saveAsCassandraTableEx(table: TableDef, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit
Saves the data from RDD to a new table defined by the given
TableDef
.Saves the data from RDD to a new table defined by the given
TableDef
.First it creates a new table with all columns from the
TableDef
and then it saves RDD content in the same way as saveToCassandra. The table must not exist prior to this call.- table
table definition used to create a new table
- columns
Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.
- writeConf
additional configuration object allowing to set consistency level, batch size, etc.
- connector
optional, implicit connector to Cassandra
- rwf
factory for obtaining the row writer to be used to extract column values from items of the RDD
-
def
saveToCassandra(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit
Saves the data from RDD to a Cassandra table.
Saves the data from RDD to a Cassandra table. Uses the specified column names.
- keyspaceName
the name of the Keyspace to use
- tableName
the name of the Table to use
- writeConf
additional configuration object allowing to set consistency level, batch size, etc.
- Definition Classes
- RDDFunctions → WritableToCassandra
- See also
-
def
spanBy[U](f: (T) ⇒ U): RDD[(U, Iterable[T])]
Applies a function to each item, and groups consecutive items having the same value together.
Applies a function to each item, and groups consecutive items having the same value together. Contrary to
groupBy
, items from the same group must be already next to each other in the original collection. Works locally on each partition, so items from different partitions will never be placed in the same group. -
val
sparkContext: SparkContext
- Definition Classes
- RDDFunctions → WritableToCassandra
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()