com.datastax.spark.connector

RDDFunctions

class RDDFunctions[T] extends WritableToCassandra[T] with Serializable

Provides Cassandra-specific methods on RDD

Linear Supertypes
Serializable, Serializable, WritableToCassandra[T], AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. RDDFunctions
  2. Serializable
  3. Serializable
  4. WritableToCassandra
  5. AnyRef
  6. Any
Implicitly
  1. by any2stringadd
  2. by any2stringfmt
  3. by any2ArrowAssoc
  4. by any2Ensuring
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RDDFunctions(rdd: RDD[T])

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. def +(other: String): String

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to StringAdd performed by method any2stringadd in scala.Predef.
    Definition Classes
    StringAdd
  5. def ->[B](y: B): (RDDFunctions[T], B)

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to ArrowAssoc[RDDFunctions[T]] performed by method any2ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @inline()
  6. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  7. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  8. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  9. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. def ensuring(cond: (RDDFunctions[T]) ⇒ Boolean, msg: ⇒ Any): RDDFunctions[T]

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to Ensuring[RDDFunctions[T]] performed by method any2Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  11. def ensuring(cond: (RDDFunctions[T]) ⇒ Boolean): RDDFunctions[T]

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to Ensuring[RDDFunctions[T]] performed by method any2Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  12. def ensuring(cond: Boolean, msg: ⇒ Any): RDDFunctions[T]

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to Ensuring[RDDFunctions[T]] performed by method any2Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  13. def ensuring(cond: Boolean): RDDFunctions[T]

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to Ensuring[RDDFunctions[T]] performed by method any2Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  14. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  16. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. def formatted(fmtstr: String): String

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to StringFormat performed by method any2stringfmt in scala.Predef.
    Definition Classes
    StringFormat
    Annotations
    @inline()
  18. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  19. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  20. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  21. def joinWithCassandraTable[R](keyspaceName: String, tableName: String, selectedColumns: ColumnSelector = AllColumns, joinColumns: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = ..., newType: ClassTag[R], rrf: RowReaderFactory[R], ev: ValidRDDType[R], currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraJoinRDD[T, R]

    Uses the data from RDD to join with a Cassandra table without retrieving the entire table.

    Uses the data from RDD to join with a Cassandra table without retrieving the entire table. Any RDD which can be used to saveToCassandra can be used to joinWithCassandra as well as any RDD which only specifies the partition Key of a Cassandra Table. This method executes single partition requests against the Cassandra Table and accepts the functional modifiers that a normal com.datastax.spark.connector.rdd.CassandraTableScanRDD takes.

    By default this method only uses the Partition Key for joining but any combination of columns which are acceptable to C* can be used in the join. Specify columns using joinColumns as a parameter or the on() method.

    Example With Prior Repartitioning:

    val source = sc.parallelize(keys).map(x => new KVRow(x))
    val repart = source.repartitionByCassandraReplica(keyspace, tableName, 10)
    val someCass = repart.joinWithCassandraTable(keyspace, tableName)

    Example Joining on Clustering Columns:

    val source = sc.parallelize(keys).map(x => (x, x * 100))
    val someCass = source.joinWithCassandraTable(keyspace, wideTable).on(SomeColumns("key", "group"))
  22. def keyByCassandraReplica(keyspaceName: String, tableName: String, partitionKeyMapper: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = ..., currentType: ClassTag[T], rwf: RowWriterFactory[T]): RDD[(Set[InetAddress], T)]

    Key every row in the RDD by with the IP Adresses of all of the Cassandra nodes which a contain a replica of the data specified by that row.

    Key every row in the RDD by with the IP Adresses of all of the Cassandra nodes which a contain a replica of the data specified by that row. The calling RDD must have rows that can be converted into the partition key of the given Cassandra Table.

  23. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  24. final def notify(): Unit

    Definition Classes
    AnyRef
  25. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  26. def repartitionByCassandraReplica(keyspaceName: String, tableName: String, partitionsPerHost: Int = 10, partitionKeyMapper: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = ..., currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraPartitionedRDD[T]

    Repartitions the data (via a shuffle) based upon the replication of the given keyspaceName and tableName.

    Repartitions the data (via a shuffle) based upon the replication of the given keyspaceName and tableName. Calling this method before using joinWithCassandraTable will ensure that requests will be coordinator local. partitionsPerHost Controls the number of Spark Partitions that will be created in this repartitioning event. The calling RDD must have rows that can be converted into the partition key of the given Cassandra Table.

  27. def saveAsCassandraTable(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = ..., rwf: RowWriterFactory[T], columnMapper: ColumnMapper[T]): Unit

    Saves the data from RDD to a new table with definition taken from the ColumnMapper for this class.

    Saves the data from RDD to a new table with definition taken from the ColumnMapper for this class.

    keyspaceName

    keyspace where to create a new table

    tableName

    name of the table to create; the table must not exist

    columns

    Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.

    writeConf

    additional configuration object allowing to set consistency level, batch size, etc.

    connector

    optional, implicit connector to Cassandra

    rwf

    factory for obtaining the row writer to be used to extract column values from items of the RDD

    columnMapper

    a column mapper determining the definition of the table

  28. def saveAsCassandraTableEx(table: TableDef, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = ..., rwf: RowWriterFactory[T]): Unit

    Saves the data from RDD to a new table defined by the given TableDef.

    Saves the data from RDD to a new table defined by the given TableDef.

    First it creates a new table with all columns from the TableDef and then it saves RDD content in the same way as saveToCassandra. The table must not exist prior to this call.

    table

    table definition used to create a new table

    columns

    Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.

    writeConf

    additional configuration object allowing to set consistency level, batch size, etc.

    connector

    optional, implicit connector to Cassandra

    rwf

    factory for obtaining the row writer to be used to extract column values from items of the RDD

  29. def saveToCassandra(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = ..., rwf: RowWriterFactory[T]): Unit

    Saves the data from RDD to a Cassandra table.

    Saves the data from RDD to a Cassandra table. Uses the specified column names.

    keyspaceName

    the name of the Keyspace to use

    tableName

    the name of the Table to use

    writeConf

    additional configuration object allowing to set consistency level, batch size, etc.

    Definition Classes
    RDDFunctionsWritableToCassandra
    See also

    com.datastax.spark.connector.writer.WritableToCassandra

  30. def spanBy[U](f: (T) ⇒ U): RDD[(U, Iterable[T])]

    Applies a function to each item, and groups consecutive items having the same value together.

    Applies a function to each item, and groups consecutive items having the same value together. Contrary to groupBy, items from the same group must be already next to each other in the original collection. Works locally on each partition, so items from different partitions will never be placed in the same group.

  31. val sparkContext: SparkContext

    Definition Classes
    RDDFunctionsWritableToCassandra
  32. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  33. def toString(): String

    Definition Classes
    AnyRef → Any
  34. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. def [B](y: B): (RDDFunctions[T], B)

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to ArrowAssoc[RDDFunctions[T]] performed by method any2ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc

Shadowed Implicit Value Members

  1. val self: Any

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to StringAdd performed by method any2stringadd in scala.Predef.
    Shadowing
    This implicitly inherited member is ambiguous. One or more implicitly inherited members have similar signatures, so calling this member may produce an ambiguous implicit conversion compiler error.
    To access this member you can use a type ascription:
    (rDDFunctions: StringAdd).self
    Definition Classes
    StringAdd
  2. val self: Any

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to StringFormat performed by method any2stringfmt in scala.Predef.
    Shadowing
    This implicitly inherited member is ambiguous. One or more implicitly inherited members have similar signatures, so calling this member may produce an ambiguous implicit conversion compiler error.
    To access this member you can use a type ascription:
    (rDDFunctions: StringFormat).self
    Definition Classes
    StringFormat

Deprecated Value Members

  1. def x: RDDFunctions[T]

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to ArrowAssoc[RDDFunctions[T]] performed by method any2ArrowAssoc in scala.Predef.
    Shadowing
    This implicitly inherited member is ambiguous. One or more implicitly inherited members have similar signatures, so calling this member may produce an ambiguous implicit conversion compiler error.
    To access this member you can use a type ascription:
    (rDDFunctions: ArrowAssoc[RDDFunctions[T]]).x
    Definition Classes
    ArrowAssoc
    Annotations
    @deprecated
    Deprecated

    (Since version 2.10.0) Use leftOfArrow instead

  2. def x: RDDFunctions[T]

    Implicit information
    This member is added by an implicit conversion from RDDFunctions[T] to Ensuring[RDDFunctions[T]] performed by method any2Ensuring in scala.Predef.
    Shadowing
    This implicitly inherited member is ambiguous. One or more implicitly inherited members have similar signatures, so calling this member may produce an ambiguous implicit conversion compiler error.
    To access this member you can use a type ascription:
    (rDDFunctions: Ensuring[RDDFunctions[T]]).x
    Definition Classes
    Ensuring
    Annotations
    @deprecated
    Deprecated

    (Since version 2.10.0) Use resultOfEnsuring instead

Inherited from Serializable

Inherited from Serializable

Inherited from WritableToCassandra[T]

Inherited from AnyRef

Inherited from Any

Inherited by implicit conversion any2stringadd from RDDFunctions[T] to StringAdd

Inherited by implicit conversion any2stringfmt from RDDFunctions[T] to StringFormat

Inherited by implicit conversion any2ArrowAssoc from RDDFunctions[T] to ArrowAssoc[RDDFunctions[T]]

Inherited by implicit conversion any2Ensuring from RDDFunctions[T] to Ensuring[RDDFunctions[T]]

Ungrouped