class CassandraTableScanRDD[R] extends CassandraRDD[R] with CassandraTableRowReaderProvider[R]
RDD representing a Table Scan of A Cassandra table.
This class is the main entry point for analyzing data in Cassandra database with Spark. Obtain objects of this class by calling com.datastax.spark.connector.SparkContextFunctions.cassandraTable.
Configuration properties should be passed in the SparkConf
configuration of SparkContext.
CassandraRDD
needs to open connection to Cassandra, therefore it requires appropriate
connection property values to be present in SparkConf.
For the list of required and available properties, see
CassandraConnector.
CassandraRDD
divides the data set into smaller partitions, processed locally on every
cluster node. A data partition consists of one or more contiguous token ranges.
To reduce the number of roundtrips to Cassandra, every partition is fetched in batches.
The following properties control the number of partitions and the fetch size: - spark.cassandra.input.split.sizeInMB: approx amount of data to be fetched into a single Spark partition, default 512 MB - spark.cassandra.input.fetch.sizeInRows: number of CQL rows fetched per roundtrip, default 1000
A CassandraRDD
object gets serialized and sent to every Spark Executor, which then
calls the compute
method to fetch the data on every node. The getPreferredLocations
method tells Spark the preferred nodes to fetch a partition from, so that the data for
the partition are at the same node the task was sent to. If Cassandra nodes are collocated
with Spark nodes, the queries are always sent to the Cassandra process running on the same
node as the Spark Executor process, hence data are not transferred between nodes.
If a Cassandra node fails or gets overloaded during read, the queries are retried
to a different node.
By default, reads are performed at ConsistencyLevel.LOCAL_ONE in order to leverage data-locality and minimize network traffic. This read consistency level is controlled by the spark.cassandra.input.consistency.level property.
- Alphabetic
- By Inheritance
- CassandraTableScanRDD
- CassandraTableRowReaderProvider
- CassandraRDD
- RDD
- Logging
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
type
Self = CassandraTableScanRDD[R]
This is slightly different than Scala this.type.
This is slightly different than Scala this.type. this.type is the unique singleton type of an object which is not compatible with other instances of the same type, so returning anything other than
this
is not really possible without lying to the compiler by explicit casts. Here SelfType is used to return a copy of the object - a different instance of the same type- Definition Classes
- CassandraTableScanRDD → CassandraRDD
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
def
++(other: RDD[R]): RDD[R]
- Definition Classes
- RDD
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
aggregate[U](zeroValue: U)(seqOp: (U, R) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U
- Definition Classes
- RDD
-
def
as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8], arg10: TypeConverter[A9], arg11: TypeConverter[A10], arg12: TypeConverter[A11]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8], arg10: TypeConverter[A9], arg11: TypeConverter[A10]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8, A9](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8, A9) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8], arg10: TypeConverter[A9]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3, A4, A5, A6, A7, A8](f: (A0, A1, A2, A3, A4, A5, A6, A7, A8) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7], arg9: TypeConverter[A8]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3, A4, A5, A6, A7](f: (A0, A1, A2, A3, A4, A5, A6, A7) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6], arg8: TypeConverter[A7]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3, A4, A5, A6](f: (A0, A1, A2, A3, A4, A5, A6) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5], arg7: TypeConverter[A6]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3, A4, A5](f: (A0, A1, A2, A3, A4, A5) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4], arg6: TypeConverter[A5]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3, A4](f: (A0, A1, A2, A3, A4) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3], arg5: TypeConverter[A4]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2, A3](f: (A0, A1, A2, A3) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2], arg4: TypeConverter[A3]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1, A2](f: (A0, A1, A2) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1], arg3: TypeConverter[A2]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0, A1](f: (A0, A1) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0], arg2: TypeConverter[A1]): CassandraRDD[B]
- Definition Classes
- CassandraRDD
-
def
as[B, A0](f: (A0) ⇒ B)(implicit arg0: ClassTag[B], arg1: TypeConverter[A0]): CassandraRDD[B]
Maps each row into object of a different type using provided function taking column value(s) as argument(s).
Maps each row into object of a different type using provided function taking column value(s) as argument(s). Can be used to convert each row to a tuple or a case class object:
sc.cassandraTable("ks", "table") .select("column1") .as((s: String) => s) // yields CassandraRDD[String] sc.cassandraTable("ks", "table") .select("column1", "column2") .as((_: String, _: Long)) // yields CassandraRDD[(String, Long)] case class MyRow(key: String, value: Long) sc.cassandraTable("ks", "table") .select("column1", "column2") .as(MyRow) // yields CassandraRDD[MyRow]
- Definition Classes
- CassandraRDD
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
barrier(): RDDBarrier[R]
- Definition Classes
- RDD
- Annotations
- @Experimental() @Since( "2.4.0" )
-
def
cache(): CassandraTableScanRDD.this.type
- Definition Classes
- RDD
-
def
cartesian[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(R, U)]
- Definition Classes
- RDD
-
def
cassandraCount(): Long
Counts the number of items in this RDD by selecting count(*) on Cassandra table
Counts the number of items in this RDD by selecting count(*) on Cassandra table
- Definition Classes
- CassandraTableScanRDD → CassandraRDD
-
lazy val
cassandraPartitionerClassName: String
- Attributes
- protected
- Definition Classes
- CassandraTableRowReaderProvider
-
def
checkpoint(): Unit
- Definition Classes
- RDD
-
implicit
val
classTag: ClassTag[R]
- Definition Classes
- CassandraTableScanRDD → CassandraTableRowReaderProvider
-
def
clearDependencies(): Unit
- Attributes
- protected
- Definition Classes
- RDD
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
clusteringOrder(order: ClusteringOrder): Self
Adds a CQL
ORDER BY
clause to the query.Adds a CQL
ORDER BY
clause to the query. It can be applied only in case there are clustering columns and primary key predicate is pushed down inwhere
. It is useful when the default direction of ordering rows within a single Cassandra partition needs to be changed.- Definition Classes
- CassandraRDD
-
val
clusteringOrder: Option[ClusteringOrder]
- Definition Classes
- CassandraTableScanRDD → CassandraRDD
-
def
coalesce(numPartitions: Int, shuffle: Boolean = false, partitionCoalescer: Option[PartitionCoalescer])(implicit ord: Ordering[R] = null): RDD[R]
This method overrides the default spark behavior and will not create a CoalesceRDD.
This method overrides the default spark behavior and will not create a CoalesceRDD. Instead it will reduce the number of partitions by adjusting the partitioning of C* data on read. Using this method will override spark.cassandra.input.split.size. The method is useful with where() method call, when actual size of data is smaller then the table size. It has no effect if a partition key is used in where clause.
- numPartitions
number of partitions
- shuffle
whether to call shuffle after
- partitionCoalescer
is ignored if no shuffle, or just passed to shuffled CoalesceRDD
- returns
new CassandraTableScanRDD with predefined number of partitions
- Definition Classes
- CassandraTableScanRDD → RDD
-
def
collect[U](f: PartialFunction[R, U])(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
- RDD
-
def
collect(): Array[R]
- Definition Classes
- RDD
-
val
columnNames: ColumnSelector
- Definition Classes
- CassandraTableScanRDD → CassandraTableRowReaderProvider → CassandraRDD
-
def
compute(split: Partition, context: TaskContext): Iterator[R]
- Definition Classes
- CassandraTableScanRDD → RDD
-
val
connector: CassandraConnector
- Definition Classes
- CassandraTableScanRDD → CassandraTableRowReaderProvider → CassandraRDD
-
def
consistencyLevel: ConsistencyLevel
- Attributes
- protected
- Definition Classes
- CassandraTableRowReaderProvider
-
def
context: SparkContext
- Definition Classes
- RDD
-
def
convertTo[B](implicit arg0: ClassTag[B], arg1: RowReaderFactory[B]): CassandraTableScanRDD[B]
- Attributes
- protected
- Definition Classes
- CassandraTableScanRDD → CassandraRDD
-
def
copy(columnNames: ColumnSelector = columnNames, where: CqlWhereClause = where, limit: Option[CassandraLimit] = limit, clusteringOrder: Option[ClusteringOrder] = None, readConf: ReadConf = readConf, connector: CassandraConnector = connector): Self
Allows to copy this RDD with changing some of the properties
Allows to copy this RDD with changing some of the properties
- Attributes
- protected
- Definition Classes
- CassandraTableScanRDD → CassandraRDD
-
def
count(): Long
- Definition Classes
- RDD
-
def
countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]
- Definition Classes
- RDD
-
def
countApproxDistinct(relativeSD: Double): Long
- Definition Classes
- RDD
-
def
countApproxDistinct(p: Int, sp: Int): Long
- Definition Classes
- RDD
-
def
countByValue()(implicit ord: Ordering[R]): Map[R, Long]
- Definition Classes
- RDD
-
def
countByValueApprox(timeout: Long, confidence: Double)(implicit ord: Ordering[R]): PartialResult[Map[R, BoundedDouble]]
- Definition Classes
- RDD
-
final
def
dependencies: Seq[Dependency[_]]
- Definition Classes
- RDD
-
def
distinct(): RDD[R]
- Definition Classes
- RDD
-
def
distinct(numPartitions: Int)(implicit ord: Ordering[R]): RDD[R]
- Definition Classes
- RDD
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
fetchSize: Int
- Attributes
- protected
- Definition Classes
- CassandraTableRowReaderProvider
-
def
filter(f: (R) ⇒ Boolean): RDD[R]
- Definition Classes
- RDD
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
first(): R
- Definition Classes
- RDD
-
def
firstParent[U](implicit arg0: ClassTag[U]): RDD[U]
- Attributes
- protected[org.apache.spark]
- Definition Classes
- RDD
-
def
flatMap[U](f: (R) ⇒ TraversableOnce[U])(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
- RDD
-
def
fold(zeroValue: R)(op: (R, R) ⇒ R): R
- Definition Classes
- RDD
-
def
foreach(f: (R) ⇒ Unit): Unit
- Definition Classes
- RDD
-
def
foreachPartition(f: (Iterator[R]) ⇒ Unit): Unit
- Definition Classes
- RDD
-
def
getCheckpointFile: Option[String]
- Definition Classes
- RDD
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getDependencies: Seq[Dependency[_]]
- Attributes
- protected
- Definition Classes
- RDD
-
final
def
getNumPartitions: Int
- Definition Classes
- RDD
- Annotations
- @Since( "1.6.0" )
-
def
getOutputDeterministicLevel: org.apache.spark.rdd.DeterministicLevel.Value
- Attributes
- protected
- Definition Classes
- RDD
- Annotations
- @DeveloperApi()
-
def
getPartitions: Array[Partition]
- Definition Classes
- CassandraTableScanRDD → RDD
-
def
getPreferredLocations(split: Partition): Seq[String]
- Definition Classes
- CassandraTableScanRDD → RDD
-
def
getStorageLevel: StorageLevel
- Definition Classes
- RDD
-
def
glom(): RDD[Array[R]]
- Definition Classes
- RDD
-
def
groupBy[K](f: (R) ⇒ K, p: Partitioner)(implicit kt: ClassTag[K], ord: Ordering[K]): RDD[(K, Iterable[R])]
- Definition Classes
- RDD
-
def
groupBy[K](f: (R) ⇒ K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[R])]
- Definition Classes
- RDD
-
def
groupBy[K](f: (R) ⇒ K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[R])]
- Definition Classes
- RDD
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
id: Int
- Definition Classes
- RDD
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
intersection(other: RDD[R], numPartitions: Int): RDD[R]
- Definition Classes
- RDD
-
def
intersection(other: RDD[R], partitioner: Partitioner)(implicit ord: Ordering[R]): RDD[R]
- Definition Classes
- RDD
-
def
intersection(other: RDD[R]): RDD[R]
- Definition Classes
- RDD
-
lazy val
isBarrier_: Boolean
- Attributes
- protected
- Definition Classes
- RDD
- Annotations
- @transient()
-
def
isCheckpointed: Boolean
- Definition Classes
- RDD
-
def
isEmpty(): Boolean
- Definition Classes
- RDD
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
iterator(split: Partition, context: TaskContext): Iterator[R]
- Definition Classes
- RDD
-
def
keyBy[K]()(implicit classtag: ClassTag[K], rrf: RowReaderFactory[K], rwf: RowWriterFactory[K]): CassandraTableScanRDD[(K, R)]
Extracts a key of the given class from all the available columns.
Extracts a key of the given class from all the available columns.
- See also
keyBy(ColumnSelector)
-
def
keyBy[K](columns: ColumnRef*)(implicit classtag: ClassTag[K], rrf: RowReaderFactory[K], rwf: RowWriterFactory[K]): CassandraTableScanRDD[(K, R)]
Extracts a key of the given class from the given columns.
Extracts a key of the given class from the given columns.
- See also
keyBy(ColumnSelector)
-
def
keyBy[K](columns: ColumnSelector)(implicit classtag: ClassTag[K], rrf: RowReaderFactory[K], rwf: RowWriterFactory[K]): CassandraTableScanRDD[(K, R)]
Selects a subset of columns mapped to the key and returns an RDD of pairs.
Selects a subset of columns mapped to the key and returns an RDD of pairs. Similar to the builtin Spark keyBy method, but this one uses implicit RowReaderFactory to construct the key objects. The selected columns must be available in the CassandraRDD.
If the selected columns contain the complete partition key a
CassandraPartitioner
will also be created.- columns
column selector passed to the rrf to create the row reader, useful when the key is mapped to a tuple or a single value
-
def
keyBy[K](f: (R) ⇒ K): RDD[(K, R)]
- Definition Classes
- RDD
-
val
keyspaceName: String
- Definition Classes
- CassandraTableScanRDD → CassandraTableRowReaderProvider → CassandraRDD
-
def
limit(rowLimit: Long): Self
Adds the limit clause to CQL select statement.
Adds the limit clause to CQL select statement. The limit will be applied for each created Spark partition. In other words, unless the data are fetched from a single Cassandra partition the number of results is unpredictable.
The main purpose of passing limit clause is to fetch top n rows from a single Cassandra partition when the table is designed so that it uses clustering keys and a partition key predicate is passed to the where clause.
- Definition Classes
- CassandraRDD
-
val
limit: Option[CassandraLimit]
- Definition Classes
- CassandraTableScanRDD → CassandraRDD
-
def
localCheckpoint(): CassandraTableScanRDD.this.type
- Definition Classes
- RDD
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
map[U](f: (R) ⇒ U)(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
- RDD
-
def
mapPartitions[U](f: (Iterator[R]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
- RDD
-
def
mapPartitionsWithIndex[U](f: (Int, Iterator[R]) ⇒ Iterator[U], preservesPartitioning: Boolean)(implicit arg0: ClassTag[U]): RDD[U]
- Definition Classes
- RDD
-
def
max()(implicit ord: Ordering[R]): R
- Definition Classes
- RDD
-
def
min()(implicit ord: Ordering[R]): R
- Definition Classes
- RDD
- def minimalSplitCount: Int
-
var
name: String
- Definition Classes
- RDD
-
def
narrowColumnSelection(columns: Seq[ColumnRef]): Seq[ColumnRef]
Filters currently selected set of columns with a new set of columns
Filters currently selected set of columns with a new set of columns
- Definition Classes
- CassandraTableRowReaderProvider
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
parent[U](j: Int)(implicit arg0: ClassTag[U]): RDD[U]
- Attributes
- protected[org.apache.spark]
- Definition Classes
- RDD
-
lazy val
partitionGenerator: CassandraPartitionGenerator[V, T]
- Annotations
- @transient()
-
val
partitioner: Option[Partitioner]
- Definition Classes
- CassandraTableScanRDD → RDD
-
final
def
partitions: Array[Partition]
- Definition Classes
- RDD
-
def
perPartitionLimit(rowLimit: Long): Self
Adds the PER PARTITION LIMIT clause to CQL select statement.
Adds the PER PARTITION LIMIT clause to CQL select statement. The limit will be applied for every Cassandra Partition. Only Valid For Cassandra 3.6+
- Definition Classes
- CassandraRDD
-
def
persist(): CassandraTableScanRDD.this.type
- Definition Classes
- RDD
-
def
persist(newLevel: StorageLevel): CassandraTableScanRDD.this.type
- Definition Classes
- RDD
-
def
pipe(command: Seq[String], env: Map[String, String], printPipeContext: ((String) ⇒ Unit) ⇒ Unit, printRDDElement: (R, (String) ⇒ Unit) ⇒ Unit, separateWorkingDir: Boolean, bufferSize: Int, encoding: String): RDD[String]
- Definition Classes
- RDD
-
def
pipe(command: String, env: Map[String, String]): RDD[String]
- Definition Classes
- RDD
-
def
pipe(command: String): RDD[String]
- Definition Classes
- RDD
-
final
def
preferredLocations(split: Partition): Seq[String]
- Definition Classes
- RDD
-
def
randomSplit(weights: Array[Double], seed: Long): Array[RDD[R]]
- Definition Classes
- RDD
-
val
readConf: ReadConf
- Definition Classes
- CassandraTableScanRDD → CassandraTableRowReaderProvider → CassandraRDD
-
def
reduce(f: (R, R) ⇒ R): R
- Definition Classes
- RDD
-
def
repartition(numPartitions: Int)(implicit ord: Ordering[R]): RDD[R]
- Definition Classes
- RDD
-
lazy val
rowReader: RowReader[R]
- Definition Classes
- CassandraTableRowReaderProvider
-
implicit
val
rowReaderFactory: RowReaderFactory[R]
RowReaderFactory and ClassTag should be provided from implicit parameters in the constructor of the class implementing this trait
RowReaderFactory and ClassTag should be provided from implicit parameters in the constructor of the class implementing this trait
- Definition Classes
- CassandraTableScanRDD → CassandraTableRowReaderProvider
- See also
CassandraTableScanRDD
-
def
sample(withReplacement: Boolean, fraction: Double, seed: Long): RDD[R]
- Definition Classes
- RDD
-
def
saveAsObjectFile(path: String): Unit
- Definition Classes
- RDD
-
def
saveAsTextFile(path: String, codec: Class[_ <: CompressionCodec]): Unit
- Definition Classes
- RDD
-
def
saveAsTextFile(path: String): Unit
- Definition Classes
- RDD
- val sc: SparkContext
-
def
select(columns: ColumnRef*): Self
Narrows down the selected set of columns.
Narrows down the selected set of columns. Use this for better performance, when you don't need all the columns in the result RDD. When called multiple times, it selects the subset of the already selected columns, so after a column was removed by the previous
select
call, it is not possible to add it back.The selected columns are ColumnRef instances. This type allows to specify columns for straightforward retrieval and to read TTL or write time of regular columns as well. Implicit conversions included in com.datastax.spark.connector package make it possible to provide just column names (which is also backward compatible) and optional add
.ttl
or.writeTime
suffix in order to create an appropriate ColumnRef instance.- Definition Classes
- CassandraRDD
-
def
selectedColumnNames: Seq[String]
- Definition Classes
- CassandraRDD
-
lazy val
selectedColumnRefs: IndexedSeq[ColumnRef]
Returns the columns to be selected from the table.
Returns the columns to be selected from the table.
- Definition Classes
- CassandraTableRowReaderProvider
-
def
setName(_name: String): CassandraTableScanRDD.this.type
- Definition Classes
- RDD
-
def
sortBy[K](f: (R) ⇒ K, ascending: Boolean, numPartitions: Int)(implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[R]
- Definition Classes
- RDD
-
def
sparkContext: SparkContext
- Definition Classes
- RDD
-
def
splitCount: Option[Int]
- Attributes
- protected
- Definition Classes
- CassandraTableRowReaderProvider
-
def
splitSize: Long
- Attributes
- protected[spark.connector]
- Definition Classes
- CassandraTableRowReaderProvider
-
def
subtract(other: RDD[R], p: Partitioner)(implicit ord: Ordering[R]): RDD[R]
- Definition Classes
- RDD
-
def
subtract(other: RDD[R], numPartitions: Int): RDD[R]
- Definition Classes
- RDD
-
def
subtract(other: RDD[R]): RDD[R]
- Definition Classes
- RDD
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
lazy val
tableDef: TableDef
- Definition Classes
- CassandraTableRowReaderProvider
-
val
tableName: String
- Definition Classes
- CassandraTableScanRDD → CassandraTableRowReaderProvider → CassandraRDD
-
def
take(num: Int): Array[R]
- Definition Classes
- CassandraRDD → RDD
-
def
takeOrdered(num: Int)(implicit ord: Ordering[R]): Array[R]
- Definition Classes
- RDD
-
def
takeSample(withReplacement: Boolean, num: Int, seed: Long): Array[R]
- Definition Classes
- RDD
-
def
toDebugString: String
- Definition Classes
- RDD
-
def
toEmptyCassandraRDD: EmptyCassandraRDD[R]
- Definition Classes
- CassandraTableScanRDD → CassandraRDD
-
def
toJavaRDD(): JavaRDD[R]
- Definition Classes
- RDD
-
def
toLocalIterator: Iterator[R]
- Definition Classes
- RDD
-
def
toString(): String
- Definition Classes
- RDD → AnyRef → Any
-
def
top(num: Int)(implicit ord: Ordering[R]): Array[R]
- Definition Classes
- RDD
-
def
treeAggregate[U](zeroValue: U)(seqOp: (U, R) ⇒ U, combOp: (U, U) ⇒ U, depth: Int)(implicit arg0: ClassTag[U]): U
- Definition Classes
- RDD
-
def
treeReduce(f: (R, R) ⇒ R, depth: Int): R
- Definition Classes
- RDD
-
def
union(other: RDD[R]): RDD[R]
- Definition Classes
- RDD
-
def
unpersist(blocking: Boolean): CassandraTableScanRDD.this.type
- Definition Classes
- RDD
-
def
verify(): RowReader[R]
Checks for existence of keyspace and table.
Checks for existence of keyspace and table.
- Definition Classes
- CassandraTableRowReaderProvider
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
where(cql: String, values: Any*): Self
Adds a CQL
WHERE
predicate(s) to the query.Adds a CQL
WHERE
predicate(s) to the query. Useful for leveraging secondary indexes in Cassandra. Implicitly adds anALLOW FILTERING
clause to the WHERE clause, however beware that some predicates might be rejected by Cassandra, particularly in cases when they filter on an unindexed, non-clustering column.- Definition Classes
- CassandraRDD
-
val
where: CqlWhereClause
- Definition Classes
- CassandraTableScanRDD → CassandraRDD
-
def
withAscOrder: Self
- Definition Classes
- CassandraRDD
-
def
withConnector(connector: CassandraConnector): Self
Returns a copy of this Cassandra RDD with specified connector
Returns a copy of this Cassandra RDD with specified connector
- Definition Classes
- CassandraRDD
-
def
withDescOrder: Self
- Definition Classes
- CassandraRDD
-
def
withReadConf(readConf: ReadConf): Self
Allows to set custom read configuration, e.g.
Allows to set custom read configuration, e.g. consistency level or fetch size.
- Definition Classes
- CassandraRDD
-
def
zip[U](other: RDD[U])(implicit arg0: ClassTag[U]): RDD[(R, U)]
- Definition Classes
- RDD
-
def
zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D])(f: (Iterator[R], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]
- Definition Classes
- RDD
-
def
zipPartitions[B, C, D, V](rdd2: RDD[B], rdd3: RDD[C], rdd4: RDD[D], preservesPartitioning: Boolean)(f: (Iterator[R], Iterator[B], Iterator[C], Iterator[D]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[D], arg3: ClassTag[V]): RDD[V]
- Definition Classes
- RDD
-
def
zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C])(f: (Iterator[R], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]
- Definition Classes
- RDD
-
def
zipPartitions[B, C, V](rdd2: RDD[B], rdd3: RDD[C], preservesPartitioning: Boolean)(f: (Iterator[R], Iterator[B], Iterator[C]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[C], arg2: ClassTag[V]): RDD[V]
- Definition Classes
- RDD
-
def
zipPartitions[B, V](rdd2: RDD[B])(f: (Iterator[R], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]
- Definition Classes
- RDD
-
def
zipPartitions[B, V](rdd2: RDD[B], preservesPartitioning: Boolean)(f: (Iterator[R], Iterator[B]) ⇒ Iterator[V])(implicit arg0: ClassTag[B], arg1: ClassTag[V]): RDD[V]
- Definition Classes
- RDD
-
def
zipWithIndex(): RDD[(R, Long)]
- Definition Classes
- RDD
-
def
zipWithUniqueId(): RDD[(R, Long)]
- Definition Classes
- RDD