Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package datastax
    Definition Classes
    com
  • package spark
    Definition Classes
    datastax
  • package connector

    The root package of Cassandra connector for Apache Spark.

    The root package of Cassandra connector for Apache Spark. Offers handy implicit conversions that add Cassandra-specific methods to SparkContext and RDD.

    Call cassandraTable method on the SparkContext object to create a CassandraRDD exposing Cassandra tables as Spark RDDs.

    Call RDDFunctions saveToCassandra function on any RDD to save distributed collection to a Cassandra table.

    Example:

    CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
    CREATE TABLE test.words (word text PRIMARY KEY, count int);
    INSERT INTO test.words(word, count) VALUES ("and", 50);
    import com.datastax.spark.connector._
    
    val sparkMasterHost = "127.0.0.1"
    val cassandraHost = "127.0.0.1"
    val keyspace = "test"
    val table = "words"
    
    // Tell Spark the address of one Cassandra node:
    val conf = new SparkConf(true).set("spark.cassandra.connection.host", cassandraHost)
    
    // Connect to the Spark cluster:
    val sc = new SparkContext("spark://" + sparkMasterHost + ":7077", "example", conf)
    
    // Read the table and print its contents:
    val rdd = sc.cassandraTable(keyspace, table)
    rdd.toArray().foreach(println)
    
    // Write two rows to the table:
    val col = sc.parallelize(Seq(("of", 1200), ("the", "863")))
    col.saveToCassandra(keyspace, table)
    
    sc.stop()
    Definition Classes
    spark
  • package rdd

    Contains com.datastax.spark.connector.rdd.CassandraTableScanRDD class that is the main entry point for analyzing Cassandra data from Spark.

    Contains com.datastax.spark.connector.rdd.CassandraTableScanRDD class that is the main entry point for analyzing Cassandra data from Spark.

    Definition Classes
    connector
  • package partitioner

    Provides components for partitioning a Cassandra table into smaller parts of appropriate size.

    Provides components for partitioning a Cassandra table into smaller parts of appropriate size. Each partition can be processed locally on at least one cluster node.

    Definition Classes
    rdd
  • package dht
  • BucketingRangeIndex
  • CassandraPartition
  • CassandraPartitionGenerator
  • CassandraPartitionedRDD
  • CqlTokenRange
  • DataSizeEstimates
  • EndpointPartition
  • MonotonicBucketing
  • NodeAddresses
  • RangeBounds
  • ReplicaPartition
  • ReplicaPartitioner
  • TokenRangeClusterer
  • TokenRangeSplitter
  • TokenRangeWithPartitionIndex
  • package reader

    Provides components for reading data rows from Cassandra and converting them to objects of desired type.

    Provides components for reading data rows from Cassandra and converting them to objects of desired type. Additionally provides a generic CassandraRow class which can represent any row.

    Definition Classes
    rdd

package partitioner

Provides components for partitioning a Cassandra table into smaller parts of appropriate size. Each partition can be processed locally on at least one cluster node.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. partitioner
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class BucketingRangeIndex[R, T] extends AnyRef

    A special structure for fast lookup of rangesContaining containing given point.

  2. case class CassandraPartition[V, T <: Token[V]](index: Int, endpoints: Array[String], tokenRanges: Iterable[CqlTokenRange[V, T]], dataSize: Long) extends EndpointPartition with InputPartition with Product with Serializable

    Metadata describing Cassandra table partition processed by a single Spark task.

    Metadata describing Cassandra table partition processed by a single Spark task. Beware the term "partition" is overloaded. Here, in the context of Spark, it means an arbitrary collection of rows that can be processed locally on a single Cassandra cluster node. A CassandraPartition typically contains multiple CQL partitions, i.e. rows identified by different values of the CQL partitioning key.

    index

    identifier of the partition, used internally by Spark

    endpoints

    which nodes the data partition is located on

    tokenRanges

    token ranges determining the row set to be fetched

    dataSize

    estimated amount of data in the partition

  3. class CassandraPartitionedRDD[T] extends RDD[T]

    RDD created by repartitionByCassandraReplica with preferred locations mapping to the CassandraReplicas each partition was created for.

  4. case class CqlTokenRange[V, T <: Token[V]](range: TokenRange[V, T])(implicit tf: TokenFactory[V, T]) extends Product with Serializable

    Stores a CQL WHERE predicate matching a range of tokens.

  5. class DataSizeEstimates[V, T <: Token[V]] extends Logging

    Estimates amount of data in the Cassandra table.

    Estimates amount of data in the Cassandra table. Takes token range size estimates from the system.size_estimates table, available since Cassandra 2.1.5.

  6. trait EndpointPartition extends Partition
  7. trait MonotonicBucketing[-T] extends AnyRef

    A mapping from T values to an integer range [0, n), such that for any (t1: T) > (t2: T), bucket(t1) >= bucket(t2).

  8. class NodeAddresses extends Serializable

    Looks up listen address of a cluster node given its Native Transport address.

    Looks up listen address of a cluster node given its Native Transport address. Uses system.peers table as the source of information. If such information for a node is missing, it assumes its listen address equals its RPC address

  9. trait RangeBounds[-R, T] extends AnyRef

    Extracts rangeBounds of a range R.

    Extracts rangeBounds of a range R. This is to allow working with any representation of rangesContaining. The range must not wrap, that is end >= start.

  10. case class ReplicaPartition(index: Int, endpoints: Array[String]) extends EndpointPartition with Product with Serializable
  11. class ReplicaPartitioner[T] extends Partitioner

    The replica partitioner will work on an RDD which is keyed on sets of InetAddresses representing Cassandra Hosts .

    The replica partitioner will work on an RDD which is keyed on sets of InetAddresses representing Cassandra Hosts . It will group keys which share a common IP address into partitionsPerReplicaSet Partitions.

  12. class TokenRangeClusterer[V, T <: Token[V]] extends AnyRef

    Groups a set of token ranges into groupCount groups containing not more than maxGroupSize token ranges.

    Groups a set of token ranges into groupCount groups containing not more than maxGroupSize token ranges. Each group will form a single CassandraRDDPartition.

    The algorithm is as follows: 1. Sort token ranges by endpoints lexicographically. 2. Take the highest possible number of token ranges from the beginning of the list, such that their sum of ringFraction does not exceed ringFractionPerGroup and they all contain at least one common endpoint. If it is not possible, take at least one item. Those token ranges will make a group. 3. Repeat the previous step until no more token ranges left.

  13. case class TokenRangeWithPartitionIndex[V, T <: Token[V]](range: TokenRange[V, T], partitionIndex: Int) extends Product with Serializable

    Holds a token range together with the index of a partition this token range belongs to

Inherited from AnyRef

Inherited from Any

Ungrouped