Default Load Balancing Policy

By default, Ruby driver will use a combination of Token aware and Data Center aware round robin policies for load balancing.

This combination proved to be the most performant of of all built-in load balancing policies.

When the name of the local data center is not specified explicitly using Cassandra.cluster, the first datacenter seen by the load balancing policy will be considered local. Therefore, care must be taken to only include addresses of the nodes in the same datacenter as the application using the Ruby Driver in the :hosts option to Cassandra.cluster, or to provide :datacenter option explicitly.

Background

Given
a running cassandra cluster in 2 datacenters with 2 nodes in each
And
the following schema:
CREATE KEYSPACE simplex WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '2', 'dc2': '2'};
CREATE TABLE simplex.songs (
  id uuid PRIMARY KEY,
  title text,
  album text,
  artist text,
  tags set<text>,
  data blob
);
INSERT INTO simplex.songs (id, title, album, artist, tags)
VALUES (
   756716f7-2e54-4715-9f00-91dcbea6cf50,
   'La Petite Tonkinoise',
   'Bye Bye Blackbird',
   'Joséphine Baker',
   {'jazz', '2013'})
;
INSERT INTO simplex.songs (id, title, album, artist, tags)
VALUES (
   f6071e72-48ec-4fcb-bf3e-379c8a696488,
   'Die Mösch',
   'In Gold',
   'Willi Ostermann',
   {'kölsch', '1996', 'birds'}
);
INSERT INTO simplex.songs (id, title, album, artist, tags)
VALUES (
   fbdf82ed-0063-4796-9c7c-a3d4f47b4b25,
   'Memo From Turner',
   'Performance',
   'Mick Jager',
   {'soundtrack', '1991'}
);

Default load balancing policy always routes to primary replicas when possible

Given
the following example:
require 'cassandra'

cluster   = Cassandra.cluster(hosts: ['127.0.0.1', '127.0.0.2'])
session   = cluster.connect('simplex')
statement = session.prepare("SELECT token(id) FROM songs WHERE id = ?")

coordinator_ips = 4.times.map do
  info = session.execute(statement, arguments: [Cassandra::Uuid.new('756716f7-2e54-4715-9f00-91dcbea6cf50')]).execution_info
  info.hosts.last.ip
end

puts coordinator_ips.sort.uniq
When
it is executed
Then
its output should contain:
127.0.0.2

Default load balancing policy always uses primary replicas from the local datacenter

Given
the following example:
require 'cassandra'

cluster   = Cassandra.cluster(hosts: ['127.0.0.3', '127.0.0.4'])
session   = cluster.connect('simplex')
statement = session.prepare("SELECT token(id) FROM songs WHERE id = ?")

coordinator_ips = 4.times.map do
  info = session.execute(statement, arguments: [Cassandra::Uuid.new('756716f7-2e54-4715-9f00-91dcbea6cf50')]).execution_info
  info.hosts.last.ip
end

puts coordinator_ips.sort.uniq
When
it is executed
Then
its output should contain:
127.0.0.4

Default load balancing allows specifying data center explicitly

Given
the following example:
require 'cassandra'

cluster   = Cassandra.cluster(
              datacenter: 'dc1',
              hosts: ['127.0.0.3', '127.0.0.4']
            )
session   = cluster.connect('simplex')
statement = session.prepare("SELECT token(id) FROM songs WHERE id = ?")

coordinator_ips = 4.times.map do
  info = session.execute(statement, arguments: [Cassandra::Uuid.new('756716f7-2e54-4715-9f00-91dcbea6cf50')]).execution_info
  info.hosts.last.ip
end

puts coordinator_ips.sort.uniq
When
it is executed
Then
its output should contain:
127.0.0.2

Default load balancing policy prevents requests to remote datacenters

Given
the following example:
require 'cassandra'

cluster   = Cassandra.cluster(
              hosts: ['127.0.0.1', '127.0.0.2']
            )
session   = cluster.connect('simplex')
statement = "SELECT token(id) FROM songs"

$stdout.puts("=== START ===")
$stdout.flush

$stdin.gets # ready, block on stdin

begin
  execution_info = session.execute(statement).execution_info
  $stdout.puts("Statement #{statement.inspect} fulfilled by #{execution_info.hosts.last.ip}")
rescue => e
  $stdout.puts "#{e.class.name}: #{e.message}"
end
$stdout.flush
$stdout.puts("=== STOP ===")
$stdout.flush
And
it is running interactively
And
I wait for its output to contain “START”
When
node 1 stops
And
node 2 stops
And
I close the stdin stream
Then
its output should contain:
Cassandra::Errors::NoHostsAvailable: All hosts down