There are several different parts of cstar_perf that can be used
either as a whole, or as individual components (See
Architecture.) This guide will walk through the installation of
cstar_perf.tool, which will be the core of what you need to start
benchmarking Cassandra. The next chapter of this guide will focus on
the setup of cstar_perf.frontend
which sets up a full web-based
interface for scheduling tests, archiving results, and monitoring
multiple clusters.
Setup cstar_perf.tool¶
cstar_perf.tool is the core module of cstar_perf. It is what bootstraps Cassandra and runs performance tests. You should install it on a machine within the same network as your Cassandra cluster. It’s best to dedicate a machine to it, as it will be what runs cassandra-stress and ideally should not have any resource contention on it. If you don’t have an extra machine, you can install it on the same machine as one of your Cassandra nodes, just be aware of any performance penalty you’re introducing by doing so.
In this example, we have four computers:
+------> cnode1
| 10.0.0.101
|
stress1 +----------> cnode2
10.0.0.100 | 10.0.0.102
|
+------> cnode3
10.0.0.103
stress1
is the node hosting cstar_perf.tool.cnode1
,cnode2
, andcnode3
are Cassandra nodes. These nodes have 4 SSDs for data storage, mounted at/mnt/d1
,/mnt/d2
,/mnt/d3
, and/mnt/d4
Setting up your cluster¶
Key based SSH access¶
The machine hosting cstar_perf.tool should have key based SSH access to the Cassandra cluster for both your regular user account as well as root.
In terms of our example, from your user account on stress1
you
should be able to run ssh your_username@cnode1
as well as ssh
root@cnode1
without any password prompts.
When generating SSH keys, it works best if you don’t specify a password. You can use an SSH agent if you are uncomfortable doing this, but be aware things will stop working when that agent isn’t running (system reboots, not logged in, etc.)
Software requirements¶
The machine running cstar_perf.tool needs to have the following packages installed:
- Python 2.7
- Python 2.7 development packages - (python-dev on debian)
- pip - (python-pip on debian)
- git
The Cassandra nodes also need to have the following:
- Python 2.7
- git
In addition, you need to prepare a ~/fab
directory to install on
each of your nodes. This will contain the JDK as well as a copy of
ant. Prepare this directory on the controller node (stress1
in our
example) and then rsync it to the others. Here’s an example to set
this up on 64-bit Linux with Java 7u67 and ant 1.9.4 (links may
change, so modify accordingly.):
mkdir ~/fab
cd ~/fab
wget --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie;" http://download.oracle.com/otn-pub/java/jdk/7u67-b01/jdk-7u67-linux-x64.tar.gz
tar xfv jdk-7u67-linux-x64.tar.gz
rm jdk-7u67-linux-x64.tar.gz
ln -s jdk1.7.0_67 java
wget http://archive.apache.org/dist/ant/binaries/apache-ant-1.9.4-bin.tar.bz2
tar xfv apache-ant-1.9.4-bin.tar.bz2
rm apache-ant-1.9.4-bin.tar.bz2
ln -s apache-ant-1.9.4 ant
The end result being that we can invoke java from
~/fab/java/bin/java
and ant from ~/fab/ant/bin/ant
.
Copy this directory to each of your cassandra nodes:
rsync -av ~/fab cnode1:
rsync -av ~/fab cnode2:
rsync -av ~/fab cnode3:
You’ll know you got your SSH keys sorted out if copying those files didn’t require you to enter any passwords.
Cassandra Stress¶
Additionally, on the node hosting cstar_perf.tool (stress1
in our
example) you need to download and build cassandra-stress. This is only
needs to be run on the controller node (stress1
):
mkdir ~/fab/stress
cd ~/fab/stress
git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
cd cassandra
git checkout cassandra-2.1
JAVA_HOME=~/fab/java ~/fab/ant/bin/ant clean jar
cd ..
mv cassandra cassandra-2.1
ln -s cassandra-2.1 default
The end result being that you find cassandra-stress in
~/fab/stress/default/tools/bin/cassandra-stress
. You’ll know you
have java and ant installed correctly if this build was successful.
Install cstar_perf.tool¶
Finally, you should install cstar_perf.tool onto your designated
machine (stress1
in our example):
pip install cstar_perf.tool
Depending on your environment, this may need to be run as root.
Configuration¶
cstar_perf.tool needs to know about your cluster. For this you need to
create a JSON file located in ~/.cstar_perf/cluster_config.json
.
Here’s the config for our example cluster:
{
"commitlog_directory": "/mnt/d1/commitlog",
"data_file_directories": [
"/mnt/d2/data",
"/mnt/d3/data",
"/mnt/d4/data"
],
"block_devices": [
"/dev/sdb",
"/dev/sdc",
"/dev/sdd",
"/dev/sde"
],
"blockdev_readahead": "256",
"hosts": {
"cnode1": {
"internal_ip": "10.0.0.101",
"hostname": "cnode1",
"seed": true
},
"cnode2": {
"internal_ip": "10.0.0.102",
"hostname": "cnode2",
"seed": true
},
"cnode3": {
"internal_ip": "10.0.0.103",
"hostname": "cnode3",
"seed": true
}
},
"user": "your_username",
"name": "example1",
"saved_caches_directory": "/mnt/d2/saved_caches"
}
If you want to use DSE and install it from a tarball, you can add the following keys:
"dse_url": "http://my-dse-repo/tar/",
"dse_username": "XXXX",
"dse_password": "YYYY"
If you want to use DSE and install it from a source branch, you can add the following keys:
"dse_source_build_artifactory_url": "https://dse-artifactory-url.com"
"dse_source_build_artifactory_username" = "dse-artifactory-username"
"dse_source_build_artifactory_password" = "dse-artifactory-password"
"dse_source_build_oauth_token" = "dse-oauth-token-for-github-access"
The required settings :
- hosts - all of your Cassandra nodes need to be listed here, including hostname and IP address.
- name - the name you want to give to this cluster.
- block_devices - The physical block devices that Cassandra is using to store data and commitlogs.
- blockdev_readahead - The default block device readhead setting for your drives (get it from running
blockdev --getra /dev/DEVICE
) - user - The user account that you use on the Cassandra nodes.
- dse_** - Only if you want DSE support.
If you’re familiar with Cassandra’s cassandra.yaml
, you’ll
recognize the rest of these settings because they are from there. You
can actually put more cassandra.yaml
settings here if you know
you’ll always need them, but it’s usually better to rely on the
defaults and introduce different settings in your test scenarios,
which you’ll define later.
Test cstar_perf_bootstrap¶
Now that cstar_perf.tool is installed and configured, you can bring up a test cluster to test that everything is working:
cstar_perf_bootstrap -v apache/cassandra-2.1
If you want to install DSE instead of pure Cassandra, then use the following command to bring up the cluster and install the specified version from the given dse_url
, specified in your ~/.cstar_perf/cluster_config.json
:
cstar_perf_bootstrap -v 4.8.1
This command will tell all of the cassandra nodes to download, from git, the latest development version of Cassandra 2.1, build it, and create a cluster. You’ll see a lot of text output showing you what the script is doing, but at the end of it all, you should see something like:
[10.0.0.101] All nodes available!
INFO:benchmark:Started cassandra on 3 nodes with git SHA: bd396ec8acb74436fd84a9cf48542c49e08a17a6
Assuming that worked, your cluster is now fully automated via cstar_perf. Next steps include creating some test definitions, or to setup the web frontend.
Flamegraph¶
It is possible to generate flamegraphs when running tests. Follow these instructions to enable the feature:
Install system dependencies on all workers of the cluster:
sudo apt-get install cmake dtach linux-tools-`uname -r`
sudo pip install sh
Ensure your kernel has performance profling support:
$ sudo perf record -F 99 -g -p <a_running_process_pid> -- sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.098 MB perf.data (~4278 samples) ]
Add NOPASSWD sudo configuration for the cstar/automaton user:
echo "cstar ALL = (root) NOPASSWD: ALL" | sudo tee /etc/sudoers.d/perf
Enable flamegraph feature in your cluster configuration:
"flamegraph": true,
"flamegraph_directory": "/mnt/data/cstar_perf/flamegraph"
# The flamegraph working directory default to /tmp/flamegraph if not specified.
In case you update your kernel, you might also need to install the matching version of linux-tools
as described above.
Yourkit Profiler¶
It is possible to enable the yourkit profiler when running tests. The snapshot will be available as artifact at the end of the test. Some details:
- The yourkit agent has to be uploaded on the nodes manually due to the license
- The telemetry window is 1 hour
- The yourkit profiler options used are: “onexit=memory,onexit=snapshot”
Enable yourkit feature in your cluster configuration:
"yourkit_profiler": true,
"yourkit_agentpath": "/path/to/yjp-2014-build-14112/bin/linux-x86-64/libyjpagent.so",
"yourkit_directory": "/path/to/Snapshots/",
Ctool Command¶
It is possible to run a ctool on the cstar_perf cluster when running tests. This has been mainly implemented to use ctool metrics with cstar_perf. Follow these intructions to enable the feature:
Install automaton:
git clone https://github.com/riptano/automaton.git
Configure the cluster using ctool setup_existing. Create a json config file:
{
"cluster_name": "cstar_perf",
"private_key_path": "/home/cstar/.ssh/id_rsa",
"ssh_user": "cstar",
"hosts": [
{
"host_name": "172.17.0.2",
"ip_address": "172.17.0.2",
"private_host_name": "172.17.0.2",
"private_ip_address": "172.17.0.2"
}
]
}
Then setup the existing cluster:
cd automaton
PYTHONPATH=. ./bin/ctool setup_existing ctool_cluster.json
Add the following configuration in your ~/.automaton.conf file:
[ssh]
user = cstar
force_user = true
Enable ctool feature by adding the automaton path in your cluster configuration:
"automaton_path": "/home/cstar/automaton/"
Test the ctool feature using the frontend by selecting the ‘ctool’ operation and use “info cstar_perf” as command.