Cassandra Operations and Performance Tuning

Welcome to Dieudonne Madishon NGAYA's Blog

When do you need to tune performance ?

optimizing:

when things work but could be better. we want to get better performance.

Troubleshooting:

fixing a problem that impact performance
could actually be broken, could just be slow in clusters, something broken can manifest as slow performance.

What are some examples of performance related complaints an admin might receive regarding Cassandra ?

Performance-related complaints:
it’s slow
certain queries are slow
program X that uses the cluster is slow
A node went down

Latency, Throughput and the U.S.E Method:

Bad methodology
how not to, approach performance-related problems?
streetlight anti-method
random change anti-method
blame someone else anti-method

In performance tuning, what are we trying to improve ?

latency – how long a cluster,node,server or I/O subsystem
takes to respond to a request

throughput – how many transactions of a given size (or range)
a cluster,node or I/O subsystem can complete in…

View original post 9,837 more words

Git clone branch example

Most of time, we need to interact with Github, the remote repository for Git. Git offers you different choices to work with different branches in Github.

You may find all branches with git branch -a.

You can also download specified branch with git clone -b branchname git_link.

You can also switch to specific branch with git checkout branchname.

Check out my new article on Git clone branch example.

Create Git repository example

This time I’m starting over and explaining how to create a git repository.

Three options could be made:

  • Create a git repository from scratch/empty folder
  • Create a git repository from an existed project
  • Create a git repository from remote repository/Github

Check out my latest article on Create Git repo example.

Use JMX/Jconsole to monitor Cassandra on EC2

In my previous post, we use Datastax Opscenter to monitor the performance of Cassandra cluster.

In this article, I’ll show you how to use JMX/Jconsole to monitor the metrics in Cassandra. Again, all the cluster is deployed in Amazon EC2

There’ll be mainly three steps to finish the setting:

  • Change the setting in cassandra-env.sh
  • Start the cassandra server
  • Use Jconsole to remotely monitor Cassandra

1.Change the setting in cassandra-env.sh:

The main job here is to go to cassandra_dir/conf/cassandra-env.sh, find the following part and uncomment the hostname, change it to the amazon ec2 address

# jmx: metrics and administration interface
#
# add this if you're having trouble connecting:
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=ec2-xx-xx-xx-xx.us-west-1.compute.amazonaws.com"

Here we should notice the hostname should be the Amazon EC2 ip address format.

2. Start the Cassandra server
3. In a remote machine, open the terminal and type jconsole, the following GUI comes out:

Screen Shot 2016-05-10 at 15.38.57

Then in the Remote Process, put in:

service:jmx:rmi:///jndi/rmi://ec2-xx-xx-xx-xx.us-west-1.compute.amazonaws.com:7199/jmxrmi

Notice the port here is set to 7199, you should open this port in the Amazon EC2 security group.

Then Click Connect, it will show the following:

Screen Shot 2016-05-10 at 15.37.12

The figure above shows the overall metrics for the server.

Also you can go MBeans and check out the metrics for Cassandra. For example, you can check the request in different stages, like below:

Screen Shot 2016-05-10 at 15.37.54

Note that if you want to get the updated metrics, you should click Refresh. Otherwise, it’ll update automatically according to the settings for different metrics.

If you want to know more about the meaning for different metrics, go to the wiki page and check them out.

If you have problem with connection, probably the settings are wrong. Go to the Oracle webpage for the right settings.

Git summary: how to use basic git commands

Check out my new article:How does Git work? Git Tutorial for Beginners

This time, I’m trying to make a more detailed way to explain how to use Git. Starting from installation, I’ve followed by the architecture of Git, then with the very-often-used git commands explained by examples.

Following this way, I hope everyone one have a basic idea of how to use Git. Again, the git flow below  is very important:

gitflow

Git operation flow

Git undo commit example

Every human being may make mistakes, especially the Software developers:) When we’re writing code, it’s a process of how we’re thinking and how we solve specific problems. In real life, it could be a disaster if we make some irrecoverable mistakes. However, this will not happen in writing code for the developers, especially with the help of Git.

In this article, when we use git, there’s a way on how to undo the commit, which may probably save your life:)

Basically, git reset —hard HEAD command will let you go back to the previous version of file. For more detailed explanation on this, check out my new article on git undo commit example.

Git diff example

This time, I’ve finished an article on git diff command example. When I referred to some blogs and found a very useful figure for git. It comes from the link below the title.

Screen Shot 2016-03-08 at 16.42.56

Basically, git diff command shows the difference between the files in working directory, index and most recent commit. The three most often used git diff commands could be:

  • git diff: Show differences between the working directory and the index.
  • git diff -–cached: Show differences between the index and the most recent commit.
  • git diff HEAD: Show the differences between your working directory and the most recent commit.

Check out my new article about git diff command here.

Simulation of C3 algorithm in Cassandra

The paper C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection is a research paper on how to dynamic select the replica, to reduce the tail latency. Two methods are utilized to achieve this:

  • Replica ranking: rank the replicas
  • Distributed rate control: adaptively control the input rate to different servers.

The simulation with C3 algorithm in Amazon EC2 results in up to 3 times improvement at the 99.9% percentile latency, with the usage of YCSB. The C3 algorithm is designed in Cassandra server-side with Java, less than 400 lines of code. The code can be checked here in github.

For the following days, I’ll read the paper and try to repeat the experiment part, to see what’s exciting things going to happen.

1. Prerequisite and Implementation:

To make the C3 algorithm work, Java 1.7 or higher and Python 2.7 or higher are needed.

To make Cassandra work in IDE like intellij, try this link: https://wiki.apache.org/cassandra/RunningCassandraInIDEA

Also, ant is needed with sudo yum install ant (If ant not installed, error would occur: Could not find or load main class org.apache.cassandra.service.CassandraDaemon)

Then clone the cassandra-c3 code with:

git clone https://github.com/lalithsuresh/cassandra-c3.git

Go to the cassandra-c3 folder and run ant release

Everything is built successfully.

2. Experiments:

Six nodes are in the same cluster, and the replication factor is 3. Another amazon instance is working as ycsb client. Then 300000 records are inserted by different situations: read-heavy, update-heavy and read-only.

The result are shown below:

Screen Shot 2016-03-18 at 12.37.53

Throughput comparison

 

Screen Shot 2016-03-18 at 12.59.14

Read latency comparison

 

Screen Shot 2016-03-18 at 13.04.47

Write latency comparison

As we can see from the figures above, the C3 algorithm not only improve the read latency performance and also the throughput performance. However, for the write latency performance, the improvement is not that good as the read performance.

Change the default Java path to the right path

Today, I created a new Amazon EC2 instance to work as a YCSB client machine. After running mvn clean package, a previous error shows up again like below, which is very annoying.

[ERROR] COMPILATION ERROR :

[INFO] -------------------------------------------------------------
[ERROR] No compiler is provided in this environment. Perhaps you are running on a JRE    rather than a JDK?
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.424s
[INFO] Finished at: Mon Mar 07 14:21:01 IST 2016
[INFO] Final Memory: 5M/15M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-   plugin:3.1:compile (default-compile) on project TEST-WEB: Compilation failure
[ERROR] No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK?

this error is related to the setting of java home.

Run echo $JAVA_HOME, it returns /usr/lib/jvm/jre, which should not be the correct java path.

Then execute the following command to change it to the right path:

export JAVA_HOME=/usr/java/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME

After this, everything is working now.

Also, this post should be helpful on permanently setting for the java path.

 

Benchmarking Cassandra in Amazon EC2 with YCSB

YCSB has been proved to be one of the most efficient tool to test the performance of different databases, such as HBase, MySQL, Apache Cassandra, MongoDB etc.

Recently I’m playing with my own benchmarking experiment.

Firstly, we need to install Apache Cassandra, basic settings can refer to my previous posts.

Secondly, to install YCSB. The three prerequisite of the installation is:

  • Python 2.7 or higher
  • Maven 3.0 or higher
  • Java 1.7 or higher

Then download YCSB. Two methods can be used to download it:

  1. Git:
    git clone git://github.com/brianfrankcooper/YCSB.git
  2. Curl:
    curl -O --location https://github.com/brianfrankcooper/YCSB/releases/download/0.7.0/ycsb-0.7.0.tar.gz
    tar xfvz ycsb-0.3.0.tar.gz

After this, go into the YCSB folder and run mvn clean package.

Everything is DONE!

************************************************

After the installation, we need to use YCSB to benchmark the Cassandra.

  1. We need to start the Cassandra server. Then in one of the Cassandra node, use cqlsh to create keyspace ycsb and table usertable with the following command:

create keyspace ycsb WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’: 3};

use ycsb;

create table usertable (
y_id varchar primary key,
field0 varchar,
field1 varchar,
field2 varchar,
field3 varchar,
field4 varchar,
field5 varchar,
field6 varchar,
field7 marcher,
field8 varchar,
field9 varchar);

2. To load the data, below we used the defaults workloadf in YCSB. Basically you can define your own workload with the post here:

bin/ycsb load cassandra2-cql -p hosts=172.31.19.145 -P workloads/workloadf -s > workloadf_res.txt

The out put after running the command above is :

Loading workload…

Starting test.

2016-02-29 05:19:58:333 0 sec: 0 operations; est completion in 0 seconds

Connected to cluster: Test Cluster

DBWrapper: report latency for each error is false and specific error codes to track for latency are: []

2016-02-29 05:20:02:996 4 sec: 1000 operations; 212.27 current ops/sec; [CLEANUP: Count=1, Max=17583, Min=17568, Avg=17576, 90=17583, 99=17583, 99.9=17583, 99.99=17583] [INSERT: Count=1000, Max=113279, Min=1503, Avg=3560.21, 90=4695, 99=22655, 99.9=53343, 99.99=113279]

3. To run the workload:

bin/ycsb run cassandra2-cql -p hosts=172.31.19.145 -P workloads/workloadf -s > workloadf_trans.txt

The output for this command is:

Loading workload…

Starting test.

2016-02-29 05:20:48:156 0 sec: 0 operations; est completion in 0 seconds

Connected to cluster: Test Cluster

DBWrapper: report latency for each error is false and specific error codes to track for latency are: []

2016-02-29 05:20:52:219 4 sec: 1000 operations; 243.37 current ops/sec; [CLEANUP: Count=1, Max=17183, Min=17168, Avg=17176, 90=17183, 99=17183, 99.9=17183, 99.99=17183] [READ: Count=1000, Max=24879, Min=1374, Avg=2295.7, 90=2821, 99=8155, 99.9=12967, 99.99=24879] [READ-MODIFY-WRITE: Count=486, Max=16527, Min=2728, Avg=4257.43, 90=5835, 99=11631, 99.9=16527, 99.99=16527] [UPDATE: Count=486, Max=14991, Min=1273, Avg=1896.02, 90=2321, 99=6135, 99.9=14991, 99.99=14991]

After this, we’ve written the performance result into a txt file. The content in it is:

[CLEANUP], MaxLatency(us), 17183.0

[CLEANUP], 95thPercentileLatency(us), 17183.0

[CLEANUP], 99thPercentileLatency(us), 17183.0

[READ], Operations, 1000.0

[READ], AverageLatency(us), 2295.703

[READ], MinLatency(us), 1374.0

[READ], MaxLatency(us), 24879.0

[READ], 95thPercentileLatency(us), 4167.0

[READ], 99thPercentileLatency(us), 8155.0

[READ], Return=OK, 1000

[READ-MODIFY-WRITE], Operations, 486.0

[READ-MODIFY-WRITE], AverageLatency(us), 4257.427983539094

[READ-MODIFY-WRITE], MinLatency(us), 2728.0

[READ-MODIFY-WRITE], MaxLatency(us), 16527.0

[READ-MODIFY-WRITE], 95thPercentileLatency(us), 7823.0

[READ-MODIFY-WRITE], 99thPercentileLatency(us), 11631.0

[UPDATE], Operations, 486.0

[UPDATE], AverageLatency(us), 1896.022633744856

[UPDATE], MinLatency(us), 1273.0

[UPDATE], MaxLatency(us), 14991.0

[UPDATE], 95thPercentileLatency(us), 3515.0

[UPDATE], 99thPercentileLatency(us), 6135.0

[UPDATE], Return=OK, 486

It shows all the latency and throughput information we want.

Troubleshooting

When I was run the command to load the workload, I used the following command:

bin/ycsb load cassandra2-cql -p hosts=“172.31.19.145” -P workloads/workloadf -s > workloadf_res.txt

It leads to error like below:

Loading workload…
Starting test.
SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2016-02-29 04:45:41:695 0 sec: 0 operations; est completion in 0 seconds
com.yahoo.ycsb.DBException: java.lang.IllegalArgumentException: “172.31.19.145”: Name or service not known
at com.yahoo.ycsb.db.CassandraCQLClient.init(CassandraCQLClient.java:193)
at com.yahoo.ycsb.DBWrapper.init(DBWrapper.java:77)
at com.yahoo.ycsb.ClientThread.run(Client.java:289)
Caused by: java.lang.IllegalArgumentException: “172.31.19.145”: Name or service not known
at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:843)
at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:865)
at com.yahoo.ycsb.db.CassandraCQLClient.init(CassandraCQLClient.java:146)
… 2 more
2016-02-29 04:45:41:734 0 sec: 0 operations; est completion in 106751991167300 days 15 hours

For the first warning, according to the post here, after we add the slf package into the ycsb folder, it could be eliminated.

For the error after that, it’s because of the quotes between the host. After deleting the quotes in the red mark above, everything works. As hosts should be the Cassandra node’s IP to connect to and it should be a String format, I have no idea why adding quotes doesnt work. This error is really annoying and it’s absolutely not funny.

Later more benchmark results will be added.