How to Install Apache Cassandra Cluster on Linux

Arvid Larson

Read more posts by this author.

Apache Cassandra is a free and open-source NoSQL distributed database management system created by Facebook (now Meta). Cassandra’s distributed nature allows for high availability and high performance with no single point of failure.

Because of its scalability, Cassandra is suitable for massive active and critical data set. Big and famous organizations such as Apple, Bloomberg, BestBuy, eBay, Netflix, Spotify, and many more. And if you’re interested to know Apache Cassandra, you’re in the right place.

In this article, you’re going to learn how to set up and configure an Apache Cassandra Cluster on Linux systems. You’ll also learn how to interact with Cassandra using its command-line tools.

Prerequisites

To follow along with the examples in this tutorial, be sure to have the following requirements in place.

  • You’ll need two Linux servers are in the same network. This tutorial will be using two Rocky Linux (v8.5) servers with the following details.
HostnameIP Address
cassandra01172.16.1.10
cassandra02172.16.1.15

The Apache Cassandra documentation does not provide a prescriptive list of compatible Linux distros but mentions that Cassandra may run on CentOS, RHEL, Debian, and SUSE Enterprise Linux.

  • You must have sudo privileges or access to the root account.
  • Nano text editor or any Linux-based text editor.

Installing Java OpenJDK and Python

Before jumping in with the Apache Cassandra install, you first install the software dependencies. Cassandra is a Java-based application, and the latest version (v4.0 as of this writing) requires Java OpenJDK 1.8 and Python 3.6.

This tutorial uses the DNF package manager for RPM-based Linux distros. You may also use Yum or Apt on DEB-based distros like Ubuntu and Debian. Refer to your distro’s documentation to determine which package manager to use.

Follow the steps below to install Java OpenJDK 1.8 and Python 3.6 on each server.

1. Open your SSH client, connect to your server, and run the sudo su command to become root.

ssh [email protected]_name_or_IP
sudo su

2. Next, run the dnf command below to install the Java OpenJDK 1.8 and Python 3.6 packages. Wait for the installation to complete.

dnf install java-1.8.0-openjdk python36 -y

3. Now, verify the Java version by running the command below.

java -version

Below you can see the current version of Java OpenJDK is 1.8.0_312.

Checking the Java version
Checking the Java version

4. Next, set up the default Python interpreter on your servers to Python 3.6. To do so, run the alternatives command as below.

alternatives --config python

Type the number corresponding to your Python version at the command selection prompt. The example below shows that Python3 is option 2.

Choosing the default Python interpreter
Choosing the default Python interpreter

5. Lastly, execute the following command to verify the Python version.

python --version

You should see that Python 3.x.x is the default, similar to the screenshot below.

Checking the Python version
Checking the Python version

Installing Apache Cassandra NoSQL Database

You’ve installed the dependencies and made sure they are suitable versions. Now it’s time to install Apache Cassandra!

While there are many ways to install Cassandra, the most convenient way is through the official repository. But there are a few quick steps you need to perform first. To install Cassandra NoSQL Database on Linux systems, proceed as follows.

1. Run the following command to create a new repository file for Cassandra.

nano /etc/yum.repos.d/cassandra.repo

2. Copy the following Cassandra repository configuration. This repository is available for most Red Hat distributions, including Rocky Linux.

[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS

3. After editing, save and close the file by pressing Ctrl+X, Y, and Enter.

4. Next, execute the dnf command below to verify all available repositories on your system.

dnf repolist

You should see the Apache Cassandra repository in the repo list, as shown below.

Checking repositories for Apache Cassandra
Checking repositories for Apache Cassandra

5. Now, install the Cassandra NoSQL Database by running the following command.

dnf install cassandra -y

You should see a confirmation message after installing Apache Cassandra, similar to the screenshot below.

Installing Apache Cassandra
Installing Apache Cassandra

Configuring the Apache Cassandra Cluster

Once you have installed Cassandra, you’ll need to edit the configuration /etc/cassandra/conf/cassandra.yaml and set up the Cassandra cluster.

To make the Cassandra cluster work, you’ll need to change the default Cassandra configuration on all servers, such as:

  • Change the default cluster_name.
  • Add server IP addresses to the seeds option.
  • Change the default listen_address to the local IP address.
  • Enable the rpc_address for client connections.

Now, proceed with the following steps to set up the Cassandra cluster.

1. On cassandra01, run the following command to open the Cassandra configuration cassandra.yaml for editing.

nano /etc/cassandra/conf/cassandra.yaml

2. Change the default value of the cluster name with the new name. This tutorial uses the new cluster name ATA Cluster.

cluster_name: 'ATA Cluster'
Specify the cluster_name for the Apache Cassandra cluster
Specify the cluster_name for the Apache Cassandra cluster

3. Now, add each server’s IP address with default Cassandra TCP port 7000 to the seeds option below. The format follows the pattern IP:Port,IP:Port, and the default port is 7000.

seeds: "172.16.1.10:7000,172.16.1.15:7000"
Add node/server to the Apache Cassandra cluster
Add node/server to the Apache Cassandra cluster

4. Next, change the default listen_address to the server’s IP address, not localhost. The option listen_address defines which IP address Cassandra will be running.

# for cassandra01
listen_address: 172.16.1.10

# for cassandra02
listen_address: 172.16.1.15
Setup bind_address Apache Cassandra
Setup bind_address Apache Cassandra

5. Next, change the default option rpc_address with the server IP address, the same value as the listen_address option. On the Cassandra cluster environment, all client connections go through the local server IP address on default TCP port 9042.

# for cassandra01
rpc_address: 172.16.1.10

# for cassandra02
rpc_address: 172.16.1.15
Setup rpc_address for client connections
Setup rpc_address for client connections

6. Save and close the configuration file by pressing Ctrl+X, Y, and Enter.

7. After editing the Cassandra configuration, run the following command to start the Cassandra service. This command will automatically start the cluster and reach other servers whose IP addresses are on the seeds option.

service cassandra start

8. Now, confirm the Cassandra service status by running the command below.

service cassandra status

You will get an output similar to the screenshot below. As you can see, the Cassandra service is active (running).

Verify Apache Cassandra service status
Verify Apache Cassandra service status

Securing the Apache Cassandra Cluster with Firewall

Setting up a firewall for securing services is an essential task in the production environment. Doing so allows you to limit access to the Cassandra cluster only from specific IP addresses or network ranges.

On generic Red Hat Linux distributions, firewalld is the default firewall software.

By default, Cassandra requires two TCP ports need to be open. Port 7000 is the default cluster port, and port 9042 is the native default transport port for client connections.

Follow these steps to secure Cassandra cluster deployment with a firewall.

1. First, confirm whether you already have firewalld on your servers by running the command below.

dnf search firewalld

If firewalld does not exist, follow steps #2 and #3. But if firewalld already exists on the server, skip to step #4 instead.

Checking for firewalld installation
Checking for firewalld installation

2. If you don’t have firewalld on your system, run the following command to install it.

dnf install firewalld -y

3. Now, start the firewalld service by running the command below. This command will start the firewalld service with default rules, opening essential ports and services such as SSH and DHCP clients.

systemctl start firewalld

By default, firewalld provides a command-line interface firewall-cmd for managing and maintaining firewall rules.

4. Run the following firewall-cmd command to create a new zone for the Cassandra cluster and reload the firewalld rules.

# add firewalld zone cassandra-cluster
firewall-cmd --new-zone=cassandra-cluster --permanent

# reload firewalld
firewall-cmd --reload

You will see the output message success, which means the operation is successful. The option --permanent makes new firewall rules permanent.

Add new zone and reload firewalld
Add new zone and reload firewalld

5. Next, add your server network CIDR to the cassandra-cluster zone. This rule allows any servers or clients on the CIDR 172.16.1.0/24 to talk and connect. To add a single IP address, input the IP address 172.16.1.20.

firewall-cmd --zone=cassandra-cluster --add-source=172.16.1.0/24 --permanent

6. Now, run the command below to add Cassandra service ports 7000 and 9042 to the cassandra-cluster zone.

# add storage_port Apache Cassandra to the zone cassandra_cluster
firewall-cmd --zone=cassandra-cluster --add-port=7000/tcp --permanent

# add Apache Cassandra port for client connections
firewall-cmd --zone=cassandra-cluster --add-port=9042/tcp --permanent
Add source IP address and Apache Cassandra ports to firewalld
Add source IP address and Apache Cassandra ports to firewalld

7. Lastly, reload firewalld rules to apply a new configuration by running the command below.

firewall-cmd --reload

The Cassandra cluster is now accessible only through the 172.16.1.0/24 network and will drop all connections from other networks.

Checking the Apache Cassandra Cluster Status

Nodetool is a native command utility for managing and monitoring the Cassandra cluster. This tool allows you to show the Cassandra cluster’s metrics status, such as tables and keyspaces, server metrics, applications, client connection metrics, etc.

In general, administrators run the nodetool command directly on the operational Cassandra server performing routine database maintenance and monitoring.

Follow the steps below to learn the basics of monitoring the Cassandra cluster using the nodetool utility.

1. Check the Cassandra cluster status by running the following command.

nodetool status

You will get an output similar to the screenshot below.

  • U means the node is UP or running.
  • N means the node is NORMAL.
  • The Address can be the node IP address or URL.
  • Load is the size of files in the Cassandra data directory. This value refreshes every 90 seconds.
  • The Token is the number of tokens available on the node.
  • The Host ID is the network id of the node. Each node has a different id.
Checking Apache Cassandra cluster status with Nodetool
Checking Apache Cassandra cluster status with Nodetool

2. Now, run the command below to get detailed information about the single node.

nodetool info

Below, you can see detailed information about the node such as:

  • Uptime
  • Heap memory info
  • Load
  • Key cache and Counter cache
  • Datacenter location
Checking detail of node with Nodetool
Checking detail of node with Nodetool

3. Next, display the Cassandra cluster details by running the command below.

nodetool describecluster

You can see below the detailed Cassandra cluster.

  • Cluster Information contains basic information about the Cassandra cluster, including name, default Cassandra partitioner, and schema version.
  • Stats for all nodes indicate the current status of all nodes on the Cassandra cluster.
  • If you’ve built the Cassandra cluster on multiple data centers, you will see all of your datacenters on the Data centers section.
  • The Database versions section shows the Cassandra version on each cluster node.
  • The list of all available keyspaces or databases on the Cassandra cluster is available under the Keyspaces section.
Checking details Apache Cassandra cluster with Nodetool
Checking details Apache Cassandra cluster with Nodetool

Connecting to the Apache Cassandra Cluster

Installing the Apache Cassandra package on the server also installs the Cassandra Query Language Shell (CQLSH). This tool allows admins to connect to Apache Cassandra and manage databases or keyspaces and users.

Follow the below steps to connect to the Cassandra cluster using the command-line cqlsh.

1. Run the cqlsh command below for connecting to the Cassandra cluster. Specify the Cassandra IP address, and the default port for client connections is 9042.

cqlsh 172.16.1.10 9042

Once you connect to the Cassandra cluster, you will see a similar output like the screenshot below. This example uses the cluster name ATA Cluster on the server IP address 172.16.1.10.

Connect to Cassandra with cqlsh
Connect to Cassandra with cqlsh

2. Now, run the following CQL queries to check which server you connected to, check cluster name, and check all available keyspaces on the Cassandra.

# show detailed host
SHOW HOST

# show cluster name
DESCRIBE CLUSTER

# list all available keyspaces (databases)
DESCRIBE KEYSPACES

You will see a similar output to the screenshot below. The SHOW HOST query shows you where you’re connected, the query DESCRIBE CLUSTER shows you the Cassandra cluster name, and the query DESCRIBE KEYSPACES shows you the list of keyspaces on your Cassandra node.

Connecting to Apache Cassandra using cqlsh commands
Connecting to Apache Cassandra using cqlsh commands

3. Finally, type the exit to log out from the cqlsh environment.

Conclusion

Throughout this tutorial, you’ve learned how to install and configure Apache Cassandra on Linux. You’ve also configured the Apache Cassandra cluster using two Linux servers and secured the deployment using the Firewalld.

At this point, you’re ready to add more servers and scale your deployments, providing high availability, consistency, and redundancy for your data.

What’s next for you? Perhaps begin with setting up the authentication and authorization on your Cassandra cluster, then set up keyspace/database replication for your applications. And while you’re at it, why not learn how to maintain the Apache Cassandra cluster with nodetool?

Subscribe to Stay in Touch

Never miss out on your favorite ATA posts and our latest announcements!

Looks like you're offline!