Cassandra interview questions and answers 👇

  1. Cassandra Interview Questions
  2. Advanced Cassandra Interview Questions


Cassandra Interview Questions

What is Key-Value Store DB?

View answer

A key-value database is a type of nonrelational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects.

What is Column Store DB?

View answer

A columnar database is optimized for fast retrieval of columns of data, typically in analytical applications. Column-oriented storage for database tables is an important factor in analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk.

What is Apache Cassandra?

View answer

Apache Cassandra is a distributed database management system that is built to handle large amounts of data across multiple data centers and the cloud. Key features include:

  • Highly scalable
  • Offers high availability
  • Has no single point of failure

What is CQLSH?

View answer

cqlsh is a command-line interface for interacting with Cassandra using CQL (the Cassandra Query Language). It is shipped with every Cassandra package, and can be found in the bin/ directory alongside the cassandra executable.

What are Clusters in Cassandra?

View answer

A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container which arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.

What is a Keyspace in Cassandra?

View answer

In a Cassandra cluster, a keyspace is an outermost object that determines how data replicates on nodes. Keyspaces consist of core objects called column families (which are like tables in RDBMS), rows indexed by keys, data types, data center awareness, replication factor, and keyspace strategy.

What are durable writes?

View answer

Writes in Cassandra are durable. All writes to a replica node are recorded both in memory and in a commit log on disk before they are acknowledged as a success. If a crash or server failure occurs before the memtables are flushed to disk, the commit log is replayed on restart to recover any lost writes.

What do you mean by replication factor?

View answer

A replication factor of one means that there is only one copy of each row in the Cassandra cluster. A replication factor of two means there are two copies of each row, where each copy is on a different node.

What is replication Strategy?

View answer

Cassandra stores data replicas on multiple nodes to ensure reliability and fault tolerance. The replication strategy for each Edge keyspace determines the nodes where replicas are placed. The total number of replicas for a keyspace across a Cassandra cluster is referred to as the keyspace's replication factor.

What is Simple Strategy?

View answer

The SimpleStrategy is a basic replication strategy. It's used when using a single datacenter. This method is rack unaware. It places replicas on subsequent nodes in a clockwise order.

What is Network Topology Strategy?

View answer

The NetworkTopologyStrategy lets you define how many replicas would be placed in different datacenters, hence making it suitable for multidata center deployments. It's a rack-aware replication strategy, so it tries to avoid two replicas to be placed on the same rack.

What is a Column Family?

View answer

A column family is a database object that contains columns of related data. It is a tuple (pair) that consists of a key–value pair, where the key is mapped to a value that is a set of columns.

What is tunable consistency?

View answer

Tunable consistency is a phenomenal character that makes Cassandra a favored database choice of Developers, Analysts, and Big data Architects. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s tunable consistency allows users to select the consistency level best suited for their use cases. It supports two consistencies: eventual consistency and strong consistency.

What is CAP Theorem?

View answer

The CAP theorem states that a distributed system can provide only two of three desired properties: consistency, availability, and partition tolerance.

What is Cassandra Data Model?

View answer

Cassandra data model provides a mechanism for data storage. The components of Cassandra data model are keyspaces, tables, and columns.

What is CQL?

View answer

CQL is Cassandra query language to access and query Apache distributed database. It consists of a CQL parser that incites all the implementation details to the server. The syntax of CQL is similar to SQL, but it does not alter the Cassandra data model.

What is Thrift?

View answer

Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. Thrift in Cassandra is used to facilitate access to the DB across the programming language.

What is replication factor?

View answer

Replication factor is the measure of the number of data copies existing. It is important to increase the replication factor to log into the cluster.

What are CRUD operations?

View answer

These operations are used to make changes in the Cassandra database.

CRUD stands for:

  • Create operation
  • Read operation
  • Update operation and
  • Delete/drop operation.

What are the components of Cassandra?

View answer

The components of Cassandra are:

  • Table
  • Node
  • Cluster
  • Data Centre
  • Memtable
  • SSTable
  • Commit log
  • Bloom Filter

Advanced Cassandra Interview Questions

What is Gossip Protocol?

View answer

Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about.

What are partitions?

View answer

Cassandra organizes data into partitions. Each partition consists of multiple columns. Partitions are stored on a node. Nodes are generally part of a cluster where each node is responsible for a fraction of the partitions.

What are Tokens in Cassandra?

View answer

A token is the hashed value of the primary key. When you add nodes to Cassandra you assign a token range to each node, or let Cassandra do that for you. Then when you add data to Cassandra it calculates the token and uses that to figure out on which server (node) to store the new data.

What is Snitch in Cassandra?

View answer

A snitch determines which datacenters and racks nodes belong to. They inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into datacenters and racks.

What is Memtable?

View answer

A memtable is the in-memory/write-back cache space consisting of the content in a key and column format. The data in a memtable is sorted by key, and each column family consists of a distinct memtable that retrieves column data via the key. It stores the writes until it is full, and then flushes them out.

What is CommitLog?

View answer

Commitlogs are an append only log of all mutations local to a Cassandra node. Any data written to Cassandra will first be written to a commit log before being written to a memtable. This provides durability in the case of unexpected shutdown.

What is SSTables?

View answer

Sorted Strings Table (SSTable) is a persistent file format used by Scylla, Apache Cassandra, and other NoSQL databases to take the in-memory data stored in memtables, order it for fast access, and store it on disk in a persistent, ordered, immutable set of files. Immutable means SSTables are never modified.

What is Anti-Entropy?

View answer

Anti-entropy is a process of comparing the data of all replicas and updating each replica to the newest version.

What is Hinted Handoff?

View answer

Hinted handoff is a Cassandra feature that optimizes the cluster consistency process and anti-entropy when a replica-owning node is not available, due to network issues or other problems, to accept a replica from a successful write operation.

What is JMX?

View answer

Cassandra exposes a number of statistics and management operations via Java Management Extensions (JMX).

JMX supplies tools for managing and monitoring Java applications and services.

What are snapshots?

View answer

A snapshot first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. The snapshot files are stored in the /var/lib/cassandra/data (by default) in the snapshots directory of each keyspace.

What is Bloom Filter?

View answer

In Cassandra, Bloom filters are used to boost the performance of reads. It is non-deterministic algorithms for testing whether an element is a member of a set. They are non-deterministic because it is possible to get a false-positive read from a Bloom filter, but not a false-negative.