It’s true, there’s no such thing as a stateless architecture.

Persistent applications and services, while building the best user experience, is what is driving the innovation we are seeing in container technologies. Add in scale-out databases and complexity is increased exponentially.This has fueled innovation and a growth of software solutions to support ever-changing requirements but also presents new challenges for operations teams who must maintain these new technologies.

 

Cassandra: Scale-out Database

Scale-out databases like Cassandra have become quite popular over the past few years and many organizations are starting to deploy them at scale. Although Cassandra, in particular, brings great availability and performance enhancements for applications, it also presents new operational challenges. Generally, it is deployed on bare-metal servers, with each database instance consisting of multiple server nodes. The largest instances can have thousands of nodes. Dealing with software and hardware upgrades, OS and database operations, and general day-to-day activities is manageable in environments with few nodes or clusters, however manageability becomes significantly more complex as your environment grows in node-count and cluster-count.

 

As scale increases, flexibility and speed during node maintenance becomes critical.

When bringing a node down for maintenance, your cluster is left vulnerable – part of your token range has one less node available to respond to read/write requests. Sure, there should be additional nodes available to handle each incoming request, however available performance is reduced and any write requests scheduled for the unavailable node must be tracked elsewhere during the duration of the downtime. Also, if using special read/write QUORUM policies, having only 2 available nodes for a given key can be problematic.

It is important to explore new ways to deploy and manage Cassandra clusters in order to increase operational efficiency. This direction leverages software to ensure the stability of services by reducing the complexity associated with operating Cassandra and other distributed software.

Emerging cloud native environments include components such as container schedulers and platforms which are critical to helping manage this complexity. They introduce a new application-oriented consumption layer in the data center opening new possibilities. Interoperability with resources such as compute, memory, and storage with applications becomes possible instead of manually preparing Cassandra nodes. Portability through containers and external storage then solves many of today’s maintenance complexities.

 

Managing Containers with Cassandra

Containers make managing Cassandra easier. They simplify packaging and deployment while removing the necessity to manage host operating systems as a dependent component. In terms of deployment they are emerging as a packaging standard that can be scheduled among a cluster of physical servers managed by schedulers such as Mesos, Kubernetes, and Docker. Specific to Cassandra, containerizing enables you to easily deploy multiple nodes on each physical server while maintaining close to bare-metal performance.

  • Container benefits for Cassandra:
    • Colocation w/ Bare-metal Performance
      Container technology allows you to deploy multiple applications on a bare-metal server, colocated and sharing resources much more efficiently.
    • Node Sizing and Resizing
      Containers allow you to pick cores/memory footprints for your nodes, however it’s also simple to resize these instances by destroying/recreating on the fly with new specifications. When a container initializes with its existing dataset, it doesn’t know the difference other than it has more or less performance than before. This is helpful for increasing performance when an application must scale, and possibly reducing performance characteristics if no longer necessary. This is much faster when compared to the process of adding / decommissioning nodes via the Cassandra interfaces.

Remote storage introduces a new level of flexibility when running Cassandra clusters at scale. Over the past few years, the Cassandra community delivered a strong message against running nodes on remote storage. There is a good reason for it: most remote storage systems don’t scale the way Cassandra does. Also, Cassandra requires significant disk throughput, and when you have 20, 50, or even 100s of nodes pointed at a SAN, the throughput won’t even come close to local storage performance of all the physical combined. There are however some software-based storage technologies such as ScaleIO that scale in complementary ways just as Cassandra does. These software-based storage technologies add both capacity and performance linearly as you add more nodes.

  • Remote storage benefits for Cassandra:
    • Migrating Nodes
      When there’s a need to move a cassandra node to a different physical server, whether due to maintenance or failure, the recommended solution is to either copy the local SSTables manually to a new server or to decommission the node from the ring. Centralized storage removes this burden.
    • Increasing Storage Capacity
      When compaction cannot complete because a server is out of space, it can be difficult to increase capacity of a bare-metal server. This requires additional storage hardware to be installed which will most likely require maintenance. When using remote storage, it is generally a simple operation to increase a volume’s size.
    • Local Node Performance
      In any given Cassandra cluster, each node has a maximum storage performance available to utilize. Some nodes are likely to be hot at any given point in time, where the utilization is not balanced relative to the rest of the nodes on the cluster. When running hot, a node may not have adequate performance available to serve incoming read/write requests. With remote storage, the maximum theoretical performance for an individual volume can be much higher than most individual local disks. This allows hot nodes to burst performance levels for disk reads/writes when necessary.

In Summary

While it’s important to focus on increasing operational efficiency, it becomes critical when you consider the fact that Cassandra isn’t the only distributed application in the modern data center. I have mentioned just a few reasons to explore the use of remote storage and containers. Cloud native thinking truly makes Cassandra a smarter application through flexibility, faster deployments, better resource utilization, and performance elasticity. I will be digging a bit deeper into the technical details in future posts.

For slides discussing some of these topics please see my Dell EMC World 2017 presentation.