Macquarie Bank migrated stateful workloads including databases to the cloud using Kubernetes. The talk goes over how they progressively matured automation while also going from self-managed Kubernetes on AWS to Google’s GKE. As early adopters of the beta version of StatefulSets, they discuss how it matured and how they added additional customizations and database awareness with kOps, Helm charts, and operators. They also discuss backup/restore, secure config management, and a roadmap for the future.

This talk was given by Jon Lau and Jeremy Hann as part of DoK Day at KubeCon NA 2021, watch it below. You can access the other talks here.

0:02 Jeremy Hanna: Hi everyone. My name is Jeremy Hanna and I work as a data architect at DataStax. I’ve been working with Jon Lau and the Macquarie team, with their Cassandra clusters since 2014.

0:16  Jon Lau: I’m Jon Lau, the digital data platform owner for Macquarie’s banking and financial services group, otherwise known as BFS. Today, we’d like to share with you our journey to running databases on Kubernetes. At BFS we’ve developed an award winning online and mobile digital banking experience that provides a highly personalized, and intuitive customer experience. Features such as dashboards are auto-created from your transactions data, so you can view summarized information at a glance, such as your financial position, recent transaction activity, and spending grouped by categories with most spend. Budgets can also be set on different categories, so you can track your spending. These features allow users to take a holistic approach to personal finance. You can search transactions across your accounts, like how you would ask a person by typing questions such as “How much did I spend at McDonald’s last month?” Or if you are from the land down under “How much did I spend at Macca’s last month?” Customers can customize individual transactions. You can upload documents, such as receipts, or warranties to transactions to make warranty applications easier, add notes to transactions such as #tax to easily identify tax-deductible transactions at tax time. Even change the transactions category to your preferred category. When you make a purchase in-store, online, or from scheduled payments on your Macquarie credit card or debit card, you’ll receive a push notification, so you know the correct transaction has been processed. You can also customize what notification types you want to receive, as well as which accounts you want to receive them for. All these features are powered by Cassandra. The data is first extracted from source systems, then enriched with contextual data, such as merchant details, and category groups such as groceries, utilities, leisure, etc. then stored in Cassandra. The entire digital banking solution stack, including the backend Cassandra databases, are all running on Kubernetes. But this wasn’t always the case, so how did we get here? 

Back in 2015, BFS embarked on our digital transformation journey. As a digitally-led bank, it was important for us to have a strong online presence. This was our chance to revolutionize our digital banking solutions from the ground up. Our next-generation digital banking solution is a data platform that has scalable performance to handle the ever increasing workload, advanced and powerful search capabilities, and high availability. It had to be always on even during maintenance windows or infrastructure failures. We chose DataStax Enterprise built on Apache Cassandra, it is active everywhere with no single point of failure, horizontally scalable, with integrated search. This is the all-in-one solution that satisfies all our needs. We first deployed Cassandra using Ansible onto our on-prem infrastructure. However, we quickly hit the scalability and performance limits of our on-prem infrastructure which had implications for our customer experience. In the end, the best option for us was moving to the cloud. 2017 in search of stronger horizontal scalability, performance, and out-of-the-box automation, we turned to Kubernetes, this would prove to be a steep learning curve. See back in mid-2017, there were no Kubernetes managed service offerings available to us. So we had to build our own self-managed Kubernetes cluster, and due to our restricted AWS environment, we couldn’t simply use the out-of-the-box KOPs to create Kubernetes clusters. So we partnered with Kubernetes experts and over several months of CI/CD, we had modified KOPs that could be used to deploy our first KOPs provisioned Kubernetes cluster onto our AWS environment. Then in 2020, we started exploring Datastax’s Cass Operator to see how he could further automate our Cassandra deployment and reduce our operational overhead. We work closely with DataStax on this, suggesting features and improvements along the way. StatefulSets is the key ingredient to running stateful workloads such as Cassandra on Kubernetes. In 2017, we used StatefulSets to deploy our first Cassandra clusters onto Kubernetes. And today, in 2021 we are still using StatefulSets.

6:10  Jeremy Hanna: I find it fascinating that even in 2017 and the early days of StatefulSets, when it was an alpha as Pet Sets and then beta, they were actually really solid, you had to make some changes and do some significant testing around KOPs, the orchestration piece, and your restricted AWS environment but StatefulSets actually worked really well for you guys.

6:34  Jon Lau: That’s right, Jeremy. From our experience, it just works well out of the box for its intended purpose. We rely on the StatefulSet to manage the creation of sequentially ordered pods with unique ordinal identifiers, like Cassandra pod 0, Cassandra pod 1, Cassandra pod 2 and so on. Auto-heal pod it manages – if any pods in the Cassandra cluster are offline or deleted, we expect the StatefulSet to restart them. Maintain the link between pods in the persistent volumes. The same persistent volume should be mounted to the same point every time the pod restarts and performs zero downtime pod updates. On the StatefulSet object, if we increase CPU, or memory resources, or update the Cassandra container image, the StatefulSet controller will automatically update one pod at a time. It won’t move to the next pod if the current point fails to update.

To deploy StatefulSets onto Kubernetes, we first describe our requirements in a StatefulSet manifest.yml file. Say, we want three pods in our Cassandra cluster. In the StatefulSet manifest, we specify 3 replicas. We want each pod to run a Cassandra container so in the manifest, we specify the Cassandra image and the Cassandra version we want to deploy and each pod has its own persistent data storage volume. So in the manifest, we add a volume claim template section with storage requirements and define a volume mount to specify the persistent volumes not passed in the pipe. Then, we submit the request to Kubernetes and the StatefulSet will be created. The StatefulSet will then proceed to ensure the desired number of pods and volumes are created for our Cassandra cluster. And the correct volumes are mounted to the Cassandra pods. Kubernetes and the cloud have allowed us to simplify the deployment and management of Cassandra and underlying infrastructure. We no longer need to manually provision or maintain infrastructure, thus reducing our operational overhead cost and complexity, while gaining significant improvements in scalability, performance, delivery speed, and automation. However, there are still some gaps. The more on this later. Let’s talk about backups first. 

Although Cassandra is highly available and highly fault tolerant by design, Kubernetes retains your persistent volumes even after StatefulSet deletion backups are still needed in case of data loss or corruption due to human error, software bugs or hardware failure. We use Velero to backup and restore our Cassandra clusters running on Kubernetes. Since Velero is a Kubernetes native and cloud native application it understands how to backup Kubernetes resources that manage our Cassandra clusters. It also knows how to backup the persistent data volume for each Cassandra pod, using the cloud providers snapshot capabilities, like EBS snapshots, or, more generally, we can use Kubernetes CSI volume snapshots. We use the scheduled backup feature in Velero to automatically backup all workloads running on our Kubernetes clusters at recurring intervals. Different backup schedules and retention periods can be set up for different Kubernetes resources. For resources with more frequent changes like Cassandra clusters we backup several times a day, and other Kubernetes resources that don’t change that much are only backed up, say, once a day. Velero can also be used to perform ad hoc snapshots, such as just before Cassandra upgrades, or Kubernetes upgrades. And we’ve used Velero to restore from backups entire Kubernetes clusters, a single Cassandra cluster, we’ve even migrated a Cassandra cluster including all its data from one Kubernetes cluster onto an entirely different Kubernetes cluster. However, we can only restore full data snapshots of a Cassandra cluster with Velero, we can not do partial data restores. Maybe this is where Medusa, the Cassandra backup tool, can help.

Jon Lau: Although StatefulSets know how to manage Kubernetes resources, they don’t know how to manage Cassandra resources, we still had to manually maintain and update Cassandra configuration files. There can be differences between configuration files across Cassandra versions. We need to extract the template configuration files, then save these templated config files to our Helm charts, so we could override certain values on deployment. This made Cassandra upgrades time-consuming. We created bespoke scripts, random init containers, and PreStop hooks to perform Cassandra operations. Scaling down a StatefulSet didn’t scale down a Cassandra cluster. After we scaled down because of StatefulSet, we needed to manually run Cassandra operations for each node that we removed from the Cassandra cluster. We also learned that the cloud compute and storage needs to be in the same state, same zone. Otherwise, the persistent volume can be mounted to the pod. To manually set up multi-zone Cassandra clusters, we use Kubernetes taints and tolerations and affinity rules in the StatefulSets. We also created and deployed a different StatefulSet for each Cassandra data center. The result is one data center per availability zone. Thus, we can tolerate an outage of one AZ. Then, in early 2020 DataStax announced the arrival of Cass Operator. Cass Operator bridges the knowledge gap between managing resources on Kubernetes and operating Cassandra clusters, a turnkey solution to deploying and managing Cassandra clusters on Kubernetes. We worked closely with DataStax, suggesting new features and improvements, then tested the new versions of Cass Operator with the changes once they were released and repeated this process over a few months.

14:06  Jeremy Hanna: As we were talking before, an operator acts as a bridge between Kubernetes and StatefulSets, and the Day 2 operations required by a database like Cassandra. I remember a lot of conversations where John would kind of sigh and say, “Well, I guess I could use an init container for this but it’d be really nice if this could be done automatically with the operator and we’d subsequently get it into the operator.” One example of this was putting cluster configuration into Secrets rather than environment variables so that sensitive information could be access restricted properly.

14:54  Jon Lau: With the open source Cass Operator, we focus more on the Cassandra cluster that we want to deploy, unless what needs to be created on the Kubernetes level to achieve the setup we want. We rely on Cass Operator to manage our Cassandra clusters on Kubernetes. If there are any new events or updates the Cassandra data center custom resource Cass Operator will roll out the changes to meet the desired state. Now, we simply need to create a Cassandra data center custom resource manifest file and Cass Operator will order provision or configure Kubernetes resources such as StatefulSets, Persistent Volumes, Services, ConfigMaps, and Secrets. Before we manually create a manifest for each of these, Cass Operator knows how to seamlessly manage Cassandra upgrades and Cassandra configuration file updates, perform Cassandra lifecycle operations like bootstrapping a new node, graceful shutdown, and restarts and scaling down Cassandra clusters. It can run advanced workloads such as DSE search, and DSE graph. It knows which ports need to be opened for each advanced workload and automatically manages multi-zone Cassandra data center deployments. For safe data storage, Cass Operator makes sure that data replicas are placed across different zones.

16:33  Jeremy Hanna: So before Cass Operator, a Cassandra data center only comprised a single availability zone within the cloud region in these clusters. With Cass Operator, it can take failure zones, such as availability zones, or racks into account and automatically scale the cluster evenly across availability zones, so that if there are three replicas, one replica will be on each failure zone. So, you can withstand the failure of one zone and be just fine.

 

17:07  Jon Lau: Earlier this year, we also worked on deploying Cass Operator and Cassandra clusters to GKE using our Gitops pipeline. The Gitops pipeline GKE uses Anthos Config Management and a custom install operator that has integration with HashiCorp Vault and cert-manager. This means we don’t need to manage a separate CI/CD platform and we can simplify our password rotation and certificate renewal processes. Using Cass Operator we are now able to deploy Cassandra data centers on both AWS and GCP. And since DNS entries as additional states are now supported in Cass Operator, we can connect to standard data centers running on AWS and GCP to form multi-cloud Cassandra clusters.

 

18:08  Jeremy Hanna: Cass Operator avoids issues with trying to span multiple Kubernetes clusters across different clouds and regions by taking advantage of something that’s in the native Cassandra multi data center Cassandra features. Once the networking is set up between the Kubernetes clusters, then you just need to specify hosts by IP or DNS entries in each region as Cassandra seed value and Cassandra will operate all regions as a single Cassandra cluster. If one region goes down, or connectivity between regions is cut for some reason, the other regions can still operate independently, this completes the picture of what we’ve done today with deploying Cassandra with Cass Operator in Kubernetes.

 

19:03  Jon Lau: We worked across time zones and continents, we first started creating Kubernetes clusters and running stateful workloads on Kubernetes. Today, the barrier to entry has been dramatically lowered. Many turnkey solutions and SAAS offerings exist in the marketplace. This lets us focus on using the technology rather than operating it. So what’s next? There’s plenty left to explore – Kubernetes upgrades. As Kubernetes evolves, API’s get deprecated and are eventually removed in the newer Kubernetes versions. How do we seamlessly ensure existing workflows will still work in newer Kubernetes versions? Dynamic volume resizing. The expansion of Kubernetes storage classes is possible. However, this is not supported natively by StatefulSets or Cassandra operators. So manual workarounds are required at this stage. Evaluate K8ssandra’s suite offerings and SAAS offerings like Astra serverless.

 

20:23  Jeremy Hanna: Dynamic resizing of volumes would be helpful in certain circumstances when you need a bit of extra space. I’ve done this with Google’s specific dynamic sizing with GKE, but it would be nice to have that standardized across Kubernetes versions. Speaking about K8ssandra, Cass Operator was originally a standalone open source project. But since then, we saw that there were multiple companion open source projects that taken together could really help manage Cassandra clusters better. K8ssandra includes open source tools and integrations such as metrics collection and Grafana dashboards, Medusa for backup and restore, Reaper for automating cluster repairs, and an API gateway called Stargate to persist to and query from Cassandra with Rest, GraphQL, and a schema-less JSON API. All of that is put together in this K8ssandra project. Finally, DataStax Astra started out as a cloud based service offering to basically manage Cassandra clusters for you, that actually used Cass Operator under the covers to deploy and manage those clusters. DataStax Astra serverless is an interesting innovation in managing Cassandra within Kubernetes. It fundamentally re-architects Cassandra to be cloud native from the ground up, it decouples the storage from the compute to break the monolithic structure of Cassandra. 

This makes for much faster elastic scaling up and down multi-tenancy and allows for consumption-based pricing. We’ve seen through all of this that data on Kubernetes is a fast-moving space for data systems. It’s gone from a lot of manual work for day two management of clusters, to mostly automated, to now being able to run Cassandra in a serverless fully cloud native model. Over the next year or two, we think this space will continue to accelerate to make Cassandra much more naturally deployable and manageable within Kubernetes. 

Thanks everyone, for your time and attention today. Feel free to reach out if you have any additional questions.