For stateful, cloud-native applications, data operations must often be performed by tools with a semantic understanding of the data. The volume-level primitives provided by orchestrators are not sufficient to support data workflows like backup/recovery of complex, distributed databases.

To bridge this gap between operational requirements for these applications and Kubernetes, the open-source project Kanister was created. Kanister is a framework to support application-level data management in Kubernetes. It lets developers define relationships between tools and applications, and then makes running those tools in Kubernetes simple. Kanister is managed through Kubernetes API objects called CustomResourceDefinitions, and all interactions with Kanister take place through Kubernetes tools and APIs. In short, Kanister allows administrators and automation to perform data operations at the Kubernetes level regardless of the complexity of the application.

In this live webinar, Pavan presents how it is used and will demo protection operations on a live MongoDB cluster. This webinar is targeted towards developers and ops teams interested in stateful applications in Kubernetes.

This talk was given by Kasten by Veeam Technical Staff Pavan Navarathna as part of DoK Day at KubeCon NA 2021, watch it below. You can access the other talks here.

Bart Farrell: Very nice to see you, Pavan, how are you doing?

Pavan Navarathna: I see you are doing well. How are you?

Bart Farrell: Great. So what are we going to hear about today?

Pavan Navarathna: Today I’ll be talking about an open-source project called Kanister, and I can show how it can be used to do some application-level data operations on Kubernetes.

Bart Farrell: Alright folks, as usual, remember, we can continue the conversation in Slack. Feel free to leave comments and questions as well in the YouTube comments. Also, follow us on Twitter, smash that subscribe button on YouTube. It’s all yours Pavan, you can take it from here.

Pavan Navarathna: Thanks, Bart. So, a little bit about myself. I am Pavan Navarathna. I am a member of the technical staff here at Kasten by Veeam. For the past three and a half to four years, my focus has been mostly on solving data management problems for stateful workloads on Kubernetes. I’ve also been actively contributing to open-source projects and Kanister is one of them. I’ll be covering that in our session today. 

Before we talk about Kanister, let’s talk about data management on Kubernetes. In general, once all the infra and storage-related problems have been solved, data services like backup, disaster recovery and, application mobility take precedence. Based on our learnings here at Kasten, we have seen that there can be multiple flavors of data management, storage-centric snapshots that mostly utilize the underlying storage providers’ snapshot capabilities. In most cases, these are crash-consistent, and they provide a pretty good way to snapshot the underlying volumes. Now, one thing we know from this is that with the storage-centric approach, the snapshots or the process doesn’t really interact with the application itself. The second approach here is that you could have the storage-centric snapshots but have hooks or APIs provided by these data services, to freeze and unfreeze the application during the process. There’s also the data-centric approach where you could utilize the tools provided by databases like MySQL dump and PG dump. These are capabilities by the database itself, all the compression, encryption, everything is provided by the tool. And in most cases, the recovery process is kind of complex for this case. And finally, we have the application-centric approach. This is more of a logical method to provide a way to execute all these different kinds of data management in a coordinated manner.

For each of the data management approaches that I talked about, there could be its pros and cons in terms of speed, consistency and because each of them has different pros and cons. So the optimal strategy for your organization or your application would depend on the capabilities that are available on your infrastructure as well as the needs of the application you’re deploying. Cloud-native applications, in general, use different components. They could use different data services or they could use different storage technologies in the same application. Each of these components can have different domain experts, the Kubernetes administrator and then you have the app developer itself. And if you’re using any databases, then you have database administrators. All these experts have their requirements for data protection. It’s pretty difficult to separate these concerns. Then we have different infrastructures, moving parts. You’d have to select different types of backups, whether it’s logical volumes, snapshots, or even provide a specific like Amazon RDS. And some applications require scaling up and scaling down during the snapshot process. Then once the snapshots are taken, you need a destination for the snapshots, you either move them out to an object store or even NFS or file storage. 

A good data management approach would have to be able to provide a way to coordinate among all these things and implement these complex workflows. That’s where Kanister comes in, we built Kanister to address some of these concerns. It’s an open-source project which is purpose-built for Kubernetes and stateful workloads on Kubernetes. All these database-specific workflows can be captured in custom resources called blueprints and these blueprints can be easily extended and shared between different experts. We provide a standardized Kubernetes API to execute these workflows. 

What is Kanister made up of? It has four main components; the Kanister controller, blueprints, action sets, and profiles. Controller is an operator based on Kubernetes operator pattern. It’s responsible for the state management of blueprints, action sets, and profiles. Now, as I said, the blueprints are custom resources used to define workflows for backup, restore or even delete operations. Once these operations have been defined, we can use an action set to execute a particular action from a particular blueprint. Finally, we can use the profiles to determine the destination for these backups, or in the case of restore, they would be the source. There are also a couple of tools that Kanister provides studies kanctl and kando. The kanctl can be used to create action sets and profiles, and kando is a tool that is used in the container to move data to and from the object store. As I described earlier, a blueprint can be used to define the workflows for different operations. So let’s look at an example here. This blueprint is a CR, for let’s say, a MongoDB backup. The workflows are generally executed using actions and each action has one or more phases. Here in this example, we see a backup action and a single phase where we are using a function called group task. This is nothing but the Kanister function, where you could either have a shell script running as a command, or you could even create volume snapshots using these functions. I will talk about the functions at the end. But that is a Kanister function. And we also have output artifacts that are used to capture the state between phases. So from this coop task phase, I’m capturing the S3 part here. Moving on, once these blueprints are defined, we’ll see action sets. And this action set is selecting the backup action from the blueprint we just saw. It’s also selecting an object as the target for this action, that is the MongoDB replica set. And then we have a profile called example profile that is used for the destination for this backup. And once the backup is done, the Kanister controller sets the status of the action set. It could either be the artifacts like we see here or while the execution is happening, it constantly updates the progress of the action itself. Finally, we have the profile. We saw that it selects the destination but we didn’t see how it does that. A profile contains a location that could be of S3 compliant type or it could also be Google or Azure Blob Storage. Now here we are selecting an S3 compliant bucket called Kanister backup. It also has the credentials for that location. Now that we have seen all the components, let’s see how it behaves. 

Assume that you have a Kanister controller blueprint and a database workload, how do we back this up? The first step would be to create an action set and once the action set is created, the controller goes and discovers the blueprint referred to in the action set, gets the action or the definition of that action, and then interacts with the database workload using the Kanister function. If we use a coop task, it is going to spin up a pod. And that pod is going to connect to this database workload and then execute whatever command we have provided. And once the backup is complete, it can move the data to an object store, and that is selected by the profile. After all these actions are executed, the controller is going to update the action set and update its status to complete. In theory, we have seen how Kanister behaves. Let’s see a live demo and see how we can use Kanister to backup our MongoDB replica set. 

I’m just showing the namespace where I have the MongoDB replica set deployed on this cluster. It’s a GK cluster with Kubernetes 1.21 version. We have that stateful set running. We will check the pods, so we have two pods running. Now that we have the Mongo deployed here, let’s try to add some data into it and execute, and go into the container there and add some data. It’s just adding a simple restaurants table with four entries here, for different restaurants. If you have ever been to the Bay Area, you’d be familiar with these restaurants. It’s added four entries; we are just verifying that it’s added four entries. What we can do now is see how we can use Kanister to protect this MongoDB replica set. I’m going to check whether Kanister has been added to my Helm charts. It provides a Helm chart so it’s pretty easy and simple to deploy. Let’s go ahead and create the namespace for Kanister. Think I took a while to copy the helm commander. Installing the 0.68.0 version of Kanister. The operator has been installed. Let’s go ahead and verify that the controller or the operator is running right now. Let’s go ahead and install the tools that are required to create and execute some of these operations. There’s an easy script available to install kanctl and kando. Let’s skip ahead to minute 3 and 18 seconds. The download of this took a while. Now that we have the tools, let’s just verify if the tool was installed, it should be version 0.68.0. Yes. 

There’s also another step where we need a destination for these backups. Let’s go ahead and create a profile. We use kanctl for this where we provide a type of the profile and then the bucket name along with credentials we need to use for that. We will see that there is a secret created for the credentials and the profile is also created here. This is all happening in the Kanister namespace. The last step here to protect the MongoDB is the blueprint itself. Let’s go ahead and deploy the blueprint. I’m using the latest version that is available in the upstream Kanister repository. The blueprint should be deployed. You can always use coop control commands to get these CRs. So just to verify what is in this blueprint, we see that there are three different actions here. The first one is the backup and it has output artifacts where we are storing the location to the file that we create for the backup. Then we have the phases where we are using a coop task Kanister function and it’s executing the Mongo dump command there. And finally, using kando to push that Mongo dump data into an S3 bucket. The phase itself is called a consistent backup. You can also pass objects into phases. And here we are providing the Mongo secret with the MongoDB credentials. We see here the delete operation has an input artifact, which is nothing but our output cloud object that we created so that the delete action can then go ahead and delete that file from the S3 location. Again, we are using a coop task here. Finally, the restore action itself, using the input artifact, is just doing Mongo restore. The data is first pulled using kando location pull, which is another command that is provided. Finally, we use that data to do a Mongo restore. This is all done again, in a coop task. A blueprint is like any other CR, it has the metadata section with name and namespace. Everything is set up now, let’s just go ahead and create an action set to run the backup action and taking the profile that we just created, we just provide the subject for the action that is the Mongo stateful set, and then the blueprint from which the action has to be done. The action set has been created. Let’s go and check the status of that. The phase state consistent backup is currently running. Let’s give it a few seconds and check again. We now see that the state is marked as complete. We see the S3 paths stored there and it was using the stateful set as the subject there. Now that backup is complete, let’s just verify that the S3 has that particular file, the S3 bucket, so I’m just running S3 LS on that. We see that file was created. I just recorded this yesterday, so the timing shows last night, so the file is there. Now what we can do is just imitate or simulate a scenario where we delete the database or the tables and then see how we can restore that quickly. I just deleted the drop the table here. We’ll verify that it’s dropped. There are no entries right now. Let’s go ahead and recover that using Kanister. What I’ll do now is create an action set again, using the backup action set that we just performed. The kanctl can be used again to do that. And we’ll just provide the input action so that it takes all the artifacts from the previous run and creates an action set for restore. There is a from flag here, which will do that for us. Again, we can go and check the status here by doing describe. This has already been completed before we check the status. We can verify that it’s running the restore action from the Mongo blueprint, and the subject is the stateful set, and all the things are complete. As I mentioned, the events are noted at every point of the action execution. So it gives us an idea in case the data is quite large, it may take up a few minutes to run. We’ll know at every point what the status of that action is. Again, let’s go ahead and verify that the entries are back. Yep, we see the entries again. That was a simple backup and restore that we saw using Kanister. I had dropped the tables and finally recovered that using Kanister. We can go back to the slides. Thank you. 

So that was a pretty simple demo, we saw that we used the coop task Kanister function there. But that’s not the only available function. There are many other functions in Kanister as well, I have listed a few here. But due to lack of time, I think I’ll keep explaining each one of them. But feel free to reach out, I would be happy to give more details on each of these available functions. Now there are also some new things coming up. I’m pretty excited to talk about those. We are planning to add a guide for writing blueprints. So in case you are using a different database that we don’t already have a blueprint for, this guide should help you write your own blueprints. Yeah, feel free to reach out to us and we can work on a blueprint together. There’s also a plan to add file storage as a destination for backups. And we saw that kando moved all the backup data into S3 storage, right? So we don’t know how small or how big the data is. So we are planning to add encryption, compression, and deduplication for the data that is being moved. So in the case of large database dumps, we can pretty much use all these features to easily save some storage on either S3 or our file storage. Then finally, we have seen a lot of data services come up these days with operators. So that is also in our roadmap to add functions to help back up these operators. 

So finally, just to wrap up, feel free to reach out to us. Try out the project yourself. Like you’ll see here a small snippet of how easy it is to deploy Kanister as a developer or use the code itself and run a tutorial of a Mongo blueprint and Mongo backup and restore. So I’ve added all the links here to the Twitter handle and the Slack channel as well. Join us, feel free to try the project, recommend changes and even come up with your own blueprints. Yeah, if you are in LA do come and visit our Kasten booth. We will be here in person to talk to you about Kanister. Thank you