Cloud Native Database as a Service using Kubernetes

Dec 13, 2021 by melissa

In the recent Cloud Native Computing Foundation (CNCF) survey, managing complexity is the number one issue faced by Kubernetes architects and practitioners.

Deploying stateful applications in Kubernetes further adds to the cognitive load of practitioners. MayaData and Percona have partnered together to build a unified Helm chart aimed at providing an easy onramp to a highly performant Database as a Service offering.

In this talk, Percona, Product Manager Sergey Pronin and Mayadata VP of product and solution engineering Murat Karslioglu demonstrate the joint solution & answer questions on how to run stateful applications in Kubernetes.

This talk was given as part of DoK Day at KubeCon NA 2021, watch it below. You can access the other talks here.

Murat Karslioglu 00:00

Hello everyone. Thanks for joining us today. My name is Murat Karslioglu. I’m the VP of product and solution engineering at MayaData. Today I have Sergey Pronin and Jason Meshberg. Joining me from Percona. We will talk about cloud native database service with Percona and MayaData and simplifying database management on Kubernetes. And we assume our audience today has a basic understanding of Kubernetes and data management with the cloud native application. So in this webinar session, we will work with practitioners, we will not talk about many beginner concepts, but they can be found in our documentation link we will share at the end. Sergey, anything you’d like to add, could you please introduce yourself?

Sergey Pronin 00:54

Nothing to add, but just to introduce myself here, I’m Sergey. I’m a manager of technical product managers at Percona and I’m mostly focused on our Q&A operators and databases in the cloud. And I will be thrilled to share our knowledge and wisdom with you today.

Murat Karslioglu 01:16

Glad to have you, thank you so much. This graph is from the most recent 2020 CNCF survey report at the time we present today. This session 2021 survey was still in progress and the report will be out probably before or during the KubeCon North America event. In summary, Kubernetes use in production has increased and this report says up to 83% up from 78% compared to a year before and most importantly, more than half of the respondents use the stateful application in containers in production these days, and organizations are testing new use cases or moving more workloads as they become more comfortable with containers and again in 2020 complexity, this graph shows that the complexity, joined with cultural changes with the development team shows up as the top challenge in using and deploying containers. As you can see, storage is in fourth place by 29%. Sergey, can you tell us why people increasingly run databases on Kubernetes stateful applications?

Sergey Pronin 02:39

We see it more like an evolution, we all started a long time ago with bare metal servers, there was a big huge metal box where you were running your piece of software and the database. And later on, as IT systems were developing, and the demand was growing for running the software, VMs appeared, and firstly, VMs were considered not like something good to run your database or even your application because database administrators were hesitant. They were saying, “Hey, what about performance, if I run on a VM, my performance is going to degrade, I don’t want it”. And looking at today, we see that almost all databases in the cloud are running in VMs. Like RDS it is a virtual machine with some virtualization behind it. And then containers appear, and containers allow users to not only just pack things more densely, but also empower the microservices architecture. So instead of having a huge monolith, huge multi-petabyte database, what was happening in the businesses was breaking down the applications and databases into smaller pieces. And once containers were going forward, and running hundreds and 1000s of containers started to become a problem then Kubernetes appeared to orchestrate all these inconveniences? And this evolution is ongoing not only for stateless applications but also for databases right and everything is moving to containers. Nowadays, everything is turning into microservices. And for some containers becoming a strategy and Percona also, it is more like a customer user-driven because people started asking us “hey folks, how can we run our databases on Kubernetes, where Kubernetes show up, we’re moving all our infrastructure there and it’s not only big companies but also startups, and even huge in the enterprises. And if we talk about the databases on Kubernetes, usually it is a cluster, as Kubernetes is volatile and ephemeral, the nodes can come and go, containers are restarting and they’re immutable. It takes a lot to run a database on Kubernetes because you need to configure this cluster. And you also need to execute some management tasks. And it’s not only the database that you’re running, but it’s also some kind of a proxy like for MySQL can be an HAProxy or ProxySQL, you need to configure a server, you need to configure the database itself to set up the replication and to take backups, and so on. And at the same time, there are lots of options in Kubernetes world for storage layer, your database can be running in local storage, on cloud storage, like Amazon EBS, or GCP, compute storage, and some other cloud native solutions, like Rook, for example, that you can run on your Kubernetes cluster. And the greatest complexity comes with Day 2 operations. As I said, backups, scaling, upgrades, maintenance of the database, and this on Kubernetes is becoming a huge problem, especially for new users or for someone who just starts this journey on Cuban is for databases.

Murat Karslioglu 06:39

Having options sometimes can be a good and a bad thing as well, because as we see from the users, they are looking at the alternative, managing Database as a Service. So the developers just want the database to work. And they don’t want to think about storage concepts as we witness. Without users, they don’t want to think about how many replicas should I have with the storage? They just want the availability, they want their replication to work.

Sergey Pronin 07:13

I agree. And, as you said, what the user wants, the user wants the database, the user doesn’t want to configure the Kubernetes primitives, like odd StatefulSets, or the database itself, what the user wants to get is a service. And that is where operators jump in nicely. Because what operators do they live in between the Kubernetes primitives and the application or in our case, the database configuration and instead of spinning up the StatefulSets, creating the Q&A servers, configuring Secrets, ConfigMaps, everything on Kubernetes and then go into configuring the database with the user does, is just sending the YAML manifest to Kubernetes API saying, “Hey, Mr. Operator, please give me the database with this number of nodes, with this version, with this proxy, this is how I want to configure my backups” and that’s it, then the operator does the rest, he provisions the Kubernetes objects and resources and configures out metrics with the database. And what the user gets in the end state is just the endpoint, the root user, and the password that the user can use to connect to the database. And that’s it, the user doesn’t know anything about the database, how it is configured, what’s going on, and behind the scenes, it doesn’t matter, the operator does the rest. And obviously, the operator does the day 2 operations and management like backups, scaling, and so on. And if we talk about recording a distribution for my scale operator, our goal is simple: we want to provide users with a way to deploy and manage enterprise-grade MySQL clusters on Kubernetes. And we do it with the Percona expert DB cluster which is a synchronous replication based on Galera and MySQL. And we have numerous integrations like, out of the box that operator not only gives you the MySQL Cluster but also provides you with an HAProxy or ProxySQL for your choice, HAProxies is for layer 4, ProxySQL is more of a deep dive into the inquiries. We’re also providing an integration with Percona Monitoring and Management, which means you can deploy your operator, your database and you get all the metrics, all the alerting out of the box just working. We also provide custom sidecars so you can tweak the operator in any way you want, like to do some benchmarking, some third-party monitoring, and so on. And obviously, and more, I believe it’s the most important thing is data 2 operations, like backups, scaling, and everything that goes with the database and the pains of it. It works out of the box within the operator. And more to that our operator is 100% Open Source, anyone can use it, we are not going with the open core model. Everything is open, all the features can be accessed by users with no fees.

Murat Karslioglu 10:38

Thank you, Sergey, this was insightful in what we witnessed in our community, probably four years ago, when we were at Kubecon. Well, some of the users were still experimenting with Kubernetes. Is Kubernetes the right platform for me though I containerized my workload? But the last two years he’s been well, we don’t hear that question anymore. Kubernetes is now accepted. And people move their databases stateful applications on Kubernetes. And questions of the last two years have always been with the increasing use of operator or start of the operator, how do we do the second-day operations? How do I keep my database updated, because now users have instead of one monolithic, larger database, well, there are use cases for that, too? But they now host hundreds of databases and how do you upgrade when you need to. So one by one would be almost impossible. And now operators changed our life and as Sergey mentioned, every workload now comes with a different storage expectation, performance, and availability requirements too because distributed databases provide their multi-master replication, and using the legacy highly available storage would not only waste resources, well the disk resources, but would also slow down performance you can directly get from raw devices instead because network adds overhead and replication over synchronous replication over the network would add another layer on top of what you already have with your database. And Percona XtraDB Cluster enables synchronous multi-master replication each node is regular for Percona server, for MySQL instances and it implements ProxySQL in front of the cluster to assist with several functions, such as splitting reads and writing traffic among nodes as needed. And these volumes need to replace the dynamic duo on fast reliable storage. So Percona Xtradb Cluster can be provisioned with the OpenEBS volumes using different flavors of OpenEBS such as Local Persistent Volumes and CSI if there is additional high availability needed based on the cloud configuration and application requirement, but in our session today, we will be installing PXC environment on OpenEBS local PV hosted and before we get to the demo, we’re also talking about the lessons we have learned in the last, four and a half years been working with the users moving their stateful applications to Kubernetes. Kubernetes teams vary in size but a large percentage of them are self-sufficient small mid-size teams in terms of number and then poor team control and granularity became important to these teams and being able to decouple storage from the infrastructure or cloud render is important. So we can run the same environment, same manifest, same pipelines on any cloud, any hardware, even on-prem and migrate if we need, if there is a change in any parameter can be costs or floor or any decision there. So we don’t want our deployment to be affected or tied to infrastructure pieces. Kubernetes by design supports all these. We call it agility enablers unless you are tightly coupled to an infrastructure piece such as storage, and this is where the OpenEBS project comes in an open-source, easy to use. Maintaining granular is a big requirement granular container attached storage solution. So it’s a new term it was coined after Kubernetes popularity increased casts in OpenEBS is a CNCF sand project leading cash solution in a CNCF currently also looking to take the steps to move into incubation and it’s as a storage solution. We should say that it’s not a replacement for your storage. It’s an additive to underlying systems or cloud volumes or JBuds or your existing appliance because you don’t overnight replace your legacy storage system, you want to convert or you want to take advantage of Kubernetes CSI primitives and be able to granularly dynamically provision volumes. So you, you reduce the complexity. How does OpenEBS local PV host work with Percona? So OpenEBS provides, as I said, multiple storage options to get replicated and non replicated options. So the local PV option we will be using on our demo today is a dynamic hosted provider. Kubernetes also has a local PV. But that requires manual provisioning, therefore, may not be or it’s not suitable for dynamic scaling of your DB sometimes. And in this case, OpenEBS is in this deployment not directly in the database. Therefore, there is no performance overhead at all, you get close to the raw device performance, there is only a file system between your data and the device, especially with the latest NVMe SSD it’s very important and increasingly portable. So, we will share the link to the repository at the end of what we have done here. We have created the helm chart, and let me move to that screen. So we have created the Helm Chart and so with the helm chart, it installs Percona Kubernetes operator for Percona XtraDB cluster in a very simple but reduces complexity a lot and requires OpenEBS to exist and if you go into the chart requirement, it’s pretty much the same and then it has dependency into OpenEBS edit. And the other difference is if you go to values you can see. So when it deploys the cluster, it uses the storage engine that OpenEBS creates. OpenEBS hosts all these parameters, you can change the values file before we apply. But we’ll just demonstrate what was documented in this repository. To show you how it works, I’m gonna switch to my terminal screen first and install the repository. And before I do that, I had just before the webinar deployed, thanks to the new simplified tools available. So I use CUPs to deploy a four-node cluster and there is nothing on my cluster. What you need to do if you haven’t done it before, so need to add two repositories. Well, I already have it on my setup, but I’ll run it so it says it exists. And then you also need to deploy and add the Percona repository (https://github.com/percona/percona-helm-charts). So this also exists and then you do helm repo update to make sure you have the latest one in my case I have a bunch of others. So it’s very simple. From the repository here, you deploy the chart and you can also deploy this in a namespace if you prefer, but I’m just showing very basic (helm install pxc-operator mayadata/percona-openebs). Once this is done, we’ll see we can get Kubectl, kubectl get pods because I didn’t pick your namespace, it just deployed everything to default. So it’s doing its thing here, you can see get sc up. So it created some storage classes, I want to talk about this one second waiting for kubectl get pods to come up. So we have an optional step here, I shared a couple of default storage classes if you prefer and if you’re running on different flavors of AWS, you get tested NVMe devices, you can also create additional storage classes for different types of databases to use that faster device, what you need to specify here are two things. One is the name of course, and where that device is mounted. So basepath where you want your hostpath to be stored, and the volume binding mode is in this case, wait for the first customer. So after this, we will pass operators should be up and running. So it’s very quick. So let’s take a look, I hope my cluster is in good shape too. As I see one out of two not running and there might be a difficulty in my cluster. But this will not hold me from showing you the next step. So what you do here from the Percona repository, there is another chart that deploys the database. So you can specify create, create this in a different namespace or just deploy in the default for the sake of the demo, but so I will just deploy this in a PXC namespace. You can also again set if you have a different storage class, you can define your storage class here, and let’s take a look before we do that get sc very quickly as you can see, so we have a whole spread and let’s describe the kubectl storage class and get OpenEBS hostpath which my PVs are using (kubectl describe sc openebs-hostpath). Let’s take a look at how the default one looks so as you can see, it’s using the default basepath under OpenEBS local, and as you can see there is a PVC in bounds and then using that OpenEBS storage class. And so let’s create this and it’ll be it, you can follow these steps to root into your cluster and this gets the root password from the Secrets, and then using that password you can connect to the application. But that’s not the point of our demo today. I’ll go back to some additional details. You may want to use the same Instructions, also step by step instructions are on our MayaData website. Under workloads you can find Percona there, you can follow the step-by-step instructions to get your cluster up and running using different flavors of OpenEBS. And so, as an again the CNCF survey shows complexity is an important problem blocker front of wider Kubernetes adoption of containerization, which OpenEBS enabled Percona together operator, you can use it to simplify. So everything you have seen in the demo and OpenEBS and Percona. Like Sergey also mentioned, everything is open source. And there is a big community behind both projects. What we see from our users, and that was the purpose of building the join operator. Dynamic provisioning helps OpenEBS users to accelerate their deployment and provide autonomy to their developers and small, small two-pizza teams. So they don’t need to ask the storage admin to create a lawn and then mount it on any device. So the bar is high in storage, of course, cloud vendors have simplified the way we consume storage now the expectation is also the same. So we want everything to be provisioned very quickly, very fast. And so that’s what we see when we hear from our community. So using this type of solution, they accelerate their deployment of services into production. And here I share some resources. You can find the chart repositories from MayaData and Percona here and our solution guide also Percona CTOs blog about Kubernetes operators with OpenEBS local storage on Percona blog.

Data on Kubernetes Day Europe 2024 talks are now available for streaming!

Cloud Native Database as a Service using Kubernetes