Attend Data on Kubernetes Day at KubeCon North American on Nov 12th

Register Now!!

Anatomy of a DBaaS: Bringing self-serve databases to Kubernetes with open source

This talk was an under-the-hood look at developing an open source DBaaS. We looked at the components that are used for its backend services, frontend, and the inner workings of a DBaaS running on top of Kubernetes.

We also showed demos of creating, deleting, managing and otherwise wrangling databases using operators and explaining what happens behind the scenes. This project was still in heavy development, and we had lots of recent lessons to share.

Speakers:

  • Peter Szczepaniak – Senior Product Manager, Percona
  • Diogo Recharte – Backend Software Engineer, Percona

Watch the Replay

Read the Transcript

Speaker 1: 00:01 Alright. All right, we’re live. All, thanks everybody for joining us for this doc talk. We’re excited. We have Peter and Diego here to talk about the anatomy of DBAs, bringing self-serve database to Kubernetes with open source. Yeah, we’re really excited to have you two here. We’ve had this planned for a couple months, so without further ado, I’m just going to hand things over to Peter and Diego to get started, so I’ll stop sharing.

Speaker 2: 00:30 Thank you very much. I’ll take over the share and first of all, thank you very much for the opportunity to talk to you guys today. We are really happy that we have this opportunity and hopefully we can share some of the knowledge and some of the things we learned. And without further ado, I’m going to just dive deeper into our agenda. So what we want to touch on today is first I’m going to quickly share some of my learnings about should you buy or DIY, your database as a service, some of the benefits, some of the reasons why people are going databases as a service, some of the differences between the self-hosted, self-managed and the one that are managed available with hyperscalers. Then the yoga will show us a little bit of a database service in action and we are going to play around with it, wrangle it a little bit, hopefully show you some cool, neat things you can do and how easy it is. Then yoga will also explain how this all works in practice and at the very end there will be my humble request as a product manager to all of you guys.

01:42 Meanwhile, if you have any questions, we have the chat here open. We’ll try to keep our ears out, but if we miss anything at the end of the presentation, there will be also time to ask questions and we’ll hopefully be able to answer. Alright, so should you buy or DIY Basically, I love this phrase, I learned it from some crazy YouTube channel. I think it’s just amazing. There’s so many applications in real life where you actually need the thing. If I can buy, I need DIY, but maybe quickly, first of all, what I learned, and I’m a product manager and I talk to a lot of customers, end users as well as our community users and non-users, people who are not using our services or product, but also either using our competitors and I try to combine some of that knowledge and those findings into couple slides.

02:40 And I want to explain here, like I mentioned earlier, why people in general are picking database as a service. When is the good moment? What is the trigger usually that triggers people, oh, I cannot maintain this on my own, on directly VMs. I need a database as a service. And then I’m going to go through some of the challenges that larger, smaller companies are facing with some of the available hyperscalers and hopefully this will help you make a decision. Is this worthwhile? Because you need to always take in consideration you going to self-manage and self-host your own serves. That’s always some additional work. Okay, so when people are in general going to database as a service, there are, like I mentioned, there are the top three reasons I would say they’re looking for quick and easy database deployment and not always the DBA A is looking for this, but many times the DBA is looking for a way to shift some of the responsibilities to the development team, give them more power, empower them to be self-sufficient and be able to self-provision.

03:45 And for that they need and quick and easy UI or API some sort of interface that can operate with another trigger is if you have multiple internal customers or product teams or whatever you call it. Basically if you are maintaining multiple product and the scale is just huge, the scale is one of the biggest triggers to go into database as a service. So if you keep provisioning, backing up, destroying, recreating databases, this is definitely the trigger and the second and third, but very connected with the second one I would say is the core reason why I got an answer is that the ops infrastructure and operations team are just not scaling with how fast the organization is scaling with new products and new ideas. And this pushes them into looking for solutions like databases to shift that responsibility and don’t have to scale everything with the teams.

04:49 Okay, so a couple of the challenges that I heard as a challenges for public database as a service, one of the main ones that was usually mentioned maybe not as a, because it’s never this single reason is causing us that we either must or cannot use. It’s usually a combination and it’s always a waiting of a lesser evil. But one of the first and main cool things was a security and data privacy. And although all the hyperscalers have all the means to provide security and they meant well, they’re spending tons of money building new security features and measures, but we know leaks happen and when they happen, there are just some companies that cannot completely afford this. So this would be a big one. But another, and I would say maybe even more common actually than the security and data privacy or equally common was the limited customization.

05:54 The big issue with hyperscalers and all the dbu public DBUs providers is usually you basically get what you get. There is some limited customization you can perform, but if your application requires something or you actually want to go above and beyond to optimize your database, you might hit some walls, you might hit some edges of the black box that is actually sitting inside which you don’t have access to. And this was in some cases a limiting factor. Another thing is performance variability. This is happening less and less, but still there are stories on the internet you can find that about an issue that was for a very long time investigated by a team why some variety in performance is happening. And it was actually the reason was something called a noisy neighbor and the cloud provider that was causing those issues for different reasons, but this unable to fully control what is happening around the environment.

06:56 For some organizations and some applications, it’s just unacceptable. Compliance challenges is another very big thing, although like I said also all the hyperscalers are building up to provide security in those manners, but there are still some compliance rules that you just need to be completely on-prem, completely isolated. And this is also one of the reasons why people cannot go with a public database as a service. And last but not least is a vendor lock-in which is being mentioned here and there. It’s actually a huge issue because if you’re committing to a single vendor that has a specific technology or maybe a changed open source technology, you are tying yourself to this vendor or you are risking a heavy and painful migration to a different provider. If you ever have to looking at the chat real quick, taking a quick sip, let’s continue. Okay, so is private DBAs private database as a service something that your organization should look into? So again, if you’re looking for flexibility with your deployment, if you want to have that customization, if you want to have that ease of use and that freedom that you can give your developers access but still have control some sort of a control over what is happening, you want to integrate it tightly with your environment or maybe run it in a completely air gapped environment, then absolutely the answer is yes.

08:42 Again, couple other benefits of running a private database as a service. Well, you can basically infinitely enhance your data security, it’s your data center, it’s all up to you and how you’re going to manage this. I mentioned customization and control, but this is super important when you actually want to give the power to development teams to manage their databases by themselves. Predictable performance for any of the production and critical environments is also very important. Compliance data, serenity, I also mentioned that before and scalability and the resource efficiency. This is also not mentioned very often that with a lot of things in our IT industry, it depends, but basically you can think about this that hyperscalers are great for a small to average scale, but there is a point where you’re going to reach a scale where hyperscalers become inefficient and yeah, you can get a lot more bang for your buck running things on. Prep the chat real quick, don’t see any other questions. So now Diego is going to show us a DBAs and action. I’m going to stop my share.

Speaker 3: 10:05 Yeah, so hi everyone while I start sharing. Just want to start by saying that, yeah, I’m going to showcase Ana Everest, which is a fully open source private database as a service that can be easily deployed with a single command to your Kubernetes cluster. You can run it in whatever cloud you want, you can run it on the public clouds like AWS or GCP or Azure or you can run it on your own infrastructure if you want. But I want to show you start by showing you how easy it is to get a database going. So when you’re in this main dashboard, and this is the main dashboard for the per one Everest, if you click on the create database button, you would see first that you need to pick from one of the namespace. So when you install Everest, you can provision how many namespace you want.

11:08 You can say basically, Hey Everest, I want you to manage the dev and the prod namespace and it’ll install all needed components there. And you can then follow up with the installation. And here you can see that I have on this namespace available three different database engines. We have MySQL, MongoDB, and Postgres. And after choosing which database engine you want to deploy, you can name it and you can pick from a list of database engine versions as well as you can pick and choose your storage. Plus this is one of those things that you start seeing the benefits when you’re self-hosting stuff and giving the control. So if I had here and any number of storage classes that my storage provider gave me with my CSI driver or something, I would be able to choose which one to use with this given database deployment. And if I just go and continue, then you move to the page where you actually start getting a little bit deeper into how is this database going to be run. So you can first choose between the number of nodes, so the horizontal scalability of your database cluster, as well as setting resource limits for the database engines. Engine nodes you can pick between the presets that we have here or you can choose the custom one and manually input the values that you want. I’m going to continue with the small one for this purpose.

Speaker 2: 12:43 Yoko, can you make the font a little bit bigger? Sure. There is a question about this.

Speaker 3: 12:48 Is it better? Perfect,

Speaker 2: 12:49 Thank you. Yes, much better.

Speaker 3: 12:52 Okay, then moving on to this section. You know that a database is not just the day one operations. You don’t just want to deploy your database and forget about it. You want to be able to do data operations, which means backing up, being ready for a disaster recovery scenario. So with Everest, this is very easy. You can just come here and say, Hey, I want to create a backup schedule. I want to back up, for example, daily at midnight, and I wanted to go to these backups to go to this S3 bucket. So you can have any S3 compatible, A PII pre-configured this one. But yeah, as long as it’s an S3 compatible API, you can set it up in Everest and you can set a given schedule for the backups. You can also enable point in time recovery so that your transactions are constantly being backed up so that you can be more granular when you’re actually restoring.

13:54 Then you can also run into some more advanced configurations. So let’s say that you actually want to expose your database externally to your Kubernetes cluster. You can quickly toggle it here. This will provision a load balancer for you and you can configure a single IP or a source range where you will be able to access this database from so that you don’t have the database fully open to everyone. You can just pin a few ips here as well as if you want to be more specific. You can also set a specific database engine parameters. So this will allow any DVA to customize the hell out of the database engine if they need to. And last but not least, we also have the possibility to enable monitoring. So as of now, Everest only supports PMM, which is percona monitoring and management, but we’ll adding different monitoring technologies in the future.

14:54 But again, by enabling this toggle and selecting which instance do you want to push metrics to, you will immediately get the database metrics flowing onto the monitoring instance that you have configured. And yeah, by just pressing create database, now the process has started, your database is being created and you can see it show up in the list here. And while it’s initializing, you can actually go into the details of this database, navigate into the components tab and see the status of each components. In this case, this is a MongoDB database and I deploy the single replica set. So this is the only thing that you see, but you can expand it to see which containers you have running within this pod. But if you have further nodes, then this will be showing up there as well. So as any cooking show does, we have a precooked database here running and I want to show you just a full stack application running on this database.

16:00 So if I switch tabs to here, you will see that this is a, I actually borrowed this from the MongoDB developer’s website. This is a MER stack application, so basically a react frontend running with node js backend and MongoDB database. And it’s very simple. You just have a list of of employees and you can just come here and create an employee. For example, I can add myself as a backend dev and yeah, just a standard list. You can add the lead stuff, nothing too fancy here, but let’s take then a standard use case, a disaster recovery use case. Let’s say that we have this guy, this Bob Ross guy that was our internship. He was doing an internship with us, but we don’t want to hire him because he’s too abstract thinking. He doesn’t really know too much about web design. So I actually want to remove them from the list.

17:03 But by mistake, I come here in the lead, Chuck Norris, oh God, our 10 x developer, our full stack developer that does all the work for us is now gone from the company records. What can we do quickly, we can jump back to the Everest interface, open the database that we had running there, this DOK demo. And by going into the backups, I can see that, hey, thankfully I have two backups that were run just a few hours ago. So by selecting here, and I can easily restore this database, I can restore directly to this database, or if I wanted, I could create a new database from this backup, evaluate the data, do some testing, and then change it back. But in this case, I just want to change the data on this database or restoring to this database. And here I can pick from the two full backups that we have there.

17:59 And if I had point in time recovery enable, which I didn’t, we would also be able to select from point of time recovery and be more granular and set a specific date. Like I want to go back to this moment on this transaction and choose to restore from there. But for now, this full backup will do. And if we press restore, you’ll see that the database is on the restoring state. And if I open the database, we can have, first we see a banner that says, Hey, you’re restoring the database. You can’t do any actions on editing it. But yeah, there’s here a restore that is happening yet still and within a few seconds, and obviously this depends on the size of your database as well as the database technology. In this case of Mongo, this should be a relatively fast process because the database is very small and you can see that the database is already up and running. So if I just changed to my application and refresh it, you would see that, hey, Chuck Norris is back. We just avoided this big problem of basically firing our best developer. And yeah, this is how easy it can be to manage the database on Everest. It’s super easy to just restore the data, create databases, no matter what the technology is. So in this demo I showed you MongoDB, but the process is exactly the same for MySQL or Postgres.

Speaker 2: 19:31 There is a question from Ben on the chat, Ben is asking, is Everest, so Everest is a front end to different operators running on Kubernetes? Yes, you can think about Everest as a front end or abstraction layer. I also wanted to mention that everything you’ve seen here as a front end, you also have a rest API that you can use to perform every single action in the same way through the rest. API currently have rest supports three database technologies that are supported by Perona operators, but we do in the near future one add, click, house as well as key. And we eventually would like to make Everest kind of a platform where you can connect any type of operator that is running in Kubernetes, at least database.

Speaker 3: 20:21 Yeah, thank you Peter for that. Hopefully the structure, the architecture of Everest will become completely clearer for you guys with the rest of the presentation. So let’s see if I can go a bit deeper on this. So I don’t want to waste too much time here because this is the DOK community. So you most likely know how a database is run on Everest. But just as a refresher, not on ever on Kubernetes, but just as a refresher, it’s often a misconception that people say that databases should not be run on Kubernetes. And this misconception comes from the fact that Kubernetes, the core Kubernetes does not have primitives that are ready to go to be used with databases. For example, we do have stateful sets, which will give us very important stuff like the ordering of spinning up and spinning down pods. Also, it ensures that the PVC will get attached to the same pod, which are both prerequisites to be able to run databases, but this is not enough. As you guys know, databases are just a lot more than just deploying and having it ready to go. You need to have failover, you need to have a replication going, you need to be able to perform backups, restores. This all comprises a system of a database, and that’s exactly why the community built operators that leverage these primitives but build on top of them. And I’m using here again, a generic MongoDB operator, but you can replace this whatever operator you want.

22:08 And yeah, an operator is not an else then process that is running and is basically looking out for the resources and reconciling based on the specification that you have. And there are in this case three relevant custom resource definitions that the MongoDB operator will then reconcile into the Kubernetes primitives that make a database. So starting from the MongoDB CRD, this is where you would specify how many nodes do you have, what type of architecture topology you have and other stuff, which greatly simplifies the deployment process of a database. So now by managing a single manifest, you can easily tweak the database and then the operator will make sure that whatever you specify, when this little number of fields gets replicated or reconciled onto the Kubernetes primitives that make it happen, it’ll also take care of ensuring those failovers that I was mentioning and such. The same can be said about the backup and restore CRDs.

23:19 So if you want to trigger an on-demand backup, you can just create this single resource, which will then the MongoDB operator will then reconcile it, make some magic happen under the hood that will trigger the database to do this backups. And the same with the restores. What we found on Perna, we found that this, while very good and very useful for many folks, it’s not enough. It starts to become a bit cumbersome and it requires that you have a lot of technical expertise, Kubernetes expertise to still manage this. Obviously it simplified the deployment process, but you still need to be a Kubernetes expert. And like Peter was mentioning, we think that it’s very important that many develop, not developers, sorry guys, many operate people that are managing these clusters, the Kubernetes admins that they can shift left the responsibility, shift the responsibility to developers to be able to deploy some databases, to do some testing without needing to require the ops guys to be always helping them.

24:39 So we went a step further and we created our Everest operator that it’s more opinionated, it has less tweakable fields for the database, but it’s also easier to understand, easier to manage and will allow basically people to deploy databases faster and also agnostically. So this is a common interface that will work for Mongo, it’ll work for MySQL, it’ll work for Postgres. So we developed a common interface that people can learn this simple CRDs, this simple way of specifying these resources and then the Everest operator will do the hard work of translating it onto to the specified database engine operators. This also makes it extensible. So if you want add a new operator, your API still holds the same. You don’t need to learn a new API, you just need to tweak the Everest operator to be extensible in this fashion and add support to a new operator.

25:45 And to wrap this up, as Peter was saying, we also created an API that a restful API that will allow these non Kubernetes experts to also interface with this. So if you want to write a script in whatever language to do some API requests to provision 100 databases at a time, you can easily do this with the Everest API, or if you instead just want to use the ui, you’re also free to do so. So if I can summarize the requests of what I showed you guys in the demo, this will be like whenever the user presses the create database button, the UI does an H TT P request to the Everest API server to create a database cluster. The Everest API server will generate a database cluster cr, which will then get interpreted by the Everest operator, which will translate this to a MongoDB CR, that then the MongoDB operator will translate, reconcile onto the ified resources that need to happen today to get that database running. So in general, if you’re the Everest user, you can easily use the UI or API to interface with all of this without much added complexity and I’m sure that there will be further questions. So Peter, are you seeing anything there?

Speaker 2: 27:12 Yes, there are actually a couple of questions. So take a short break, Diego and I will actually, there’s a question from Anthony, can you please go more into details about understanding where is the soft spot of running DBAs on Hyperscaler versus on-prem? And unfortunately the answer is it depends. There is no super soft spot. So unless if you are not restricted by any of the compliance rules or any of the additional flexibility or any of the super resiliency and avoiding noisy neighbors situation, if these are situations that you can accept or kind of a work around with your application or any other way, I think the pricing is always like the most common that people look at, right? Because again, at some point with the hyperscalers, they are becoming very expensive, but it’s usually a large scale and I cannot put a true number because, and unfortunately it depends. So I’m sorry, I cannot answer directly. There’s one more question for you. yogo will Everest three, the list of namespace in the cluster where it is deployed.

Speaker 3: 28:22 So I didn’t show you guys the installation process of Everest, but right now we will add helm support in the future like helm charts, installation through helm charts. But right now the way that you install Everest is by downloading our CLI tool and running a single command. So you run Everest CTL install, and as part of that install command, you get asked which namespace do you want Everest to manage? And also not only which namespace you want Everest to manage, but also which database technologies do we want to deploy on each one of those namespace. So let’s say that you don’t care about Postgres at all, you don’t need to deploy the Postgres operator in those namespace, you can say just, okay, I just want the MySQL operator, nothing else. And this is how you manage the name spaces and the operators as well in those name spaces.

Speaker 2: 29:22 Alright, thank you Diego. There’s one more question. Where does the DB backup is stored? Can we store backups in AWS S3 and restore whenever we need it? Yes, exactly. This is how it works. You define as many S3 compatible storage location. This could be WS S3 or other S3 compatible storages. And then when you’re provisioning a cluster or creating a new backup schedule, you can actually say, Hey, I want this backup to be, or this schedule to be stored always in this S3 or this on demand backup to be stored in this specific S3. And then when you’re restoring, it’s the same situation, you can basically pick any of the backups that are for your database, pick them and recreate it.

Speaker 3: 30:06 A little bit more color on that. We have this screen to configure the backup storages, which looks like this. So when you’re trying to add a backup storage, you need to give a name to the storage, select in which namespace do you want this storage to be available. And then you need to enter your S3 compatible API details here, like the bucket name, the region, the endpoint, and the access keys and secret keys. And for further compatibility, if you’re running, for example, a self-host, self-hosted S3 compatible API, and you, for example, don’t have a valid certificate, you’re using a self sign certificate, you could uncheck the verified TLS certificate box, which will not verify the certificate for you. So it’ll work with self sign certificates or if you’re also running an S3 API, which needs the path style URL access, you can also force it here and it should work with any S3 compatible APIs.

31:03 In the future, we will be adding also other types of storages. So native access to Google cloud storage or Azure blob storage as well. But for now, only S3 compatible APIs are here. But yeah, like Peter was saying, after you have these backup storages set here on the databases, when you try to run a backup, either A on-demand one or a schedule one, you’ll be prompted to select one of those storages. So if you have multiple, you would be able to drop down this menu and select which storage do you want to store the backups in. And yeah, in this case, just that one, that S3 bucket that I had prepared.

Speaker 2: 31:45 Right. There is one follow up question. When you say namespace, is that namespace inside the Kubernetes cluster? Yes, that is correct. All the namespace inside the Kubernetes cluster somewhere later this year, we are planning to also add a support for multiple Kubernetes clusters, but this is later in the year. And there’s also one more question. What operators are actually running the databases for MySQL, monga, and Postgres? Can you install Everest into Kubernetes cluster that already has a Postgres operator running? So underneath we are running Perna operator for MySQL, Perna operator for Mongo and for Postgres. And it’s not like a super straightforward migration if you’re running an operator and want to migrate them into Everest. And I also wanted to add that again, we want Everest to be able to work with multiple different operators. It’s not like we’re limiting. Maybe something very important to mention is EST is completely open source, like all the products from Perona. Yeah, but right now, today, as of today, we’re supporting three technologies with Percona operators.

Speaker 3: 33:01 And following up on that, we are supporting those operators. And the way that we deploy them is by using a framework or a product from Red Hat, which is called OLM, the operator lifecycle manager. So this allows us to actually manage these operators not only by installing them, but also by upgrading them whenever there’s a new version. So whenever there’s a new version, this will pop up in your Everest UI and we’ll say, Hey, there’s a new version for a Postgres operator and we will do some validations to make sure that firstly all your databases are compatible with this new version of the operator. And then you can do this on a rollout with a rollout strategy. Like you first do the development namespace, you upgrade the per operator there for res, then you check if the databases are okay, everything there, and then you can move on to the next namespace. So it gives you control of when you upgrade and doing this on a phased state for each namespace.

Speaker 2: 34:15 Right. There’s one more question. NETA for helm charts to be released for Perona Everest? Yes, there is an ETA. We’re actually about a week away from our one release, which is going to happen end of July. And the next release one to two, which is going to happen end of August, we’ll include Helm charts to deploy Everest.

Speaker 3: 34:45 All right. And with this, Peter, I think that we can move on to you again.

Speaker 2: 34:51 Great. This is where I come in with my huge request. Like I mentioned, Perona, Everest is an open source product. We are very still young product, we just went GA a month ago and we are definitely looking for feedback from amazing communities like you guys. You can help us shake this product into a better solution that hopefully can benefit all of us and we need feedback a lot. So please give it a spin, try it out, let us know what you think through our forum, through our GitHub, whatever forum you want or through. I’m also available on the okay Slack channel if you want to chat with me or if you want to have a closer session to discuss some of the possibilities, we are open to do it. Alright,

Speaker 1: 35:49 Awesome. Thank you Peter. And yoga. Yeah, if anybody has more questions, we’ve already had a bunch of really great questions come through, but if you have more, feel free to have ’em with the chat. But I think people already asked you all a bunch of questions. You all had pretty good answers. Yeah, this is great. I mean, so it seems like it’s really, with Everest, you’re really lowering the barrier to entry to getting started. So it’s just getting easier to run databases on Kubernetes. First you had operators and now you have this sort of tool that I guess obvious gets it a little bit, but makes it easier to get started. But I like also that you said there’s an API option as well, right? For those who don’t want to use the gui?

Speaker 2: 36:35 Absolutely. For any automation purposes or things like this. Yeah.

Speaker 1: 36:41 Okay. We have another question from Edith. So could you please show the form where you can find more Proco Everest?

Speaker 3: 36:57 So I’m not sure. Share where we can find more about Everest.

Speaker 1: 37:03 Oh, sorry, go ahead.

Speaker 3: 37:06 So there’s Oh, the

Speaker 2: 37:07 Forum.

Speaker 3: 37:08 Oh, the forum. Okay. So let me quickly share here. Think that, oh sorry, I just opened GitHub by mistake. I should have had links there on the page. But there’s the Perna forum where in the community forum there’s a section the completely devoted to Everest. So this is an easy way also for you to ask questions. So if you come to the Perna firm, here is the category for Perna Everest. You can ask questions here, which we monitor very frequently, but also you can just go into the Perna Everest repo on GitHub and open an issue there as well, which will also be monitored by us. So either one of these ways are good ways of reaching us as well as what Peter said about the slacks.

Speaker 1: 38:15 And if people have more questions that come up or I think, I know Peter is on the DOK slack too, you can reach out to him there. Also, if you want to talk about it in the topic database channel, you can talk about there, somebody asked how to join the DOK slack and we added a link there. But also if you go to the homepage of the DOK website, there’s a link to join the Slack group. So yeah, you can definitely ask questions there, but you can talk to Peter directly, ask him all those. Good. Well I think we are done with the questions it seems like, but I think this is really interesting, really cool. And we love that it’s open source. So thank you all for presenting. Anything else you, any final notes you want to give the community?

Speaker 2: 39:13 Thank you for the opportunity. And again, please, if you want to say thank you to us at Start Everest, give us feedback, we really appreciate it.

Speaker 1: 39:23 Great. Alright, well thank you so much for joining. Really appreciate your time and thank you everybody else for coming to watch. I hope you found this useful. I think I did. So yeah, we’ll see you next time at our next town hall in August. Alright, thanks a lot. Have a nice day everybody. Bye all.

Data on Kubernetes Community resources