Attend Data on Kubernetes Day at KubeCon North American on Nov 12th

Register Now!!

Migrating MongoDB to Kubernetes

Kubernetes needs help to create and manage stateful applications like databases. It needs a specific network configuration, persistent storage, and dedicated computing capacity. This applies to MongoDB.

This talk goes over an overview of Kubernetes operators for MongoDB, why operators are needed to run MongoDB on Kubernetes. Sergey also explains how to migrate MongoDB to Kubernetes ending with a demo.

This talk was given by Percona, Product Manager Sergey Pronin as part of DoK Day at KubeCon NA 2021, watch it below. You can access the other talks here.

0:00  

Bart Farrell:  Sergey, Welcome and very nice to have you. Today you’re gonna be telling us about migrating MongoDB to Kubernetes. What’s up Sergey? How are you doing?

 

0:27  

Sergey Pronin: I’m doing good, thank you for having me today.

My name is Sergey. I work at Percona as a product manager. And my main focus is on Kubernetes, on our operators on cloud technologies and everything which is related to that. And today, I’m going to briefly take you to the journey of migrating MongoDB to Kubernetes. And, first of all, this is the talk plan. We’re going to talk about why someone wants to run MongoDB on Kubernetes. I’ll talk a bit about the operator that we have at Percona and some other options. We’re going to review the migration options, I’m going to show you how the migration through the replication can be done, it is going to be an overview. And I’m going to show you some demo quickly just to showcase that what I’m talking about is real. Right? So it’s not something that I’m imagining. So Mongo and Kubernetes is the thing And the short answer is yes. At the corner, we have lots of customers and lots of community users using our operator to run MongoDB on Kubernetes. And they run it in production for various use cases. And the MongoDB on Kubernetes is exploding day by day and I can see growth, which is an interesting case, though and if we talk about the whys, the first thing that we usually hear when some big enterprise or a company comes to us they want to avoid the vendor lock-in and usually talk about Atlas or Ops Manager and the main driver is usually the cost. Well, not that obvious, but it is. And another item there is that there are not so many managed services. What I mean by that is, if you look at the MongoDB space, you can see it’s document to be on Amazon, but it is not the real Mongo it is Mongo, like a database, right? There is Digitalocean and some other vendors which provide MongoDB. And it is all because of MongoDB nonfriendly SSPL license, it’s not so easy to provide the managed service. And when people start to think, Okay, how can I run Mongo? If not Atlas, if not on a manage service? What are the options? I still would love to have the automation, I still would love to have day-2 operations, which are super important scaling, taking backups, upgrades and everything else. And I want to avoid vendor lock-in. I don’t want to get stuck with one cloud. I want to migrate from Amazon to Google and to my private cloud anytime I want. And the answer here is the operators and there are a few of them. One is Percona. I’m going to talk about it a bit later on. Another one that I really like is KubeDB. 

If you are participating in the DoK Days and you should know about KubeDB it’s a Swiss Army knife operator to run databases on Kubernetes. It can run Postgres, MySQL MongoDB, the only thing that might stop you running KubeDB is that it follows an open core model. So some of the features are not available for free like you cannot take backups for example there is a showstopper for production use cases obviously. And another one there is also a MongoDB community operator but it doesn’t have backups as well. It’s not very suitable for production I would say. And another item here on the why, is MongoDB runs good on Kubernetes and  I’m saying this after running Postgres and MySQL and Kubernetes. And it is not that hard to run MongoDB. And I believe it’s mostly because of its design and the ability to scale easily and it’s more aligned with Kubernetes primitives and Kubernetes ideas of deploying the containers, the parts, and this volatile environment that Kubernetes brings in. And well, obviously running Postgres and MySQL is also possible, but I find that running MongoDB on Kubernetes is much, much easier. So if we talk about the Percona operator briefly, first of all, it’s obviously 100%, open source as everything at Percona. You have shutting out of the box working so you can deploy your MongoDB cluster, and scale it out scale in in any way possible, you can add more compute resources, more storage, and everything else, which is awesome. We support backup and restore with point-in-time recovery. So it will reduce your restoration time to the point you would love. We do that by uploading our blogs to remote storage. So it’s more in line with the Kubernetes way. We don’t store backups locally, and you can easily migrate from one Kubernetes to another. We have integration with Percona Monitoring and Management, it is our tool to monitor and manage the databases, and it works out of the box. What I like about that is you deploy the database, and you have all the metrics and alerts right there where you need them. 

For customizations, we have custom sidecar containers, which means you can extend your MongoDB cluster with any monitoring tool of your choice, you can deploy benchmarks in the sidecar container, and many more. So customization works just out of the box nicely. We have automatic upgrades, and the last one is multi region deployments. This is one of the features that I’m going to highlight in this talk. It’s the new one, which we delivered just recently, and it is focused on solving two problems mostly. First is disaster recovery. So if you have two Kubernetes clusters, and you want one MongoDB (cluster) in the US and another MongoDB cluster in Europe, for example, you can set up a replication, a single ReplicaSet across two clusters. Another use case for multi region deployment is obviously the migration capability that it also provides, so you can migrate into the Kubernetes and out of Kubernetes easily. I’m going to tell you how it can be done.

So the migration options that we have are not only for MongoDB, but almost for any database. But we’re going to focus on MongoDB. So the first migration option on the left is through backup and restore. So you take the backup of some ReplicaSet, which is running somewhere on-prem, you put it into some S3 Bucket or some other storage. And you deploy the operator on Kubernetes with the ReplicaSet as well and you do the restoration to this ReplicaSet. The only downside of such an approach is the huge recovery time, because the bigger database you have, if you have multiple terabytes or petabytes of data, restoring it is going to take some time and you’re going to breach all possible SLAs which you obviously don’t want. 

But it is the most simple way possible to migrate the database from point A to point B, including for MongoDB. And another way for MongoDB specifically, is to add remote nodes which are running somewhere else to your on premise ReplicaSet. So it’s the picture on the right you have a ReplicaSet running somewhere on the Linux box or I don’t know, on Atlas, on Amazon and you want to move it to Kubernetes. What you do is you deploy the operator, you deploy the ReplicaSet nodes, and you just add these nodes to the on prem ReplicaSet, you synchronize the data, and then you can switch the application to the MongoDB that you have in Kubernetes. And that is what I’m going to highlight today and talk about in more detail. 

Well, I have one more slide and the demo. So I’m going to be quick. So this is going to be the setup that we’re going to have right, on the left, we have an on prem MongoDB, it is anywhere again, it can be a datacenter, it can be bare metal, it can be cloud, it doesn’t matter. And you have a desire to move it to Kubernetes. So number two, if you have a Kubernetes cluster with the operator deployed. But the key here is that the operator is going to deploy these nodes in a managed mode, which means they are not going to form a ReplicaSet itself. And you can only add these nodes into a ReplicaSet which is running on prem, I am going to show it in the demo for simplicity. 

But this is the key item that these nodes are running, unmanaged. So no ReplicaSet formed on Kubernetes. For now, right? This is the key. And the third thing here is that we’re going to need to expose each and every ReplicaSet pod or each and every ReplicaSet node to either the outside world or some private network where on prem Mongo can reach it. And the reason is, each node and ReplicaSet should have access to each other. So it’s like a full mash deployment of nodes. And the reason again, here is the need to form a quorum if one node fails, and not ones becoming the primary, and so on, so exposing the notice is a key here as well. And step number four, we’re going to add this nodes to the on prem ReplicaSet. And this will just kick start the synchronization process. Once the nodes on Kubernetes cluster are synchronized, you can switch the application to use them and then just slowly kill the on prem MongoDB cluster. 

And it is important to note that the same process works the other way around. So if you want to move out from Kubernetes to on prem, for some reason, you can do it through the operator as well. The beauty here is that once you finalize this migration from on prem to  Kubernetes, you have a fully managed cluster on Kubernetes, the operator takes care of all day to operations, so backups, scaling and everything. So you can finally forget about the on prem pain. And when I say pain, when I was preparing for this talk, I was just spinning up a MongoDB cluster on DigitalOcean, I think or in some Linux boxes. And it took me a lot of time because I forgot how hard it is to spin up something manually without the operator with the operator, things are much, much easier. And you just forget how hard it is to configure everything with your own hands or with some scripts. 

So the demo, let’s get down to it. It’s good. The quick one, let me share my screen. I think it’s this one. So here, I have a ReplicaSet. It’s just one node, it’s some Linux box running somewhere. So single node ReplicaSet. And on the right side, I have a Kubernetes cluster. Well, I’m going to show you here the custom resource. The custom resource is the main manifest to deploy the database through the operator. So the operator is just a piece of code, which takes as an input the custom resource, well it proves the Kubernetes API and checks for specific custom resource and Kubernetes and if it sees it and does something in this case, when it sees this custom resource YAML file applied to Kubernetes. 

It says okay, I need to provision an unmanaged MongoDB cluster means nodes are not forming the ReplicaSet, no certificates are created, nothing, right? And it says, Okay, I need to provision three nodes of a ReplicaSet, but it ignores the ReplicaSet name. And it also sees that, Okay, I need to expose these nodes through a load balancer. I’m doing a load balancer, I do not recommend you to do the load balancer because it might be pricy. And also, I do it through the public internet, which is obviously not recommended. There are other ways to do it, like through some VPS tools like Submariner, and so on. But for the sake of the demo, I’m going to keep it simple. I already have this cluster running. So in here, you can see I have an operator and deployed and three pods. They are called ReplicaSet 0012. But in reality, it’s not a ReplicaSet, it’s standalone pods for now. But we’re going to turn them into the ReplicaSet later. And here, I already have a service. And you can see I have a load balancer for each node. And they have a public IP, I also created some domain names, just to simplify the deployment and the configuration so that I don’t know I don’t need to specify the IP addresses anywhere. 

So let’s now deploy, add these nodes into the ReplicaSet on prem. So this is on prem, this is my single node ReplicaSet, I want to migrate it to Kubernetes. Now I’m going to add these nodes into this ReplicaSet that I have on Kubernetes. So one, two, and three. Once I do that, what should happen is the replication to this node should start. And as you can see they are already added into the ReplicaSet, and they are marked as secondary. So here, I have ID one is my on prem ReplicaSet node, and ID two, and go for further ID two, three, and so on my new nodes on Kubernetes, and they’re already marked as secondary. I don’t have much data on my primary. So the migration is going to be super fast. I think it’s all really finalized but let’s see. Yeah, I think it’s already done. I will wait a bit. So okay, yeah. Let’s see. Okay, yep, migration is finalized. Now I’m going to reconfigure the ReplicaSet. I’m going to make one of the nodes on Kubernetes. side, as primary. Obviously, before doing that you need to ensure that your application knows about these nodes. I’m not going to focus on the application part here, I am going to focus on databases. But be careful and make sure that your application knows where your new primary is, and so on.

And yeah, I’m setting the high priority, and I’m allowing this node to vote on Kubernetes. And then reconfiguring the ReplicaSet, what should happen now is the nodes in this ReplicaSet, they are going to become primary. So right now it’s still the one on prem with this IP address, it is still primary, but after I execute the command, I can see that my node on Kubernetes is primary now, which means that I’m ready to switch my application there. But before doing that, what I’m going to do is I’m going to promote my Kubernetes cluster to become managed. And it’s really super easy. I’m going just to change this unmanaged from true to false. And that’s it. Now, I’m going to apply this YAML file to my Kubernetes cluster. And the operator now manages this ReplicaSet completely. So whenever I want to take a backup whenever I want to scale this ReplicaSet, I can now do it for the Kubernetes. I can now forget about this on prem node and just remove it from the ReplicaSet and That’s it. My data is now in Kubernetes and my data is now in my cluster and MongoDB cluster is now managed by the operator, which is kind of awesome. And this is a short talk and a quick demo. And I also have a blog post where I describe in detail what you should do to deform such migration, and what are the key benefits of it.

 But believe me, running MongoDB on Kubernetes is a much easier journey than running it manually. So that’s it for me today. I really appreciate your time. Thank you very much for coming and listening. If you have any questions, please ping me on DoK Slack or through email through Twitter. Thank you, everyone.