Data on Kubernetes Day Europe 2024 talks are now available for streaming!

Watch Now!

Leveraging Running Stateful Workloads on Kubernetes for the Benefit of Developers

Kubernetes comes with a lot of useful features like Volumes and StatefulSets, which make running stateful workloads simple. Interestingly, when combined with the right tools, these features can make Kubernetes very valuable for developers wanting to run massive production databases in development! This is exactly what was seen at Extendi.

The developers at Extendi deal with a large amount of data in their production Kubernetes clusters. But when developing locally, they didn’t have an easy way of replicating this data. This replication was needed because it allowed developers to test new features instantaneously without worrying if they would work as expected when pushed to production. But replicating a 100Gb+ production database for development wasn’t turning out to be an easy task!

This is where leveraging Kubernetes + remote development environments came to the rescue. Running data on Kubernetes turned out to be way faster than any of the traditional approaches because of Kubernetes’ ability to handle stateful workloads exceptionally well. And since Extendi already used Kubernetes in production – the setup process was fairly simple.

This talk will cover practical steps on how leveraging Kubernetes based development environments allowed dev teams at Extendi to run production data on Kubernetes during development using features like Volume Snapshots, having a huge positive impact on developer productivity.

Hi, everyone, I hope you’re having a great day at DoK Day. Today, I’m here to talk to you about how you can leverage stateful workloads on Kubernetes. And benefit your developers. My name is Ramiro. I am one of the founders of Okteto, and I am super excited to be here.Arsh Sharma  00:21

I’m Arsh. And I work as a development experience engineer at Okteto.


Lapo Elisacci  00:27

Hello everyone. I’m Lapo. And I’m a senior software engineer at Extendi.


Arsh Sharma  00:33

Alright, let’s get this started then. So Kubernetes Kubernetes. In today’s world, Kubernetes needs no introduction. It is the go to container orchestration system when it’s come to production. And for very good reason to it has like excellent capabilities of handling the tasks we have been struggling to do before its arrival. So Kubernetes allows applications to scale much more easily than it was possible before. And as a result, we can now reliably reach a large number of users than was possible before. So what this meant for our Data Handling Systems was that they needed to keep up because this these large number of users, were generating an even larger amount of data. So that covers about Kubernetes, and data. But that’s not what this talk is all about. So how do developers fit into this? What role do they have? To answer that? I feel like we need to think that after Kubernetes, and dealing with like, all this large amount of data became normal, the way developers develop applications that did not change. So we changed how we deployed applications. We started containerizing them, we started writing more and more micro services. But we did not reflect back on how we were developing them. Oh, what is the solution for all of this? The solution is remote dev environments, which leverage Kubernetes underneath. With these remote dev environments, when working on your micro services, instead of running the micro services you require during dev locally, you would deploy them on a Kubernetes cluster. Let’s switch to the next slide to make it clear, like what happens when you deploy these micro services to production. So once they’ve been deployed in the cluster, you replace the container running in the cluster with a development container. And this container inherits all the configuration from the original container. And since the configuration is inherited from the original container, everything continues to work as if it’s still being deployed in production, except it’s not. So and once this replacement step happens, a file synchronization service gets set up, which syncs the code you write on your local machine, with that running inside the dev container. That means any changes you make get transferred to this dev container running in the remote cluster on the cloud. So as soon as you hit save after making a change, you will be able to view it exactly like it would look in the deployed version of your application. This is something very cool. And now we’re gonna see this in action. So I’m going to hand it over to Lapo, who’s going to show how this is leveraged beautifully extend AI and take it from him.


Lapo Elisacci  03:29

So what we are looking at here is a demo environment of Okteto, which essentially has a Kubernetes cluster underneath. And here, as you can see, I have a particular stateful set application deployed with a volume attached called PGdata. And it all has been deployed inside a namespace called Dolly. So what I’m going to show you now is, I’m going to create a snapshot of the volume attached to this particular StatefulSet application. Essentially, you can imagine this to be like a Postgres application where you can get the data from production and see the database with data from production. And what I’m going to show you now is that you can essentially create a snapshot of the volume attached and make your developers have their own environment always up and ready with the data inside without the need of performing seeding operations and all those kinds of stuff. It can be done automatically by creating a snapshot of that volume , and apply a configuration and annotation to the to the deployment that says that the volume has to be to get created, starting from that, from that snapshot. So here is you can see I have these services manifest that declares a Postgres deployment, where I’ve set some environment variables and the volume. So now I’m gonna go ahead and run Okteto up, which essentially is the feature that Ash mentioned earlier that allows me to connect to the cloud Kubernetes cluster. And what I’m going to do now is essentially I’m going to create a container that will act as a proxy, essentially, and will allow it to forward the Postgres application that I have deployed to my localhost in order for me to to be able to connect to it. So essentially, now I’m gonna go ahead and open Beekeeper, which is a Postgres client. And we’re gonna go ahead and select Postgres connection, I’m going to show you that I can now connect to the Postgres instance that happens to be to belong to the cloud Kubernetes cluster, directly to my local host. Here’s how I’m connecting to configure a connection to my localhost, is going to use Postgres. The password is going to be the one that I have configured here, which is demo. And the default database is going to be Postgres. So I’m gonna go ahead and test my connection. And the connection looks good. So I’m gonna go ahead and connect. And as you can see, I already have a table here, with some data into it, just the short is a small table with some users into it. And the goal now is to create a snapshot of the volume that contains this data, and then create the same application into another namespace and find out that the data has been created starting from the snapshot that I’ve created. So here we go, I’m gonna go ahead and create the snapshot. So I’m gonna go ahead and, and use KubeCTL -n Dolly, which is the namespace I’ve selected here, where were my Postgres, belongs. And first of all, I’m gonna go ahead and see whether or not volume snapshots happen to be there, and there’s no snapshot in it. So I’m gonna go ahead and create one. And the snapshot is defined by this manifest here that I call Postgres snapshot that essentially what is what is described here is the main these production name is going to be Postgres. And it’s going to be from the persistent volume claim PGdata, which is the bloom attached to the postgres stateful application. So I’m going to go ahead and say, KubceCTL -n Dolly, apply, -f postgress-napshot.yaml. And as you can see, I have this snapshot created. So I can go ahead and describe KubeCTL -n Dolly, describe the little snapshot. And as you can see, here, I have my snapshots created. And so what’s next, essentially now, I can stop my connection to my Dolly namespace, and Postgres instance, I’m going to switch to another namespace, which is going to be lovely. A different namespace, as you can see here, the namespace level is actually is empty. And I’m going to go ahead and deploy the Postgres and it’s gonna be Okteto deploy, and deploy my Postgres to the empty namespace. It’s going to do some magic, but essentially, what I’m expecting to happen is just for the fact that I’ve set up these annotations to my volume, the volume is gonna be it’s gonna get created starting from the snapshot that I created


Ramiro Berrelleza  09:06

The deploy phase.


Lapo Elisacci  09:09

No, I did it. Yes, it’s just the fact that these proxies now get created by it’s completely fine. Perfect. As you can see, now the positives is getting created. And in a minute, once it gets ready, we’ll do the same operation I’ve done earlier. So I’m gonna go ahead and run successful up and go connect to the to the Postgres from the new namespace and find out that the data has already been filled. So I’m gonna go ahead and run Okteto up in get rid of this shell here. There you go, I’m gonna go ahead and open my Beekeeper, I’m going to close this connection, because otherwise, you might think that this is the old one. And so that I might be cheating and creating a new connection Postgres connection localhost use is going to be Postgres, it’s going to be the same connection I have set up earlier, because the environment variables, and all those kinds of types are the same, but there’s going to be a Postgres that belongs to a different namespace. And so the the password is going to be demo, and the default database is gonna be Postgres. Go ahead and test the connection. Connection looks good. And, as you can see, the users table already exists inside the new namespace and has data filled into it. So how would you imagine a fish like that you breathe, you get your data, you dump your production database, you fill a namespace volume with the data, you need, the new current snapshot. And if you have correctly set up the cluster, and the manifest, you’re gonna have the whole team being able to create Postgres instance, inside their own namespace with do they felt really feel, which is pretty cool.


Ramiro Berrelleza  11:29

And just to summarize, what you saw that is that now with with the use of Okteto remote dev environments, and volume snapshots, you can now give developers dev environments that not only look like production, they have data, which is something that you know, as we all know, data is a fundamental part of modern applications. It’s not just code. It’s also data and what can you do with it. Okteto makes that so much easier for developers. 


So just to give you a quick, this is what’s happening behind the scenes, snapshot is created in one namespace, it gets copied to the other on deployment. And developers now have access to data. And this could be, you know, a full production dump, something that sanitized, or maybe even, you know, different variations of like all schemas, migrations, there’s no limits once you have data available as part of your, your toolbox for developers, there’s all a lot of things that that can be done. 


So, you know, it’s what I want everybody to get out of this talk is data needs to be part of your dev environment, or whatever environments allow you to move around to Kubernetes to have realistic and realistic flow as you write code, adding volume snapshots. And data makes it even more realistic. Because you can also validate that your data integrates well with your code changes, because at the end of the day, applications without data are not useful to anybody. 


So I hope you found this useful. Remember, it’s all about bridging the gap. It’s all about making sure that developers have all the tools they need to write code to ship value data, security policies, configurations, Kubernetes it’s all part of the same of the same package. So as you go home, and rethink your experience, you know, please take this into consideration. Thank you so much for this. I hope you enjoy the conference. Arash, thank you so much for joining us today showing this really cool demo. And now for those of you who are in Valencia, we’ll see you at the party. Enjoy the rest of the conference.


Lapo Elisacci  13:53

Thank you. Bye bye bye