Lightning Talk: My Database Runs on Kubernetes. What’s Next? Data Platforms!

Jul 03, 2024 by Paul Au

There’s not much doubt that databases now run well on Kubernetes: operators have matured, storage management works, and there are lots of success stories. What do you do now? Build your own data platform to replace expensive, proprietary cloud services! Argo CD and Flux make it possible to integrate databases, data ingest, visualization, integration, and operations into a single platform that deploys from GitHub or GitLab. Our talk reviewed open source projects for data platforms configuration as well as standard design patterns for applying them in real systems.

Speakers:

Robert Hodges – CEO, Altinity

Watch the Replay

Read the Transcript

Speaker 1: 00:00 I’m just going to kick this off myself. My name’s Robert Hodges. I am a DOK Ambassador. I’m going to be doing a Lightning Talk called My Database Runs on Kubernetes. What’s next? Well, the answer is data platforms. This was originally supposed to be about a 30 minute talk. It’s down to five. So I’m just going to talk about one little teeny tiny aspect of building data platforms, but I hope you’ll find it an interesting one. So intros, keep this under 10 seconds. I’ve been writing code for 52 years. I’ve been on Kubernetes since 2018. My company does a lot of work with Kubernetes. We are service providers for click house, run a Cloud on it. We wrote the click house operator and are very, very invested in Kubernetes as a way of running analytic apps. Let’s talk about platforms. So data platforms. They are basically custom stacks that solve specific business problems.

01:02 So a few years ago, people used to talk about the modern data stack. The idea was, hey, we’ll just take a bunch of SaaS services and we’re going to weave them together, and we’re going to create data pipelines in storage and ingest, event processing, all this stuff, but it’s all going to be based on SaaS services. Well forward eight years. Kubernetes now makes it possible to just put that all in a single Kubernetes cluster, thanks to things like operators, thanks to wide availability of open source components. And so you get these stacks, which are increasingly run in the cloud, of course. So you’ve got the basic cloud infrastructure, sort of undifferentiated compute, storage and networking, and then an entire stack on top of it running in Kubernetes. And as you get to the top layers of the stack, increasingly customized to specific business problems. So what are the issues in building this?

02:02 Well, there are many, but I think there’s two that really stand out that become very interesting when you see a lot of people do this. One is there’s a huge array of tooling that’s now available to do this. And just to take a few examples, you can use Kubernetes to set up, excuse me, you can use Terraform to set up Kubernetes and cloud resources. But on the other hand, if you’re deploying services within Kubernetes itself, you might use Argo cd. There are many other tools as this picture shows, and they overlap. So that’s issue number one. Issue number two is a little less obvious. There’s different types of knowledge required to build this stack. In particular, there’s knowledge about clouds, which is specific to things like, Hey, do you understand how Im works in Amazon or in Google all the way to building applications? Do you understand how developer tools

Speaker 2: 02:58 Work to build pipelines, build containers, get them deployed and bring the application up? So these are very, very different types of knowledge and they tend not to reside in the same person. So what happens when you have technology that has this kind of complexity in this sort of tooling associated with it? Well, if we look at history, it actually has an interesting effect on social structure. This is a Norman Knight Circuit, 10 66, happily invading England. What he actually is also is a complex expensive weapons system. And these knight, those two properties led to changes in medieval society around things like land ownership, feudalism, so on and so forth, some of which are even visible today. We get the same sort of thing in Kubernetes. And the key thing that really stands out as you look across platforms that people are building is you tend to have a platform function that is to say the folks who understand and run the underlying platform, the Kubernetes and the underlying cloud resources and the apps team.

04:10 So those are the folks that actually build the app that rides on top of this and runs in Kubernetes. So this can become a, and this is surprisingly durable. I see this even in startups with two people. So Joe takes care of the cloud stuff. Susan writes the app because it’s just more efficient to split this up. So the question is, how can you make this work best and not have these teams become bottlenecks for each other or do things inefficiently because they’re repeating work? And one of the answers to this is that we have found in our work is to pull management, particularly of resources as much into Kubernetes as possible. Example, don’t use external load balancers, just have a load balancing service inside. Click it’s inside. Kubernetes another example. Pull the management of cloud resources into Kubernetes itself so people can do cloud operations without leaving Kubernetes.

05:09 How does that work? Let me give you an example. We are really big fans of EBS storage and GP three storage has the ability to dial bandwidth up and down at will. The problem is you have to step outside of Kubernetes to do it. So what we did was developed a controller that can read labels on your volume claims and automatically adjust, do the adjustments to EBS for you without you ever leaving Kubernetes. So this is an example of a shim, which allows your application team to do platform management things without having to be cloud experts or necessarily even to know how it’s done. So this is the point I wanted to make about this talk is that when you’re trying to build

Speaker 3: 05:56 These stacks, you have to allow for the fact that you have the apps team or the apps function. They have a certain set of tools that they like to work with, they have a certain amount of domain knowledge. You have the platform team that understands clouds, has a certain set of tools that they prefer. By building these shims and sort of mechanisms like what I saw, you can enable ’em to work productively together. So platforms are the next big thing for databases on Kubernetes. This is how we take the databases, which now run well and make them solve real problems at scale. So what I like to invite you to do is if you’re interested in this topic, come contact me. Let’s talk about it. This is something we want to focus on in the data on Kubernetes group. I think also for many of us as vendors or as users, this is a problem we want to solve together. Thank you very much. You can get hold of me on LinkedIn or on the DOK Slack or right here. I’m around all week. So thank you very much.

Data on Kubernetes Community resources

Check out our Meetup page to catch an upcoming event
Engage with us on Slack
Find DoK resources
Read DoK reports
Become a community sponsor

The CFP is open for DoK Day at KubeCon NA 2024 through July 14.

Lightning Talk: My Database Runs on Kubernetes. What’s Next? Data Platforms!

Watch the Replay

Read the Transcript

Data on Kubernetes Community resources