An honest discussion about the challenges of running stateful, distributed applications inside of Kubernetes. This talk features Umair Mufti, former Manager of Data Services at DreamWorks where he led the development of a Kubernetes-based, Database-as-a-Service platform. Currently building products for Portworx that incorporate all the lessons learned from managing enterprise-grade databases on Kubernetes.
The talk discusses what extensions need to be made for Kubernetes to allow proper handling of cluster topologies, leader election, membership/failure detection, sharded ingress, service discovery, operator, and more.
Umair Mufti 0:01
At the end of the day, we’ve all had a lot of content to digest. So I’m going to keep my talk pretty short, hopefully. I’m gonna try to do something a little bit differently than I’d originally planned. When I originally proposed this topic to Bart, I imagined a full hour of being able to talk through the problems that DBA is faced with running databases in Kubernetes. And proposing some solutions. But I want to turn this into a dialogue. Since I’m limited on time, I’m going to propose the problems and itemize them. It’s not going to be anything you haven’t heard before, probably throughout the day, we’ve been talking about these same problems. And then just turn it more into a discussion, as I said, and we can solve these problems together, it’s gonna be a call to action at the end. Let’s talk about how we can solve some of these problems together. We’ll leave plenty of time for a Q&A. And then whatever we don’t answer here, let’s take it into the Slack channel and keep that conversation going. So a quick introduction about myself. My name is Umair Mufti, I am the product manager of a brand new product that we just introduced two weeks ago called Portworx Data Services by Pure Storage. And if you haven’t heard, read, or seen the announcement, I highly encourage you to head over to the Portworx site. We’ve posted a bunch of blogs and light boards and demo videos, you can register for early access to PDS. The TLDR for ports data services is that it allows anyone to create a database as a service on top of Kubernetes that can be on-prem or in the cloud, it can be multi-cloud hybrid-cloud. It’s very apropos to what our DoK day is here today. I’m gonna do another shameless plug, we’re hiring very actively for the team. Feel free to find me on the slack DoK community, you can tweet at me or DM me (my email address). As I go through today’s presentation, if you see some of the problems that I mentioned, and you say, hey, I’ve solved that before or whomever I disagree with the way that you’re approaching this, it’s all valid, you should come at me with those and, we’re happy to find a spot for you on their team to solve these problems. Getting back to the matter at hand previous to my work at Portworx, I was actually at DreamWorks for almost eight years, I started life there as a DBA as a Cassandra DBA. To be specific. By the time I left, I was managing the data services team where we built a data services Database as a Service platform on top of Kubernetes. And we were running over 10 different database types on Kubernetes by the time I left. If you didn’t get a chance to listen to Ara’s discussion earlier today with Sylvain, I highly recommend you go back and listen to it. He does a great job of talking through some of the challenges and the journey that they went on to get all the databases onto Kubernetes not going to rehash it here. But I will suffice it to say that a lot of my lessons learned during that time at DreamWorks are informing this presentation today. So I kind of cut my teeth there at DreamWorks. So let’s talk about the problems right, what is so hard about running databases on Kubernetes? We’ve all probably been saying 100 times today, and we’ve been hearing it that Kubernetes wasn’t built for stateful applications, it was built for stateless applications. But what does that mean? And haven’t we solved some of these problems already? The biggest misconception I hear with the industry or even people that aren’t DBAs, to be honest, is that StatefulSets exist? Doesn’t that solve all the problems of running stateful applications or databases on Kubernetes? And the short answer is no. It’s much more complicated. And as an aside, when I was at DreamWorks, we were very early adopters of containerization. We were building databases in containers before Docker. So we were doing lxc, we went to Docker in the very early days of Kubernetes. We evaluate it to see if it would be a solution. And that was before StatefulSets existed? So it was a non-starter and was a showstopper for us. And while it’s true that when StatefulSets were released, it did change the game for us. And it was the Portworx guys that had approached us at Dreamworks at the time and said you guys should take a look at this again.
So it’s true to say that, yes, StatefulSets have enabled stateful applications, but it’s not enough and this is going to be kind of a common refrain throughout my talk today. And if there’s one takeaway from this is the one thing you need to learn from this presentation. The d part about running databases in Kubernetes is not that they’re stateful. The hard part is that most databases are distributed applications. And as we all know, distributed applications or distributed systems are some of the hardest problems in computer science, probably naming variables, and then distributed systems, probably like the top two issues. And that only gets more difficult when we try to manage these things with an orchestrator like Kubernetes right?. It’s not just that I need to have a dedicated Elasticsearch master and dedicated data nodes. It’s that Kubernetes itself needs to schedule those things. And also be aware. So StatefulSets don’t get us where we need to go. This leads us to CRDs, right. So besides StatefulSets, we have this notion in Kubernetes, of being able to extend the Kubernetes API with custom resources, which is great. And before the operator pattern emerged, people often looked at CRDs as sort of the panacea for stateful services. And the problem, again, is that it only gets it’s about halfway there. With CRDs, it’s great that I can extend the Kubernetes API and have a notion of being able to define a custom resource that I can save at CD in the back end. But, at the end of the day, if I’m not doing anything with that custom resource, if I can’t apply any business logic to it, then it’s somewhat useless, right? This brings us to operators, something that we all love dearly. And there’s been a lot of talks today about operators. And it’s true, they are game-changers. Operators have enabled this next phase of stateful services on Kubernetes. But the thing about operators is, at its core, really what the operator pattern is, it’s a design pattern, right? It’s nothing more than a CRD and a controller. And so when I think of operators, I think of things like the MVC framework (Model View Controller). And MVC framework is great for writing applications. But at the end of the day, you still have to write your applications, you still have to come up with logic and build all the code and solve those solutions. It’s just providing you with the framework. And the operator pattern is no different. We still need to develop the operators and define what those CRDs are, we have to define what the controllers are doing. And the result of that is that we’re seeing a flood of operators being released into the market. And that’s a good thing. But it’s also problematic in its way because as you know, we saw the survey results earlier today. When the problem now shifts to which operators should I use?
Which one, there are multiple Postgres operators, there are multiple Cassandra operators even, which is the right one to use, what’s the level of maturity? More so if I’m running 10 different database applications in the Kubernetes cluster, which, let’s face it, people are with the advent of microservices and cloud-native development, there’s this notion of polyglot persistence, right? Microservice A might need a relational database, but microservice B is using a graph database and C is using a key-value store. So every one of these things has its own needs. So we’re seeing many more database types, needing support in Kubernetes. So if I’m now a DBA, administering 10 different database applications in my cluster, I need at least 10 different Kubernetes operators. Not only that, but my team and I need to learn the syntax and semantics, and what are the custom resources for each one of these custom resources. So my hypothesis here is that what we need beyond operators is a higher level of abstraction, something that’s beyond an operator, something that sort of encapsulates specifically what databases need. And for all their talk of being different and graph databases doing something different than a relational database at their core, there are certain API’s or certain things that we expect of any data store, right?
We expect to be able to scale it, we expect to be able to take backups to monitor it, and several things. So what I’m going to be calling out to the community at the end of this presentation is a call to action. Let’s start discussing as a community how we can come up with some of those common standards that we can go to the database vendors and say, “Hey, Instead of coding for this common operator pattern, why don’t you code for a different level abstraction.” And then we can just have one operator that rules them all. And your code will automatically work in Kubernetes, which is at the end of the day, what database vendors want. Before I get into solutions, I want to take a deeper dive into some of the problems. And we’re gonna come at it from a different angle here, which is to say, let’s look at the specific resources that do exist in Kubernetes. And let’s see who they were designed for and who they weren’t designed for right? We all know that the crux of the issue is that Kubernetes wasn’t designed with data stateful applications in mind. And we see that reflected in the actual Kubernetes API themselves, the actual resources. And really, what it comes down to is that DBAs have different needs than application developers. And that’s really what we’re talking about. So our first example is load balancing, right? So this is provided by the native service, resource type, one of the oldest resources in Kubernetes. So if you’re an application developer, and you’ve developed a web app and you’ve got five Nginx pods up and running, it makes a lot of sense for you to just throw a load balancer in front of it, have your label selector point to each of your five Nginx pods. And, any one of those pods conservative, your read requests, right. But data services, databases aren’t necessarily that way, right? They behave differently. A lot of databases we talked about are sharded. And it might be that node two in my Cassandra cluster might contain the key that I care about. And so if I send a read request to node one, and I’m looking for the data that’s on node two, well, either node one needs to redirect me to node two, or node one just might not know about it and just kind of fall over. So the notion of just being able to balance the read requests against all my backing pods doesn’t make a lot of sense for a stateful application or distributed application. Likewise, Ingress, how many of us use an Ingress controller in front of a database? We don’t, it doesn’t make sense, right?
Any notions of, of the specification and the Ingress controller are talking about HTTP paths, hostnames paths, these are all L7 HTTP constructs. But as DBAs, we are thinking about other protocols, right? We’re thinking about SQL, CQL, Thrift. Sure, there are some things like Elasticsearch and Couchbase that have restful endpoints. But by and large, what we need to be thinking about are our different protocols. So what we need is sort of a database-aware, Ingress API. And I’ll get to that in my next slide or on solutions. Coming back to StatefulSets, I talked a lot in the last slide about how they aren’t the full solution. So let’s look at what StatefulSets do include support for. So there’s a notion of in the StatefulSet spec itself, there’s a notion of a pod template. In my pod template, I can say I want to run Nginx. This is the image name and I want to run five replicas of it. And again, that’s great. If you’re running a web application where any of the pods need to run the same image. But in distributed applications, and databases, specifically, a lot of times we have a notion of topology, or that is to say some of the pods might have a different role or a different identity than the other pods, right? And again, in ElasticSearch you might have a dedicated master and dedicated data nodes in Couchbase, you might have some pods that are servicing the data service and some that are responding to the indexing service. And so it might be that I need a StatefulSet that allows me to specify different images for different pods, I still want to think of the entire thing as one sort of atomic unit, I want to think of it as it is an application, a Cassandra application, or in this case, a Couchbase application. But I want to have different images or different pod templates for each of the roles.
And that carries over to environment variables as well. So again, in the pod template of your StatefulSet, you can specify config maps or different environments that they get attached to your pods or set the environment in your pods. And that makes sense again, for a microservice but again common refrain in a distributed application. This doesn’t make a lot of sense, Always, if I have different roles, my pods each have a different role, they might require different configurations as well. Right now there are ways around this right, you might have an intelligent entry point script in your pod that’s going to go and parse out the environment variables that you need and only apply them if some condition is met. But you’re necessarily making your image more bloated. And in any case, why should we do that? Kubernetes actually should have the tools built in to provide us those primitives to do that, instead of us having to code that logic in at the image level. Finally, again in StatefulSets, this notion of ordered scaling. If I scale up my StatefulSet, stateful pod zero is guaranteed to start first and then one and then two. And if I scale it down, the reverse order is true. By and large, that’s a good thing. And it makes a lot of sense for non-distributed applications. But in the world of databases, database administration, and distributed applications, we often want to shoot a node in the head, right? It could be that Cassandra node one has some file corruption or data corruption, and I need to decommission that node and pull it out and let the application replicate the data back across the remaining nodes. So I should be able to just scale-out node one and let node zero and node one remain. So these are just a handful of examples of the types of solutions that make a lot of sense for microservices and app developers but don’t make a lot of sense or don’t translate very well over to the world of data services or databases.
So what is it that we need? How can we solve some of these problems? So the first thing I’d like to point out is, whatever solutions we come up with, we need to redefine this notion of high availability. We always think about high availability, in terms of making sure that my pods, one of my many pods, is up and running. We think of service availability, but for data service, or a database, we need to flip that on its head, we need to be thinking about data availability, we need to be thinking about it. Again, I’m gonna keep using Cassandra as an example. In my Cassandra example, if one of my Cassandra nodes goes down, do I still have a copy of the data somewhere? Can my cluster still respond and fulfill my read request or my write requests? And it’s a different notion of availability than what’s built into, the existing Kubernetes API. And this is much more complicated than it sounds. We have things like application-level replication that happens. So the application might have a replication factor of three, or if it’s ElasticSearch, my index might have a certain replication factor. And that might inform what I need to do at my StatefulSet level. It might inform the fact that Cassandra is already replicating this data three times. Maybe I don’t need a three-node Cassandra cluster, or maybe I do. Or maybe it informs at the volume level if I’m replicating the block storage level two. There’s also a notion of convergence: my pod should be scheduled on top of my data so that there’s low latency, but at the same time, we want that anti-affinity so that no two pods in my StatefulSet are scheduled on the same node. Because if that Kubernetes node crashes, I need to be thinking about the data availability. So that’s an area where I think we need a common industry-wide solution to topologies. I hinted in my last slide about StatefulSets not having a notion of dedicated Elasticsearch masters or Couchbase nodes. And we need an extension to Kubernetes that everybody agrees to. That’s a common standard across all application types, that has this notion of topologies. Most databases have this construct. And each one of the operators that I’ve seen, is solving this problem differently. And so we need to standardize on that. Again, what I’m talking about is, I need to be able to deploy singles, I’ll call the StatefulSet, but a single application where some of the nodes are running a specific image with specific configurations and other nodes are running a different image with different configurations. Service Discovery is also a pain point and one that needs to be solved. We often think of service discovery, in terms of our clients being able to connect to if I discovered the pods or the services and be able to route their redirect requests to them. And while that’s true for distributed applications, there’s another notion of service discovery. And this relates to two pods being able to find themselves. In a distributed application, node three needs to be able to find nodes one and two. And again, there’s some nuance here, right? So when node three crashes and Kubernetes reschedule, that on a different server with a new IP, how do nodes one and two understand that, this new thing that is joining the cluster, is, the same node, it’s still the same node ID, it’s still the same pod, even though its IP may have changed. And so this notion of cluster membership is quite difficult to do in a system like Kubernetes which is trying to handle some of these same semantics for you, or the same constructs where you will be trying to reschedule things. It could be that I need to scale to five nodes and those two nodes that are joining are new nodes. Likewise, failure detection, the opposite direction where if node three crashed, or node two crashed, the other two nodes need to be aware of the node leaving in any automated monitoring, where then you might have to set up for your cluster. You also need to be aware of that thing that just stopped running that pod, either failed and can’t come back up or it was just ephemeral.
And Kubernetes is going to restart it. Sure, its IP address might change, but it’s coming back in, and it didn’t fail. It’s just restarting. So that’s service discovery. And then finally, Ingress, I touched upon this again in the previous slide around Ingress API Ingress controllers, being very specific to HTTP requests today. And what I believe we need in the industry is a notion of data-aware Ingress. And this probably needs a whole new Ingress controller API, specific ones, probably even down to database types. You can think of it like PG pool or PG bouncer for Postgres, where it does load balancing, connection pooling, it knows that if my SQL insert statement is coming in, that it gets routed to my primary, if it’s a read request, it can go to any of the secondaries. So I’m talking about read requests and write requests that are intelligent enough to be routed to the proper backing node on a request by request basis. And that doesn’t exist today. So I think these are some of the higher-level problems that we need to solve together as a community. And so that’s where I’m going to leave it. I’m not going to propose any solutions here. Certainly, these are the types of problems that we’re thinking about at Portworx, specifically in the Portworx Data Services team. It’s these kinds of problems that our team and others had to tackle at DreamWorks. So there’s a lot of collective knowledge in this space. And we’ve been thinking about these problems for a long time. I know a lot of you have been as well. And certainly just the breadth and expertise of the presenters on DoK day today is kind of testament to that. So, what I’m asking for is let’s get together, let’s work as a community, let’s partner together, let’s create some standards that we can go to database vendors and say, Hey, data stacks. Don’t code for this common operator pattern anymore code for this higher-level abstraction. And by doing that, you’ll automatically get all these benefits of ingress and load balancing and all the other greatness that like a truly distributed application, a true stateful distributed application in Kubernetes can benefit from, and I think that’s how we solve some of these problems. So with that, I’m going to just open up a Q&A. And then like I said, we’ll start the discussion here. I’d love to hear your thoughts and feedback, and whatever we don’t answer today again, we’ll pull it into the Slack channel and keep that discussion going forward. So thank you.
Bart Farrell 24:16
Because it’s interesting, looking at how we’ve gone from the last kubeCon that we did in May being in this one several months later. And the kubeCon that we did in May where Patrick McFadden gave a talk about “From DBA to SRE.” So with this whole notion centering on the DBA, the role and we talked about this as well, previously in some of the panels were like okay, we have shifting technologies, but also sort of shifting responsibilities. And you have this background, what do you think are some of the barriers and how they can be overcome when it comes to the mentality challenges and the cultural challenges that come along with starting to run stateful workloads on Kubernetes?
Umair Mufti 24:58
It’s a very loaded question. The quick answer is that when I think of DBAs, and I think of Kubernetes, I think of a Venn diagram and I think of DBAs being one circle. My Venn diagram and SREs are the other circles, and the overlap is very minimal at best. Frankly, people who understand database administration might not understand how to build a Kubernetes operator. Most won’t, vice versa as well. Right? So somebody skilled in writing a Kubernetes operator thing about go code and thinking about new Texas probably doesn’t know much about Cassandra replication. And so, in practice, I think finding that sort of unicorn has been difficult. Certainly was for us at DreamWorks and that’s kind of the problem that PDS is attempting to solve. So we’re trying to go after that problem, we think that it’s a non-trivial problem. we’ve crossed the chasm, and I get some of the sentiment and I understand that, for sure. Kubernetes is ready for stateful workloads. But the administrators are the people that are maintaining these systems, there is a dedicated Kubernetes admin, dedicated platform engineer, and admin of a database is still a separate role.
Bart Farrell 26:34
With that in mind, too, because this is a recurring conversation that we’ve had in our community, based on that. One part is the empathy side, as you mentioned, I think that’s a very strong part of that we can, you can never forget the human element that needs to be incorporated in all these teams, and we’re talking about the stakeholders that are actively going to be working on crossing the chasm. From a technical perspective, however, something that we’ve talked about on various occasions, with some other companies, is how one quantifies these skill sets? And for other aspects of Kubernetes? They have a CKA CKAD. A recurring question that we asked a fair amount of our guests is, how could you structure the knowledge that’s necessary for such a way? But more so on the side of certification so that for organizations such as yourselves as to where you’re working, is that when you’re looking for folks to say, Okay, how can you show me that you understand this? What kind of stuff are you going to be looking for in the folks that are going to be hiring?
Umair Mufti 27:28
Great question mentioning the book all again, another plug for Portworx, is the sponsor of that book. So we are distributing some of the early chapters that Patrick’s already released. And so I don’t mean to plug Portworx. But what I mean to say is that this is an area that we’re putting all of the efforts and resources not to support where it’s but Pure Storage behind, and we think these are worthwhile problems to solve. What we’re, what we think we need to find. Again, there are a lot of those unicorns and people that do understand cloud-native development, Kubernetes development, but then also things like very specific domain expertise in Cassandra, in Couchbase, in Elasticsearch, and those are the people that work.
Bart Farrell 28:22
And that’s precisely one of the tricky things that folks have mentioned when we talked about certification, because it’s like, you have to do database by database if you can’t have this umbrella overarching because of the intricacies of the ins and outs of each kind of database.
Umair Mufti 28:34
We have a notion of like a certified Cassandra admin, right. And I think what you’ll see is at that level, sort of certification for Cassandra on Kubernetes like a bonus. It’s like that makes sense. I know how to run this, but also know how to run this in Kubernetes.
Bart Farrell 28:54
It’s a good point. Taking a question from the audience. I mentioned that, asked in slack too but posting here for more visibility. I enjoyed availability, which means data availability, not serviceable availability, any thoughts on SLO’s and SLIs for that type of mindset, particularly as we’re talking about DB8’s, we’re thinking about slow slides.
Umair Mufti 29:11
Great question. Yes, I think those are all problems worth solving. I kind of hinted about it when I talked about needing to think about both convergence and anti-affinity. I think those lead into your SLO’s as an organization. Certainly, that is an important topic. I don’t have any answers around it. But I do think that it’s an important topic to consider when designing any sort of solution.
Bart Farrell 29:43
I think once again when we talk about DBA, SRE, DBRE, (database reliability), and the mindset of, SLOs, SLA’s, and SLI is also a big part of my vocabulary. We’re talking about ownership. We’re talking once again about the stakeholders who were involved here. And also customer-facing service, you know that we had a great conversation with someone from Google that’s on their CRE team and saying, “you’re an engineer, but you’re thinking about customer happiness”, what kind of metrics are going to provide that? What kind of transparency from the get-go, they are technical questions, but they once again have a very strong business component, service component, and customer-centric component.
Umair Mufti 30:22
I bring it back to the customer. I’ll point out when we were at DreamWorks, and again, the way we’re approaching PDS is that we think of our application developers as our customers, right? So we were building a platform where an application developer could come in and run any database on-demand, right, whether it’s Cassandra, Elasticsearch, or whatever. And so it was then on the DBA team to keep that thing up and running and think about providing an SLO or SLA to the application developer. So it was very much top of mind for us, providing availability. And that’s why we would come up with best practices around how many nodes our stateful set would need for Cassandra, where we provide replication at the volume level at the application level, even at the block storage level. And making sure that data was always available or the database itself was always available to our application developers.
Bart Farrell 31:15
Very good, but I think it’s going back to your comment on the previous question about, Crossing the Chasm, we’re very much celebrating this, perhaps it’s happening. But then once again, our job as a community is to get some visibility on that. Let’s get some light on that. Let’s find folks and we did this today. We had folks in Singapore, in Australia, in different parts of Europe, obviously, plenty of folks joining us today from the US you’re here in person. So it’s refreshing to hear so many different voices about these experiences, seeing the different technologies that are going on. And it’s great to see your organization, your company’s invested well into this saying, this is what’s going to be happening. We’re in it right now, kind of Vanguard situation, but we aren’t. And once again, the survey that the research report showed is brought to light that like, this isn’t just a crazy idea, we see this happening.
Umair Mufti 32:05
Can I interject? I want to call you out specifically for building this community. I mean, the growth in the DoK community, from a year ago to now, is unbelievable. So props. And then, again, plug-in Portworx, we saw that, and we signed on as a platinum sponsor. So we, again, are putting all of the weight of our organization behind this community. And like you said, evangelizing that and solving some of these higher-level problems, so that the next generation benefits from our lessons learned, I think, is paying it forward.
Bart Farrell 32:43
I think it’s great, and that we see as momentum growing from Kubecon to Kubecon, when we did it in May, it was kind of like our first step out, now really consolidate and focusing on when we say data on Kubernetes, what do we mean getting strict with that in terms of our content, and you’ll see that in the following live streams that we have planned is that doubling down on the stuff that you have seen the research report from the very first talk that we had today with Melissa, as well as the stuff that we’re commenting on about how organizations are backing this now in a very serious way. So like I said, we’ve heard it here, in October in LA, the next Kubecon, which will be in May of 2022 in Valencia, in Spain. It’s gonna be very interesting to see how much further we’re going to be driving this, how much further down the field, we’re going to be running with the ball and the difference like I said, waking people up to this idea, over 500 organizations were interviewed. And so the results kind of speak for themselves in that sense. So we’re going to be extending those points, taking deeper dives on some of the talks that we had today, we had a super great lineup of speakers. Obviously, for reasons of time having so many talks, we had to cut some things a little bit short. These are all things we could easily expand out into live streams, as well as panels so you can expect more content based on the stuff that we covered today.
Umair Mufti 34:00
Come find me, I’m on slack, easy to find, tweet me, DM me, whatever. Just look for a memo to find me. As I said, it doesn’t always have to be an agreement with any of this stuff that I said. If you want to challenge any of my assumptions, I’m happy to hear your voice so contact me.
Bart Farrell 34:19
Alright. Perfect.