From Enemy to Evangelist of Stateful Workloads on Kubernetes

Feb 01, 2022 by melissa

Kubernetes was originally designed to run stateless workloads. Today, it is increasingly used to run databases and other stateful workloads.

Rick Vasquez used to be a huge opponent of putting any data in Kubernetes, now he is one of the most vocal proponents of doing so. Learn why he changed his mind.

Bart Farrell 00:00

You’ve been with us for quite a while. How are you?

Bart Farrell 01:15

If anybody in the data on the Kubernetes community is OG, it is Rick Vasquez. Rockin’ those cool shades, killing it down in Texas. Rick, you are no stranger in our community, I hope you never are. But anyway, it’s good to have you with us. Today, we’re going to talk about a lot of different stuff, but in case somebody doesn’t know you, who are you, Rick.

Rick Vasquez 01:40

I’m Rick Vasquez, and I am a technologist. I’ve been a data nerd pretty much my entire life. So whenever people ask me how I got into computers, it’s a really funny story, because it has to do with gaming. So the first computer I ever built was so that I could have a better experience playing Counter-Strike. This is like, OG Counter-Strike none of this new CS GO stuff. And starting to understand things about networking, about how operating systems work, about how graphics cards all work, was all just kind of in passion of getting better FPS and a lower-paying. And then that transformed over time. As I matured into a young adult, I was like hey turn this into a career, something that I’m really, truly passionate about? How can I make that something that I’m able to do every day, and enjoy doing it? And I ended up, out of school, starting a pretty cool software company with some guys that were focused on data. So we collected data from all kinds of disparate sources all over the internet, we had to normalize it, shove it into a database, and then make it as fast as possible for people to retrieve that data, not just through search, but doing some analytics and some summary stuff. And that’s really where I was introduced to databases. And so I was thrown into the fire, MySQL, a super early version of MySQL. And then, from there, we somehow decided that the NDB cluster was going to be like, the next thing that differentiated us from everybody, not just from an availability perspective, but a performance perspective, because everything was held in memory. And so, I was a super young buck, deploying the NDB cluster by hand scripting everything. And so I know a thing or two about some distributed SQL. Back in the day, whenever it was like a shared-nothing architecture that was super hard to develop on. Anyhow, I moved on from that company and worked in a bunch of other little startups, some that did well, some that didn’t do well, and then ended up at a company that did proprietary databases for geospatial. And so it was a geospatial analytics platform. And what we did there is, we had a rendering engine next to the query engine in the same application space. So it was a shared memory pool. And that really kind of cuts the rendering time for things like drawing district maps or showing on CNN, how they have those maps where the elections are happening, and you can see, the counties are down to the zip code level. So all of that data needs to get rendered fast. And we had a cool, clever way of doing that. And so work there, help them build that product and get it to market. And from there, I started pursuing my passion for open source at a company called Percona. Some of you may know who Percona is, but a pretty big leader in the open-source community, especially when it comes to keeping open sources open. So that they are striving to take some of these databases that are the building blocks of the internet are the building blocks of the today that we know. And making sure that there’s always going to be a fully open-source kind of thing where you don’t even have to pay for enterprise features. And so, that cool ignited my passion for open source there in a way that I didn’t have before. And fast forward a little bit longer. After five years of Percona, it made sense for me to kind of jump into the hardware game. And so now I’m focused at Western Digital on some kind of strategic and emerging programs within the flash business, and trying to bring the world the vertical integration they deserve down to the storage device. So a lot of exciting stuff. But that’s it in a nutshell, who I am and how I got to be here.

Bart Farrell 06:23

I would like to know two things. Do you still play Counter-Strike, the OG Counter-Strike?

Rick Vasquez 06:28

Oh no, I don’t. I don’t even know if you could find servers to play it. I don’t have any time anymore to play any games? It’s kind of sad.

Bart Farrell 06:36

I think it is because my go-to game back in the day was Command and Conquer Red Alert. And about every five years, I go through a binge phase where I find some way to get on a computer nowadays, which has its kind of complications, but with enough tutorials and READMEs, you can kind of get there. But it’s kind of funny how games a lot of times sparked that interest like, “oh, no, I’m going to kind of see what’s under the hood I guess.” The second question is what you mentioned because we’re gonna be talking a lot today about before and after, the differences that have kind of marked your career, and where you’ve seen things. What has been the sort of before and after effect when it comes to open source?

Rick Vasquez 07:14

So for me before I thought open source was a means to an end? You just pick it up, you don’t have to pay anybody to use it right? You’ve got some libraries, you can slap some stuff together, it helps kind of increase the speed of development. But you didn’t owe anything to anybody, right? Because it’s just random stuff you found on the internet, from a general community perspective, it’s because people were passionate about it. And really, after having spent so much time at a company that is purely dedicated to open source and the ideology of open being the right way to do things. One of my biggest shifts has been that proprietary is now the laggard, I think before you could say that it was the conventional deployment model, it had the lion’s share of revenue, people were focused on it, because that was the way that you deploy things. And this open-source thing was like, what people do that didn’t have enough money to buy Oracle or didn’t have enough money to buy SQL Server. And I was like one of those guys that didn’t have enough money to buy those things. So it’s like a necessary evil so that you could still operate and still get what you needed to get done. But really, what ended up happening is, over time, the number of people who’ve opted into leveraging open source technology has kind of overtaken the number of people that are willing to part with the amount of money upfront that it requires for that proprietary solution, then you start to see some of the innovation, stagnate a little bit with some of these companies, because they’re focused mainly on only what their customers are asking for, instead of, what can we build that depth fundamentally changes the way that we can do things. And that’s something that an open ecosystem, not just open source from a software perspective, but open from an ecosystem perspective provides because you can have some of the best and brightest from many different companies that are trying to solve the same issue. And instead of everybody solving it in their nuanced way, you can have one way that gets solved, that gets adopted kind of industry-wide, and then that pie just gets bigger for everyone. And I think that’s one of the biggest things from a successful open-source deployment that you can see is that it fundamentally makes the pie bigger for everyone and whatever slice of that pie that your company happens to have as a steward of that open source. Unity should also grow with the overall pie size, right? And even if it doesn’t, and you know you’re shrinking, that the growth of these open source ecosystems is, is at this point explosive, and Kubernetes is one of the most explosive and at the forefront of that, we’ll get into that later on in this talk. But from an open perspective, we’re seeing this trend, not just in software, I think software has done a great job representing that, but we’re seeing it in the hardware world, right? OCP is a huge movement. We’re seeing it everywhere, where people are wanting to start to develop out in the open, because of the code development that you can claim, because of the edge cases that you see that you wouldn’t otherwise see in your customer base. And then really, it’s all about the end user’s experience, most likely going to be fundamentally better because it’s open than it is because you’re just after it for a commercial game.

Bart Farrell 11:03

Related to that, you did mention this word a few times. What’s your favorite kind of pie?

Rick Vasquez 11:10

Ah, it’s hard because it’s seasonal. But it’s a universal, universal favorite pie key line 500%. Okay, like, easy. But once you start to introduce seasonality into it, I’m a sucker for an incredible pecan pie. It’s a little bit strange. It’s starting to cool off. The leaves are turning colors, we’re about to stuff our faces into the turkey. And there’s nothing better than a good Texas pecan pie to go along with that. For those that don’t know, I reside in Austin, Texas, which is snowy most of the time. But today, it’s also a nice day out there.

Bart Farrell 11:55

Okay, good. But now that I think that, you mentioned touching on ecosystems such as Kubernetes. One, which is just open, unleashed a ton of activity, and others shouldn’t be forgotten? Can we touch a little bit more on your background to understand the mentality of the folks you’ve worked with within the database world? Starting from the beginning, when do you get in there? How would you say this sort of mindset, with dealing with problems, the positioning inside an organization? How database folks are, obviously there are lots of people out there. So we don’t want to profile with too broad strokes. But in terms of the kind of problems, they’re tackling the way that they’re positioned inside organizations, how has that been in your perspective?

Rick Vasquez 12:44

So it’s a really good question. And I’ve thought a lot about this. In general, like, what is so important, or what’s, the key characteristic about databases? And really, the one answer that I can come up with is, it’s the high-performance application? It is the lifeblood of many organizations. So if your database is slow, it doesn’t matter how great your application code is, that sits on top of it, your apps are gonna be slow. And so the most I’ve ever seen is a focus on performance, or min-maxing, right, and starting to optimize, you start to see a lot of people get into that niche. Whenever you talk about databases. It’s a lot of people who are obsessed with performance. It’s a lot of people who, maybe we’re laggards on even adopting virtualization, because bare metal was just that five 10% performance on a query sometimes makes a big difference, right? And so, early virtualization, you were taking a little bit of a hit, and people were directly attributing that to virtualization of different kinds. And so what you see is this combination of being obsessed with maximizing performance, and also obsessed with minimizing anything that could be a hindrance to that. And so that also probably means new deployment methodologies. You don’t see a lot of people on the bleeding edge latest release, which is kind of a strange thing to think about. Whenever you think about, oh hey, there’s a new version of a database. It’s like, you should always want to be on the latest version, right? And when you talk to people in the database community, they’re like, yeah, no, we let some like of those new-fangled guys like they go after the latest version, we were gonna wait 6 to 12 weeks and see how it works out and, or maybe never adopt on on the first GA release you You adopt on the next release?

Bart Farrell 15:03

I heard that so many times I’ve heard with Cassandra 4.0 coming out like, “oh, no, no, no”. Well, even though we waited years for that to happen, let’s wait till 4.1 comes out. Like that’s what we got.

Rick Vasquez 15:14

We’re not the people that we’re not the guinea pigs for 4.0, and it’s like, I don’t know what you guys think. But the faster you adopt 4.0, the faster 4.1 comes, right? They don’t release 4.0 knowing that everything is wrong with it, right? And so they’re not going to find these edge cases that are your edge cases until you start using them. So it’s kind of this interesting dichotomy from what’s in the DNA of somebody who is a database engineer, a database administrator, it’s this balancing act between how do I maintain and achieve the maximum level of performance, and then balance that with either operational distraction or doing what I know works, instead of trying something new. And so it’s, it’s a really interesting kind of oil and water mixture. But if you can, if you can figure that out. within your organization, it unlocks a new level of performance that wasn’t previously possible, because more people and more applications within your company can get access to that much-needed data. Right, which is really what makes a lot of companies valuable.

Bart Farrell 16:35

With that in mind, thinking about mentality and how problem-solving works, somebody told me recently, who has a lot of experience in the database world, which is why I’m curious, your opinion, that 80% of data projects fail. Oh, that sounds like a lot. Yeah. I mean, I’d have to dig deeper to see where this stat came from. But it sounds like a way I guess, how do you define a data project? What’s the definition of that? But would you agree to a certain extent that it’s folks that are often dealing with things not working?

Rick Vasquez 17:08

I think that’s a huge symptom of garbage in garbage out, right? Databases don’t make your data better. And a lot of people don’t understand that they kind of want this silver bullet, especially with some of the streaming stuff that’s come about now. It’s like, oh, now we’ve got like, all the data, right? Everything, data, and it’s like, yeah, but if you’ve got a poop chute, right, and the only thing coming down the poop chute is poop. You’re not gonna get some gold nuggets unless you know somebody who has a strange diet. Yeah, that’s just how it works. Databases are no different. So if you are storing or persisting data, right, and you’re relying on a database management system to do that for you, sure it will, as long as you organize it, and do whatever, that’s great, right? You’re telling the database, how do you want this data to be persisted? If when you retrieve that data, it’s absolute nonsense, what does it matter? And I feel like that is more the issue with some of these data-related projects than the actual platform that’s leveraged behind the scenes. Fair enough?

Bart Farrell 18:29

I’m fair. Now, I guess I’m looking at today’s subject obviously, “from enemy to evangelist”, hearing about how you first encountered Kubernetes and underpinning at that point regarding running databases on Kubernetes differ workloads and how that changed it. Where did you first hear about Kubernetes? Who told you about it? What was your initial reaction?

Rick Vasquez 18:52

Oh, man, I think we have to start even before that, right?

Bart Farrell 18:56

Let’s do this.

Rick Vasquez 18:57

The story starts, with containers and right, if you want to get super, super back in the past, it’s VMs. But containers, right. So the first time I had heard about containers was when I was working at the second startup that I had, we were starting to leverage Nodejs to build quickly, we had a microservices architecture. And so just the quick API’s that we needed to do and scale-up in some way, we started looking at containers to say, hey how can we just get this to be a, an immutable deployment style where we can have, what we now know, is a CI/CD pipeline. And what we were doing was test-driven development. And then, every commit, we’d have a build to get popped out. And instead of that, we were saying, well, let’s just pop out a container out the back end. And so, we started doing that and that’s great from being able to leverage it. And so this is Docker-like, way before people today are looking at Docker, right? We’re way past the post-world split of enterprise, Docker, and Community Edition. This was like back when it was just like Docker. Right? There was no enterprise, there were no community enterprise editions, it was just like containers were like this new thing for immutable stuff. And it just made, it just made deployment super consistent. Right. And so that was cool. At the same time, and I was looking at these containers from the application side, especially from a backend perspective, I was like, well, we’re just hooking that up to a database, right. And at that point, I was running a pretty substantial Postgres deployment. And looking at that, it’s like, why would you ever want to containerize? Why would you ever stick this in a container? Number one, do you need control? Like, the whole point of this is that it’s mutable, right? Like everything you’re doing to the data, is changing the data. So everything in that container is now not disposable. You have to like, really, really know what’s going on inside of this container. And so eventually we ended up getting this idea that you could have these mount points that persisted beyond the lifecycle of a container and that, started to be like maybe you can start looking at that. And by that time, I was already at Percona. And I gave a talk at Percona live in 2017 on bare metal VMs, or containers, like, what should you do from a multi-faceted database deployment? So if you’re in a hybrid, MongoDB, MySQL, Postgres you’re running all three? Would it be easier to run it in containers? And the answer is just straight up no, right? Because it opened up this whole can of worms. And then on top of that, you had the performance hit that I was talking about. That was a big deal. Back in the days, just containers were nowhere near what they are today, as far as transparency and performance. And so I think that’s step zero, is why would I ever containerize my database, and even when there were some mechanisms for me to be able to like to make sense of it. Like, it still was just not very interesting. Because now you’re, all you can do is just restart that container, or replace that new container with another one. And then you have to worry about what happens, then it doesn’t make rollback easier, it doesn’t make anything easier than apt-get, or write yum install or yum update like nothing is that much better about just putting the database binary in there, and leaving the database in some data directory that you’re mounting. Just like there are no advantages and disadvantages from a performance perspective and none of the portability. And so then I hear about Kubernetes, which is supposed to be this cool way. Hasn’t been a thing that long. And, it’s this way that you can have a data center, like all orchestrated and whatever. And this is still whenever the mezzos war was like, this is like the Blu-ray versus HD DVD of the modern workload orchestration was happening in front of our face. And at Percona we were working with the guys that needed us on something for one of the databases because we thought DCLs were going to be like the cool thing, right? There was a lot to like about DCLs from a deployment perspective. And Kubernetes I think that the whole ecosystem learned a lot from that and said, Yeah, we can do that too. And so in the early days of Kubernetes, it was probably as good as what we know today, like Docker Swarm. Right, where you could put like, six things together and push a button and kind of make them all work together and that was great. And pretty much strictly for containers. So it was just the immutable workloads. And so at that point, it’s like okay, you can still do these container things. And then you can put them on very specific machines. And all you’re doing is managing the runtime lifecycle from like an upgrade downgrade, and maybe a little bit of configuration magic. But Puppet and Chef and everything that’s still around at this point, right? And that’s like, what people are doing in VMs. And on bare metal? And, again, what advantage am I gaining at all, from putting it in Kubernetes? It’s no more portable, right? It’s not portable at all. It’s not making my life easier from a general management perspective. And it’s right, it’s a database in a container, right, which, for me, was still terrible from a performance perspective. And then suddenly, like fast forward a little bit after that, we start to introduce, StatefulSets, and kind of this notion that, hey, it probably makes a lot more sense to have data move around with containers or have containers that can be on other machines that magically stitch back to a, set of available discs. And so networking has also been drastically improving over this time. I think that that needs a shout out because a lot of people don’t realize how stuck in the mud Kubernetes would have been, we still had been on mainly one and ten, maybe 25 gig networking, like this whole move to 40 gigs and 100 Gig networking is a very big deal from a Kubernetes perspective. But that’s kind of just you know, “keep that off to the side, it’s a distraction.” We now have the right ways to move workloads around and you can have well, maybe on some machine, it’s going to be accessing disks that have acceptable performance, even if their network is attached right or fiber attached or whatever they’re doing. And now you’re starting to get into this new world of “I’ve been able to deploy VMs on different servers and then connect them back to a sand with I scuzzy for a while now.” And now I’m starting to get that general feeling from Kubernetes with StatefulSets. And then you kind of upgrade that one step further with CSIs and you’re getting kind of the robustness that you would expect out of a full-on sand type deployment or network-attached storage deployment. And, and you’re starting to get into a way that starts to really feel like, “hey I could put a database in there and see what happens”. And then, obviously, fast forward even more, where now we can have data, that we have an entire data layer in Kubernetes that’s just accessed from the outside so you have technologies like my data is open EDs, you have Longhorn, you have rook, you have all of this container attached storage solutions that themselves are orchestrating behind the scenes, everything that a CSI gives you. So that from an application perspective, you just ask for a resource of some type of performance, the scheduler is smart enough to understand where that container needs to sit to get you level of performance. And then underneath that data layer is smart enough to respond to say “hey, if that does exist, I can serve that, I can move data around now, create replicas, and have this persistence that”, many containers can use without the user or the application needing to think about anything other than this is where I want the data. And so that evolution, kind of like start to finish, I would say sometime around when CSI became a thing is whenever I started to look at Kubernetes and say, hey, maybe it’s possible and then really the full-blown data plane is what sold me on it. Because one of the hardest things to manage is not just replication between kinds of logical or physical replication or backups, those types of things. You’ve always been able to leverage it if you had really expensive sand, a nice one from NetApp or you have a great Dell EMC box or something that you paid the dollars for. It’s Cool. You have this UI, you don’t have to worry about if your stuff blows up, you can always do that if you have this time machine type functionality. And that just didn’t exist in an open way, in open source. And you can have all of those benefits now from a software-driven approach. And that’s really what kind of did it for me where it was like, this is better than deploying on either direct-attached and having to manage everything on your own and backups and this and that, like there’s no, there’s no replacement for off-Site Backups, or like moving your data entirely out of the data center, that’s still a mandatory thing. But knowing your data is going to be saved because it’s in more than one place and not having to worry about that is such a big deal, especially when you’re talking about database applications. That’s kind of where it started to open my eyes and say, Look, I need to look into how this works. Now, this has been fantastic

Bart Farrell 31:01

All the while this is the process you’re going through, I’m curious that in working at Percona and other places, you’ve been particularly going back to the beginning, one thing was going on, you know this transition. In the meantime, what were the conversations that you were having with customers, and more? Were they even aware of these changes that were going on? Because sometimes it’s I’ve heard that in different live streams, like now the customer is never going to see this other times it sounds like customers can even be more advanced in the vendors. What was your experience, like in that regard?

Rick Vasquez 31:31

So this is interesting because sometime during this journey that I just took everybody through, there’s been this brewing change for a long time ever since. We can call cloud-native as a paradigm that has existed. So if we like, rewind to 2010-ish, where people were using EC2 and everybody’s starting to get going in the cloud. Everybody pitched this, the total cost of ownership benefit, and whatever like, it’ll just run better in the cloud, you don’t have to worry about anything, you don’t hire system administrators. And that’s not really why the cloud ended up taking off, like, let spoiler alert that nobody, nobody cares that they didn’t have to hire sysadmins anymore. What ended up happening is that the developers as they got access, because the CIO, was told by the CTO or the CEO, straight up, told the CIO, hey, you’re in the way you need to give this guy access. The more and more you started to see that happen in organizations where there was a self-service workflow for the developer to do what they needed to do without having to interact with other people at the company. That’s the reason why all this happens. That’s like, if we cut through all the BS, that’s the reason why we’re in the world we are today. And so what you see is kind of a spectrum of companies that have varying degrees of control that’s granted to their CTO in the organization. When I say CTO, I mean, people that are doing things that are applications for their end-users, right there, their core product, is what that CTO team is working on. And what I mean by CIO organization, that means either internal applications, because there are still application developers for internal and then systems integration, system deployment, right? Resource management, all of that is what I mean by CI organization. So some people are still very traditional, whenever it comes to, how do I have my company set up? And I have a very traditional CIO or where the technical leadership is driven, and an executed by the CIO organization, and then I have a CTO or that they’re executing on, R&D, that may not be code-based, right, or if it is, it’s not something that’s the main driver of revenue. And so you see those people being a little bit of the laggard here. No surprise. Some of your bigger banks, some of your more traditional manufacturing, right. Auto manufacturers, these things are companies that are very, very data-driven. But their product isn’t necessarily some technical application that’s served up through either web browser or some other machine that’s consuming it. And then on the other end of the spectrum, you have telecom, right which is almost absurdly pressing what’s possible in the boundaries of future or big tech, your hyper scalars, the big tech of the world, and those people are almost 100% CTO driven to the point where they don’t even have a CIO. Right? The CIO got diminished to maybe like the security guy. And so it’s like the big bouncer out front is all of a CIO now because the CTO owns this whole provisioning lifecycle that got a big DevOps team. Everything’s about developers servicing and enablement, because their product at the end of the day, how they make money, is what comes out of these developers’ fingertips and how productive they can be to kind of push the next iteration, how do we make the user experience better? Those types of things. And so there’s that spectrum of companies. And I think I’m on the more traditional extreme, you are seeing, like, now, timeframe, people are starting to wake up to this. And I think the reason it’s taken so long is that the maturity cycle had to happen within Kubernetes. Because they don’t need some of the operational improvements, per se, right? They are, okay, pay workers, right? They are okay, paying SQL Server, a lot of these other distributed SQL databases that are available, or big analytics engines that have had a managed service in one way or another, whatever that means. It’s not called a managed service, but they give you all the tools to just push button deployments in a UI. And it’s a bit self-service on-prem, that’s, that’s existed for quite some time on the proprietary side. And I think what’s happened is because of some of these big tech firms that commit to open source, and we’re solving a lot of the same problems together, which we have now, birth Kubernetes. And then, from Kubernetes, you have some of the biggest and baddest databases of all time, right? You’ve got MySQL, you’ve got Cassandra, you’ve got Postgres, which are all getting closer and closer from an integration perspective. And that’s all not being driven by one or two people or a Percona, or anyone database company. Database companies are kind of the laggards here. Some of these companies are saying, “Hey, I have 10,000 instances of MySQL, I don’t want to have to worry about deploying, I don’t like restarting a server or doing a failover, it is a nightmare. I have to find that needle in the haystack. So they’ve pioneered this way to kind of do these database operations. And fundamentally, the way that they’re leaning on that is, is by using Kubernetes. And so those operational gains, and that maturity cycle that happens, that was people were scripting stuff to do Lifecycle Management before operators existed, then an operator framework came out now we have operators that are like, robust and repeatable. And, they work for pretty much everybody and are near bulletproof, some of them. And, it’s starting to look more and more like what people traditionally think of as a managed service, like a database as a service, I don’t have to think about it, I can push buttons and things happen for me. There’s a lot more to manage service than that, right? I think that the clouds have kind of diluted the word managed service there. But what you see is, people that are in traditional IT, have a favorable lean towards these managed services, especially if they want to, quote-unquote, outsource or they don’t want to hire some talent, they’ll go and they’ll hire somebody out to do it. And there are lots of big companies in the world that have set up lots of these deployments for people, IBM and SAP. Right, they come to mind. When’s the last time you heard of an SAP deployment being less than $10 million?

Bart Farrell 39:11

Yeah, now a hefty price tag?

Rick Vasquez 39:15

But once it’s done, right, the care and feeding is almost a black box to the company that’s deployed it. There are a few experts involved. But that’s kind of the more traditional landscape in this big tech or the open landscape. Not only do you not have to know what it is, but it’s also not a black box. If you wanted to know what it is and you wanted to tinker with it, you can, but you have this robust base that you can deploy off of, and I think what people are starting to see is it’s really attractive to start doing things on a small scale. And if that small scale starts to snowball, you don’t have the oh shit moment of what do I do to scale this thing? How do I do this? Because most things that you’re deploying now, in Kubernetes have this operationally sound thing that’s based in cloud-native. And cloud-native is all about having this robust, horizontally scalable independent way to scale the different aspects of your application. And so I think that’s from a customer experience perspective, I do think it depends on which company you’re in. But that traditional company, they’re starting to come into the Kubernetes landscape now, because of the maturity, right, they’re seeing that this is a viable way to solve a problem that I’ve been having for a while or a future problem that I think I’m gonna have, but I just don’t want to deal with.

Bart Farrell 40:59

A couple of things there. You mentioned starting small scales. So for a lot of the folks out there that are kind of still in the naysayer company, like, this is a bad idea. I don’t want to do it. There are different things, there could be internal, they’re gonna have to fight too much. For those people that have, as a sort of, MVP proof of concept to be able to show to a boss or a team, how would you recommend that? Like, I said, what sort of naysayers try this first? And if you don’t like it, then I’ll shut up. What would you recommend that they take in terms of the first steps we have in the community, but if someone needs to say something like, “alright, I’ll try this out and see how it goes, would you recommend it?”

Rick Vasquez 41:36

So this is cool and it doesn’t cost that much money. From a budget, I think you can probably get away with it on a free tier in one of the clouds, choose your favorite cloud if you don’t have something already available to you to kind of just install Kubernetes. It’s really easy to install. But, beyond that, go to a public cloud. And you can install either Kubernetes on some elastic compute nodes, or whatever general computer or just use one of the Kubernetes services that’s out there. They’re all right, they’re all good. And what you can do is you can install something, what am I forgetting it now? But it was here, let me find it now. Because we got to link it as well. Anyway, in essence, what this does is it’s a thing that is a repository for Helm charts. And you can effectively have this running. Bitnami is an old thing, right? So Bitnami got bought by Pivotal then Pivotal got bought by VMware, and now it’s tansy these are the bigger fish. Anyway, we’ll link it after this. And that way everybody has access, I would install this. Immediately, you get a bunch of Helm charts that you can deploy kind of any number of applications, and it’s soup to nuts, right. So you have the ability, and we’re just gonna use something super simple. So you have WordPress, right. So you can click WordPress, and it will do everything, including setting you up a sidecar container for the database, you can choose Maria DB or MySQL, whichever variant that you’re familiar with. And it’s guaranteed to work together, right, you don’t have to worry about setting anything up. And then you can also have a metric sidecar. So you have full observability into what your frequency of requests is for all these things. And then with a couple more presses of a button, you can set up inbound rules into this Kubernetes cluster, so that you can have multiple WordPress sites, right that you’ve now clicked a button six times, and you have six different WordPress sites that you can have all with their database. Or if you want to use a shared database container, you can do that too. It allows you that option within the configuration. And, and quickly, you can start to see, oh man, I can start to deploy many apps that have a persistent backend that I don’t necessarily need to know about. And that’s a cool demo. And one thing that you can do to take it a step further, is if you’re comfortable on the command line enough to get this installed, and you can install the operator repository essentially, and now you have access to enterprise-grade deployments of so many databases, right. So MySQL, one Cassandra is another and as long as there is an operator that’s in the operators’ ecosystem at Red Hat, you can click a button, install, configure the CRDs, everything all in one easy to have place and what I would say is play with that for maybe three to four hours, half a day, get familiar with it, and then challenge whoever in your IT department to get the guy that knows how to do something that your company does, and race them. And I mean, it’s a really powerful demonstration, because in half a day without having to know the intricacies of how to deploy things. You can get there five hours faster than the guy that’s going to configure it and have to spin up VMs and provision LANs in the sand and have to figure out what’s all going on there. Or you’ve spent a whole lot of money on hyper-converged and you’ve already got Nutanix in which case, what are you doing? Just use Nutanix?

Bart Farrell 46:07

Right, this is good because the thing is one thing when we’re talking about this, we’re telling people yes, you can run, don’t say for work goes on Kubernetes. But there are different factors here. What is it? How is it done? And why is it done? As you explained there, you have timing, we have performance, we have produced, in some cases even cost-saving. Other things, as you mentioned are the app operator catalog that’s readily available. Are there any other benefits that I haven’t mentioned now? Or that you generally tried to keep in conversations when you’re debating with different folks about why this is a good idea?

Rick Vasquez 46:41

I think the scariest part about Kubernetes is what about my stuff that’s not in Kubernetes. That’s the scariest part about Kubernetes. And I just think that is just quite frankly, just not a good excuse anymore. Because there are so many good ways to get Ingress and Egress traffic to do what you want it to do from your Kubernetes cluster. And your federation from Kubernetes. Coming along the way too. So the next generation of what we’re going to be talking about is, well, how do I span the entire globe with Kubernetes, because that gets pretty tricky now. But I feel like the key takeaway here is, don’t be afraid to just move what you can into Kubernetes, don’t move things into Kubernetes. Because you’re committed to getting everything in Kubernetes. Some things just never make sense to be different as some things don’t need such as automated provisioning and care and feeding and some things are just fine to live on their own as little one-offs that don’t necessarily need orchestration, they don’t need to be cloud-native. But they may need to connect and have some type of Egress and Ingress from your cloud-native applications. And so I would say, there’s a, there’s this whole lift and shift fallacy of like, you can containerize it and just throw it in Kubernetes. And I know, firsthand from a friend of mine that does manage services for a really big enterprise. And they have a lot of legacy applications. They decided, hey, we’re gonna offer managed service. And, we want to simplify operations, let’s just containerize it and throw it in Kubernetes. It’s not working out great. And the reason it’s not working out great is you fundamentally have to embrace the cloud-native architecture if you’re going to go down the Kubernetes path. But that doesn’t exclude the things that don’t embrace that. You just have to interact with them differently. So Kubernetes isn’t an all or nothing, all of my data has to be in Kubernetes, or none of it does. There are ways to share data from your Kubernetes instances and manage infrastructure out to your non-Kubernetes managed stuff. So I wouldn’t be scared of that. Okay, and it’s called Kubeapps.

Bart Farrell 49:15

I finally got it. I don’t know how you forget something so simple, but it’s called Kuba because there are too many names out there. Let’s see a couple of other things, so what do you think needs to happen for this to become more common-place? Because when I started the data on Kubernetes community last year in September, a lot of it was well, a lot of stuff that I was hearing well, Kelsey Hightower and Hightower says keep everything Stateless. Don’t get data involved, it makes it too complex. We’ve even seen in his case that June made his tweet about Crossing the Chasm. You commented on that very well in your article for the new stack as well as in the panel we did in KubeCon. We’ve now seen that Kelsey also tweeted, again about this. So it seems like if that was kind of the issue that seems to be going away, what are some of the other things that need to change for this to become more commonplace?

Rick Vasquez 50:12

So I think Tyler Dusan, shouts out to him, a wonderful person

Bart Farrell 50:15

Wonder also had him in the live stream, we’ll add a pre-recorded talk today. He’s amazing.

Rick Vasquez 50:22

Yeah, but he had this wonderful analogy about data being the anchor and Kubernetes. And how much it still is.

Bart Farrell 50:34

true?

Rick Vasquez 50:35

Right, I don’t know how much that’s still true. Because of this thing that I mentioned earlier, networking has gotten so much better than even when we were talking about it, that you can have data kind of move around really, really quickly behind the scenes with a container. So your data has become significantly more portable, and in much larger quantities, I would say, and so that the data portability piece, I think is, is mostly getting solved. I think there’s a good approach by some of this new SQL, distributed SQL databases that are kind of taking horizontal and vertical partitioning to kind of a new realm with the amount of orchestration that you can have at both the data layer and the application layer. But I’m still stuck on Kubernetes and can only meet you. And I think Kelsey Hightower just said this very recently, can only meet you halfway, we’re gonna have to have vendors kind of wake up and say, I need my application to fully vertically integrate with Kubernetes, right and into that ecosystem. And it’s not enough just to have an operator that lets people use the application that they are used to using outside of Kubernetes. In Kubernetes, we have to make applications that are cloud-native Kubernetes that have features that are developed on top of things that are only available in the Kubernetes ecosystem. And I think, then you’ll start to unlock a new level of progress within the community. That’s one of the things that we’re trying to work on at Western Digital. How do we get more devices to be the thing that you’re directly interacting with, instead of even having to go through a host where those devices are plugged in? It’s a really interesting question to be asking, especially whenever we’re starting to disaggregate everything, right, we’ve got CSL right around the corner, we’re about to have CPUs, or just a box of CPUs, a box of memory, a box of SSDs, a box of HDDs. And, and all that’s just supposed to magically work, just like it was all plugged into one server. And so it’s a really exciting future. And I think, to fully leverage what that future has to offer, these applications are going to need to evolve into that arena.

Bart Farrell 53:19

And with that in mind, we’ve talked about this before, on several occasions, but operators seem to be one of the most promising solutions at the moment. Do you imagine a post operator world where the other solution might come in? I know, you’ve mentioned deeper integrations at some point, but what would that involve?

Rick Vasquez 53:39

I think operators have the maturation lifecycle ahead of them. And that’s, I don’t know that you’ll have operators for operators. But obviously, you just spoke with somebody who has a way to have essentially a primary and a secondary operator that interfaces with each other and makes operations easier across clusters. And I think that the big next frontier is, we’ve got this thing down within one Kubernetes cluster, which can be datacenter wide, which can span multiple availability zones. If you’ve got a low latency link between two availability zones in a data center, there’s no reason why you can’t have a Kubernetes cluster that is the entire size, right? I mean, you can have hundreds, thousands of nodes within that and have a fundamentally seamless experience. Now, when we start to split that by geography and we start to introduce things like physics and the speed of light, there’s, intrinsically always going to be this latency. And sometimes that just doesn’t make sense to span a cluster across the WAM. and then go from the US East, US West, or even more extreme the US east, all the way to Asia. And we’re going to need a way for Federation to happen, not in two independent clusters, ways of thinking. But in, I’m all one single deployment. And all of my data belongs to that data as well. And I think that’s the next real evolution that we’re going to experience in the Kubernetes landscape, especially when it comes to workload orchestration, and data persistence, because data persistence is the only reason why it makes a difference. If you have one cluster here and one cluster there, that the deployment experience is the same for a stateless application and push a button, it’s gonna go and deploy a bunch of stuff. And it’ll take Ingress traffic, you can have all your DNS settings and all your service mesh, right. All that is transportable, very, very easily transportable, and none of that needs to talk from region to region. But then when you have data that’s persisted that application needs to do its job. That’s where things get complicated between regions.

Bart Farrell 56:14

Very good, Rick, anything else you’d like to add? Before we finish? We know about pie. We know that you don’t play Counter-Strike. Not enough time for video games. Are there any other fun facts that you like to add before we wrap it up?

Rick Vasquez 56:24

Ah, no, I’m just gonna say it here that I hope that the UT Longhorns football team figures out what their issues are.

Bart Farrell 56:35

They’re near and dear to my heart. I got some serious debugging. I actually, wanted to mention this in the beginning because we’re in a community but you have a long-standing history with the Longhorns community, if I’m not wrong,

Rick Vasquez 56:46

Yeah, so I have a pretty large Longhorns community. It’s called Surly Horns. If you check it out. Don’t judge me. It’s a community. But yeah, they’re a bunch of surly people who are there, fan, I think people forget that it stands for fanatic. So there are some fanatics there. And it’s hard to be happy about a whole lot whenever they’re this bad, but it’s okay. Right. We’ve got Kubernetes to save the day.

Bart Farrell 57:15

But I was gonna say it, but I would like to touch on though, for how long have you been involved in this? I’m saying you have a very long term and just for folks that are out there as well. There’s a wonderful word that’s right underneath, certainly, when you go on there, and looking at the Oh, wow. I’m not even going to read some of the threads. So who can we realistically hire tomorrow Sark was hypothetically fired. Anyway, that’s probably the best PG 13 title that’s out there. But anyway, I will drop a link so that folks can check this out. Interacting with Rick in the DoK community isn’t enough. You definitely should check out Surly horns.

Rick Vasquez 58:03

Yeah. Or just hit me up on Twitter. That’s kind of got a life of its own.

Bart Farrell 58:10

Right. Yeah. But still, no, I’ve been intrigued. But I do think if nothing else, you’ve very much seen the power of community? And what has it gotten to you? By connecting with current students, alumni. How’s that been?

Rick Vasquez 58:25

Yeah, it there’s, a lot of parallels in right that this personal endeavor and most of my business endeavors, and that they’re very, I think the more the world becomes kind of this open place, and more community based, the more important, it becomes to kind of have a very, very structured identity and focus. Otherwise, you just kind of turn into a loose bag. And I think that really, one of the things that I’ve learned from having this community is how do we keep it focused? How do we keep it honed in how we maintain an identity instead of just adopting a different identity? Because we’re distinctly different from something like barstool are distinctly different from what your Reddit offers you or any other message board. And I think that’s true from almost all of the Linux Foundation and Apache Foundation-sponsored projects, they are very, very focused and directed on one thing and I think that’s one thing that sets them apart from some of the just more random projects that you’ll find on the internet that kind of just floating through time. And they have feature development, but it’s not necessarily super focused and honed in. I do think that CNCF has been one of the main reasons that Kubernetes has been on the forefront and had as much adoption as it has. And I think that that’s a key indicator of an extremely healthy community is you get bought into a certain style. And you just kind of keep going with that.

Bart Farrell 1:00:16

Wow, very, very good. But like I said, there’s, whether it plays an instrument or has some kind of a hobby that there’s always a way that we can see the interplay between these different things that being said, Rick, you are no stranger to this community. And I hope you never will be. If you can see my screen, let me know. See it. Good. So while you’re talking, we have an amazing graphic recorder in the background, drawing a few of the things that you touched on a lot of different things today. We’ll be unpacking that stuff and getting some quotes out of there because there were a lot of nice insights. Hearing about your journey. Where you can find Rick on Twitter, Rick Vasquez.

Rick Vasquez 1:01:03

All right, we’ll see. I’ve got a son, but his name’s not Rick. So if we haven’t we have another son.

Bart Farrell 1:01:09

Yeah, there’s a chance. All right webinar. We’re not going for a dynasty necessarily. That’s all right. Anyway, Rick, thank you very much. Rick’s always very easy to find and talk to you on Slack. If you have any questions about databases, operators, any of the stuff we talked about today, very friendly and willing to help. I don’t think we have a better community. We might have some of the same levels. But anyway, Rick, you’ve always been great to our community. And we’re very grateful for all your help.

Rick Vasquez 1:01:34

Always love hanging out with Bart. Happy to be back anytime.

Bart Farrell 1:01:37

All right. Take it easy, man. Bye, everybody.

Data on Kubernetes Day Europe 2024 talks are now available for streaming!

From Enemy to Evangelist of Stateful Workloads on Kubernetes