CFP for DoK Day at KubeCon Eu 2024 is open until Dec 4th.

Submit Proposal!

Postgres on Kubernetes Applied at Scale

DoKC staff Sylvain and Bart interviewed OnGres CEO Álvaro, engaging in an entertaining, technical discussion about a very real use case: running Postgres on Kubernetes, at scale, in production. We covered open-source tools to use, how to properly handle day-2 operations and why you’d be missing out by not running your database on K8s!

This talk was part of DoK Day at KubeCon NA 2021, watch it below. You can access the other talks here.

Bart Farrell 00:00

If you ever have any Postgres difficulties, if you say the word Postgres three times, and I’m allowed to say Postgres one more time, and look what happened, I magically have Alvaro appearing. I learned that from Alvaro himself. If we’re talking about wonderful community members, and also talking about amazing sponsors that are involved in the Data on Kubernetes community, OnGres and Alvaro are a driving force behind that. When I got involved in the community one of the first things I did was, I started scrolling through the names of folks that were in there. For Alvaro’s good or bad luck, since his name starts with an A, he was at the top. I introduced myself and basically, we hit it off immediately. I started to realize how much Alvaro can offer the community which is exactly what he’s been doing since then. He’s joining us today. We indeed had planned to do a panel with Alexander Kukushkin, who is also a very good friend of Alvaro in the Postgres community. Unfortunately, Alexander is not able to join us, we hope everything’s going okay with him. And sure we will be having him on in the future in another live stream. We had him last year talking about Zalando and Patroni, as his nickname is Mr. Patroni. But anyway, today, we’ll be joined exclusively by Alvaro so we have his full undivided attention. First thing that we gotta get out there is big congratulations Alvaro, who recently became a father. That is not the only surprise that he is going to be sharing this with us today. Alvaro very nice to have you with us. How are you doing?

Álvaro Hernández 01:14

Thank you so much. I’m really flattered by your introduction. Yes, what can I say?

Bart Farrell 01:20

Jump right into it, you jump in your slides and then Sylvain and I will be ready and waiting with questions asked to take this conversation further, but go for it. Tell us what the surprise is for today.

Álvaro Hernández 01:29

First of all, indeed I became a father, a little bit more than three weeks ago already. It’s been a real roller coaster, it’s an amazing period of time and is also really hard. But anyway, I’m here, virtually, that’s actually the only reason I’m not in LA today. But anyway, here I am virtually with all of you. So today, we’re gonna chat about Postgres on Kubernetes. This is going to be like a live session. It’s mostly unprepared. So you can shoot any question to me then I’ll try to do my best. So moving on, I’d like to introduce today a very small surprise. But first of all, the compulsory slide about myself. So who am I? Alvaro Hernandez, I am the founder and CEO of OnGres, OnGres by the way means “On Postgres”. So it should be easy to understand what we do. I’ve been using Postgres for a long time, more than 20 years. It is the only database that I’ve essentially used. And the only reason is because it always fulfills my needs. So why Look, look elsewhere, right? And I always try to look at innovative projects and go behind and create them. So I’ve been working on projects like StackGres, which probably we’ll talk briefly about, because this is Postgres on Kubernetes. ToroDB, which was an open source project also to move data from MongoDB to Postgres and create data tables dynamically. And I was also named in 2019, an Amazon data hero. So I’m here to answer any questions about Postgres, but also databases in general, whatever I can do about cloud and whatnot. 

So let’s move on to this small announcement that I have for you today, and it is this open source project we’ve been working on for more than two years already, it’s called StackGres and StackGres is a whole stack of components around Postgres to run Postgres on Kubernetes. So we are at stage 1.0.0 release candidate one (1.0.0 RC1), which means we’re almost there to release the final GA version of StackGres and announce it to the world. So this is a premier. I’m very happy to be able to say today there’s StackGres, go to https://stackgres.io, give it a try and give us the latest bit of feedback before we announce it publicly. And please share your feedback with us. 

Let me now move on to what StackGres is so that you understand what it is. It’s basically an operator and a prosperous platform on Kubernetes for innovation. So we want to keep Postgres as a core and you know, databases need to be boring. Postgres is boring and we want to keep it that way. But around it, we can construct and build these innovative features. So let me just point very briefly to four of them. First one is Extensions, everybody loves Postgres Extensions, it is what allows you to do really cool things with Postgres. So we have focused so much on extensions, that is the platform with more extensions available in the world, there are more than 120 extensions. And if this number is going to be growing over time to reach hundreds of extensions, there’s nothing like this. 

Also we have developed together with the Envoy community, the first envoy filter for Postgres, which is a parser that intercepts Postgres protocol and allows us to do two really cool things, one is to provide additional metrics without having to touch or query Postgres and the second one is to terminate SSL. So we Don’t need to go to Postgres to add SSL. There’s also a fully featured web console, even has dark mode. So you can do everything either from the command line or the web console. And last but not least, we can collect both Postgres and Patroni logs, because StackGres is Patroni, thanks Mr. Patroni, even though you’re not here with us today. And they are collected into a single location. And this central location is yet another Postgres database with one extension, Timescale for time series data, so we can collect a huge amount of logs and process them. So you can query logs with SQL or the web console. So I hope that these four very innovative features are at least a reason for you to give it a try and check it out on https://stackgres.io. And last, but not least, I would like to show you and explain moving on that tomorrow, if you are especially interested in the extensions, let’s please move on to a small announcement that I’m going to be giving tomorrow another talk at KubeCon, which is specifically explaining about how we did these Postgres Extensions. It’s very interesting, first of all, because it’s the only Postgres talk in this KubeCon, also in the last I don’t know, several KubeCons and second because we develop a mechanism for loading dynamically extensions into the containers, because extensions are code that you’ll call dynamically but containers are built from images that are immutable. So we had to do something there to solve this problem. I’ll explain in my talk on Wednesday, 13th 5:25 pm PST, so please join me if you want an in candidate. And that’s all I wanted to say now. I think I’m ready to be shot any question.

Bart Farrell 06:45

So while you were talking Alvaro, Sylvain and I were playing Postgres, the drinking game and so every time you said the word Postgres we had to drink. Needless to say, that was a very intense few minutes, but now awesome. It’s very, very clear Alvaro is a strong authority on Postgres. I didn’t want to get into another spoiler, because I saw this recently on Twitter, I don’t know, are you going to be doing a conference in Ibiza in Spain next summer, perhaps?

Álvaro Hernández 07:11

Yes, you’re absolutely right.

Bart Farrell 07:14

We got that to look forward to, like I said, if you have anything Postgres related, jump into our Slack, and Alvaro is very good at answering questions. That’s exactly what we’re gonna do here. In that event, do you think we’ll be able to talk about Data on Kubernetes?

Álvaro Hernández 07:26

It’s gonna be a must. So we organize this conference for the first time in 2019, 2020. And 2021, we’ll stop for obvious reasons. But we’re coming back in 2022, go to https://pgibz.io or look for Postgres Ibiza. It’s a conference where you can go to the beach, swim, and then go back to the next talk. It can get better than that.

Bart Farrell 07:48

Anyway, I’m already looking at flights and my partner’s quite excited about it too, Sylvain is also invited. But now we want to jump into some more Postgres related questions, but looking at your experience, because this is something we talked about a lot. And what Sylvain mentioned earlier in, in his interview that he did with Ara from DreamWorks, how do we make running data on Kubernetes look like something that’s not so foreign? That’s not so strange to a lot of folks that, let’s say, I’ve been working with databases for quite some time. And I’ve seen this with friends of mine that have, you know, done the whole big data stack, have been working with data for a long time, you say, hey, what about data on Kubernetes? They’re like, “no, no, no, no, that’s too hard”. In your experience, what was your first experience like with that? And based on that, what recommendations would you make to folks that are out there that are starting to think about running databases on Kubernetes? What are some things that maybe you would like to share with them so that it’s not so it’s not such a rocky road when they get started?

Álvaro Hernández 08:40

Okay, so actually, let me briefly tell you a story of how we went to Kubernetes. It’s not because we decided, oh, we want to, you know, Kubernetes is the next fancy thing. And we need to be on Kubernetes or because, you know, whatever reason? No, the reason why we came to Kubernetes is because of its API and the automation. Basically, we’ve been working, if you look at Postgres, Postgres is something that requires a whole stack of components around it. It’s not that you’re able to get to install Postgres and deploy this to production, you deploy this to your laptop, but you will not run this in production, you typically want to add connection pooling, monitoring, backups, management of logs and high availability. All these components don’t come with Postgres. Postgres is kind of a Linux kernel, we need the distribution to run this in production. So where do you take all these components from the ecosystem? How do you make them all work together is a difficult problem, because it depends on many environments. So if you have done it working with a load balancer, or you have a virtual IP or you have a DNS then the entry point is going to be different. And then if you look at the storage, whether it’s a NAS or a SAN, or clouds native volume, everything is going to be different. So when we were looking, we were repeating Ansibles, Shafts, Puppets and all these things, in every customer different environments have different ways. We asked ourselves, is there any way in which we could create a single deployable package for all environments? And the answer was Kubernetes is the only answer to this problem, thanks to its API. So there’s a very good reason to run databases on Kubernetes. Because it’s standardizing the environment below which otherwise can be too complicated. So you’re really, really saving a lot of time and effort by working on Kubernetes. But also and contrary to some beliefs, your availability, which is a critical concern for many, can be increased in Kubernetes. Some people fear that things in Kubernetes vanish away. Well, they don’t. Containers, running containers are basically wrappers around processes. If your processes run, well, your containers are going to be still there, they just don’t don’t vanish, right? There’re certain situations in which a Kubernetes orchestrator may reschedule them, but it’s easy to protect against those arbitrary situations. But because of their re-healing characteristics of Kubernetes and Statefulsets for example, it’s actually trivial on Kubernetes, that if a node fails, it will be restarted and the pod rescheduled somewhere else. And then with tools like Patroni, to re-adjust the cluster so that there will be a failover if required, that the next node will connect to the previous one and the cluster will be re-healed. This is not so easy in all environments. And most of the environments that we see, maybe they have high availability but they don’t have auto rehealing of nodes. So if a node dies, you’re with one node less. And this doesn’t happen on Kubernetes, so actually can lead to higher availability. So I think it’s a really good thing in general.

Bart Farrell 11:41

Very, very good. I think Sylvain is already thinking about stuff he wants to write based on your answer, but I’ll let him take the next question. So Sylvain if you want to jump in, push the button.

Sylvain Kalache 11:49

So maybe something interesting, you could go over Alvaro is the history of stateful workload on Kubernetes. It’s not always been like an easy thing to do. So could you go over how different timeline concepts came and disappeared?

Álvaro Hernández 12:12

This is actually one of the areas where people show concern about Kubernetes because of how it was in the past, we need to understand that Kubernetes has evolved fast and well. But it’s been a changing landscape. So what was true four or five years ago may not be true today, most likely it doesn’t, right? So what do you do when you want to run a stateful workload like a database on Kubernetes? First of all, you could use the storage within the container, this ephemeral storage, but that’s probably a pretty bad idea, because first it is ephemeral, its lifetime is tied to the container and second it is very slow, because it goes through some, let’s call it emulation. And so if for anyone who has tried this, that will get bad results, then so on Kubernetes, there was this need to have storage. And because ephemeral container storage wasn’t appropriate, there was a first story called Petsets that appeared in Kubernetes. It was good because it laid the foundation to the next one, but by itself had significant problems in many areas. So Petsets actually gave a little bit of that fame for running stateful workloads in Kubernetes. But this has moved on and all the problems that those sets had have been solved in Statefulsets, which is kind of the next generation storage management in Kubernetes. And it is not the only option, the King Kubernetes right now is that you have the ability to create Persistent Volumes, which are managed by their handles called Persistent Volume Claims. And this allows an abstraction of the storage and the storage can be redundant and external to the container can be cloud volumes, for example, or can be container attached storage that is software defined, that could also be redundant enough. But because it is external to the container it is not tied to the lifetime of the container. And there’s no performance hit because it’s just mounted as a filesystem inside of the container. So as of today, most operations for data on Kubernetes use a Statefulsets, which is a higher level abstraction that manages the volumes for you these persistent volumes and claims and is able to understand that you’re running potentially a cluster of several instances. But there’s even the option of just managing the Persistent Volumes and Persistent Volume Claims by yourself. I know a few operators that actually do this, which means they don’t use Statefulset but they have the equivalent notion of a Statefulset implemented as part of the operator. Both approaches are potentially good and correct. The key is that we moved on from persistent Petsets to Persistent Volumes, which are external storage containers. And this actually brings enough reliability and tolerance to what you want to do running data on Kubernetes, essentially.

Sylvain Kalache 15:15

Historically, it was a risky thing in Petsets that were part of the reason why running stateful workloads in Kubernetes was naturally a thing. But that has really shifted. And as we can actually see that we surveyed 500 organizations, asking them about the stateful workload in Kubernetes and 90% of the respondents believe that Kubernetes is ready to run production, still from workload on Kubernetes. So I think that this belief that Kubernetes is only for stateless workload is kind of gone, or at least going away according to that. Like I said, we got to a point where Kubernetes is like the basics to us. How do you go about managing the day two operation once you have set up your stateful workloads in Kubernetes, as well as the tools available to users?

Álvaro Hernández 16:19

Day two operations are, in my opinion, one of the other reasons to get Data on Kubernetes. Specifically talking about Postgres, which is a database that is actually complex to use in production and again, looks simple to deploy and operate until you go to a certain volume of operations. And you start realizing that the database requires some maintenance operations, it requires vacuums to clean the bloat, it requires re indexing, because your indexes became bloated, and they start responding slow, you need some kind of tuning, you obviously need backups, you need sometimes repack operation, which is to remove this load, and many others, right. And these operations are not are something that needs to be run by humans. But by DBAs, not even on the cloud managed environments, these operations are present, so it’s basically on you. And in reality, these operations even though they require a lot of domain knowledge, expertise, the execution of dos is potentially like a not so complicated runbook which means that there was an opportunity for automating them. And again, thanks to Kubernetes API, and all the automation that is built in this is possible to do from an operator perspective. So operators can leverage this Kubernetes API, and bring this domain knowledge expertise of running day two operations, normally calling database DBA maintenance tasks as fully automated operations, and then you’re really not only saving a lot of time and effort into not having to run those manually. But also you’re running them error free, because they’re already programmed and correct, right. This is something I don’t want to piggyback too much on this break that I just mentioned before in StackGres, but we have done this, I’m speaking because in my experience speaking of Postgres, we have automated, vacuum reindex, repack, major and minor version upgrades and even benchmarks. So you can just specify a simple demo file and run a benchmark and then collect the results as part of the status of the CRD. And these are just examples of how you can leverage automation in Kubernetes. For simplifying your running your database and Kubernetes is something that’s really hard to do with custom software developed for running this outside of Kubernetes. And I haven’t seen it before.

Sylvain Kalache 18:57

So we actually saw this in the report where the more organizations running stateful workloads on Kubernetes, the more productivity boost, they get organizations that are asked more than 75% of their production workload and Kubernetes report being twice as productive as before. But on the flip of the coin, the operator enabled that but on the other flip of the coin, for organization, we’re not running the stateful workload on Kubernetes, operators were mentioned as a top barrier. And if we dig down into that, the two main issues are the valuing different level of quality and also the lack of standard. So what do you have to say about this? Like how do you think we could have better standards for operators?

Álvaro Hernández 19:53

It’s a hard problem. Right now in the Postgres world, there are close to 10 operators, maybe I’m even missing some, not all of them are very well known. I can try to recap mentally but around 10 operators for sure. Let me know if you want me to try to recap all of them. So which one is best or which one is a better fit for my use case? It is a different difficult question to answer? But this is also speaking of Postgres specifically, this is a well known problem, the Postgres ecosystem has created many tools for the same tasks. So it is just by experience who, where people are learning which tools are best for which use cases. I don’t know how we can better assess the quality or maturity of operators, there’s this famous five levels of maturity and capabilities developed by Red Hat. But to be honest, everybody claims that they achieve mostly all of them, right, without further ado. So I don’t think that’s a very reliable mechanism to assess quality. But I have some thoughts about the standardization, the second part of your question, so what is an operator, an operator is a development pattern for creating software and Kubernetes, which essentially consists of two parts. One is a CRD or a custom resource definition, which is essentially a custom object you create on Kubernetes. And a controller with this demon or a server that runs inside Kubernetes, that speaks to Kubernetes and reacts to changes on these CRDs. On the controller part, there could be many implementations. But let’s look at the CRD part, this custom resource definition again, what is it? The custom resource definition is a way to convey your domain level expertise into a custom object which can have the properties that you want. If they’re properly designed, these CRDs can have very high level properties of what you want to achieve, so that the user doesn’t need to be an expert on the subject matter and it is declarative. Let me give you an example. This is something we have also tried to achieve on StackGres. And if you look at the CRDs to create a cluster, they don’t ask you for any specific Postgres knowledge or expertise. You talk about the number of instances, the Postgres version that you want to be instantiated. About if you want to reference a Postgres configuration, yes, provide the name, about the obviously the disk space about the kind of instance that you want, if you want monitoring or not. But it’s nothing like super low level, or what is going to be just Patroni configuration parameters, so that replication works? So they’re hiding, the goal is to hide all this complexity. Okay, now every operator has created their own CRDs with their own shape, some of them more high level, some of them less high level. How we can collaborate, how we can standardize, we could standardize on the CRDs, if instead of looking this CRDs to be so tightly coupled with their own controllers as part of each operator, if we look at them, as if they were specs, like specification, if we could gather together and create like a committee and say, okay, a user wants to create a Postgres cluster, every Postgres operator has a Postgres cluster CRD, others have others, or there’s just one, but everybody has a cluster with different names, but everybody has a cluster. But then why don’t we sit together and create specifications with many optional fields if needed to say, what is that cluster expectation of any user? What does the user need to provide in order to create a cluster? What is the expectation and then with some compulsory fields with them, some optional fields, potentially even some extensions or proprietary extensions, but that’s a way where we could standardize by standardizing on the CRDs, let me give you another example. Let’s look at object storage, or cloud storage s3 or Google Cloud Storage, or servebolt, or whatever mean. There’s many operators who are storing data on disk buckets. And each one of those if you want to store data on a bucket, you need to provide a reference to it. So probably you’ll provide the endpoint, the access credentials, the path within the back end, and maybe a few other options. Why don’t we standardize on those, because I’ve seen so many implementations already have the really same thing, a reference to an s3 bucket. So why don’t we as a community sit down to create comedies, and well comedies, I don’t want to make it very formal, but you know, conceptually and standardize on CRD, then anyone can provide your own implementation. And you can choose to implement this feature or not, that’s up to you. But then users would be able to more easily port and move away from one operator to another.

Sylvain Kalache 25:05

Thanks, Alvaro that was a very detailed answer and really appreciate your thoughts on that actually, if some of you in the audience have an opinion about how to make operators better or come up with a standard we the data on Kubernetes committee will organize working groups to discuss this type of question and try to come up with ideas and suggestions and then involve the broader Kubernetes community. So I will be sharing about this shortly, but join our slack. And hopefully we can see you there. So taking a step back, you know, as you said, Postgres is not fancy, but that’s also why it’s great is that it just works right. But Kubernetes also offers different technical paradigms that were not accessible to us prior and so what are your thoughts on what’s next for databases? When they’re all hosted on Kubernetes? Are there new paradigms or new possibilities coming up that some company or some open source project are already offering all that are to come?

Álvaro Hernández 26:34

That’s a very interesting question. There are some databases who claim to be cloud native. And for the most part, I agree with that definition. The main difference for these new databases compared to more classical databases is potentially the replication architecture. So Postgres works in a primary replicas architecture, you have a single read-write node, and multiple read only nodes potentially. Whereas all the databases may offer an architecture where you have multiple write nodes. That’s absolutely fine and that’s something always desirable in principle, but it’s not that much needed in the case of Postgres. So it’s an advantage, but I wouldn’t say this is a reason to rewrite Postgres from scratch, and, you know, build it in this cloud native way, actually building a database is a quite complicated thing, that there’s a quote, by MongoDB CEO that said, “a little bit shooting himself in the feet some years ago that taking a database to a point where it’s mature enough to run in production takes a decade”. And they were like five or six years old, so like, okay, well wait four years. But it’s a good point that Oracle also, I don’t know who exactly, but someone or Oracle said the same stuff, right, like a database that takes 10 years to mature. I don’t disagree significantly with this. So is it justifiable, right now to say, okay, you know, we love Postgres but you know, let’s rewrite Postgres in a cloud native way, okay, then, like 10 years until people aren’t going to run this in production. So instead of kind of thinking of a whole rewrite or redoing things, I’m more prone to look at databases like Postgres, and instead of reinventing them completely, try to look at them. I consider Postgres, a little bit of a monolith, I don’t say this in a bad way but it does a lot of things. Now, if you look at the Kubernetes world at the CNCF ecosystem, there are many other components that could take and replace some of the functionality that Postgres does. For example, I just mentioned it before. But this envoy, the envoy proxy, has a Postgres filter that we helped develop and this Postgres filter is able to terminate this assault Postgres SSL. So instead of letting Postgres terminate SSL, you can remove all this SSL code from Postgres and let it do it on the envoys side. The advantage is that envoy has API’s for managing configuration already connects to things like cert manager. And so you can use a tool that is normal and usual for Kubernetes users to manage the certificates like cert manager or others. And then you need to A.) learn Postgres specifics and B.) you relieve Postgres from this workload and can focus on other workloads, or you can use the CPU for other stuff. This is just one example. If we start looking we could make sense of the functionality of Postgres and take it away and replace it by native components of the CNCF ecosystem. For example, an example is log management. Log management is something that we have accomplished by using Fluentbit and Fluentd. Postgres supports sys logs, supports CSV logging, and supports a lot of logging mechanisms. All of those, for the most part, can be replaced by FluentD and Fluentbit. Monitoring is more or less nowadays integrated with Prometheus, but there’s still a lot of advanced work to be done. There’s no integration with open telemetry, the same, there’s no tracing imposters, there’s actually tracing in Postgres, but not the way we know it in the Kubernetes world. And this is also something that could be built into Postgres, or stripped off Postgres into external components. That’s how I would reshape databases for working Kubernetes nowadays.

Bart Farrell 30:39

And we’re almost out of time. But just one last question. Sylvain asked earlier to the speaker we had from DreamWorks, someone gives you Alvaro a magic wand to make you know, the process of running, you know, stateful workloads on Kubernetes easier. What are you going to do with that wand? Or got a wish to make the say that I would like to change this? It could be something in a technical sense, it could be something in a mentality sense, what are the what needs to happen in order to take the next step forward?

Álvaro Hernández 31:06

I talked before about the CRDs to really make the CRDs super high level, super easy to use, that they don’t convey any low level knowledge. This doesn’t require any magic, just a little bit of a thoughtful process on the people who are creating the operators to make them really easy to use, and that they don’t require expertise on the database that you’re running so that if you’re a Kubernetes administrator, you don’t even need a CKA, you should be able to deploy a Postgres cluster, MySQL Cluster, Cassandra cluster. I think that should be enough.

Bart Farrell 31:42

So let’s reduce the fear factor. As we get closer to Halloween, let’s get brave, and not worry about that. You said also on previous occasions, that CRD is your favorite feature. So it’s no surprise that you’ve stuck with that. No, it’s a good thing, not everybody likes to use the word favorite because then things start getting heated debates but the answer is really, really good point is let’s make it more high level more accessible. So folks that have the experience, have the know how have over already, you know, overtaken overcome other technical challenges. Let’s get them in there. And so I think that’s going to be a focal point then in our community is let’s drive more attention to CRDs, so that folks that are, you know, maybe a little bit behind or just catching on to this trend of running stateful workloads on Kubernetes that can be simpler for them and give them more probabilities of success.