The CFP is open for DoK Day at KubeCon NA 2024 through July 14.

Submit Now!

Production Postgres Made Easy on Kubernetes

It’s one thing to get a database up and running on Kubernetes; it’s another to run it in production.

There are many “Day 2” operations to manage when running any database in production, including high availability, backups, monitoring, security, and disaster recovery. Kubernetes operators can take care of that and also manage such as advanced features like self-healing, simple scaling, secure credential management, and more.

In this talk, Jonathan S. Katz covers important concepts for running databases in production. Featuring PGO, the open source Postgres Operator, making the management of production Postgres clusters on Kubernetes the GitOps way easy.

Jonathan explains how using simple manifests, PGO deploys HA Postgres clusters with self-healing capabilities that apps can securely connect to. how PGO implements these features, including a demo on how to implement key “Day 2” features for managing PostgreSQL on Kubernetes.

Bart Farrell  00:00

Welcome to live stream number 112 of the Data on Kubernetes community. Before we get started, as usual, check us out on Twitter, LinkedIn, subscribe to our YouTube channel, jump in on Slack, we have plenty of things going on. Jonathan, welcome to the Data on Kubernetes Community. Can you just give us a little bit about your experience? How did you get started with Postgres and Kubernetes?


Jonathan Katz  01:48

I started as an application developer. Back when I was hacking in high school and trying to build websites. I started building websites on it and I fell in love with the features. Instead of just trying to do things through an ORM(Object-relational mapping), I dived into a database layer, like writing raw SQL queries and using Postgres’ full support for data types and indexes. Postgres ended up becoming very popular for a lot of different reasons. And that’s what got me into Kubernetes. I joined Crunchy Data and they had already been running Postgres on Kubernetes for several years. But what I’ve been looking at was not so much about running a database and containers; it was, how do we run Postgres in a production environment on Kubernetes? But I always find myself having to manage the database at the end of the day. So how can you make it easy to do that in a production environment?


I’ve been active in the Postgres community for well over a decade. I’ve been in the startup world for a very long time and I joined Crunchy Data because I just wanted to work on Postgres full-time. We love contributing to open-source projects, including the Postgres upstream. We have a certified open-source Postgres distribution. We do stuff with Postgres on Kubernetes. And we also have our own fully managed cloud service called Crunchy Bridge. Kubernetes continues to evolve very rapidly in a lot of different ways. On the Postgres side, we’ve been very excited about the Kube 1.22 release. The support for cgroup version two is being able to support swap which is very important for running a Postgres database within that environment. As Kubernetes evolves, we need to make sure we can tailor Postgres to run within the production environment. And the reason production is so important to us is because that’s ultimately where our data is going to live. 

I can do all sorts of things in my development environment and test things out. But if I can’t get it to run in production, then what’s the point? The focus from the product and engineering side has been, what can you do to not only make it possible to run Postgres in production on Kubernetes, but also make it easy to manage a database? In that way, you just have to worry about building your application. So this is where the Crunchy Postgres for Kubernetes comes from. We read this as the Postgres operator, PGO for short. PGO has been open-source for a very long time and has been battle-tested in production environments. So what’s important when you’re trying to deploy anything in Kubernetes, is that you need to be decorative. This is why I love Kubernetes because I’m going to define what I want to deploy, what my topology looks like, and then it should just appear. We want the same experience for our databases, whether it is Postgres or not. This was one of the first big design decisions of PGO, to make it fully decorative. I want high availability, with my backups stored somewhere and with my monitoring turned on. 


GitOps is another big advantage of running things in Kubernetes. You can successfully version your manifest and be able to move forward and back in time. With databases, it can be a little bit trickier, particularly if you’re doing destructive things like a point-in-time recovery, but we can get as close to GitOps as possible with a lot of the other operations involved. Just to get the long story short, PGO works with all the tools. Let’s get a production database up and running. I’m going to go for a few architectural concepts that are important for PGO. And I would say for running anything where you are backed by a key stateful service. The first is this notion of being declarative. This is why Kubernetes is so popular because I build up my YAML manifest, and I want my HA(High-availability) and DR(Disaster recovery). I want my monitoring and I put it up to my API on my custom resource. And PGO just takes care of it or an operator takes care of it or a controller in Kubernetes takes care of it and deploys it as you want. We are going to see that we are going to leverage Kube Secrets so that all your connections are in there and you do not even need to look inside them at all. You just need to pop them in, and everything just works through the application. So we are up and running. 


If you delete a deployment or a service, they do not automatically come back up. That is when an operator helps. The great thing about the operator pattern, in general, is that it can detect those changes and automatically heal them. But that is only one side of the equation. An operator can certainly help maintain the state of how you want your Kubernetes objects deployed. But then there is availability within your stateful services themselves, including Postgres. Some key aspects of PGO are high availability uses a distributed consensus. We leveraged the Kubernetes API’s for that; and disaster recovery, so backups can run independently of the operator, and the same goes with monitors. 


There are many reasons why you might want to update. The first reason is CVs are coming out all the time and your base image needs some updating. You need to update your Postgres image or backup image. You might need to resize your Postgres cluster, you need to add more memory or add more CPU resources, and all those are destructive operations. So you want to make sure that updating has minimal impact on your cluster. Then, being able to do continuous delivery, ensuring that you can build the cluster in your various environments, and having the declarative workflow certainly helps with that. The next point here is infrastructure agnostic, something you get inherently with Kubernetes. You can start your backups in multiple locations. We are almost to the demo. 


You should think about when you’re in a production environment, and how to look at it with high availability. I think there is a lot of focus on this, particularly in Kubernetes, is that you want to make sure that you are always available. But you also want to make sure that you are not sacrificing things around data integrity, and this is where disaster recovery is really important. Make sure you have backups, and you can restore quickly from them, and be able to put them flexibly anywhere. Then monitoring because, you can anticipate problems before they occur, such as running out of disk or a runaway query. 


Other things that an operator can certainly help with, not just PGO are being able to provide security out of the box, running with unprivileged containers, and enabling TLS default. In the case of PGO, it brings its own PKI (Public Key Infrastructure), having performance, you being able to tune your cluster, easily being able to clone data, and connection pooling as you continue to scale up. I am in my Kubernetes cluster and I’m using a local time cluster. I went ahead and deployed the Postgres operator to this. You can see it is up and running. I have already created a few manifests to deploy everything. We are going to go through the script which I set up together. We are first going to deploy our application and our monitoring stack. Today, we are going to deploy Keycloak and we have Keycloak backed by a Postgres database.


Keycloak is an identity management application and we have a Postgres database connected to it. I am going to go through the manifest here and explain some of the different things. First, you can see we are loading in our Postgres image and we are specifying high availability. In this case, we can run two replicas and I have high availability because of using the DCS that Kubernetes provides. I set up two different backup repositories. One is using local Kubernetes storage, the other is using my S3 bucket. I have deployed a monitoring sidecar, so we can start exporting metrics. This is the favorite part of the entire demo for me, and it is binding our application to the database. All of the credentials exist within a secret. I do not need to know the user credentials and I can just refer to the secret to my application manifest. I am going to take a manual backup to S3, which is going to be helpful later. You can schedule backups on your own and have them occur using KubeCon jobs and I strongly recommend that you set that up anyway. As I am using my local kind cluster, I am just going to port forward it. 


So here is Keycloak and this is the welcome screen. I am going to log in. I am on the main screen for Keycloak and let us see if I can save some data. Data got saved. Our application is up and running and we are against a highly available Postgres database. Let me show you what the topology looks like. We can see our two database instances here. We want a backup job completed when one is still running the one to S3, which did take a little bit of time, but we have our application up and running. There are multiple ways to connect to the database. There is a secret that contains the database credentials, which you saw that I bound to the application. You can take the database credentials and connect directly or you can also execute the pod. We can see our data was saved to the database. I strongly recommend reading the documentation and going through the tutorial because you will learn lots of cool things. We have a cluster here, it is up and running and we connected our application to it. But in production, chaos can occur and it can occur in a lot of different ways. 


One of the most important elements in our cluster is a service. Why is the service important? A service is a stable endpoint. It lets you always connect to whatever it may be: your application, your database, etc, through this endpoint. An important one for us is this one called keycloakdb-primary, which is a stable endpoint to whichever Postgres instance is your primary or the one that accepts reads, and writes. It would be a bad thing if that service gets deleted. Under normal circumstances, if I delete a service, it is gone and you do not see it ever again. So I am going to delete that service and try to create havoc. But we will see that it comes back up almost instantly, it is like the service was never deleted. And that’s because you know, PGO, the operator detected that the service was deleted, and recreated it. The fundamental power of the operator pattern is that it is constantly watching what is gone in the environment. It appears that I can still save data to the database at least through the UI. I am going to execute it and we see that our data is still there. 


Let us add another instance, because two Postgres instances give us high availability, but three is better. Three gives us a little bit more resiliency against one node or two nodes. I am going to go into our manifest, and change replicas from two to three. I am going to do a “kubectl apply” and have the changes roll out. We can see the new instance coming up. We have a primary, a replica, and this new instance initializing, which is a part of the initialization process; It needs to get a copy of the data. The way PGO does this is that it uses one of the backups to speedily recreate that information, and then it attaches it to the current primary to get up to date, and it becomes a streaming replica. We can see that we have everything up and we have three instances. 


Let us get into a software update. Let us say that our application is widely successful and we need to tune our memory a little bit. Let us give our Postgres instances a little bit more memory and then subsequently we could do some adjustments to the Postgres configuration itself. We are going to send a memory request for instances and we want to have a one-gigabyte memory request. Depending on your production application, that might be way too little memory for your database. When we do apply this change, this is a destructive operation, because we are going to the stateful set and saying, we need to adjust the memories; which means that we have to recreate our pods. PGO does this in a rolling fashion; it first takes a replica, takes it down, and makes sure that it does a safe shutdown. It then applies the change, makes sure it’s healthy, and then it goes to the next pod. In this case, we are handling the second replica and it is the same process. When we know it’s ready and able to accept connections, at this point, we are going to go after the primary and we are going to do a controlled switchover. We promote the most healthy replica, which becomes the new primary. All the connections get rerouted there. We take down the old primary and apply the change to it. Sometimes with Keycloak, there might be some connection retry logic that will get it into a loop. Let us make sure that we are still able to save. We are indeed going to a new database now but it looks like the value is saved. If we execute it, the data is still there. 


There have certainly been a lot of advancements in Kubernetes over the past few years to make it easier to run multi-cluster Kubernetes environments. One of the critical linchpins of that is your Postgres database, and making sure that we can get the data and the information between the two, and set us up for promotion or failover scenarios. We are going to create a standby cluster in a different data center. We are going to be leveraging S3 as our broker here. The reason we use the external third party is that it gives us a broker to be able to get the information between both systems. Recall that our second backup repository was the S3 repository, and we do have a backup there. There is information there to create the standby cluster. Let us go ahead and do that. 


Our standby cluster is going to look very similar to our primary cluster, and that’s by design. But the one difference is that we have a session called standby, it says enable true and it is pointing at the backup repository. We are using bootstrapping and continuously reading some of the data. For application, we are deploying Keycloak in that second data center. In this case, I am starting it at zero and we will see us bring it up. Now I am handweaving one thing on the multi Kubernetes cluster aspect, which is to save some kubectl context switching. I deployed this cluster to a namespace that I called data center 2(DC2). 


Let us see how we are doing in terms of bringing up our Postgres cluster. I am going to do it in a risky way by running a query against the database. We are on the DC2 database now, and it is up and running. We can see our database pods, and one of our backup repository pods is already there. How do we know this is a standby cluster? The command in Postgres is  “pg_is_in_recovery()”. If it returns true that means it is in replay mode, or it is accepting changes from a different database. In this case, it returned true which means it is a standby cluster. I am going to run this command just to synchronize the value of the Keycloak secret between our primary cluster and our standby cluster. You could use an operator that synchronizes secrets between two different data centers. In our standby data center, we can see that we are using the backup. We are leveraging the backup from the primary data center, which is in S3. We have our standby cluster set up and it is ready to be promoted. Let us go into a full “day 2” operation and promote that. 


We can explicitly shut down data center one or our Postgres cluster. First, we are going to shut down all the replicas. We do a safe shutdown to make sure any active transactions get flush to the disk or get pushed up to the backup repository. We then terminate the primary and after that, we will be safe to promote the backup data center. Let us promote our standby cluster. We are going to bring Keycloak up as well since we are promoting the standby. When we promote the standby, it is almost instantaneous as the database is already up and running. But Keycloak is going to take a little bit to boot up, some of that is based on the application. I am going to set a new port forward to Keycloak, and I am going to do it on a different port. I was running on 5000 before and now I am going to go to 5001, to be clear we are accessing Keycloak from a different data center. 


We are back in Keycloak. It is a new session because we restarted our pods. We still have our data that is from the primary data center. This is Keycloak connected previously to our standby data center. We can see that the data did exist in the DC2 database. I am going to make sure we get a backup saved to the local storage.


Point-in-time recovery allows you to roll back your database to a specific point in time. The classic example I like to use is dropping the user’s table. If you are ever doing work in your production database, always do it in a transaction typing begin first and then go ahead and make your changes. If you do not do that, you might accidentally drop your user table. PGO offers two ways of doing a point-in-time recovery. The first preferred method is you create a new Postgres cluster, and you do the point-in-time recovery there. The second method is a little bit riskier, because if you mess up the point-in-time recovery, you may not be able to go back to the point in time that you want it to. I am going to modify our manifest to set the point-in-time recovery. We are using the backup from the local backup repository within this new Kubernetes cluster that we set up because it will be a little bit faster to perform than me going out to S3. The time will be in UTC because that is what our container time is set to. 


Let us prepare our in place point-in-time recovery. We declare, this is how we want a point-in-time recovery to look like and we say it is enabled. To truly carry it out, you have to set an annotation. We apply the annotation to the manifest and the point-in-time recovery begins. Let us see it occurring. We can see our restore job is running. We are going to scale Keycloak back up, which means I am going to restart a port forward. The reason why I am spending a lot of time on point-in-time recovery is that when you are running databases in Kubernetes, the storage layer in Kubernetes is still being developed, and in a distributed system particularly when using network storage, a lot can go wrong. 


When people emphasize high availability, I always emphasize backups. Make sure you backup your data. Do not have one backup, have two backups. With PGO, I just need to put in my various times and move forward in the timeline, and the operator orchestrates all of that. This is one of the understated powers of using an operator to manage your databases is that you can encapsulate some of the very complex operations and simplify them to make them more accessible. Let me do one more quick thing with our cluster and set up a connection proxy. Often you set the connection proxy for dealing with more throughput of your applications where you have a lot of connections, and you want to scale your connection load appropriately in your Postgres database. We use an open-source connection pooler called PgBouncer. We can see that it is already up and running. As soon as you add the PgBouncer, the user secret gets updated to have the connection credentials to it. You do not need to pass the credentials around, you could just plug them right into your manifest. 


If we look at the stat activity, there is a special administrative PgBouncer user, but we also have Keycloak connecting through the JDBC(Java Database Connectivity) Driver through the PgBouncer user. This allows us to tie our application back to our database through the proxy. The Kubernetes community does a wonderful job of making monitoring and visualization accessible. 


With PGO we leverage some of the wonderful open-source monitoring tools available. Prometheus, Grafana, and Alertmanager provide us with some turnkey Postgres metrics. We can see our two clusters. One is our primary data center, which is down, but we have a HA(High-Availability) cluster in the second data center. We can go in and look at the details of it. Some of the things that a Postgres DBA expects are what is your connection saturation? What is the size of your database? What is the workload on the database? Is there any replication lag or, how big is your write-ahead log directory growing? These are the things that you look for before troubleshooting. You get details about the pod. You get details about the backups, like when was your last backup taken, what is the size of your wall archive, what is the size of your backups, are your archives perfectly pushing, and even things like query statistics. This concludes the demo.


Bart Farrell  46:52

Thinking from the perspective of folks that are approaching this challenge of running a database on Kubernetes, what are the things that have been difficult for you in this process? Or what are the things you might anticipate that could be easy mistakes to make?


Jonathan Katz  47:16

If you are just diving into Kubernetes, the community moves fast, and it is very hard to keep track of everything. Even before you start running a database in Kubernetes, I am going to plug a book that I get zero commission on, but this finally helped put Kubernetes in perspective. It is called “Kubernetes in Action.” Even though it was originally written around Kubernetes 1.4, it covers all the fundamentals. It is not hard to get a database up and running in Kubernetes. You are going to face a lot of challenges running it in production if you don’t apply automation to it and use an operator against your databases.


The reason is chaos. I think the biggest mistake people make is they focus so much on availability, but you should focus on the storage layer, like which storage system you are using. And most importantly, focus on the backups. The biggest mistake I have seen is people not caring for their backups and then running into an issue where there is corruption on their primary database. Keep in mind, having a replica is not a backup. Focus on the backups, ensure you understand your backup strategy with your data in Kubernetes, and this extends beyond Postgres. I would say most of the issues reported in GitHub do come around backups and for a variety of reasons. I feel comfortable running my applications in production on Kubernetes with PGO. At last, monitor. Get those insights, you want to be able to anticipate problems before they become problems.


Bart Farrell  51:21

We get feedback from folks that might not be 100% convinced that this is something their organizations are ready for, but when someone asks you why should I do this, how would you respond to that?


Jonathan Katz  51:56

There are a few reasons. One is having control over where your data is deployed. Kubernetes is powerful in the sense that with that API, my node can move anywhere. It means I can deploy my data anywhere. If I want to have a uniform way of managing my data, and I want control over where I’m deploying that, without worrying about the underlying infrastructure, I would deploy my data on Kubernetes.
That is a very powerful concept. Now, it’s not without its challenges because in addition to managing the Kubernetes infrastructure you’re then managing your data, some people may have been managing their databases for years and they’re very comfortable with dealing with all the challenges that come with that. There are people in the country staff, who before joining crunchy were managing 10 or 20k Oracle databases and they essentially had to switch to Postgres but with that kind of institutional knowledge, you get a leg up in terms of managing everything in Kubernetes. Now, we can certainly go into a lot more specifics on “why run a database container”. You’re able to lock down the entire file system except for your database directory. It’s easier to shift your workloads around to the different nodes.

There’s certainly all this semantics but it’s solving the higher-level question of why even go through this rigour.


The answer I always come back to is the freedom to move my data around. Moving your database is rare although when you do, it is much easier.


Bart Farrell  54:58

Can you do a major upgrade of Postgres with this operator? Or can we use other scenarios like logical replication or PG dump?


Jonathan Katz  55:16

So spoiler alert, there is a new version coming out at some point this quarter. it is going to have Postgres major version upgrades and it is going to be a fully declarative way of doing it. We also added in the pgAdmin before user interface for managing databases, and some enhancements to rolling updates as well. 


To answer the question about doing a major upgrade using logical replication, you can do that, though we don’t provide the orchestration for it. The operator will orchestrate managing the two clusters, but you would have to orchestrate the logical replication. If you are interested in open-source, PGO is entirely open-source. So feel free to inspect it, deploy and beat it up. You really should use the examples with the tutorial that comes in the documentation.


Bart Farrell  58:00

The point is that there are lots of different ways to reach the same objective; part of it is to make the open-source community a part of your strategy about how you learn this. So you don’t have to do it 100% alone.


Bart Farrell  59:47

Are there any other resources that you recommend or things that you think folks should anticipate soon?


Jonathan Katz  1:00:09

I think what helped crystallize a lot of the version five design of PGO was taking real applications like Keycloak, putting it through things like Argo, and seeing how people are trying to deploy their applications. The service broker operator helps bind credentials between different stateful services. The Kubernetes documentation is great.


Bart Farrell  1:01:47

Jonathan, thank you very much for your time today and looking forward to connecting again.


Jonathan Katz  1:02:34

Thank you, Bart.