From Zero to Hero: Scaling Postgres in Kubernetes Using the Power of CloudNativePG

Jun 27, 2024 by Paul Au

Unleash PostgreSQL’s potential in Kubernetes with CloudNativePG, a community-driven control plane reshaping the database landscape. Join a dedicated CloudNativePG maintainer and active Postgres contributor on a captivating journey through managing highly available clusters in the Cloud Native era. Discover best practices for large-scale databases: architecture, deployment on bare metal or virtual machines in Kubernetes, storage optimization, robust backup, recovery strategies, vertical scalability, and performance tuning. Gain insights into real-world challenges and battle-tested solutions for seamless PostgreSQL cluster operation, catering to applications and AI use cases. Delve into CloudNativePG’s strengths, discussions on existing capabilities, limitations, and the future roadmap. Revolutionize PostgreSQL in Kubernetes—liberate data, embrace portability, and empower your organization with CloudNativePG.

Speakers:

Gabriele Bartolini – Vice President, EDB

Watch the Replay

Read the Transcript

Speaker 1: 00:01 Okay, so as a post user, have you ever been wondering about how to make the world the single point of failure of your infrastructure? So whether it’s private, public, hybrid, multi-cloud, you’re probably wondering which one to use and then how can you mitigate the risk of vendor lock-in? So hi everyone. I’m Gabriele Bartolini, and today I’ll try and help you answer these questions by sharing with you the unparalleled range of possibilities and freedom that the open source trio made of Kubernetes Postgres and cloud native PG offer. So before we dive in, allow me to introduce myself. I’m a seasoned open source programmer. I’m former entrepreneur and I’m deeply passionate about databases and data warehousing. My journey with Postgres started in the early 2000, and now I’m the vice president of cloud native at EDB one of the major contributors of the open source posts project as data on Kubernetes ambassador.

01:15 So I’m really happy to be here today and I’m proud to advocate for the seamless integration of stateful workloads in Kubernetes. So my mission is to spread the message that running Postgres in Kubernetes is not only efficient, but often superior to traditional VMs or bare metal setups. And then I’m all about lean and DevOps. So I’ve been practicing these two disciplines for many, many years and they’re actually the reason why I’m into Kubernetes. And who knows, they might be the reason that to go away from Kubernetes one day. And then I’m one of the people behind two open source projects in the Postgres ecosystem. One is barman, which is a popular open source project, and the other one is cloud native pg. The topic for today, if you want to talk with me, I’m available until Friday evening. So we’ve got a booth, come and join me, and also the other developers of CloudNet PG, if you have questions.

02:25 And I’m also speaking about vertical scalability of Postgres databases in Kubernetes with Gary Singh from Google Cloud Thursday afternoon. So today agenda includes exploring the potential of cloud native PG and Kubernetes when managing Postgres, high availability and disaster recovery. And then we’ll delve into recommended architectures and strategy followed by the conclusions of this presentation. So before we begin, I don’t want to waste too much time, but it’s just remind, I think everyone knows what PGAs is. PGAs was also voted database of the year very recently. Despite its age, it keeps improving and it keeps renovating itself and PG Vector for example, or other extensions around Victoria, databases are one of the areas that are growing more and more in post guests these days. If you want to know more, I’m suggesting now to blog articles. This is about the microservice database pattern if you’re interested.

03:40 It also explains why and all the journey we did that led us here with this project. And the other one is about the recommended architectures for CUSS in Kubernetes. It’s a blog that I wrote for the CNCF website. Let’s start with sgu and the high availability cloud Native PG is basically a level five Kubernetes operator that seamlessly manages res clusters for high availability primarily throughout the entire operational life cycle. From day zero to day two, it’s production ready and it’s widely embraced by top tier database as a service solutions like big animal from E DB IBM, cloud Pak, Google Cloud, and timbo as an open source project. It’s available under the Apache license and it was originated in 2019 by my team when I was part of second quadrant, which was later acquired by DB and Cloud Native PG made a significant leap in 2022 in May before CIC con vallencia when EDB contributed the project to an open source and vendor neutral community that is openly governed the cloud native PG community that you are more than welcome to join and cloud native PG was acknowledged the most popular operator for Postgres in 2023 according to a timescale survey, and it’s rapidly gaining traction with over 3000 stars in less than two years in GitHub.

05:40 Anyway, given our time constraints, I’ll refrain from covering basic instructions and commands today and for deeper insights, please find the documentation and all the information available from the website. And again, if you want stop me. I think it’s more important we use this 20 some minutes to trigger questions and to talk about possibilities. So we will talk about building blocks that will give you unprecedented, in my opinion, possibilities to run positives. So I will briefly mention these four pillars. They’re actually being defined by Patrick McFadden and Jeff Carpenter in their book about cloud native databases. So round native PG leverages the Kubernetes API. So we kind of extend the Kubernetes controller teaching how to manage a Postgres cluster through the operator pattern. Okay, so then the second one is the declarative configuration that through it we can deploy scale and maintain databases that sell fill and also implement infrastructure as code practices.

07:00 Third one is about observability. We have a native Prometheus exporter. We log to standard output in JS O natively, so you can pretty much integrate it with everything we have Grafana dashboard and so on. Finally, the security by default paradigm. Again, read the article about the microservice database in which I explain this kind of security by default concept where the database is owned by the application developer, not the administrators. So developers own the database and they can put it in their pipelines. And then with security by default, we start from actual code writing. Then we also how we build the containers and we scan the images and then also in the container itself with Postgres measures and also Kubernetes measures. For example, we have MTLS by default and we actually advocate for certificate base authentication. Okay, everything I’ll show you today is achieved declaratively, unless I say that there’s only one thing that at the moment needs to be done manually and we’ll see it.

08:23 This is how we implement Kubernetes a Postgres cluster in Kubernetes with cloud native pg. I think the simplicity, what strikes out here, this pretty much highlights the convention of a configuration paradigm that we implement with our declarative configuration approach. And so you don’t have to specify all the parameters, but you can actually modify all of them because we make opinion decisions about the, so for example, in this yamal file here, we request cloud native PG to create one primary and two replicas. So three instances and one of them to be synchronous. This means that every time the application starts a transaction and writes commit, the primary doesn’t return to the application until the commit is written on disc on at least another standby. We can also change these and ensure that not only that it’s written, but it’s also replayed on the standby so that we can pretty much perform a read-only query later and is consistent in the entire cluster.

09:43 That means you slow down the entire right process, but you have pretty much less probability of data loss and faster RTO. Anyway, all of these is configurable. All of these is Postgres, okay? So that’s the beauty I think of open source in general, whether it’s Kubernetes or Postgres, is that it’s all about you, all about us. So this is what happens under the hood. So suppose you’ve got a Kubernetes cluster with three availability zones and our work in nodes, I forgot to say that here. I request to place the instances on nodes where we’ve got the workload post Postgres label.

10:30 So basically here we have three worker nodes with the Postgres label. So the operator pretty much places the instances only on those workers. We can choose different affinity settings. It’s all there. Then we start rising the volumes. That’s where we start. The volumes are the most important part in Postgres. We start with the PG data, which is the Postgres data where all the database files are stored and the wall files are the transaction files. So that’s how we achieve data durability with posts. Then the primary started and the operator creates a read-write service automatically so that your applications or your AI workloads can connect directly to it. Then it automatically creates clones the primary and creates the first replica, which is synchronous. And the second one all in streaming replication. So we don’t use the storage replication, we just rely on Postgres replication, which can be controlled at transaction level.

11:42 And then we create the read-only service. So if you want to perform read operations, you can use the standbys. So let’s see what happens in case of failover, the whole, for example, the worker node where the primary is as a failure, Kubernetes is immediately detect it. The operators stops the read right service. So we’ve got the downtime these, this operation is very fast. Normally a few seconds you can try it yourself and the Synchron standby is promoted and the service is updated. And then when the worker note comes back again, our instance manager actually stops that. So that’s how we prevent split brains from happening and says, you think you are the primary but you’re not. So it demotes itself and resynchronize it as a standby and the service is updated.

12:52 Let’s talk about backup and recovery. So by the way, if you want to hear the whole story, that’s my talk with Michelle Lau from Google in which we cover disaster recovery of very large databases with RES in Chicago, and we basically recovered a 4.5 terabyte database in two minutes. Thanks to volumes snapshots, there’s the whole story, but briefly, continuous backup is achieved in two ways with the wall archive. So we basically copy the wall fights in another location. At the moment, we only support object stores, and by default, they are stored every five minutes in the wall archive. So that means that your RPO is maximum five minutes by default if you’ve got backup in place. The other part are physical base backups. So you can take physical base backups, which is a Postgres technology from either the primary or the standby. You can choose the target by default is the standby.

14:04 They can be scheduled or on demand, and they can be on object stores and they’re only hot. So online backups or on Kubernetes volumes and snapshots. And they can be both hot or cold. So meaning that they are consistent and they don’t require wall files to be restored. And you can exploit the storage class capabilities in terms of transparent, incremental and differential backup. We are also working on an interface through which you can pretty much write your own backup scripts, backup tools, and extend it. This is how continuous backup works. You’ve got a cluster, so it’s done at cluster level, not at instance level. We copy the wall files in the wall archive, and then we take base backups to build a catalog of backups. This is how we have continuous backup. Recovery is essentially a bootstrap method. So we copy the physical base backup somewhere, and then we start reapplying the redo logs from the wall files.

15:14 And until we reach a target, the target can be the full recovery. So until the end. And if we do that, that’s also the foundation of continuous replication, continuous recovery. So you can create what we call replica clusters that could be pretty much aligned or even delayed. Or if you select a target, you can get point in time recovery. When the target is reached by default, cuss promotes itself, so it becomes another cluster. This cluster, you can use it also for reporting. So for example, every day you can recreate a post guest cluster just for development of reporting and destroy it. At the end of the day, on the same technology, we build the replica cluster. The replica cluster is primarily used for dr, but also for reporting. And it’s essentially the same technology. Instead of promoting the post gas cluster, we keep it in continuous recovery and we can perform read only queries on those servers. So think about that. You’ve got this replica cluster and another region is continuously replicating, and I can promote it if needed.

16:41 So let’s, we’ve seen now a single Kubernetes cluster. Let’s go beyond the single Kubernetes cluster, which is our single point of failure. We’ve got res with its backup we can use in different cluster. Simply the wall fights, the wall archives. So we think about the, we create a replica cluster in another Kubernetes cluster from a backup, and then we start replaying the wall files from the object store. We don’t even need the connection between the two servers just by using vault files. It’s pretty much lagging five minutes by default without doing anything if you want. So the other interesting thing is that if you want, you can set up the backup in the other region, and you have two independent backups architecturally in place. And if you want to reduce the RPO, you create a streaming replication connection between the two cluster.

17:45 So that’s simple, right? So old building blocks you build on top, okay? You don’t have to get there immediately, but you can get there. Starting from this, this is normally the development cluster. You can do it like you do in production with three instances, or you can even use one single instance and remember to disable port disruption budgets. Otherwise, the node where these instances placed will not be drained. And if you want, you can create your continuous backup infrastructure. So this is how it looks. A production cluster in a Kubernetes cluster with at least three availability zone that gives you very little RTO. So in case of failure of the primary, in a few seconds you’re up without doing anything. And then at the same time gives you RP of five minutes, which with the cloud native PG interface, we are aiming to reduce to zero. So all you have to do, you have the wall archive. The wall archive can also be used as a fallback in case the streaming replication between the primary and the standbys goes down temporarily. You have a dual channel for resilience, and then you build your catalog of base backups.

19:18 So we’ve seen one single cluster all good. You don’t have to do anything. Kubernetes can do self feeling and ha with CloudNet, EPG, without you doing anything except monitoring and receiving alerts. Let’s extend the architecture on two Kubernetes clusters. Normally they’re also, we can also think about them in terms of regions, and we have pretty much an identical Kubernetes cluster in another region, let’s say we use the base backup to create the PVC and the wall archive to build what we call designated primary. It’s basically a standby that is ready to be promoted in case the first data center, the first Kubernetes cluster goes down. Then you create the local wall archive and the catalog of backups. So you are already ready in case there’s a disaster to assume the role of primary cluster. And then if you want, you can even immediately or later when it’s promoted, create the replicas. That’s entirely up to you. And if you want to reduce the RPO, you can set up the streaming.

20:49 And this is the only manual thing you need to do at the moment. And probably let’s not say manual control, because what we are aiming to do is to define a declarative way to perform these operation across clusters. You have the whole region is down, so this should be treated as a rare event. But if that happens, all you have to do, so probably you have lost maximum five minutes of data, but we are talking about a massive disaster and you can quickly make that become the primary. You can even go beyond that. You can use cascading replication and go with 3, 4, 5 regions. It’s really up to you. These are all building blocks. The only problem is when you have a single availability zone Kubernetes cluster, which is very typical from what I see in on-premise setups, where still the lift and shift mindset is prevalent.

21:57 So you mapping pretty much one data center with one Kubernetes cluster. So all you have to do here is spread the database across nodes and storage. So divide as much as you can, but still your data center is the single point of failure. You’re missing out on Kubernetes a lot. And what that generates is more complex business continuity procedures like we used to do. And you could stop doing them with Kubernetes if you had three availability zones, at least for you have to do them and you have to do more of them with the more applications you have for each application, you need to have business continuity procedures that you could avoid. So my advice is if you have two data centers, try and push for a stretched Kubernetes cluster to at least extend the Kubernetes control plane over three data centers somehow, and plan for the third data center.

23:00 So this is the end. I want to thank all the community of cloud native pg, starting from some of the developers are here, but really the adopters and everyone that is contributed to this growing community. What we’re working on, the image catalog. So for example, there’s people that want to use timescale with our operator or other extensions. You can write your own image catalog and just point to that without specifying the image name. The generic interface that I was talking about before. Four will allow us to pretty much simplify the work of the operator through external plugins. So primarily wall manager backup metrics and logging. We want to control the switchover across Kubernetes clusters with the replica cluster switchover. We also want to introduce synchronous replica clusters for those that have two data centers in the same city and two Kubernetes clusters. The only way for them to talk is through replica clusters.

24:14 And we can add the synchronous one, the declarative management of databases. Databases are the only global object that we are missing with cloud native, pg, and then logical replication, publication and subscriptions. By the way, we have an imperative way already to set up logical replication publications and subscriptions and also update the sequences. We released it last week. So theoretically you can move already from any posts 10 database in the world into cloud native pg following three steps with the plugin. And if you want to talk, we can talk more for in servers and also storage, also scaling with DOK, we are working on that. So ultimately, choose whatever works for your organization. So every organization is unique. Don’t believe to those that tell you that all organizations are the same. Define your goals, R-T-O-R-P-O-T-P-S, and let you be guided by them. And on Thursday, I’ll show you some amazing results in terms of TPS that you can achieve.

25:31 We are working with storage companies to this new frontier. Mitigate the risk of vendor looking at all levels from the cloud down to the infrastructure level and set it on on-prem, private, public, hybrid, multi-cloud. You can do pretty much everything with vanilla, Kubernetes third party Kubernetes distributions, bare metal vm, and choose the right storage for you. Our advice is to go with shared nothing architectures and to take advantage of nodes and availability zones that, as I said, come from free with Kubernetes. So I hope that I gave you an idea on how you can make the world be the single point of failure for your Postgres databases. Thanks to Kubernetes. Now, I’ve been using Postgres for many years and I really believe that the best way to run Postgres is in Kubernetes. Finally, this is my last slide. It’s completely off topic. This is a great book and I dunno if you know this book, anyone has read this book from Jean Kim, Steven P Spear. This book is basically, in my opinion, a tremendous opportunity for Kubernetes to shine as a way to move from danger zone to winning zone through solidification and simplification. Okay? So if you have time, I suggest to read that book and come and meet me at Booth G 30 and around here. So I’m done. Thank you.

Data on Kubernetes Community resources

Check out our Meetup page to catch an upcoming event
Engage with us on Slack
Find DoK resources
Read DoK reports
Become a community sponsor

The CFP is open for DoK Day at KubeCon NA 2024 through July 14.

From Zero to Hero: Scaling Postgres in Kubernetes Using the Power of CloudNativePG

Watch the Replay

Read the Transcript

Data on Kubernetes Community resources