What are the most important security requirements that we expect from a PostgreSQL deployment on Kubernetes? What challenges arise for stateful workloads? Discover how EDB has addressed them while building Cloud Native PostgreSQL to make it secure by default. The talk focuses on the 4C security model, with emphasis on cluster, container, and code. Then cover client and server CAs, certificates management, logging, auditing, and more.
The talk deck is available here.
Bart Farrell: Our next speaker, Philippe Scorsolini from EDB, also one of our amazing sponsors, wonderful to have them on board. He is going to be giving a talk about “ Securing Postgres” doing things the right way there, so we could bring him on so he can start giving his talk.
Thank you. Hey everyone, we’ll be speaking again about Postgres.
Let’s get this started. Postgres is such a strong topic, I really mean that every time we’ve done livestreams about Postgres, we really get a solid turnout and it generates a lot of interest. I have no doubt that your talk will do the same.
So anyway, you can take it away, go for it.
Okay, so as you said, I work for EDB. And we’ll be speaking about Securing PostgreSQL on Kubernetes. So before starting a few words about EnterpriseDB, or EDB as we were calling it. Our focus at EnterpriseDB is PostgreSQL. We are a PostgreSQL company, with 1000s of customers employing a few of the founders of the project themselves. And we are one of the major contributors to the open source project. In recent years, we saw an increasing interest in the cloud as everyone is so much that we just landed a preview for our own PostgreSQL as a service on Azure, which soon will reach GA (General Availability), and then we’ll move to a few of our cloud providers. “EDB and Kubernetes”. For this reason, we at EDB have been investing a lot in Kubernetes. Lately, we were the first PostgreSQL company to become Kubernetes certified service provider, we are a silver member of the CNCF. And we are also Platinum founding sponsors of the Data On Kubernetes Community , as you are seeing. Moreover, we have two Kubernetes operators certified by Red Hat. And we’ll be speaking about CMP or Cloud Native PostgreSQL. So we were saying, I’m working in our team developing Cloud Native PostgreSQL, which is an opinionated operator to run PostgreSQL on Kubernetes. So it’s really on the same topic, as Álvaro Hernández was speaking about. CMP is an opinionated operator to run PostgreSQL on Kubernetes. So as all other operators, it is designed to take care of the whole lifecycle of its operands. From simple things like self healing, scale up, scale down, upgrades, and major upgrades to more complex things like backups, recovery, and replica clusters, and much more as we’ll see. So I said opinionated earlier – that’s because going from the simple declarative definition to running clusters.
Running a cluster means a lot of decisions we have to take, and most of them are security-sensitive. That’s why one of our focuses is to make CMP secure by default. What do we mean when we say secure by default? The Kubernetes community came up with a model to think about cloud native security called the Four C’s Security Modeling In Kubernetes.
In this model, we have four layers. Each of them builds upon the next outermost layer, which will give for granted the outermost layer, which is the cloud, as it’s not something we can directly control. Instead, we’ll focus on the three other layers, cluster, container, and code layer. So we start from the cluster layer, there are two areas of concern for securing the configurable cluster components (what Kubernetes gives us to configure) and to set our application to be secure, and securing the application itself which is running in the cluster. First, starting from securing the configurable cluster components, one of the main things an application running on Kubernetes can do is explicitly setting its security context, applying the principle of least privileges. So we dropped all the privileges we don’t need by explicitly defining the security context in both the operator and the cluster ports. Which for Openshift means we can run with a default restricted security context constraints. We follow the same principle for our backup software role and service accounts and role binding part. So, about securing the application which is running the cluster, here are a few requirements about it we had then posed to ourselves before starting developing, and while developing CMP all applications should possibly be supporting encrypted communication both from inside and outside the cluster. We do not want to handle certificates manually because it’s hard. And as we said, these should be the default settings, not some opt-in configurations. For both internal and external traffic, all instances are going to allow SSL connections with the provided SSL certificates.
And here we can see the configurations for both the server-side on the upper side, and the client connection string on the lower side. The client will then be able to verify they are reaching the right server by enforcing in the connection string, the correct root certificate and root CA. About the client authentication, we support basic authentication but also support and encourage certificate authentication, so that users can connect to Postgres instances just by providing a certificate signed by the right CA, which we will see we can handle. By default, we support the operator to generate a self-signed CA which will be used to sign both the server and the client certificates. In this mode the operator will also take care of rotating all the certificates when needed. But we also provide overuse of managed options to accommodate all possible requirements in hybrid mode in which the operator is given only an intermediate CA, which will then be used to generate the necessary certificates. And this can be applied to both the server and the client CA separately. And then a full mode in which the operator should only be provided the name of the required secrets containing the public part of the CA and the required certificates. This allows it to support tools such as cert-manager or any other automation you could have in place, as long as it’s using Kubernetes secrets obviously. In default hybrid mode, we also have an NDCTL plugin that allows us to generate client certificates for users using the in cluster client CA. As you may know previously Alvaro was saying, PostgreSQL supports a leader/follower kind of clustering by default, this means followers have to connect to the leader instances as a user with some specific grants, always applying the principle of least privileges.
So by default, they communicate securely by leveraging what we showed earlier connecting via TLS using a client certificate for a dedicated streaming replica user, which is all handled by the operator itself in the automated or half automated nodes. Another really important aspect is logging and more specifically, auditing when we talk about security. PostgreSQL supports logging to files in CSV or CSV log format, and the same goes for the most used auditing extensions such as PG Audit or EDB Audit if you’re using EDB PostgreSQL Advanced Server. But even when we are running Kubernetes, we don’t want CSV logs, we want JSON formatted logs, and we want them on STDOUT(Standard Out). So we could have added one more sidecar to the Postgres instances, as Alvaro was suggesting, but while running something like Fluentd or Fluent Bit, reading and parsing all logs and outputting them to standard out. But as we learned, log formats can change between Postgres progress versions, and these projects are much more complex which is why we only care about Postgres CSV logs. So we implemented it ourselves setting PostgreSQL to log on a few FIFO files in CSV, which a small goroutine is reading from parsing them and outputting them in JSON to standard out. This way, we are able to parse the logs correctly for all the supported PostgreSQL and the past versions for standard logs, PG Audit, and the EDB Audit logs. Here you can see how setting some PG Audit specific settings, parameters in the cluster configuration will result in some nice JSON formatted logs in standard. And also this will automatically enable the PG audit extension in all the reachable databases.
Going back to the authentication part for a second in PostgreSQL, client authentication is controlled by a configuration file which traditionally is named pg_hba.conf. We set a few default parameters that we need for the cluster to work properly. The rules are matched in order. So we have a few lines on the top, which are, let’s say our header on the top, which allows all local connections to all databases and for all users to be authenticated with the peer authentication method which means the Postgres username must match various users. Then all SSL connections for the user streaming replica to both the Postgres and the replication of that database from all IP addresses should be a trigger for cert authentication. And then you can inject your own rules as shown on the right and the default for all connections from everywhere for all users in all databases are going to be authenticated with md5 like some kind of fullback.
So the first C is about containers, which means you have to build and run containers with care. The usual rule of thumb is one application per container. This could lead to having a lot of sidecar containers, one for monitoring, one for logging, one for service meshes, and so on. Sidecar containers are a useful pattern, and to extend an application with some add ons. But still, coordination is kind of hard, and like it still is, you could end up with some magic tricks you can really not understand if you’re not really into it. So that’s why we decided not to adopt this pattern. Instead, we decided to inject a single binary container, which is then going to wrap the actual PostgreSQL process handling everything it needs to run Postgres properly. We built a lot of images both for the operator and for the operands, so PostgreSQL and repasser, Enterprise PostgreSQL Advanced Server, we also have on Quay.io and on DockerHub, and we regularly and more importantly, we regularly scan them for vulnerabilities or rebuild them as soon as fixes are available upstream for any package installed. About Operand images, we have a few options there. Mostly to reduce and be compliant with RedHat certification and all the other certification all the requirements of our customers. So for the PostgreSQL images, we offer both Debian and the UBI based images. We also support a few other versions, all of these images are checked and rebuilt once fixing all the additional packages installed are released upstream. We also install all the additional libraries required by CMP such as barman cloud for backup, PGAudit , PostGis, and the Docker file set available online. About the enterprise PostgreSQL Advanced Server, we only offer UBI images, we have images for all the supported versions, and all the additional needed libraries. But once you are sure an image is scanned for known vulnerability, and that is rebuilt every time a fix is available, you want to be sure about the image which you’re running, so for this reason we support specifying the image digest and you should specify the image digest and adjust the tag for the image. In this way, you’re going to be precisely sure about what’s running in your cluster and if you pair it with some kind of flavor of container or image, a signing mechanism like Notary or sigstore right now, if you’re into it, you can be sure that the image you’re running, it’s exactly the one built and you recognize it as the one built by whom you trust.
The last and most internal see is about code. As application code is one of the primary attack surfaces over which you can have the most control, so we put in place a rigorous process. Code is probably the hardest part of all of them. That’s why we have an ever-evolving process to be sure that we ship the best code to our users, it all starts from the developer who writes a patch, the patch then gets pushed in a development branch and the developer issues a pull request according to the GitHub flow that we adopt. This triggers a GitHub action that executes automated tests, unit tests and static code analysis through linters, which allow us to shift left as many issues as possible. If everything passes, we’ll build our container images, which are now used to run a set of end-to-end tests against all the different versions of Kubernetes we support. Firstly, using time to bring this in Docker. So we run a set of tests against all the different versions of Kubernetes in, let’s say, a restricted environment we can spin up when we need it. But then we also run into tests against all the supported cloud providers and Openshift supported versions. So we have a really big and huge suite of end-to-end tests that allows us to ship code, we think and we believe it is working as expected. We also run and do end-to-end tests when merging. Before merging, after the patch has been pushed to a few reviewers, surely at least two reviewers review it. All the tests pass, we manage it in main and then again all the tests are around to check if everything is fine and then early zero is done once we are ready to release the new version of the operator.