Dutch Government: Implementing Data & Databases on K8s

Jul 26, 2023 by Diogenese

A summary of our discussion during our DoKC Town Hall held in July

Data on Kubernetes Community (DoKC) Town Halls allow our community to meet one another, share end user journey stories, discuss DoK-related projects and technologies, and learn more about community events and ways to participate.

The following provides a brief overview of the Town Hall held on July 20, 2023, including the full video and transcript.

Implementing Data & Databases on K8s within the Dutch Government
Presented by Sebastiaan Mannem, Director at Mannem Solutions
Sebastiaan likes to combine databases with out-of-the-box thinking with a DevOps mindset. After he fell in love with Postgres and Kubernetes in 2016, he became committed to helping enable Dutch organizations to run their database workloads cloud natively. Sebastiaan has spent the last few years working as a private contractor for two large government agencies and hopes to share his learnings so that it might help others along their journey to adopt data on Kubernetes.

Sebastiaan led the Data on Kubernetes Community July Town Hall to walk through projects within the Dutch government that are running databases on OpenShift. This Town Hall also features success stories, debunks some common misconceptions, and provides best practices Sebastiaan has learned along the way.

Watch the Replay

Read the Transcript
Download the PDF or scroll to the bottom of this post to read the full transcript.

Ways to Participate
To catch an upcoming Town Hall, check out the DoKC Meetup page.

Let us know if you’re interested in sharing a case study or use case with the community.

Operator SIG
#sig-operator on Slack | Meets every other Tuesday

[TRANSCRIPT BELOW]

(00:03):

Welcome everyone. I’m sure more people will be arriving as the, in the next few minutes. There’s always a, always a bit of a, a delay on that front. It is great to have everyone here. This is the July d o k town hall. We’ve been doing these monthly. And it is yeah, it’s great to, great to be here. Great to have you all here. My name is Hugh Ashbrook. I’m a, I’m a director, director of community here at D O K. Yeah, and I’m very excited about this, this month’s town hall. We’ll be hearing from Sebastian, but before we, before we get to that, before introduce him I would just first, well, firstly upfront, I’d like to thank the Spon our gold sponsors for making this whole community possible. We’ve got Google, Google Cloud, and Perona for our gold sponsors as well as a bunch of other community sponsors that we have.

(00:58):

So thank you to all of them for making this possible. You can head over to the DOK site, dok.community to, to find out more about them, who they’re all that kind of stuff. So yeah, thank you. And as what we, what we always do is we always kick off these things with a bit of a community question just to get, you know, people talking about stuff. I think last, last month we spoke about podcasts, people listening to this month. We, I just wanna ask the question, what is a movie that you watch lately that you would recommend to people or not recommend? Maybe it was terrible. It could be anything. And yeah, if you got a, if you’ve got a thought about that, like it could be, if you’d like to recommend, you can drop it in the chat. You’re welcome to unmute as well. If you would like to, to speak also like to ask our speaker as well, Sebastian, I dunno if you wanna answer first.

(01:51):

Yeah. So, ever since I was young, I, I always have been a sucker for the Transformers. And I acquired a lot of those transformers as well. And so yeah, what was it about, about two months ago, I guess. The latest Transformers movie was out and there’s no other movie for me to just bring forward at the moment. So, yeah, that’s it.

(02:14):

That’s cool. Nice. I haven’t actually watched the new one yet, but yeah, the Transformers franchise has been some, some entertaining stuff. <Laugh>.

(02:23):

Yeah, it was fun.

(02:25):

Yeah, if anyone’s got any others, anything that movies you’ve watched lately, like recommend just drop them in there. Marx says, I watched Bullet Train recently. I’ve, I’ve heard That’s excellent actually. I really need to watch that. I’ve heard really, really good things about that. I must must watch that one. I recently watched the, it’s called Weird. It’s the weird El Yankovic story. If you know, weird El Yankovic, he does like parody music, parody songs, and the movie itself, it’s like a biopic about him, but it’s like a parody biopic. So it’s, the whole movies is really weird. But it’s fantastic. And Daniel Radcliffe pays and he’s, I love Daniel Radcliffe. He’s fantastic. And it’s just a, it’s a bizarre movie, but it’s super fun. Yeah, I can really recommend it.

(03:08):

There was this awesome scene with Madonna in this, right, this, this awesome scene with Madonna in that.

(03:14):

Yeah, yeah, yeah. The whole thing was played up with, he had like this whole love interest thing with Madonna, which isn’t true. It never happened in real life. The whole thing is like, it’s parody and just, it’s bizarre and it’s, yeah, you can’t really explain it. It’s a fantastic film, though. NY said he is gonna, he’s revisiting Lord of the Rings. I mean, that’s always good to do, done that quite a few times. <Laugh> and Tree, you said Mission Impossible. Dead Reckoning. Is that the, is that outer already? I’ll just go see that. Is that I didn’t realize that was already out. I, I’m really excited to see that. So, yeah. Awesome. <laugh>. Cool. I love the Mission Possible films. That’s the amazing stuff. Awesome. Well, if anyone else has any other thoughts, drop in the chat. You know, the Zoom chat’s always there, noise open.

(04:04):

And before I hand over to Sebastian to, to speak, I just have two announcements. One is, we mentioned this last time, but it’s still open. The C F P is open for the cough papers open for D o k day. That’s gonna be part of CubeCon that’s happening on November 6th in Chicago. You can check, I’ll, I’ll drop and drop a link into the chat here. Here’s a link to the C F P for that you can apply to speak. It’s gonna be part of the C ncf f’s co-located events around CubeCon. So the C ncf F is, is kind of facilitating it and taking in the applications, but we’ve got our own team of people who’ll be reviewing that to check out the talks and stuff. So when you apply, you’ll see some other events listed there. Just select the Okay day.

(04:47):

Those applica, the C F P is open until August 6th, so there’s not too much time left, but you’ve got a little bit to, to get those in. So yeah, we’d love to hear from anyone in the community who’s interested. So yeah, go have a go have a look at that. And the other very exciting announcement is that this, this week, yesterday, in fact we launched the D O K Community Ambassadors program which is something we’ve been planning for a little while. I’m very excited that it’s live. The ambassadors program is essentially people who are active in the community, who can continue to help, like, foster more growth in the community and support other members on their journeys and answer questions and just be, be around as a resource for people. The, here I’ve dropped a link to the announcement post in the chat.

(05:35):

You can have a look at that and that’ll link to more info about the program. You can find out about first group of ambassadors. We’ve got five ambassadors who signed up upfront and they’re on board. And there’s also a form there to apply it, to become an ambassador, if you’d like to do so. And we’d love to grow that team to make it as big as possible so the community can, everyone can support each other and we can see even more growth in the community, get more people excited about what we’re talking about here. So have a look at that, A very simple form to apply. Please send that in. It would be, yeah, it would be great to have more folks on there. So yeah, I’m very excited about that program. It’s been something we’ve been wanting to launch for a while, so it’s great to see it happen.

(06:13):

And if you have any questions about it, feel free to, you can reach out to me on the D O K Slack. I’m always on there. Happy to chat. If you have any questions, feel free to ask. Yeah, so today we have Sebastian here talking to us. He’s, he’s in the Netherlands. I know we’re all spread all over the, wh here, I’m in New Zealand, he’s in Netherlands, or pretty much almost, almost geographically opposite to each other. <Laugh> quite close to that, I think. And yeah, he’s gonna be talking about implementing data and databases on Kubernetes within the Dutch government, which to be honest, is sounds really interesting. So I’m excited for this. And yeah, I hand over to Sebastian and thank you.

(06:57):

Lemme see. So is my screen properly sharing at the moment?

(07:05):

Yeah, you can see it all here fine.

(07:08):

Cool. Awesome. Alright, so this is a presentation that I did in April in Amsterdam during Q Con as well. I haven’t changed anything for it, I just wanted to have it now. But I will update as we go through the slides yeah, just by talking to it. This is a presentation by running databases and Kubernetes within the Dutch government. And yeah, let’s get it going. So there’s gonna be short introduction. Who am I? Then I’m gonna talk to a challenge that we had, which is called well, basically the challenge that we have is called to break the glass. I’ll talk to you just a little in, just a little bit. We did some tests on database performance, on sef, on premise which might be interesting as well. And also we have some techno yeah, we we’re going into the, the depth of how databases work on that subject as well. So that’s gonna be cool. There is some information about some other challenges and ideas that we’ve run into. I just, I’m sprawling them out. I can update you on where we are on that as well, because there are some updates there. I end with general recommendations. So like, if you wanna introduce data and data on Kubernetes in your organization, how should you approach it? And I end up with some conclusions and takeaways.

(08:37):

Let’s first talk about who am I? So I’m Sian madam. I basically currently I I am only working for my own company, which is called Modern Solutions, which is a one man or stop shop, you could say. I’m currently starting a bigger company, trying to find people to work with me to basically enable our customers with data and database on Kubernetes and working together with a company of, what, about 120 persons. So it’s, it’s gonna be huge. It’s gonna be awesome. But yeah, currently still employed by myself. Basically by the way, this logo was was created by my son, my youngest son. And yeah, I’m so proud of him that he could create such a beautiful logo. So yeah, just wanted to talk to that.

(09:29):

It’s just a little bit as well. So yeah, maybe some people already know me. Maybe maybe some don’t, which is fine. I have been in the database space since 2000 and I’m a cloud native enthusiast since 2015. Even before that I was already into Docker. And what can you do with it, looking into the technology and all that stuff. But the whole cloud native approach, yeah, that really struck me deep. In 2015 I worked for Dee, which is the Dutch the Dutch government for jail systems. I worked for bull.com, which is a Dutch huge company. You could, yeah, you could compare it a little bit to Amazon, but Amazon is global, and bull.com is mostly in the Netherlands. But they’re technically very equipped as well. I have worked for E D B Enterprise db and I had many I, I did many cool things there.

(10:31):

I started off as a consultant and then a solutions architect, which was basically what I loved to do most. But then I noticed that I wanted to bring the databases into the cloud native world and they needed a product manager for that. So I did the product management stuff and I worked together with the development team, even did a lot of development myself in go on the operator and all that stuff. And so that was a really cool time. Then I worked for R E V M, which is a Dutch company which is a Dutch organization which is basically about healthcare. And they they, they brought me in because they needed to do a lot of stuff with Corona. Everybody needed to do stuff with Corona, and they did needed to do that as well.

(11:15):

But that was also when I started at Modern Solutions. Currently I’m working for the Dutch government again but this time for the Dutch Tax Office. And I’m also working for B M W. So as you can see, I’m really, yeah, broadening projects that I’ve worked on. I’m a contributor since ever. I contributed a a module to Ansible once, which is called P G H V A. I contributed to Stolen and to World G which are high availability products and backup product for Postgres. As I joined for the Dutch Tech office, we did a lot of work on Bitbuckets and, and this contributions on the Bitbucket C l I as well. And there’s a lot more stuff that I’ve contributed to. I also wrote a lot of tools myself which are called out below P G F G A, PG Root 66.

(12:09):

They’re I think they’re all written Golang at the moment. I write a lot of software in Python, Golan and recently I just started contributing in rust which I’m gonna be talking about soon. And you, you might say, I’m a dreamer, so I really feel that we should do everything open source which is the way to move forward as community. And I really feel strongly about communities in general. And of course DACA and the post community in specific have a, yeah, important place in my heart, you might say. Cool. as I was working for E D B, I also did a presentation, which was called 60,000 T ps with how many CPUs which was very closely related to some of the stuff that I’m trying to do over here. I also wrote a tool there, which was written in Go and in Rust.

(13:04):

And I compared to I currently have rewritten it in Rust. And I’m gonna talk to that in just a little bit. So I wanted to call that out specifically as well. So that’s the introduction. So what’s my mission? Well, a hundred percent open source both gov and private. I really feel for modern, so early adopter, everything that everybody’s trying to do in 10 years, that’s what I want to be aiming at. And data, well, ever, ever since I started in it I specialize in data and database specifically, but data in general as well. So, yeah. Cool. as I mentioned before late December, I joined the Dutch government worked together with some other guys on a special project where we wanted to introduce OpenShift for the Dutch tax tax office. They’re huge on, on that front.

(14:08):

And all of the stuff that I’m talking to is also part of that that, that venture that we’re going into. And one of the first challenges that we run into is what we called temporary access. I’m gonna first explain the, the challenge. So we wanted to run databases, we wanted to run them on OpenShift, which is Kubernetes you might say. And then you have a database and an app and the database needs to connect to the app with a user, and that user needs to be authenticated. So the big challenge was how do you do that? And I was working closely with a database team, but they really come from, let’s just say, classical environments. And they wanted to stand up federated authentication using LD up and all kinds of other stuff.

(15:02):

And me and the architect we basically, we pushed back on it. So why did we push back on it? Well, if you look at the C N C F C N C F has a GitHub page where they basically keep the definition of cloud native. And this text that you see over here is basically what they, what, what C N C F says about what clay native cloud native is. And you see all of these keywords like scalable, modern, dynamic, and, and loosely couple and all of that stuff. And so to say ldo is not very cloud native. So what do, what do I mean with that? There’s not really declarative APIs. It’s not really loosely coupled. It’s not really resilient, it’s not really scalable. It’s not really observable. So basically there’s a lot of this stuff over here that yeah, that you cannot meet up with properly if you do move to L up.

(16:04):

And there’s newer stuff, right? So like ows and all of that stuff. But we were looking at other approaches as well. And we sort of felt that naturally you would drift from Cloud native, you would look for authentication methods where you could use <inaudible> federated authentication and, and things like that. But the main thing that we are trying, we were trying to fix there, was the way that humans connect. I mean, for humans, it’s very easy and, and well put to basically use a <inaudible> education system. And then you could have, let’s just say 100 or a thousand DBAs or developers, and they’re all in that L up register in that L up with their username and passwords and all that stuff. But if you think about the, the natural way that the application is used is that basically you only want the app to connect.

(17:06):

Basically, you want to create your OpenShift project. You want to create your database, you want to create your app, and then basically you want users to authenticate against the app, and you want the app to, of course, authenticate against database, but you don’t want every user using the app to be authenticated specifically against the, the database for many reasons, actually. So so that was one thing. The other thing was of course you, you might still want humans to connect, but we could find other ways for that as well. So we basically, we decided to, to focus on the way that apps connect. And as we were looking at cloud Native Postgres, they have this natural option, which is doing client certificates. And I’m a huge fan of client certificates myself. You could basically, everybody use it already in, in tons of different ways.

(18:03):

And then we call it M T L ss. So M T L S is server side certificate, but also client side certificate. And there’s a lot of security security related subjects there. But also it is very much, it’s decoupled. Like you can just have this without LDA being up and it’s, it’s still would be working properly. It even, it just requires certificates to be provided at Postgres for for you to, to check that the the client certificate is valid and all the stuff all of the rest is basically client side configuration, which is the way that you want it. It’s very specific for a project. So if some, if, if those client certificates would be leaked you cannot use them with any other of the other projects. So there’s, there’s much going on here, which is awesome.

(18:58):

And so we wanted to have an option for cus for DBAs to log into postcards as well. And then we decided that client certificates is also there as well, because it’s basically, it’s short lived. You could have a client certificate, which is only valid for a day or for a couple of hours. And so if you have a issue and you have somebody trying to investigate that issue, you could have a very short-lived certificate, only valid for four hours investigating the issue and then fixing the issue. Basically, you want that by deploying a new version of the application, so you don’t want the D B A to do anything with it at all anyhow. So this really felt like a very secure option. And it also felt like very yeah, very how do you say it?

(19:50):

Scalable, very very loosely coupled, very resilient and all of that stuff. So we were thinking about how would we implement that. And we’ve come up with a solution. Basically you could say the cloud native PG operator would just create your post cluster and you could create a secondary object within open check, which is called a break, the glasss object. And basically it would just be a placeholder to create users in the database and to create certificate requests for CER manager. And then you could have CER manager basically creating all of the certificates, including the certificates for the users that they use for logging in. Now we are planning to build this firstly it’s gonna be built as a separate project, but I’ve already talked to the guys of the C N P G operator and also to the guys of the Stack Rest operator, and they might include in our projects so that you could just use it late natively from post course as well if you want, if you like.

(20:54):

So yeah, I just wanted to call this out. This was one of the things that we ran into. And yeah, as a short summary of it if you look at it from the first angle and the way that you usually approach it in a classical database environments, you might go in a totally different route. And if you look at it from the cloud native perspective, so yeah, that was one of the things that I wanted to to point out. Yeah, and as you can see, the break glass option is very cloud native, so it’s scalable, it’s modern it’s loosely coupled, it’s observable. Yeah it do, it, it isn’t it isn’t problematic to have high impact changes frequently and all that stuff. So yeah, it fits perfectly well. Yeah, and this sort of is my you could say my gio I feel that that’s the way that the, the, the modern integrator should be operating.

(21:58):

Like we should be looking at at, at the problem cases, we should be looking at resolutions and we should be building them, and so be opensourcing them. I see a lot of people doing that. And that is basically how we got to OpenShift, how we got to Postgres, and how we got to a lot of those other cool tools that are out there. And that’s the path forward. And from here on, it’s just gonna help you if you choose a path yourself as well. So that’s strongly yeah, part of Mono Solutions’s mission as well. Right. are, are there any, just, just bring forward, if you have any questions you can bring them in the, in the chat or anything. I don’t see any questions yet. Yeah, just, just p me or you or ese if you see any questions popping up and we can go through them directly.

(22:54):

Okay. So the second challenge that we ran into was database performance. We were, as I mentioned, we were running cloud A F P G on OpenShift, and it was running on premise and it was running on sef. And we wanted to test like, how fast can it go? So we did some performance tests. We used PG bench for the insiders and I was hoping for yeah, for huge performance, like over 9,000. But in reality it was only 1300 t p s transactions per second which is not that awesome. And yeah, the way that you could look at it is that you see that you, we have 62 milliseconds of latency, which is huge. It, it’s a lot of latency for every call which is also why we have this short tps. Currently we have a D B A that is rerunning a lot, those tests constantly. And he even did a little bit more than 1300. But this is basically what we could push out yeah, push out of the system.

(24:10):

So what’s going on? So I did some more tests, and first of all, I separated this for write a ahead log from the, the data data files, which is what you should do anyways. And cloud S P G supports it. We also added memory, we added we added C P U and all that stuff. And we just to, to, to rule out that we, that, that, that this is a storage performance problem. We also looked at what would happen if you have faster performing storage? We did it the wrong way, I just want to call it out. It’s, don’t ever do this yourself, but it was an easy way to just see what would happen if we would’ve better performing the storage. And so yeah, first for separate wall, so you could see that we ran a little bit faster, but yeah, it wasn’t really that much faster. So separating our wall did not really help the case. We tried increasing the resources, so we tried it with more C P U and more gigabytes of ran. It wasn’t a huge increase either. I see a question in the chat, by the way. Lemme just see.

(25:32):

Yes. yeah, so Mark the way that I see it, so, so the question from Mark is this would require all of the dependent web apps to leverage client certificates and administrators to manage groups and <inaudible> inside of Postgres. Yes-Ish. So yeah, your web apps should be able to leverage client certificates, which is it’s just a little bit of extra configuration and all of the clients supported. So it’s not difficult to set it up. And if you have set it up once, then probably, you know how to set it up for all of your web apps. But yeah, it is something that you need to configure extra correctly and the, the, the biggest challenges that you need to make sure that you get access to the to the client certificates, which would be there in a secret, but you still need to mount it in the port and all of that stuff.

(26:23):

So that’s totally true. You also mentioned that you need to manage groups and r back inside of post address. Well, that is, so that is something that you probably want to do. And if you want need to do it, if you want to do it, then you also need to do it for client certificates. And you also need to do it, if you want to do it without that it’s sort of a separate thing. I would advise to do it, but it is something that you just do within the database schema. So as you create your database schema, you create your tables and all that stuff, and you could also create your whole r back on top of that. The cool thing is that with client certificates, you could have separate groups that so you could have users entering as not as a, as a user, but with a connection to a group.

(27:10):

And then you could have those groups have permissions, which is cool, right? So you could have a D B A group you could have an app group, you could have a reporting group, which is only has read only access or things like that, but definitely you need to set it up inside. Perfect. Correct. Yeah. Okay. No, no, that’s fine. Mark. Just interrupt as much as you want. That’s fine. Okay. so yeah, so we also increased resources and yeah, that didn’t really do the trick either. So, and then we, I did the one thing that you should never do. Basically I tested with f fsy is off, so never do anything with fsy is off. What happens is basically you could corrupt your database big time. So whenever you have database corruption, you reach out to community and they find out that you have SY is off, then basically they will not help you.

(28:11):

Basically, they say, dump the data, create a new database, and or load the data, and it’s the only way to get it proper again. Now in my case, what I basically did was I bypassed the whole syncing stuff just to see if it had a performance improvement, because I wanted to know if fsy was basically where we got the latency from, and if improving SYC and latency would also improve t P s. And definitely it did, right? So the latency was dropped by half and the transactions per second was was doubled. So yeah, basically you could say faster storage would help, obviously. Lemme see. I have one more question. Lemme just see. Yeah, thanks, you. Okay, cool. So now is this, is, this is 1300, is it awesome or is it poor, or what are we actually looking at?

(29:06):

I, I wanted to compare some, of course I know already, but I just wanted to, to compare it just to show some numbers to other guys as well. So I ran cloud native PG on a k s as well, and there you could see that we ran 1200 T p s. But if we increased the V course, then you could see that we also ran higher t p s. So here, adding resources like fi cores and, and REM basically also helps. Now how does it help? We needed to run. I think you could see it.

(29:55):

No, it’s, no, that’s not true, actually. No. so the reason why this helped is that we could have a stronger, yeah, basically the storage was better attached, was having more resources attached to it and more buffering and all that stuff. So a friend of mine is working inside of Microsoft, and he mentioned to me that I should redo these tests because they definitely fix something in the way that storage is attached, and it does really help for the, the Azure Postgres one, but I also want to do it for, I guess, again, so I will redo this tests, but at least more resources does help, and it didn’t on our on our environment. So for R E V M I also created PG Village. PG Village is basically VM deployment and you can just easily deploy it on Azure.

(30:57):

So I wanted Postgres running on Azure VMs, and I wanted to test with that. And what you can see is that if you just use normal VMs on Azure and you run Postgres against it you could see that separating out the wall really does help, right? So the latest drops by half T P SS increases doubles and you could even see that if we increase the machines we get yeah, it also helps for on on cloud environments, if you increase the machine size, you also get higher throughput for your for your storage. You could see that we even out latency again, and we doubled the T p s again as well, so we could even do more transactions per second on on VMs. And this basically scales all the way up. So if we would increase to even larger VMs, we could even get bigger TPSs.

(31:57):

So this is the way that you want to have it, I guess. Okay. of course, I also tested with Azure Postgres, which is basically Postgres running on Azure as a, as a surface. We used burstable VMs there. It’s, yeah, it’s just crazy expensive to do anything else, which is weird, but anyways but you could see that we get yeah, nice numbers there as well. Yeah, and this is basically what we got out of our system, and this was basically the top of it 1337. So now the big question is why is on-premise not over 9,000? Why, why can’t we scale it? Why is it why is it bottlenecked somewhere? Where is the bottleneck? And the way that I see it is that if you look at databases most of your data is basically in memory.

(32:53):

So the database, if, if you run a query, the database fetches data pages from storage, it fetches it into the file system, and the file system basically offers it in as a memory block to the database engine, and then you could change it a gazillion times, but it always is a change in memory. Now what also happens is that whenever you have decided that this is the way that you want your data, you could persist it, which is what you call a commit. You could, you could use the whole transaction mechanism, and if you use the commit of the transaction, that’s basically when the right ahead law, which is where all of your changes are recorded, basically it’s persistent on disk, and there is a round trip there. So the file system basically says to the storage, I want all of these files up until this point, at least I want them on on disk, and just tell me if it is on disk.

(33:49):

So you get this round Tripp where a roundtrip where you get this SYC file system does an fsy and it goes to the storage, and the storage says, yes, it’s on this and this whole round trick thing, this is where basically where it starts to become yeah, where, where it starts to become interesting. So let’s just zoom a little bit into it. Yeah. First a side story, the way that I look at these things is always that I see data as you could say it as a, as a as a warehouse, right? This is, so this is where all your data is stored and fetching the data takes a lot of time. But it isn’t really in line. It’s not like every box that comes in here ends up on these on these shelves anyways. Usually it stays on on these on these bands over here.

(34:40):

And some of those packages are really directly distributed from this truck into this truck. But sometimes you get stuff that should be on that should be stored for a longer period of time, and then you get all of this stuff so it’s not fast, but you can store a lot of data over there. That’s basically the way the database is stored their data as well. They store it on disk. Now this is a totally different, different from where how they store the apply log. So the apply log is basically where they keep all of the transactions, all of the changes that are applied to the data, they keep a record of it. And the apply log is not like a, like a storage distribution center at all. It, it is more like an assembly line. So you get all of these transactions that are coming over, like one big train of transactions.

(35:29):

And so this has an interesting thing that it’s, it’s mostly right it is also updated in line and it’s sort of, it’s stream based as well. So so if you want to improve your t p s, usually what you should do is just make sure that your apply log can go faster so that there’s a bigger lane over here, or that it goes faster through this whole lane if you can do something like that. So that’s the way that I always look at it. Okay. I see another question in the chat by way. Exactly. I think as Azure Bruce is what this, what this friend of mine was talking about as well. Yeah, let’s I, I, so I created a test program and I want to do a lot of tests with this test program, and I want to look at a lot of different architectures.

(36:24):

So I want to do a presentation on that maybe somewhere next year, I guess I’m definitely gonna have a look at Azure Boost. Thanks for that comment. Cool. Right. So into the, into the so this is as deep as I’m going. I, I’m sure, I’m sorry for going as technically deep, but it’s just the way that my brain works. Alright, so we have this database and you have this data, which is constantly changed in memory. Sometimes it’s checkpoint in the background, but it’s not in line for the performance. And then you have this, or apply log, and the supply log is currently written into, and, and every time when you push a commit and you have this syc, now if you look at this piece over here where you have the communication between the fastest one storage, basically, if you look at classic, you could have you would basically have a file system, which is a kernel driver, then it goes like something like Scci bus or something like that, and it writes it to a disc, right?

(37:24):

So, so this is very, very directly, directly coupled. And with that also extremely fast, you could say you could have a faster bus and then you would go faster, or you could have a faster disc or even memory caches, things like that. So yeah, then you would bring down latency. But us basically, this is very fast. Now what do we have with Seth? With Seth, we have this fast system in this kernel driver as well, but then we have the T C P network, and through the T C P network, you go to different nodes and every node HA is running software, which is basically thes software which writes through the driver through the kernel. It was a file system on top of it, but basically that it’s now included into self. It writes directly to the, to the D driver, and then it still goes through the kozy and to the SS sd.

(38:19):

But too many too many things are, are different. So first of all, you have this TCP network, there’s no software and an extra call on the driver inker somewhere. But next to that you also have it on two different nodes, right? This depends, of course, how you stored your data. You could have it on two nodes, three nodes, one node. You could also have it on one node. And this has a huge impact on LA latency. If your latency is yeah, is brought back, is is, is not that performance, then you would immediately see it in the t p s as well. So that’s basically what we are looking at now within the Dutch government. I brought this forward, I also mentioned like we could look into other options for storage and we could improve it. And so I wanted to spend some extra time on it.

(39:15):

But we, we figured like, yeah, well probably 1300 usually, probably is is enough now, why is it enough? So in classical environments, you would have your app, which is a, a huge monolithic application, big thing, which is you running on a huge database as well. It’s hammering the database constantly, it’s requesting a lot of data to be crunched, all that stuff. And this one big monolith is sort of in line directly with the performance of this app, but where what we are running, we are running microservices, and within microservices we could have many different path. And you could say that SEF has the bandwidth. It, it doesn’t have the t p s, it doesn’t have the, it, it has slow latencies, but it does have the bandwidth. It can do a lot in parallel. And and this is where it gets really strong as well.

(40:12):

So if you have always your microservices running together, every one of them gets 1300 t p s. Now what if you need a microservice that requires more than 1300 tps? Of course, we still need to find, look into other options. But for now we felt like, yeah, probably this is not gonna be a huge problem in the near future. Lemme have another look at the chat. Open E Bs, no I sort of use the cloud as how do you say it? Because open e bs open e b s is, I guess, is a is a cloud thing, right? It’s probably an a W Ss thing. I guess we use the cloud as a comparison. We want to run our databases on premise. I did look at other solutions like port Works and Longhorn and things like that. I want to test with that actually. But no, we haven’t looked into open b s at the moment.

(41:22):

Ah, okay. I’ll definitely haven’t look at it. Is it something like Port Works then? I guess because with Port Works and with Longhorn, you basically get directly attached storage, and then there is a syncing mechanism in between it but you also write directly to your local volumes. Ah, that’s cool. Okay, cool. I’ll have a look at Open BS as well. Thanks for the suggestion. Yeah, cool. Okay, nice, thanks. Okay. Okay, cool. Yeah, we will, we will have a look at it as well. Thanks. so anyway, I, I still wanted to fix it. So I talked to our architect and I felt like we, there’s, there’s much stuff we could do. We could do storage with far ing performance. We could even use directly, directly right to the, the, the, the disc, the, the virtual disc. We are testing with that at the moment.

(42:36):

Once we have it available, we our deploy as well. But this is one option, and I mentioned the other options as well, right? So, and there’s open E bss, so we’re definitely gonna have a look at it if we can improve it. But I talked to the architect and basically he said, well, took on t p s is fine, don’t worry about it. So I don’t know if how much cycles I can put into it, which is fine as well. Okay. yeah, this is the main thing, the main story of this, of this talk. But I wanted to talk about some other things as well. Oh, yeah. So as I was going through this, I basically, I wrote this I, I was looking at it and, and I noticed that I was constantly running PG bench, and why was I constantly running PG bench?

(43:24):

I wanted to run PG bench with 20 clients and with 40 clients and with 60 clients. I wanted to know the whole spectrum and why do I want to know the whole spectrum? Well, first of all, some applications can only run with multiple, with a few clients parallel. But second of all, I also wanted to know the optimum, and there is an optimum here. There is a, there is a point in time where your latency increases and your t p s doesn’t really increase anymore. So there is sort of a a goldilock song you could say. And I want to know that Goldilocks song for our, our architecture and for most of the architectures as well. So I was running PG bench, but I was running multiple versions, multiple configurations of PG bench, and I sort of got tired of it.

(44:10):

So I I wrote a tool it is called pg, it’s called t p s optimized. It’s, you can find it over here on the modern solution GitHub side. It is out there, it is, yeah, it’s, I think it’s just a really zero point, whatever. But I am now at the point in time where I want to basically start using pg t p s optimizer against multiple against multiple configurations just to see how those configurations compare to each other. And basically what you get out of, it’s an image like this. So we use the fnu curve. So we have 1, 2, 3, 5, 8. So it’s sort of a logarithmic increase in the number of clients. And then you get the number of latency and t p s, and you also get this awesome graph over here, which is the T P SS divided by the latency. This is where you can see that yeah, there’s a sort of gold <inaudible>, so you get a sort of a table output.

(45:18):

Yeah, I just wanted to call it out. And yeah, let’s continue with some other things that we ran into. Well, I talked about Bitbucket, c l I already I think it’s imp important for companies to enable themselves, and if you have companies that want to run organizations that want to run Bitbucket, which is fine, but then just at least enable yourself that you can just go as fast as you could go. And for me, that was basically adding some small additions to Bitbucket, C l I we run our own compiled version of it but it’s really enabling us. We could even use it in containers and do stuff from our pipelines with Bitbucket and all that stuff. So it’s, it’s awesome.

(46:04):

We have a thing where we work with project quota and basically currently we have rebuilt this. We currently have built our own internal operator, which is basically doing everything with projects and quotas the way that we want it. I’m on the verge of open sourcing it, I guess in a few months. It’s gonna be open sourced out there. It’s very opinionated, it’s very touch deck office alike. But it’s just cool to how easy it was for us to create our own operator and enable ourselves with that as well. So I think this is just the way to go. Interesting thing, by the way we created another image for this whole project quota thing. And it only has two criticals in 16 high sorry. We, we, we build an image with OC and TT m and all kinds of other tools in there, and we used that image before, but it was huge and it had a lot of vulnerabilities, and now we create our own Golang image, and it’s tiny and it also has zero vulnerabilities.

(47:14):

So that’s, that’s awesome as well, which is one other reason for me. I would say just build yourself, but just, just don’t do all of this best stuff with all kinds of other tools. I, I would go for this option anyway. Then we have to break the glass functionality, what we already talked about. I wrote a very small goal length program to basically it, I call it the pipeline runner. You can just set parents from the environment, it’ll read the pipeline definition and it’ll create a pipeline started. You could do sort of the same with OC and TK and all kinds of other tools, but then you get this whole thing again, right? So smaller image there, vulnerabilities. And we are thinking about building our own image pool as well. There’s a lot of projects out there that do stuff with image pooling, but I think we will do something ourselves as well.

(48:04):

So, okay. So let’s just say that you are an organization and you want to do stuff like this as well. Maybe you are in some kind of government agency and, but you also want to do bleeding edge stuff. My first my first tip would be just think cloud native. So the way that we approached it, and I mentioned it before just think cloud native, if you ever come up with a solution, just think, does it match up with cloud native with all of these keywords? If it doesn’t rethink it, probably. So that’s, that’s my first gesture. Second one is enable yourself. So we use cloud native pg, not because well there’s, there’s more options there but it was open source. We could just run it, we could do A P O C with it.

(49:00):

It’s not being said that it’s gonna be the protest option for the Dutch tax office. Currently we’re still looking into options, what we wanna run but we could run with open source and we could run a P O C with it. So I would just say do that yourselves as well. Just start running databases on Kubernetes and get some experience with it. And as you go, you will know what you need, and once you know what you need, then you could look into what, what’s, what, what tool best fits, and you could acquire support for it and all that stuff. G P L, yeah, G P L is, is funny. Everybody hates G P L for some reason, but the Linux kernel is G P L. So I, I just don’t understand. Most of my software, by the way, is G P L, which basically means if you want to sell it then you need to upstream your changes as well, which is the only thing that I’m, I’m looking for actually.

(49:59):

So if you want to use my software, you want to sell it, fine, but if you change it, just, just give me your changes as well, and then we could make the upstream project better as well, which is the only thing. Cool. Okay. So this is basically where I laid out everything. Dutch government is embracing cloud native. It’s, it’s beyond the Dutch tax office, by the way. It’s also, RVM is also looking into it. DGI is also looking into it. A lot of other Dutch governments are looking into it as well. Currently, we’re on the verge of embracing cloud native data on cloud native, you could say. Yeah, if you want to run cloud native ons, it’s very doable. Well, this is what we got out of this system. Maybe you could go a little bit faster, but it’s not gonna be a big game changer, I guess.

(50:53):

But look at other storage options if you need more. I mean, there’s, there’s tons of other options there, and you could do it already, right? With 13 on T P Ss. So and the key takeaways, think cloud native. Don’t fear to build your own use. Just open your source, just share it with the whole world, right? Yeah. It’s, it’s the way forward. And decide on expectations that you have and investments that you are willing to build into. Like for us, a 1300 seems to be fine. Then just don’t put any more effort into it, right? If it’s fine, just, just do with that. But upfront, just decide what are your expectations, how fast do you want to go? Alright that was the end of the, of the presentation. So do you want me to stop sharing, I guess?

Speaker 1 (51:47):

Yeah, that was, that was great. I know we had some questions during your talk, but does anyone else have any other questions? Now that we’re at the end, if you have any other questions, just drop ’em in the chat. You can also unmute and ask if you would like. But yeah, drop them there if you haven’t. Thanks so much, Sebastian. That was, that was really interesting. It’s great to hear from your experience and yeah, it’s always great to hear like use cases and case studies like this, so that’s, yeah, fantastic stuff. Thank you so much. We have a lot to learn there. We really appreciate you taking the time to, to come in and share with us, so thank you. For sure. Yeah, if you, if anyone has thinks of questions later, they would like to ask, you can send them over to us.

(52:31):

You can ping me or Ian on the d k Slack and ask and we can, you know, get answers to you and we can publish answers on the d k blog afterwards if stuff comes in after this. So, you know, if you think of stuff later, you’re welcome to this recording. Like I said before, will be up on the d k YouTube channel. I would like to, yeah, so that’s it. So the other thing we always do in these town halls is we, sorry, I, I clicked pause and recording somehow. I just blew my mask. My touch pad is obviously very sensitive <laugh>. Nothing we like to do in these <laugh> in these down halls is run a doodle quiz for anyone to, to answer questions and win a run D O K T-shirt. If you look on the screen now, you’ll see there is a details for that.

(53:25):

It’s, it’s run through app called menti. You can go to menti.com and enter that code, or if you want your phone, just scan the QR code there and, and answer questions that way. Give people a minute to, to get on. I also put it in the chat menti.com and the code is 3 1 6 4 0 7 2 oh. So yeah, you can scan that QR code or go toi com and pop in that that eight digit code there. It’ll join the quiz. And just a few very simple questions. And that’s that. Ooh, are we ready? Are we ready? Going to the questions. Cool. <inaudible> running the quiz is, is managing the quiz here, so, okay, we’ve got three questions today. We’ve got three folks in there already. Cool. Four players, awesome. And the winner receives a run, run d o k T-shirt that’s like the the classic run D M C logo, but run d okay on the t-shirt. It’s pretty cool. We’ll get your, we can gather your address afterwards to send it over to you. Cool. First you answer the more points you get. So, okay, going question one, what is the biggest challenge Kubernetes users face highlighted in both the 2021 and 2022 D O K reports? Yeah, answer on your phones and the quicker answer. Quick get more points.

(54:59):

Ooh, no one got a right of integration with existing. Oh my word <laugh>. So leaderboard, everyone’s on zero points. There we go. Question two, what touch painting is behind Sebastian <laugh>? Starry Night, the Night Watch the Garden of Earth Leader Lights or com composition number three with red, blue, yellow, and black <laugh>. Everyone has voted the night watch. There we go. Sebastian told me before this that it’s the most famous painting came out the Netherlands, which since I’ve seen it before. That’s, that sounds accurate. <Laugh>. Of course, it’s not

Speaker 2 (55:52):

Fair, right? It’s not fair. <Laugh>.

Speaker 1 (55:56):

Okay, last question. That was a good one. Where will d o k day be hosted this year? I did say that up top. And if you go back to the link in the, that one’s voted already there we netherland quick and whoa, that’s a tight race and it looks like Sebastian was the overall winner. Nice work, <laugh> promise. Cool. I’ll chat you afterwards to get details for sending over. We can chat on Slack about that. Cool. Thanks so much everyone. That was great. So yeah, thanks for joining. Recording will be up in YouTube. Just to note that our next town hall is, we do these monthly. Our next town hall is on the 17th of August, and we’ll be hearing from Brian Chamber, the Chick-fil-A about how they implemented data on Kubernetes at massive scale. There’s a link to that meetup event in the chat there, you can register. And yeah, that’s coming up in a month of time. And again, thank you to our gold sponsors, Google Cloud and Lacona and all our other sponsors. Well check out our website for for that. And that’s it. See you all next time. Thank you so much for joining and thank you Sebastian, for sharing with us. It’s been great. For sure. Thank you. Thanks.

# # #

Data on Kubernetes Day Europe 2024 talks are now available for streaming!

Dutch Government: Implementing Data & Databases on K8s

A summary of our discussion during our DoKC Town Hall held in July