CFP for DoK Day at KubeCon Eu 2024 is open until Dec 4th.

Submit Proposal!

DoK Talk | Vitess: A cloud-native distributed system for scaling MySQL

This DoK Talk explores Vitess, a powerful database solution for deploying and managing large clusters of MySQL and Percona Server instances. The talk highlighted Vitess’s versatility across cloud and dedicated hardware environments, as well as its ability to extend SQL features with impressive scalability as well as the Vitess architecture and its Kubernetes-friendly features, demonstrating why many enterprises have adopted this solution for their production needs.

Speaker:

  • Deepthi Sigireddi – Software Engineer, Planet Scale & Vitess Maintainer and Project Lead

Watch the Replay

Read the Transcript

Speaker 1: 00:00:00 Okay. All right. Welcome everybody. Thanks for joining us for another doc talk. We’re here today with Deep D ti, who’s the software engineer at Planet Scale and with Test Maintainer and Project Lead. Before we get into the talk, I’m just going to go over some community things. So what’s coming up? So first I want to say thank you to our community sponsors, our gold sponsors, Google Cloud, as well as Perona. So a big thank you to them for sponsoring the community. We also want to thank our silver sponsors and our community collaborators so you can see more about them if you go to DOK community slash sponsors.

00:00:45 DOK Day at Co Con is coming up on November 12th. The schedule is out, so you can check that out. You scan the QR code, that’ll take you to the schedule for DOK Day. Again, that’s for the co-located events. It’s the day before actually coup con starts and to attend. You do need all Access pass, and the passes have increased just recently, so not so much there. You can check that out. I think they go up one more time. So if you’re going to go, you should get your ticket now. But yeah, hopefully we’ll see you there. And then also just a thank you to Perona for being an Event Gold sponsor as well. So appreciate that sponsorship, which allows us to do DOK day. Some upcoming events. We have our town hall, that’s next week on the 19th. We have empowering developers with easy, scalable stream processing on Kubernetes, and that’s with Sri Yai and VI Maurice from Intuit.

00:01:40 And then we also have another doc talk with Victor Mov who’ll be talking about scaling real-time analytics with Apache Pinot on Kubernetes. So you can register on Meetup there. That’ll take you to our meetup page and register for those events. Then of course, if you want to join the community, you can scan the QR code. That’ll take you to our community page, and then we have another link if you’re interested in learning about sponsorship. Yeah, so I just want to introduce Deep Tea, who’s maintainer and project lead for vass and vass used by people like Slack and Etsy and a number of other companies, which I’m sure deep Tea can tell us about. But yeah, deep tea, take it away.

Speaker 2: 00:02:24 Thank you, Paul.

Speaker 1: 00:02:27 Welcome.

Speaker 2: 00:02:29 I will just share my slides and then we’ll get started. Hello everyone. I’m here to talk about wis. This is how the talk will go. I’ll start with an introduction and then do an overview of WI and then we’ll move on to how WIS runs on Kubernetes. And then we’ll do a q and a. So whatever questions are posted, I’ll take them at the end. Introduction first, a little bit about me. My name is Deepti Segre. I am currently a software engineer at Planet Scale and I’m a maintainer and the technical lead for Vitas, both at Planet Scale and in the open source community.

00:03:23 I started a long time ago working on supply chain software, and at that time, the way we did supply chain planning was that you get all of the data that you need in files and you read them into memory and you do something with them and you write those results back out into files. But at some point, somebody had this brilliant idea of why don’t we start reading things directly from a database instead of this flat file business? So then people built some adapters for reading and writing directly to a database. This was driven by the way data was exploding. So when data sizes were small, it was actually plausible to read everything into an in-memory process and process it and then write it back onto disk. But data was growing at a much faster rate than compute and memory were growing, so they were simply not able to keep up with the rate at which data was growing. So then we moved on to doing things with databases to start with. All the databases that people were using were monolithic. You have all the data in one database and then you do whatever you’re doing with it. Everyone goes to that one database.

00:04:55 But as time went on, the database became a bottleneck. It became a bottleneck in multiple ways. One was scalability, because if the more processes you had running against the database, the more contention that would create and the more things would slow down, and the only way to improve that was to keep running the database. These were the on-prem days. Pre-cloud days was to keep running the databases on larger and larger and larger machines. And those machines, the cost of buying larger machines is actually not linear. It goes up much faster than linear, and it becomes more and more and more expensive. So companies were spending millions of dollars just for the hardware on top of paying licenses to these database vendors whose databases they were running.

00:05:48 Eventually what happened was that instead of people running things on-prem, they started running them in the cloud while things were still being run on-Prem. The team I was working with at that time, we set out to solve the scalability problem with these enterprise monolithic databases. And the way we did it was that you would partition it. So you had to take the problem space and partition it in such a way that you could line up the compute with the data. So you partition the data, but you want to make sure that whatever processes are performing work, the workers go against one partition. So the problem had to be decomposable and because at that time I was working on retail supply chain planning, the problem was decomposable. Eventually we got to the SaaS world where the same thing retail supply chain planning. Instead of every company running it for themselves, they would purchase it as a service from a vendor. But the same principle of partitioning, the data was still applicable in this new multi-tenant world where you were providing supply chain planning as a service and you had all of these clients and their data is all disjoint and it can live in different partitions.

00:07:05 After working for about 10 years on supply chain planning, I moved on to mobile device security. And in that world of mobile device security, the problems were still the same. You were running an enterprise database, you had a SaaS service, you had thousands of customers who all had their own data and you had to store this data. And partitioning and multiple schemas were essentially the strategies that we were still using to achieve scalability. Eventually, about five years ago, I ended up joining Planet Scale, and that’s when I started working on wi, and we will talk about how WIT came about in a minute.

00:07:52 So here’s a brief history of WI what is with us. First of all with us is what we call a cloud native distributed database system. It is built around MySQL. MySQL is the only dialect it supports. It was started at YouTube in 2010. And the reason with was built, the reason Wi US came about is because YouTube was running on MySQL. So in the early two thousands when we had the.com boom, a lot of dot coms adopted the LAMB stack, which included MySQL as the data store. And MySQL was great for people who were trying to build businesses because it was open source free if you wanted to use the community edition or you could pay Oracle for it. And it was actually very easy to set up and use. It was very easy to administer, but people started running into issues with MySQL as their data started growing, and that’s what YouTube was encountering in 2010. So what would happen was that the database would actually crash and the service would go down on a regular basis because the amount of data YouTube had for their video service for the videos, the metadata for the videos was just too much to fit into one MySQL. So with us was started at YouTube to solve the specific problems they were running into with MySQL, which had to do with availability, scalability, data size. So they invented WI US to solve those problems.

00:09:39 WIS was open sourced almost from the beginning. WI US was intended to be open source, but it was published on GitHub as an open source repository in 2013. And once it became open source, there was interest in using WIT from other people who were having similar problems with their MySQL deployments Over the next few years, 2014 to 17, witta saw a fair amount of adoption from open source users that included companies like Flipkart, which is a major electronic retailer, e retailer, not electronics retailer, just E retailer in India and HubSpot, slack Square, a lot of these companies that had large amounts of data started adopting WIS to solve their problems.

00:10:39 After a while, WIS was donated by Google to CNCF. That was in 2018 and it became A-C-N-C-F graduated project in 2019. A little bit about Planet Scale, which is where I currently work, planet Scale was started in 2018 after VITUS had been donated to CNCF by people who had worked on vitus at YouTube. So they had started the project and deployed it and ran it in production for several years. Planet Scale is a fully managed database and it is available on the public cloud. It runs on E-K-S-G-K-E. Planet Scale provides a multi-tenant managed database service, but there are also managed and single tenant offerings and Planet Scale runs with us with a custom Kubernetes operator.

00:11:40 Wittis is in production in many places. This is a subset of those places. Many of these companies are big, but some of them are not huge companies. They may be small in terms of their revenue, but they still have the same kinds of problems with the amount of data they are dealing with, and they still have the same kind of availability requirements as larger enterprises. Some of the key adopters of wit US Slack, every Slack message goes through us because Slack is a hundred percent on us. GitHub, every pull request, every issue, every notification on GitHub goes through with us. jd.com is a huge Chinese online retailer. They’re probably the biggest in China, and they are running 10,000 plus databases on us in Kubernetes. And then there are places like Shopify and HubSpot where the data infrastructure team provides their own internal database as a service. And WI US is part of that database as a service that their application teams can choose to use for their applications.

00:12:55 A little bit about the community around wittis. We have 15 maintainers from five different companies. In the last two years. We’ve had 256 different contributors from 47 companies. 115 of them actually contributed code and they came from 23 different companies, and these numbers are for the last two years. How is WIS deployed? WIS is deployed in a variety of ways, which means that we have to support these variety of ways. They are people who IT on-prem for whatever reason, they may have security or compliance requirements. There are people who deploy them on cloud VMs like Amazon, EC2 or Google Cloud or Azure or whatever cloud. You can run it on any cloud. Kubernetes is the preferred way of deployment for wis, and the maintainer team actually maintains an operator for WIS for Kubernetes, and that’s what we recommend to everyone. So there are a lot of people doing that. And then there are some mixed deployments where some portion of the WIS deployment runs on Kubernetes. Some of it runs on VMs. I think there is one deployment which is running on Nomad, so there are some other custom deployments.

00:14:22 Now let’s get into an overview of wittis. Why does WIT even exist? The reason WI exists is because the characteristics that we expect of modern applications have changed and evolved in the last 20 years. It’s no longer acceptable to take downtime for maintenance. It’s no longer acceptable for things to be slow. If somebody comes to your website and it is slow, people are just going to leave. And some of these things are things that have to be addressed at every part of the software stack. So what you are building may be a web application, which means that the web application has to be responsive, but also whatever you’re running on the backend, whatever infrastructure you’re running on the backend also has to be responsive. And we are in an age where unless you are running a static website, literally everything, every application has a backing data store. And what we expect from this backing data store is no different from what we expect in terms of responsiveness and availability from the front end. So modern applications have to be performant, they have to be scalable. You have to be able to grow from 10 users to a hundred thousand users to millions of users. It has to be always up. You cannot take downtime and you cannot lose people’s data.

00:15:55 So WI solves for all of this at the data layer because remember, when you have a stack of software, every part of that has to provide you very similar guarantees in terms of being up, being scalable and all of that other jazz. And your stack is only as strong as the weakest link. And if your database is a monolith, and if your database is not providing you high availability, not giving you good uptime, then you’re going to have problems. So wittis is massively scalable. We have numbers like up to 20,000 petabytes of data has been stored in wittis. It’s highly available. It has been built with availability in mind, and we will talk about that as we go through the stock. It provides durability guarantees which can be chosen. So when it is being configured, you can say to what level of data durability do you want it to guarantee, and that is what wittis will provide you. And it’s compatible with MySQL.

00:17:08 So Witness provides an illusion to applications of being a single database, even though it is actually a distributed database system with many components running at the same time and communicating with each other. As far as the application is concerned, it looks like a single database and applications can make a single connection to wittis without knowing what the actual MySQL database is that is running behind the scenes and how many of them may be running behind the scenes. And wittis speaks the MySQL protocol. So MySQL clients can connect directly to Wittis without actually going through any other layer, and it works with both MySQL, meaning the Oracle or Community Edition of MySQL and PERAs edition of MySQL.

00:18:03 And with this can be used with database frameworks, applications used various different ways of connecting to databases like frameworks. ORMs, they may have their own homegrown code using which they connect to databases or they may be using a third party application or a driver like for instance, go MySQL driver, which works with all of those things. And that’s because it actually understands the MySQL protocol. Here are some of the features that WIS provides. We’ve already talked a fair amount about MySQL compatibility. Some of the other things WIS provides are things like resharing. So the way you can scale when your data starts becoming too big for a single MySQL is through sharding. And we’ll talk a little more about sharding as we go along. But what with us provides you is a way of defining how you want to shard your database and are loving you to do a arding process in an online way with minimal downtime.

00:19:16 So when you switch over, there may be a few seconds of right unavailability reads will still continue, and then everything switches over to your new configuration with the shards. WI provides materialization features. So this is something that MySQL does not provide. You have views in MySQL, but you don’t have materialized views with wis, you can actually do materialized views. It provides APIs and CLI for managing your cluster. That’s a very important thing for anyone who’s deploying wis because they need to be able to manage the various components that are being deployed. WI provides online non-blocking schema changes. So one of the longstanding issues that people have had with MySQL is that schema changes can end up being blocking in the sense that they end up taking locks that prevent other traffic from being served and your queries may end up hanging, timing out, running into deadlocks, all those kinds of things.

00:20:24 And there is actually a huge community and a huge body of work in the MySQL community, which is dedicated to working around the limitations of MySQL when it comes to doing schema changes. MySQL has made improvements in the last few years on how schema changes work and certain types of schema changes are instant and not blocking anymore, but still it is not where it needs to be in order to run in production. So in witta, we have actually built out online schema changes where everything happens in the background and then you cut over to the new version of the schema and things work seamlessly. Wittis provides backup and recovery operations, and these are actually key to running in Kubernetes. As we will see when we get to the Kubernetes part of the stock. wittis also provides some safety features. So for instance, if we are receiving the exact same query hundreds of thousands of times, there is no need to fetch that same data from MySQL over and over and over again. We can consolidate them and fetch the data once and return it to a hundred thousand clients. This is the feature that we call query consolidation, and with us also comes with automatic failure detection and repair. There are various failure modes that you can encounter in any distributed system. And with WI Test the way we run MySQL, the way the architecture is laid out, there are specific failure modes that we have to worry about and we have a way of monitoring for those and repairing them automatically.

00:22:08 Let’s talk about the WIS architecture. How does the WIS architecture enable transparent database operations? Because I said that as far as the application is concerned, WIS looks like a single database. Before we do that, we need to go through a few concepts. So in vitus there is the concept of a key space. A key space is a logical database. So this could be, it’ll be used by clients as if it’s the database name, but behind the scenes it is actually many MySQL instances. But as far as the client is concerned, key space is a database. So it’s a logical database which could be backed by three or 50 or a hundred physical databases shard. So when you have a large amount of data and you want to split it up into disjoint parts, we call each of them a shark. So an example for this would be when you go to a conference and you’re checking in, there may be two counters, one of which says A to M, and the other one says N to Z.

00:23:24 That’s an example of sharding. You have taken the entire set of people and you have divided them into two parts, and if you combine them, you get the full set. But the two sets are disjoint. Each person belongs either to the a m set or the N to Z set. Nobody belongs to both sets. So if we extend that analogy to databases in a given table, once you shard it, you have that same table living in multiple different databases, but any given row lives in only one of those databases, and every row is present in exactly one. It’s there somewhere.

00:24:10 And then we also have something called key space id. So key space ID is what we use to identify which shard a row belongs to. And the way this key space ID is computed is customizable, and there are many functions that we provide out of the box for computing, the key space id, but people can also write their own. And the way you compute the key space ID is what we call the sharding scheme or Windex. How do you map your roles to shards? So you define which function or which scheme, which strategy you are going to use to do that computation. That computation gives you a key space id, which is then mapped to a shard.

00:25:02 So in the MySQL world, in the modern database world, it’s common to run databases in replicated mode. So you have a primary and replicas, and especially this is pretty much the standard way people run MySQL when they want high availability, when they want to make sure that if the database goes down, they have a backup, they have something to fail over to. So this is actually very standard. So what we do in Vitas is that we run MySQL in this replicated mode. So we are running multiple MySQL servers, and to each of those servers, we assign a sidecar or demon process called a VT tablet. So this VT tablet actually controls the MySQL D process and it mediates all communication with the MySQL D server. And typically it’s run on the same post as MySQL D and communicates through the Unix socket. So that allows you to run this without adding additional network latency into the system.

00:26:09 In production, people run multiple clusters. Some of them may be sharded, some of them may be unshared. And each shard looks like one of these. In each shard you have a primary and you have some number of replicas. The component that receives application traffic in wittis is called VT Gate. It’s a gateway to wittis, and this is a stateless process, and it’s a proxy that receives connections and queries from clients. It speaks the MySQL protocol and it presents this illusion of a single MySQL server to clients. But what it’ll do internally is that it takes the query, it passes it, it figures out where to send the query, and it relates the queries to the right place, to the right key space to the right database, to the right tablet, and then it gets the results back and forwards them back to the client. And even though I’m saying MySQL, VT Gate can speak both MySQL and GRPC. So there are people who send everything as A-G-R-P-C call to VT Gate because they themselves are able to speak the MySQL protocol. So that’s the vti gate. In order to scale the cluster for traffic, what people will do is that they will run multiple vti gate servers. And some of this probably looks very familiar to you from Kubernetes deployments, and you can already map what these things look like in a Kubernetes deployment.

00:27:49 V gate must take those queries and route them to the correct shards. And the way it does that is by looking at the schema and by looking at the sharding scheme. So if there is a query, for example, we have a query that says, select something where customer ID is for you. Take that ID for you, apply the sharding scheme to it. You compute a key space ID that tells you which shard this row must live in. And then V GATE can send that query just to that one shard. It does not need to send it to all shards. So in that sense, it’s efficient because you’re not sending it. You are not blindly passing along the query to all of the shards.

00:28:32 Another key component of wittis is what we call the topology server. This is a distributed key value store that we use to store the metadata associated with the vitus cluster. So the tablets, the schemas, the shards, the key spaces, the sharding scheme, all of that is stored in this, and we support HCD and zookeeper. HCD is the standard thing that ships with the operator, the WIS Kubernetes operator. This data set is pretty small, and VT GATE typically caches this and refreshes it periodically. The next component I want to talk about is a controlled demon, V-T-C-T-L-D. So given that we are running a distributed system, there has to be an API and an entry point that people can use to manage things because they have to deploy, they have to UNEP ploy, something may go wrong that requires an administrator to repair things. So V-T-C-T-L-D is the process that presents that API and a command line interface that people can use to manage their WI cluster.

00:29:46 I talked a little bit about failure detection and recovery, how important it is in a distributed system that we think about all the failure modes and we think about how we recover from them. The automated way that WI does this is using a component called VT arc. VT ARC stands for vitus orchestrator. So what VT A does is that it monitors all the VT tablets, which in turn are talking to MySQL, and it makes sure that every shard has a primary at all times because that governs our right availability. If the primary is not available, then you cannot write to the database. Even though you can probably still read from the database, you won’t be able to write anything. And that means that the system cannot make progress. So VTR monitors for primary failures, it repairs those failures if they happen by electing a new primary, a new leader for the shard where it detects a failure, it monitors replication on all of the replicas and repairs replication, it fails. If it fails and is a recoverable failure, it’ll actually repair the replication. And it also enforces the durability policies because in Vitas, you can choose what your durability is and Wiki enforces that at each tablet. MySQL level.

00:31:14 In addition to everything that we have talked about, we also have a management ui. So there is a web UI that people can use to view and manage their WI cluster. It talks to a web app which provides the API that the UI needs, and that in turn talks to V-T-C-T-L-D, which is the main API server for wittis. So to summarize the architecture, you have a bunch of data In a whole host of MySQL instances, the data has been sharded, it lives in multiple shards. Each MySQL is managed by a VT tablet. Traffic to the WIT cluster comes through VT gates. People typically run multiple VT gates and put a load balancer in front of them. And then we have the management components of the WIT cluster that all both automatic management and human management.

00:32:23 Alright, let’s move on to running with us on Kubernetes. So we will start with a little bit of history again, given that US was born at YouTube, which was already a part of Google at that point with us, had to run in Borg. Borg is the infrastructure that everything at Google runs on and Borg’s infrastructure is all ephemeral. So with us had to run in an environment very similar to what we think of as Kubernetes now. And Borg in fact was the precursor for Kubernetes. At Google, you are running your WI cluster, you are running your application, whatever it is, and you are running it in whatever they called it at that time. What we think of as pos, you are running a process, a container, a pod in this environment where the pod can be rescheduled at any point. You have no control over what physical infrastructure that pod runs on, and the orchestration layer, whether it is BOG or Kubernetes, can decide that, Hey, I need to perform maintenance on this node, so I’m just going to move this entire workload to a different node.

00:33:44 So with this had to run in that environment. And if you remember the debates in the Kubernetes community over the last five years about whether you can run databases on Kubernetes, whether you can run stateful workloads on Kubernetes, that is not an easy thing to do. It’s not an easy thing to run something like a database in an environment where the process can disappear at any point and it restarts somewhere else, but you have to deal with the instability of my process can disappear and come back somewhere else. So from the beginning, Wittis was built to be able to run in this kind of an environment. So in fact, at Q YouTube in Borg, there were no persistent discs, there were no persistent volumes. All disk, all storage was ephemeral. So WI had to run in an ephemeral storage environment. So this is why the backup and restore functions of WI are very important because the way YouTube did it was that they would say, okay, we have one primary and we have all of these replicas, and they would run tens of replicas, dozens of replicas. If a replica disappears, it’s fine. It doesn’t make a substantial difference to how they are serving traffic because it’s one out of 50 or one out of 80. It’s a small fraction. You will immediately bring up a new replica from backup, you will restore from a backup, you will connect to whatever is the current primary, and you will catch up to that. And at some point you’re caught up and everything is good.

00:35:30 Now, what happens if it’s the primary that is disappearing, you have to deal with that too, which is where the automatic failure detection and recovery comes in. Because if the primary disappears, you have a monitoring component that is watching for primary liveness and immediately saying, Hey, I detected that this thing disappeared, so I’m going to elect one of the replicas primary. So from the beginning, the way Vita has evolved, it has evolved in such a way that it is Kubernetes friendly. So it was running in Borg. Then at some point they moved with us within YouTube, within Google, from Borg to Kubernetes, they started running it on Kubernetes. So fast forward to now we have people, oh, actually, so they started running with us in Kubernetes, and then once with us was open sourced people, built a helm chart for it, and the maintainer team actually maintained the helm chart for a while, and then we deprecated it and then we built an operator, a Kubernetes operator for running with us. So in terms of timeline, this is how the various ways of running with us on Kubernetes have evolved.

00:36:49 So when did with us start running in Kubernetes with us, started running in Kubernetes in 2015. And for anyone who remembers Kubernetes history, March, 2015 was actually prior to Kubernetes 1.0 because Kubernetes 1.0 was only became available in June, 2015 or something like that. So with us was one of the very first large applications to run on Kubernetes within Google when Kubernetes was still under development. So what do current Kubernetes deployments look like? There are people who still run with Helm charts, even though the maintainers don’t provide an official one, either they have written their own or they’ve taken the one that used to be supported and they are still running with it. There is a Witta operator for Kubernetes that we maintain. There are people who use custom operators. Planet Scale is an example. HubSpot is another example. Shopify probably has a custom operator. And then there are other custom deployments that people have built on their own.

00:38:01 What is it about WIS that makes it cloud or Kubernetes native? So we’ve already touched on some of these aspects. The reason WIS is Kubernetes native is because it was designed to be so from the beginning, it was built in order to be able to run in an environment where the disc can disappear or the compute can disappear, and you have to deal with all of those failure conditions. So what wittis is able to do is that in this sort of Kubernetes orchestration environment, it is still able to provide high availability, scalability, and data durability. So we will look at how it does that and the way Vitas is able to provide these features, high availability, scalability, data durability is through all of these features that were built out over time. So in order to have high availability, you need to have a backup instance that can take over if your primary instance, which is taking rights, goes down.

00:39:12 So we achieve that through replication. You always have a certain number of replicas running, constantly receiving the data stream from the primary and staying up to date as much as possible, right? So replication is a big part of it, but the failure detection and recovery is also a big part of it. If the primary goes away, for whatever reason, it ran out of memory, it lost its disc, it was rescheduled by Kubernetes, rescheduled by an operator, we can detect that failure and we can actually recover from that by electing a new primary. Now, when people run with US operator, for instance, there is a distinction between planned rollouts and unplanned failures. Operator can take care of planned rollouts. What operator will do is that it’ll actually elect a new primary before turning down an instance so that you don’t see any downtime at all. But unplanned failures are a different category. And for that, we have the VTR component, which does the failure detection and recovery. So replication and what we do in operator for planned rollouts and failure detection and recovery together, these things give you high availability.

00:40:35 They also give you data durability. So the primary is the one that is taking the rights and it is replicating them to the replicas. Replicas can be up to date or they can be lagged, but you can run MySQL in a mode which is called semi synchronous replication, where if any right has been accepted by the primary, it has also been sent to at least one replica. So that way you are not going to lose any rights even if your primary goes down. So that is what we call durability. And this durability is configurable as a policy.

00:41:15 The other critical part that is allowing us to provide high availability is the backup and restore part of it. So there is a difference between running 50 replicas, 80 replicas and running two replicas. Maybe you are running with one primary and two replicas. You don’t need any more than that, and it costs money to run more than that. So if you lose a replica, then what do you do if you lose a disc? What do you do? You restore from backup and then you catch up to the primary. So that’s how you recover from that kind of failure, and that’s how you get the high availability because if you’re running with just three MySQL instances, one primary and two replicas, you want to keep that number pretty steady at all times so that you don’t end up in a catastrophic failure situation. And then sharding. So sharding is how WI provides scalability because we know that individual MySQL instances will not scale well beyond a certain data size and beyond a certain number of concurrent connections. And by sharding, we are actually able to distribute the load on the system to many different MySQLs, and that’s how we are able to scale.

00:42:35 So if we go back to the diagram we had earlier and think about how this would be deployed in a Kubernetes environment, each of these disks, they could be ephemeral disks, but the way we do it in the operator, they’re actually persistent discs. They’re persistent volumes. So Amazon, EBS, Google Persistent Disc. So the dis associated with each of these MySQLs is a persistent volume and wet tablet and MySQL run as pos as parts which both share that persistent volume. So VT, tablet and MySQL, along with their persistent volume, we can think of as one unit. And for each shard, we have multiple of those units and we have multiple shards and multiple key spaces. So it sort of grows pretty quickly at that point. Now, VT Gate on the other hand, is not a stateful component, it is stateless. And because it is stateless, we run it using a deployment with a replica set and you specify how many replicas you want for VT gates and they’re all effectively equal, and you put a load balancer in front of them, which distributes traffic across the VT gates. And as your traffic goes up, you can just increase the size of your deployment and that will scale.

00:44:13 That brings us pretty much to the end of whatever I wanted to talk about. Here are some resources, the WI website, planet Scale website, our GitHub repository. On the WIS website, we have links to our Slack workspace and to our social media. There are about 4,000 people on our Slack workspace right now, and it’s a very good resource for people who want to try to run us on their own or who want to run us in production. We also have an email list that is hosted by CNCF. So with that we can go to q and a.

Speaker 1: 00:44:54 Thanks so much, dt. That was a lot of great information, very thorough. We have a couple questions, but I just want to say it’s really interesting to hear about the history of Vitas in particular, going way back to running on Borg in the very early versions of Kubernetes, considering running Stateful applications seems to be more of a newer concept. It seems like you all were doing it long before everybody else, so that was really interesting. So we have one question from, by the way. Yeah, anybody else? If you have more questions, feel free to add them to the chat and we’ll ask dt. But a question from JD said, what are the typical reasons for custom operators?

Speaker 2: 00:45:47 There are a few different reasons why people are running Custom operators. In some cases, the operators that people wrote predated the open source operator that we are maintaining and they are just not going to migrate. It is actually an open question for me, which I haven’t figured out yet. How you could possibly take running containers and migrate them from one operator to another. So that probably adds to the difficulty,

Speaker 3: 00:46:20 But

Speaker 2: 00:46:22 So some people are doing it because they did it before we provided an open source operator. Some people are doing it because they have additional requirements beyond what the open source operator is providing. For instance, if you look at Planet Scale, they want to do backups in a slightly different way, which is different from how open source WI does backups even. So that was one of the reasons why Planet Scale built a custom operator. Some of the others probably because they are also providing a database as a service within their enterprise and they need to build additional functionality, which is not available in the open source operator. I would say that is the main reason for someone to build a custom operator.

Speaker 1: 00:47:16 And are, for instance, a planet scale, is that just a fork off of the main VI test version, or is that a loaded question?

Speaker 2: 00:47:25 No, no, no. It’s actually the parent because what happened is even now, the open source WI Operator still lives in the Planet scale GitHub repository. We would like to donate it to CNCF and make it part of the WI repository, but we can’t figure out how to change the name spacing, which currently says Planet Scale. But the history of that is that the Witta operator for Kubernetes was built at Planet Scale to start with, and then it was open sourced and made available to the community. And over time, the internal version of the operator evolved in its own way and diverged from the open source version of the operator.

Speaker 1: 00:48:17 Thank you. Okay. And then we have a question from David Murphy. He says, what are the medium to long-term plans for Verta to natively support cross cluster vertes deployments?

Speaker 2: 00:48:29 I’m assuming this question is with Kubernetes cross Kubernetes cluster with this deployments. So right now, there are really no plans. The thing is that the maintainer team is pretty small, and not everyone in the maintainer team is an expert on Kubernetes or Kubernetes operators. So we have a pretty small group of people who are maintaining that operator, and it becomes difficult to build any large features. So to some extent, given that it’s open source, we would need someone who needs cross Kubernetes cluster with test deployments to drive it.

Speaker 1: 00:49:19 Yeah, that’s an invitation. Are you seeking other contributors?

Speaker 2: 00:49:25 We’re open to contributors.

Speaker 1: 00:49:28 Great. Who should they contact? Should they contact you as the lead maintainer?

Speaker 2: 00:49:37 So the way people usually communicate with us is either through GitHub issues, so people can create a feature request saying, Hey, this is a feature that we want in the operator, which is its own, it has its own issues list or a Slack workspace. The question is from David. I know David, I know that he wants this.

Speaker 1: 00:50:01 Well, David, well, you should submit a GitHub issue.

Speaker 2: 00:50:06 Yeah, he’s going to say he is already done that.

Speaker 1: 00:50:08 Okay. Alright.

Speaker 2: 00:50:11 But for anyone else, yeah, GitHub issues are the canonical way to communicate feature requests. You are always welcome to join the Slack workspace and ask questions, start discussions. Things can be discussed on GitHub issues as well.

Speaker 1: 00:50:28 Yeah. Well, I had sort of a general question about, given how far back bot test goes, what would you say are the major changes that have happened since say 2015?

Speaker 2: 00:50:41 Okay, so since 2015. So I would say that until 2015, WITTIS was really built for the YouTube case. So it just needed to work the way YouTube needed it to work. But starting in 2014 or 15, because people started adopting WI people who were not YouTube started adopting with us, it actually had to grow in different directions. So one example would be the query set that was supported by WIT in the beginning was pretty limited because when you have a sharded system and you are trying to do a complex join, it’s not easy to execute that because you can’t just send that query as is to every shard. You have to figure out what roles do you need from which shard and how do you combine them with roles from some other shards because your join could be going across shards. So a lot of this work has to be done by VT gate in memory.

00:51:51 So the set of queries, set of complex queries that were supported was fairly small, and that set has been expanding over time because every new adopter of wittis, I would say from 2015 to 2020, every new adopter would bring in a few more things that didn’t work for them. And the maintainer team actually pretty aggressively added support to things that were previously not supported. And there were some community contributions too. When I say maintain a team, they are from many different companies. It wasn’t just people from Planet Scale who were adding support. They were people from other companies like Slack, Pinterest, who were also adding support. So that’s one of the biggest things, adding support to MySQL constructs that were not supported back in 2015, because with MySQL eight, oh, they also introduced a lot of new syntax that was not in five seven, which started on 5, 5, 5, 5, 5, 6, 5 7, and then eight. Oh. So that’s one of the big changes. I think the other big change is that many of the people who adopted Vitus in the 2000 15, 16, 17 timeframes were actually running either on-prem or on their own cloud VMs, because Kubernetes was still very new that time. Whereas if we look at people who have adopted with us more recently in the last 2, 3, 4 years, a lot of them are starting with Kubernetes. So that’s another big change.

Speaker 1: 00:53:38 Probably the introduction of the operator made that a lot easier as well. I’m guessing

Speaker 2: 00:53:46 People managed to do it with helm charts too, because they would basically say, oh, if Kubernetes is going to kill a process, I can put a pre stop hook which does the failover because I don’t want my primary to go down. So they were able to work with it, but I think the operator definitely made it a lot easier to deploy Vitas and Kubernetes.

Speaker 1: 00:54:11 Okay. I have one more question here from JDX is what are some of the limitations of Vitas as compared to normal MySQL? Or what is different? Can I just migrate my apps from traditional MySQL to Vitas?

Speaker 2: 00:54:28 So in general, for the most part, you can just migrate your apps from traditional MySQL to vitas. Now, there may be a few things that don’t work because the compatibility is not a hundred percent. So we actually have a MySQL compatibility page on our website, which lists a few of the things that we don’t yet support. And periodically we will publish announcements of something that we added support for. So for example, a couple of years ago, we did not support cts comment table expressions at all. And in the last release, we added support for simple cts, not recursive cts. And in the upcoming release, we’ll have support for recursive cts. So some of this keeps evolving, but in general, you should be able to 99% of the time migrate your app to tis.

Speaker 1: 00:55:30 And then I think maybe David was also responding to that question, but it says one big thing is primary keys versus sequences, but it comes to sharding and the B schema. Other things are like FKS and trigger, et cetera.

Speaker 2: 00:55:48 So typically when people migrate to Wittis, they don’t shard right away. So you do the migration, you get through whatever the issues are, and then you say, okay, the reason I’ve migrated is because my database is so big and I do need to shard it. Whatever David is talking about comes into play when you try to shard. So MySQL has auto increment fields. You cannot do auto increment, you cannot use MySQLs auto increment when you have a distributed system. So you have to have your own implementation, the equivalent of auto increment, and that is sequences in vitus. So you would have to make sure that you drop the auto increment from the column and you replace it with a vitus sequence. So there are things you have to do in order to shard, but if you just want to migrate to wittis, then some of these things don’t have to be solved right away. But David is absolutely right. You have to solve for these things eventually when you’re ready to shard.

Speaker 1: 00:56:58 Yeah, that’s really interesting to think about it turning off auto incrementing on a distributed system. That makes a lot of sense. And so forgive my naivete, but yeah, so sharding, the benefit of sharding is you’re saying your database is too big, and so trying to access piece of data and a giant database just takes too long, is that sort of the point of sharding is to make the data more accessible?

Speaker 2: 00:57:24 So sharding can be done because your data size is too big, which is slowing down your queries, or it can also be done because you have too much traffic because there’s a limit to the number of concurrent connections. MySQL instance can handle. Each connection is a separate thread, and if you have hundreds of thousands of concurrent threads, it just slows things down.

Speaker 3: 00:57:51 So

Speaker 2: 00:57:52 Typically the limit is 10,000 concurrent connections. But what we are able to do in Wittis is that we actually pool the connections. Not every connection is active all the time, and you can support 10,000 concurrent clients with a much smaller set of pool connections. And this idea of connection pooling, which was implemented in Wittis back in 2012 or whenever, is now commonplace. Almost everybody, every modern application will say, I’m not going to open a new connection to the backend database for each query that I want. I’m going to create a set of pool connections, maybe five, maybe you don’t need more than that. And every query that the application is doing will go through one of those connections. So then you end up with connection pooling at the front end and connection pooling at the backend, which gives you really good scalability.

Speaker 1: 00:58:49 Okay. So are the queries in queued, or sorry to use those connections?

Speaker 2: 00:58:57 Yeah, they’re essentially queued. So you do want to tune your number of connections to the, pretty much like the maximum number of active queries at any given point. And this really only works because not all connections will have active queries at every point in time.

Speaker 1: 00:59:17 Sure. Okay. Alright, well, thank you for answering my questions. Well, I think that’s all of the questions that we have. Yeah, this is really interesting and I really appreciate you taking the time to put this presentation together and to talk to the community. So thank you so much, Deepti, for joining us today.

Speaker 2: 00:59:38 Thank you for having me.

Speaker 1: 00:59:40 Yeah, well, you can see Deepti, she’ll be as part of our DOK panel that we’re putting on at Con, I believe on the first day of Coup Con, along with Gabrielle Bartini, Melissa Logan, the director of DOK, we’ll be there as well. And it’s going to be talking about the future of DBAs on Kubernetes. So should be a really interesting panel. So yeah, hope to see you all there. And of course we’ll post this video on YouTube. Actually, it should be available as soon as we finish, but I’ll remind everybody to go watch that. So yeah, thank you again for joining us. Really appreciate it and we’ll see you next time.

Speaker 2: 01:00:19 Thank you.

Speaker 1: 01:00:20 All right.

Data on Kubernetes Community resources