The Kubernetes community has made major strides in supporting stateful workloads, but it’s just the beginning of how Kubernetes could be leveraged as a standard to revolutionize data: to make data declarative, just like Kubernetes. The Data on Kubernetes Community (DoKC) is an openly governed community of practitioners who share an interest in making this a reality.

In this talk, Logan will provide an update on the DoK Community and share results from the first-ever Data on Kubernetes survey.

This talk was given by Melissa Logan as part of DoK Day at KubeCon NA 2021, watch it below. You can access the other talks here.

Melissa Logan

Welcome to Data on Kubernetes Day at KubeCon North America 2021. We’re glad to have you with us as we journey through the data on Kubernetes landscape today. We have over 20 talks that’ll walk us through the high level, including case studies with end users like DreamWorks Animation, Zalando, Sourcegraph, Macquarie Bank, and the 99 group, as well as the practical how-tos and tutorials to help you start your journey of running data on Kubernetes. My name is Melissa Logan, and I’m the director of the Data on Kubernetes Community. I’ve been working in tech for over 20 years and open source primarily for the past decade. In that time, I’ve had a front row seat to the rise of several new paradigms that have shaped our industry, starting with the .com era in the 2000s, Big Data, virtualization, cloud containers, open source. In recent years, we’ve seen a lot of energy around Kubernetes as it matures into its position as the de facto standard container orchestrator. This has been especially pronounced in the past year due to the pandemic, which has accelerated its transformations everywhere, and has fueled more open source software adoption overall. We all know Kubernetes for its ability to run stateless workloads, that’s what it was designed to do originally. And until persistent volumes came around, there wasn’t a reason to even think about doing stateful workloads on Kubernetes. When PVs made it possible and the operator pattern emerged, the organizations that had seen efficiencies running stateless workloads started doing so with stateful workloads. But it can be really challenging to run data on Kubernetes. We have early patterns for success, but they haven’t been hardened into code or standards. Today, I’m here to share data from the first ever data on Kubernetes survey that tells us how people are running data on Kubernetes today, challenges and benefits, and what the future may hold. 

Before we dive into the survey, for those of you who are new to us, we’re glad you’re here. DoKC is a new community just over a year old. It’s a group of practitioners who are sharing techniques and best practices for running data on Kubernetes. We’ve held 100 meetups, and we have a growing community of 1000s of people on Slack. The first DoK Day was actually held earlier this year at KubeCon in Europe. We had a few hundred people register in the spring, and this being our second event, there are over 3,000 people who have registered to be with us today. We’re seeing a growth in interest in this topic across all vectors. The community was originally stewarded by MayaData and later on DataStax. And the intention was always to bring more people together to solve these challenges. To do that, we recently announced a new governance structure and over 20 sponsors joined us to focus on that mission. These sponsors believed industry needs a place to collaborate to solve the challenges of working with data on Kubernetes. The addition of sponsors allowed us to field the survey we’ll talk about today, and it’s helping us facilitate an industry conversation among our sponsors and the community at large. We want to say a special thank you to our platinum sponsors DataStax, EnterpriseDB, MayaData, and PortWorx by Pure Storage for enabling us to be here with you today, as well as gold sponsors, Percona, and Red Hat, and our silver sponsors. The DoK community is in the process of forming working groups to help us better define the data on Kubernetes use cases and common challenges as a means of developing our collective roadmap for DoK. This also includes the formation of an end user working group to shed light on these questions. We talked to a number of end users both inside and outside our community today to understand the challenges they face running data on Kubernetes. And in the community, we’re able to bring these people together to collectively learn and grow. The bottom line is that we’re just getting started and if you’re running data on Kubernetes, or want to learn, it’s a great time to get involved, so please reach out to us. 

Digging into the survey, here you see our key findings. First, we learn that stateful workloads are pervasive on Kubernetes, and that the most advanced users are seeing massive productivity gains. Standardization is a key driver for people to move all workloads on Kubernetes, but it’s not without challenges. We’ll spotlight some of the top issues people struggle with running data on Kubernetes. And of course, looking ahead, people want to see further standardization in and around Kubernetes. Toward the end of this talk, you’ll see a key factor that may be driving this and what people would like to make running data on Kubernetes easier in the long run. The survey was fielded by a research firm named ClearPath Strategies. We surveyed 502 respondents from companies that are using or evaluating Kubernetes only. They came from a broad mix of roles and geographies. For our targeted demographic, we focused on practitioners, which made up about 35%. Managers about 20%, and executives about 45%. This was an online panel survey that was internationally sourced with 50% being in North America, 20% in Europe, and 30% in Australia, New Zealand, and Asia. For company size, we spoke with a mix of folks, the largest segment was enterprise which was 60% or more, those are the folks that have 5000 or more employees. And then mid-market at 30%, and small business was 10%. Almost half came from technology organizations which we define as those making software/hardware services, and/or a mix. And then 12% came from financial services 8% from manufacturing and heavy industry, and 6% from telco. This gave us a view of data on Kubernetes across a diversity of company sizes and sectors. Finally, we wanted to understand organizations’ IT infrastructure hosting strategy. As you can see here, this is a mix of environments that should be no surprise, I think it affirms for all of us that orgs operate in a very hybrid and multi-cloud world. As we dug into our survey, before we asked specifically about stateful, we wanted to generally understand how respondents are using Kubernetes today and the benefits that they see. Half of the respondents are running 50% or more of their overall production workloads on Kubernetes. And a majority or 50%, or more productive- pretty spectacular results. 69% said they are highly satisfied. What this tells us is that Kubernetes has become a core part of IT, respondents are highly satisfied, more productive, and they also intend to grow their footprint. Some data which is in the report, if you want to download that. 

We also took a look at what we call Kubernetes leaders throughout the survey. So this is a cohort of people who have 75% or more of their workloads on Kubernetes- a huge amount. When we look at this cohort, the Kubernetes leaders show even greater levels of productivity with most being two times or more productive. This is the long term promise of Kubernetes being realized in a big way. It’s also notable that this adoption is fairly recent, with over half adopting in just the past 12 months, the pandemic has fueled these fast transformations in a big way. 

In this next section, we sharpen our focus to look at data on Kubernetes rather than overall Kubernetes usage. When we start looking specifically at data on Kubernetes, it’s clear that enterprises are confident that Kubernetes is ready to run their organization’s stateful workloads in production: 90% believe this is true and 70% are already doing so. This gives us an idea of what they’re running today. Databases take the top spot with persistent storage, streaming, messaging, backup archival storage all tying for the second spot here. But overall, as you can see, it’s pretty distributed. And they intend to grow this footprint. Over half intend to increase the volume of stateful workloads by 30% or more in the next 12 months. 

Looking again at the cohort of Kubernetes leaders, databases are still in the number one spot and become even more important to this group, jumping 11%. Backup and archival jumps 15% to the number two spot, and we see AI and ML jump up a couple spots from the bottom from all respondents. And then you can see on the bottom half the need for persistent storage and analytics seem to become less of a concern for this group. You see those both dropped down to the bottom of this list. 

We wanted to know what was driving the decision to run stateful workloads in Kubernetes. So we asked people to identify the three most important factors, consistency and standardization top the list here followed by a number of operational efficiency gains such as simplifying management, and enabling developers to self-manage. This is even further underscored when we look at the Kubernetes leaders. The ability to standardize on Kubernetes is very important to them jumping 10% to the number one spot. The more workloads and organization runs in Kubernetes, the more can capitalize on this standardization advantage. In this next section, we talk about the challenges of running data on Kubernetes. So there are a lot of benefits to be gained as we saw previously, but it’s a pretty new pattern. And people do run into a range of different challenges. The primary challenge they run into is a lack of integration. It’s in the number one spot here. But you can see there isn’t much of a gap between that and the next few items on the list, including a lack of interoperability with tools and stack, vendor solutions, solving niche needs, and a lack of qualified talent. These all become really important challenges to solve for those running data and Kubernetes. 

The Kubernetes leaders face a different set of challenges with a four-way tie for first place- vendor solution solving niche needs, little or no vendor solutions exist, too much time and effort to manage, and a lack of qualified talent. So there’s some crossover you can see from all respondents, but a different ordering and level of concern for Kubernetes leaders here. And the talent gap was the most drastic difference when compared with all respondents, it jumped 11%. 

Next, we wanted to understand how Kubernetes operators were being used by people. The operator pattern came out in 2016, and extended Kubernetes used to stateful workloads. In subsequent years, many technologies created operators to act as that translation layer between the technology and Kubernetes. And today, as you can see, they’re being used to manage a wide range of apps. But it’s still a relatively new pattern and comes with its own set of challenges. Besides just trying to understand how they were being used, we wanted to understand those challenges. We asked people to identify what they run into with Kubernetes operators and again, we see interoperability as a top concern here. The majority of respondents experienced difficulty maintaining interoperability with other operators as they manage them across their portfolio. So databases being the number one workload, if you can imagine how many databases an org uses, they may have an operator for each one. And the number two challenge they run into is that operator quality is lacking. In fact, this is the number one reason for people who people cite for not running data on Kubernetes is a lack of quality operators. And number three, they list here there are no standards. So as a consequence of all of this, you can see on the right 61% of people are creating their own operators professionally. 

Finally, in our survey, we wanted to understand what would make people’s lives easier for running data on Kubernetes. We showed people a series of statements that asked which they agree with more even if they agree with both a little bit. We asked about the concept of declarative data, an ability to declare an outcome for your data and have Kubernetes do the work just like it does for your app or microservice. The majority believe that data should become declarative, just like Kubernetes. As we saw in the last section, our data suggests that a lack of standards makes data on Kubernetes more challenging to implement. So next we asked whether the creation of standards for data would simplify management and automation or make it more complex. A two to one majority believes that it would help simplify.

Finally, we wanted to understand the market conditions that may be accelerating the demand for running data and Kubernetes. We asked companies how important real time data is to gain competitive advantage. The majority here agree that how they leverage it will be important to gain that competitive advantage. 

Those were the bulk of our key findings from the first survey of the data on Kubernetes landscape. Data on Kubernetes is pervasive and beneficial, but the lack of interoperability and integration is a struggle for organizations today. In the future, this may be remedied by industry standards and exemplified by declared data or other similar concepts, and many people are working to solve this problem today. We see our work at the data on Kubernetes community growing to encompass the world of data technologies, data infrastructure, and data governance, and it requires contributions from everyone. We’re excited about the future of data on Kubernetes and hope you’ll join us in this journey.

The report has some additional data we didn’t talk about today you can download it at https://dok.community, and we welcome your feedback so we can sharpen next year’s survey. Please get in touch with us on Slack or email and let us know what you think. And if you are running data on Kubernetes are looking to do so we hope you’ll join our growing community and share your perspectives as we build the future of data on Kubernetes together. Thanks again to all of our sponsors for making DoK Day possible and thank you all for being here with us today. Bart, turn it back over to you.