We hold live meetups every week where different guests share their stories, wisdom and practical advice on how to overcome common issues. All meetups are recorded and put on our YouTube and Podcast channels.


5:00pm UK/9:00am PT
1-Is Kubernetes even ready for data?
Kubernetes has been a great solution for deploying application infrastructure. Trying to manage your data with the same control plane has been, less than ideal. This has been even more true when using distributed databases like Apache Cassandra. Once you get past the storage and stateful sets, you still have a lot to do. Let’s have a frank talk about the new opportunities to make Kubernetes ready for data.
Patrick McFadin
5:00pm UK/9:00am PT
2-Data on k8s maturity check
Let’s talk about storage. Optoro has moved to running stateful stores on Kubernetes. It’s a challenge, but it has a lot of value. Let’s talk about how we chose to do it, and what we figured out along the way.
Zach Dunn
5:00pm UK/9:00am PT
3-Design considerations for operationalizing Distributed SQL on Kubernetes
This talk is targeted towards cloud-native developers and architects looking to deploy the operational database on Kubernetes.  We are going to walk you through the design decisions YugabyteDB's team took when architecting the database as a service on Kubernetes. We are going to cover concepts related to Kubernetes Volume provisioning, pod placement strategies for data resilience/High availability, and how cluster events are used for reconciling the k8s workloads during day 2 operations like upgrades, scale-up/down.
Nikhil Chandrappa
5:00pm UK/9:00am PT
4-The problem of stateful workloads - balance of keeping data HA vs. costs
In an engineer’s ideal world we would love all the resources and redundancies we can possibly get for our services and infrastructure that supports them for sanity and of course, HA. However, how do you balance between “enough” redundancy and the actual operational costs of supporting such engineering choices, and what are some of the tough engineering decisions that need to be made? This talk focuses primarily on services being run on Kubernetes (or public cloud offering of Kubernetes), but the principles can be extended to any infrastructure environment. Key Topics: capacity planning, cost management, distributed services
Ren Lee
5:00pm UK/9:00am PT
5-The full cycle of doing data on k8s: a case study
Scaling ACID compliant databases in the cloud is challenging. We’ll look at a specific use case where we’re trying to scale a Saas Odoo ERP offering on Kubernetes and build a scalable Postgres cluster as a backend service.  
Dave Cook
5:00pm UK/9:00am PT
6-Operators, operators, operators…operators
Operators represent a great opportunity for the data community to solve for the complexities of managing data products for their customers in a way that standardizes UX and integration points -- historically the most powerful solutions had to be niche and highly customized.
Amit Gupta
1:00am UK/5:00pm PT
7-Conway’s Law & Kubernetes: Centralization vs. small team autonomy
Big clusters or small clusters?  Where to draw the line and how to know whats best for your use case? We will be talking to Joseph and Mike from Adobe about the inevitable questions that arise when running k8s at scale. If it is run by the platform team, is it inevitably a pet?  Or more of a pet?  Is that the idea, that we give stuff that ” must not fail” to platform teams so they are common services w/ SLAs?  Or how is it decided what is owned by the platform vs. the individual teams. While talking with Joseph and Mike we will also dive into what their stack looks like, must have tools they use on a daily bases, VM vs K8s, differences in stateful apps on k8s and War stories!
Joseph Sandoval, Mike Tougeron
5:00pm UK/9:00am PT
8-Appropriate workloads for databases in K8s
As more companies are moving to kubernetes and cloud native as a standard for developing net new functionality something has to happen to the legacy workloads. Often times we see a lift and shift mentality into kubernetes, we will talk about how that mentality can be dangerous or cause more work than expected.
Rick Vasquez
5:00pm UK/9:00am PT
9-Geospatial Sensor Networks and Partitioning Data
We use resources like weather reports or air quality measurements to navigate the world. These resources become especially important when faced by extreme events like the current wildfires in the Western USA. The data for the reports, predictions, and maps all start as realtime sensor networks.In this talk, I’ll present some of my research into scientific data representation on the Web and how the key mechanism is the partitioning, annotation, and naming of data representations. We’ll take a look at a few examples, including some recent work on air quality data relating to the current wildfires in the western USA. We’ll explore the central question of how geospatial sensor network data can be collected and consumed within K8s deployments.
Alex Miłowski
5:00pm UK/9:00am PT
10-Data on Kubernetes and container attached storage - an update
Back in 2018 the CNCF published a blog we wrote called Container Attached Storage. Today - September 22nd 2020 - a new blog is appearing on their site updating Container Attached Storage. This talk borrows very heavily from that blog. What is CAS? Why would anyone use Kubernetes itself for storage? How does a microservices architecture help? Why is shared storage at the end of the road - though still used underneath CAS sometimes?
Evan Powell
5:00pm UK/9:00am PT
11-Doing Data Wrong
In this talk, we'll look at great ways to lose data (like running databases on Kubernetes and bare metal), pain points for developers, lessons we've learned, and have a Festivus in September airing of grievances sessions for those who have felt this pain.,
Jeremy Tanner, David McKay
5:00pm UK/9:00am PT
12-PostgreSQL-as-a-Service on K8s at Zalando
PostgreSQL is a powerful, open-source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance, but an production-grade deployment requires many complementary technologies to the database core: high availability and automated failover, backup and recovery, monitoring and alerting, centralized access control and logging, connection pooling, and so on. Being not initially accustomed for running stateful workloads, Kubernetes with its infrastructure as a code paradigm, CustomResourceDefinition, and Operator pattern turned out to be extremely convenient for deploying and running PostgreSQL at scale. I will talk about a few open-source project developed and maintained by database team at Zalando which anybody could use to build own PgaaS: 1. https://github.com/zalando/patroni - Tool for PostgreSQL high availability and cluster management. Integrates with K8s API and makes PostgreSQL cloud-native. 2. https://github.com/zalando/spilo - The Docker image that packages Patroni, multiple versions of PostgreSQL, and tools for backup and recovery. 3. https://github.com/zalando/postgres-operator - Implements Kubernetes operator pattern, orchestrates hundreds and thouthands deployments of Patroni/Spilo clusters Aforementioned projects would never get to the current state without an effort of dozens external contributors.
Alexander Kukushkin
5:00pm UK/9:00am PT
13-Distributed Workloads on Kubernetes: Operators to the Rescue
How easily can you run distributed workloads on Kubernetes? The initial deployment of your 10-nodes database might be easy to setup, but day-2 operations (changing the configuration, adding and removing nodes, version upgrades, etc.) are much more complicated. We'll discuss how operators can help you manage distributed workloads, and a few operator tricks we learned while working on ECK (Elastic Cloud on Kubernetes) - an operator for the Elastic stack. 
Sebastien Guilloux
5:00pm UK/9:00am PT
14-Kubernetes Cost Control
The importance of cost control while working with the cloud. K8S, Data & Cost Control Hints/Tips around controlling your K8S costs,
Arie van den Bos
4:00pm UK/9:00am PT
15-Reaching limits in K8s: A case study with Ingress Controller
When talking about data, we usually think about big data and scale, and what do we do next. Such limits are sometimes a good problem to have. In this talk, we'll discuss our approach to this situation using the Ingress Controller.
Laurent Rouquette
4:00pm UK/8:00am PT
16-HyperStore-C: S3 object storage managed by Kubernetes
Cloudian’s HyperStore is S3-compatible object storage software focused on the enterprise market.  In this talk, I'll discuss how and why we are working on Kubernetes-managed versions of HyperStore, including where we are now and what we're looking.
Gary Ogasawara
4:00pm UK/8:00am PT
17-Is k8s Even Ready For Data? Round II
Data on Kubernetes Community: Is K8s even ready for data, Round II - Cassandra on OpenEBS - aka CaSS on CAS. In our inaugural DOKC meet-up, Patrick McFadin Developer Advocate at Datastax emphasized the challenges of running Cassandra on Kubernetes, concluding at one point that “Kubernetes might not be ready for Cassandra.” Since that meeting, the use of the open-source Container Attached Storage project OpenEBS as a simple and high performance per workload storage for Cassandra has proliferated. Also the Cassandra Operator from Datastax, aka “CaSS”, has progressed as well. So - where are we now? Is CaSS on CAS working well?  What is the future of collaboration between Datastax / Cassandra and MayaData / OpenEBS? Is Kubernetes now ready for Cassandra? What are the emerging technologies that might shape storage and Kubernetes in the near future?,
Jeffry Molanus, Patrick McFadin
9:00am UK/1:00am PT
18-DoK Panel: The State of State
Stateful vs stateless? We will stately be stating our statutes regarding the status of the state of statefulness and statelessness on k8s- oh yea! In the DoK Community, one of the main issues that folks have are how in the world can they flatten the learning curve when it comes to running stateful applications in k8s. That's why we've brought on 3 experts from 3 different countries to tell us what state state (intentionally doubled) is in!, Dok Panel: The State Of State On k8s, Just a participant of the meetup discussion, https://www.meetup.com/Data-on-Kubernetes-community/events/274551382/ A fireside chat on states and kubernetes.
Rosemary Wang, Lili Cosic, Tomasz Cholewa, Jacquie Grindrod
5:00pm UK/9:00am PT
19-Towards a K8s Native Streaming Application
Starting from a simple application which can be deployed in every machine running Docker, we will go through all steps required to transform the simple app into a Kubernetes native streaming application. We will explain the theory and then exemplify the learnt concepts to define a recipe for running streaming applications on Kubernetes. We will focus both on cultural and technical tricks to help you successfully adopt streaming applications at scale. At the end of the talk, you will have a comprehensive view regarding all platform building blocks and application requirements needed to successfully run a streaming application on Kubernetes. Spoiler: you will hear several times the words Apache Kafka, Kafka Streams and Strimzi.,
Francesco Nobilia, Jeremy Frenay
5:00pm UK/9:00am PT
20-Tips and tricks to get Kubernetes certifications
CKA (Certified Kubernetes Administrator) has a bad reputation as the hardest certification many people have faced. In this talk, we will go through the process to pass successfully the exam, tips on the exam itself, the environment and any other question that might arise. , How to fly into a kubernetes certification.
Eneko Perez, Carlos Gomez Carrero
5:00pm UK/9:00am PT
21-Data on Kubernetes: my insights
Data handling is one of the hardests things in Kubernetes. This talk will be an informal conversation about things (relateded to data management) Eduard found helping customers to embrace Kubernetes. I hope you find them useful!
Eduard Tomàs
5:00pm UK/9:00am PT
22-Vitess Operator for Kubernetes
In this talk, I would like to uncover our newly announced Vitess Operator for Kubernetes. This talk demonstrates the sample implementation of Vitess in Kubernetes topology. I also explore common DBA tasks by demonstrating how they are handled in the Vitess ecosystem. Vitess, out of the box, comes with a lot of tools and utilities that one has to either incorporate or develop to manage MySQL topology. Let’s take a look at the capabilities of Vitess in these areas and demonstrate how they are performed under the operator realm. 
5:00pm UK/9:00am PT
23- 2021 DoK Community Kickoff! Trends, friends, and more!
For our 23rd installation of the Data on K8s community meetup, we will be talking with Ariel Munafo who is a CNCF ambassador and the founder of EuropeClouds (among many other things), Arie Van den Bos Senior Systems Engineer on Cloud Systems at Kurago, and Jake Page who is a DevOps and Cloud Native Enthusiast.,
Ariel Munafo , Jake Page , Arie van den Bos
5:00pm UK/9:00am PT
24-The architecture of a distributed database
Cockroach Labs has built a database architected from the ground up to be distributed. It is a perfect fit for the cloud and Kubernetes as it naturally scales and survives without manual interaction. The unique architecture of CockroachDB delivers some key innovations that may not only provide value for your applications but might also give you insight into the challenges/solutions in distributed systems. In this session, we will deliver a deep-dive exploration into the internals of the database, exploring the following, and more: * How the database uses KV at the storage layer to effectively distribute data * How Raft and MVCC are used to guarantee serializable isolation for transactions * How Cockroach automates scale and guarantees an always-on resilient database * How to tie data to a location to help with performance and data privacy
Jim Walker
5:00pm UK/9:00am PT
25-Deconstructing Postgres into a Cloud Native Platform
Is deploying Postgres in Kubernetes just repackaging it into a container? Can’t Postgres leverage the wide range of Cloud-Native software and integrate well with K8s? Join this journey that will cover and demonstrate, with demos running on StackGres: * How to structure Postgres into an init-less container, plus several sidecar containers for connection pooling, backups, agents, etc. * Defining high level CRDs as the single API to interact with the Postgres operator. * Using K8s RBAC for user authentication of a web UI management interface. * Using Prometheus for monitoring; bundling a node, Postgres and PgBouncer exporters together. * Proxying Postgres traffic through Envoy. Terminate Postgres SSL with an Envoy plugin, that also exports wire protocol metrics to Prometheus. * Using Fluentbit to capture Postgres logs and forward them to Fluentd, which stores them on a centralized Postgres database.
Alvaro Hernandez
5:00pm UK/9:00am PT
26- How to unblock your release pipelines with data
Even though microservices are becoming a pattern, we still see a lot of "monolithical" deploys and manual reactive actions. This blocks the ability to achieve maximum velocity in your release. We can leverage data and smart use of traffic-shaping to achieve a higher release velocity AND quality.
Olaf Molenveld
3:00pm UK/7:00am PT
Nederkube Edition #1 - Is Kubernetes ready for Data Management?
Kubernetes became the standard for micro services architectures. But what about handling massive and scalable data management on top of it? Is it possible and what does it mean for operations? Cassandra has been adopted widely and accepted globally as the most scalable and reliable database. Now it adds ease of use by offering a Kubernetes native plug and play solution for enterprise use!,
Michel de Ru, Arie van den Bos, Jeffry Molanus
10:00pm UK/2:00pm PT
"DoK Brazil #1 - DevOps, Kubernetes and Data"
My experience in this contemporary technology journey of the last 4 years, fears, mistakes, IT paradigms, and agile methodologies impact my goals.
Rogeria Portilho Rodrigues
5:00am UK/9:00pm PT
"27- Cost management for OpenShift, a new SaaS service to understand your Kubernetes costs"
For IT decision-makers, this goes above and beyond just keeping infrastructure running and efficient; it is about understanding how your IT budget affects your business, and how well your resources maximize the use of your budget. This makes it critically important that IT teams can more quickly and easily see the totality of their IT costs across the hybrid cloud. We’re pleased to introduce a new software-as-a-service (SaaS) offering intended to help our customers better understand the costs of their OpenShift environments: OpenShift cost management. Available free of charge as part of a Red Hat OpenShift Container Platform subscription, OpenShift cost management provides a simplified, more intuitive view into the costs, from the macro to the granular, of an OpenShift deployment.
Sergio Ocón Cárdenas
5:00am UK/9:00pm PT
28- Getting Started Contributing to Kubernetes
This talk will walk through how to get started contributing to Kubernetes, combatting imposter syndrome, the many other ways you can get started contributing to K8s other than by writing code, and the benefits to joining a community such as K8s. ,
Rin Oliver, Savitha Raghunathan
5:00pm UK/9:00am PT
#30 Kyverno for Kubernetes!
Kubernetes is powerful but can be complex to manage! In this talk, Jim Bugwadia from Nirmata will show how policy managers can help address the complexity via admission controls and dynamic configurations. Jim will introduce Kyverno, a Kubernetes native policy engine and CNCF sandbox project. Jim will then demonstrate how you can use Kyverno to ensure security and best practice compliance for your clusters.
Jim Bugwadia
7:00pm UK/11:00am PT
DoK Brazil #2: Bora entender as Bases de dados na nuvem com a ajuda de Wagner Bianchi! (Talk in Portuguese)
Uma conversa descontraída sobre o futuro de bases de dados como um serviço. Dados em Kubernetes desde o ponto de vista dum DBA. E várias outros assuntos parecidos.
Wagner Bianchi
5:00pm UK/9:00am PT
#31 The Data Lifecycle - Where Do We Go From Here
Going from raw data to machine learning models successfully in companies of all sizes requires more than just an understanding of programming. Teams need to manage their data products lifecycle, their software as well as the data. Data products like machine learning models aren’t created out of thin air. They are built on layers of best practices that ensure the models are using accurate data, they are outputting reliable numbers and they have some method to interact with the outside world. So how do we get there? The purpose of this talk is to discuss the current state of the data lifecycle as it pertains to creating data products. This could be machine learning models, dashboards and data APIs. We will outline the general architecture that helps take data from raw to some form of machine learning model. In addition, we will discuss some of the concepts that are being applied from DevOps as well as being created in MLOps to help better facilitate your data life cycle. 
Benjamin Rogojan
5:00pm UK/9:00am PT
#32 How to choose a Kubernetes distribution for on-prem environments?
Buy a ready off-the-shelf product, customize an existing open source project, or build your own distribution? When you can't go to the cloud and leverage its powerful features you have to make a choice. On-prem environments need more attention, but they also often can be more cost-effective and are highly coveted by the development and operations teams. In this talk, I will cover some of the most important topics related to building an on-prem Kubernetes platform and I will describe the most popular distributions.
Tomasz Cholewa
5:00pm UK/9:00am PT
#33 Making observability accessible is the fourth pillar
Observability systems are typically a collection of tools that cover the three pillars of logs, metrics and tracing. These enable skilled engineers to correlate telemetry insights to perform data-driven diagnostics and rectify degraded services. In this talk, I discuss how over the course of three years, I have worked towards removing the built-in gatekeeping that comes with creating monitoring solutions and enabling them to work for an entire organisation. We shine a light on the overlooked developer community that interact with Observability but does not necessarily hail from SRE disciplines. Engaging with anecdotes from my past and illustrating the inherent bar to success that comes with connecting multiple tools together and the context that requires to achieve results. With years of experience working to improve adoption and create consumer-friendly facades for tools such as Grafana, Prometheus and Jaeger; I draw upon my background within large financial institutions and how building engaging and simplified DX can compel and excite engineers to work with observability.
Alex Jones
5:00pm UK/9:00am PT
"#34 Opstrace, An open source alternative to services like Datadog, SignalFx, and others..."
Open source observability should not be hard. What companies package as their enterprise offering should be available to anyone who wants to monitor their systems. Opstrace is a complete monitoring platform designed for the end user instead of the expert. It's goal is to be as easy to use and operate as a hosted SaaS provider but within ones own cloud account. This is not only up to 10x more cost-efficient but also allows full control over ones data.
Sébastien Pahl
4:00pm UK/9:00am PT
#35 Make Kubernetes your development environment
Developers spend a lot of time making their local machine look like a cluster. But why do we do that? Our local machine is not where our code is supposed to run! We built okteto (github.com/okteto/okteto) so we can make our Kubernetes clusters look like our local machine. In this talk, we'll show you how okteto helps you take advantage of all the goodness of Kubernetes and the cloud without having to sacrifice a really fast development and feedback loop.
Ramiro Berrelleza
4:00pm UK/9:00am PT
"St.Patrick´s Day Special - A diplomatic answer to the meaning of data, kubernetes, and everything"
I will talk about my experiences entering the world of databases and data management after a very different life as a diplomat. I will introduce TerminusDB and it's world history origins. Finally I will situate the project and the roadmap from a k8s perspective.
Luke Feeney
4:00pm UK/9:00am PT
#36 A Snapshot of DevOps
DevOps is like a camera. We focus on what's important, we capture the good times, we develop from the negatives, and if things don't work out, we take another shot. Many teams establishing working best practices for their tools improve their time to deliver and ability to scale. However, the real challenges exist outside of tools and technology and many teams today still have questions about DevOps. So, join this session to learn the fundamentals of shaping a DevOps culture. We'll discuss key attributes around people, process, and technology, likening you and DevOps to pro photographers and cameras.
Tiffany Jachja
2:00pm UK/7:00am PT
My questions about Data on K8s
Kunal Kushwaha
4:00am UK/9:00pm PT
#37 Running Data Replication Pipelines on Kubernetes with Argo
Hundreds of data teams have migrated to the ELT pattern in recent years, leveraging SaaS tools like Stitch or FiveTran to reliably load data into their infrastructure. These SaaS offerings are outstanding and can accelerate your time to production significantly. However, many teams prefer to roll their own tools. One solution in these cases is to deploy singer.io taps and targets — Python scripts that can perform data replication between arbitrary sources and destinations. The Singer specification is the foundation for the popular Stitch SaaS, and it is also leveraged by a number of independent consultants and data projects. Singer pipelines are highly modular. You can pipe any tap to any target to build a data pipeline that fits your needs, making them a good fit for containerized workflows. This article walks through the workflow at a high level and provides some example code to get up and running with some shared templates. I also drill into reasons for choosing the Argo approach over other orchestration tools like Airflow or Dagster, and the implications from a team perspective.
Stephen Bailey
5:00pm UK/10:00am PT
DoK en español #1- Nuestros aprendizajes con Kubernetes
Our learnings from Kubernetes,
Aitor Artola, Isidro Nistal, Miriam González, Raquel López Ruiz
4:00pm UK/9:00am PT
#29 How Absa Developed Cloud Native Global Load Balancer for Kubernetes
Global load balancing, commonly referred to as GSLB (Global Server Load Balancing) solutions, have typically been the domain of proprietary network software and hardware vendors and installed and managed by siloed network teams. k8gb is a completely open source, cloud native, global load balancing solution for Kubernetes. k8gb focuses on load balancing traffic across geographically dispersed Kubernetes clusters using multiple load balancing strategies to meet requirements such as region failover for high availability. Global load balancing for any Kubernetes Service can now be enabled and managed by any operations or development teams in the same Kubernetes native way as any other custom resource. The talk will cover both technical and business aspects of k8gb creation including ongoing adoption within the huge scale organization.
Yury Tsarev
5:00pm UK/10:00am PT
Dok en español #2 ¡Suelten el Krake! Trayendo la Energía al Lazo de Cómputo / Release the Krake! Bringing Energy into the Compute Loop
ES: Cloud&Heat has always focused on providing energy-efficient data centers. In the last 8 years, we have developed an innovative water cooling technology for servers, converting waste heat into a valuable asset. By doing so, we have already greatly improved the energy efficiency of individual data centers. However, this isn’t enough. To globally maximize the efficiency of distributed data center infrastructures, this talk presents Krake. Krake is an orchestration software for compute-intensive jobs. It improves the global cost and energy efficiency of infrastructures by balancing the load between data centers. Krake evaluates and selects the most efficient site to run jobs based on certain metrics, such as energy availability, heat demand, and latency. It also reacts to changes in the system by migrating jobs. In other words, it ensures a job is run in the most energy- and/or cost-efficient way at any given time. / EN: Cloud & Heat siempre se ha centrado en proporcionar centros de datos energéticamente eficientes. En los últimos 8 años, hemos desarrollado una innovadora tecnología de refrigeración por agua para servidores, que convierte el calor residual en un activo valioso. Al hacerlo, ya hemos mejorado enormemente la eficiencia energética de los centros de datos individuales. Sin embargo, esto no es suficiente. Para maximizar globalmente la eficiencia de las infraestructuras de centros de datos distribuidos, en esta charla presentaremos Krake. Krake es un software de orquestación para trabajos intensivos en computación. Mejora el costo global y la eficiencia energética de las infraestructuras al equilibrar la carga entre los centros de datos. Krake evalúa y selecciona el sitio más eficiente para ejecutar trabajos según ciertas métricas, como la disponibilidad de energía, la demanda de calor y la latencia. También reacciona a los cambios en el sistema mediante la migración de trabajos. En otras palabras, asegura que un trabajo se ejecute de la manera más eficiente en términos de energía y costo en un momento dado.
Juan A. Fraire
5:00pm UK/9:00am PT
#38 Patterns to create stateful applications on Kubernetes
In this talk we will discuss what are the best patterns to create stateful applications on top of Kubernetes. This will include application layer caching, embeddable database as well as leveraging kubernetes objects to store and sync state across multiple replicas.
Prashant Ghildiyal
5:00pm UK/9:00am PT
#39 A fireside chat with Jérôme Petazzoni
A fireside chat with Jérôme Petazzoni in which we will get to know him up close and personal, ask him about how his personal music projects influence his professional work, and answer questions from the audience.
Jérôme Petazzoni
5:00pm UK/9:00am PT
#40 Cloud-Native Chaos Engineering in Databases
Chaos Engineering is revolutionizing testing means and doing it the cloud-native way is the best way in today's rapidly changing world with a huge shift in the paradigm of Kubernetes resiliency. Karthik S, one of the maintainers for LitmusChaos would be introducing how to carry out Chaos Engineering, the cloud-native way. Further, he will touch upon how Chaos Engineering is carried out in Cloud-Native Databases with LitmusChaos. He will also touch upon observability considerations for chaos engineering and what hooks Litmus provides for the same.
Karthik Satchitanand
5:00pm UK/9:00am PT
#41 Designing Stateful Apps for the Cloud and Kubernetes
Almost all applications have some kind of state. Some data processing apps and databases have huge amounts of state. How do we navigate a cloud-based world of containers where stateless and functions-as-a-service is all the rage? As a long-time architect, designer, and developer of very stateful apps (databases and data processing apps), I’d like to take you on a journey through the modern cloud world and Kubernetes, offering helpful design patterns, considerations, tips, and where things are going. How is Kubernetes shaking up stateful app design?  - What kind of state is there, and what are some important characteristics? - Kubernetes, containers, and the stateless paradigm (pushing state into DBs) - Where state lives and the persistence characteristics - Stateless vs serverless - why stateless is not really stateless, but server less really is - Improving on stateless paradigm using local state pattern - Logs and event streaming for reasoning about state and failure recovery - The case for local disks: ML, Databases, etc. - Kubernetes and the Persistent Volume/StatefulSets - Leveraging Kubernetes PVs as a basis for building distributed data systems - Mapping the solution space
Evan Chan
10:00pm UK/2:00pm PT
"DoK Brazil #3 Como CNCF Brasil pode nos ajudar na nossa carreira de SRE, DevOps ou Dev."
Talk in Portuguese
Paulo Alberto Simoes
5:00pm UK/9:00am PT
DoK en español #3: Almacenado de BigData en k8s: El reto de obtener el mejor rendimiento.
Vivencias y experiencia en el proceso de creación de una startup cloud-native en donde unos de los principales caballos de batalla es y será el almacenado en kubernetes.
Aitor Artola
5:00am UK/9:00pm PT
#42 Spark on Kubernetes is Now Generally Available: Why & How to Migrate to It
Apache Spark natively runs on top of Kubernetes (instead of Hadoop YARN) since 2018, but it's only since Spark 3.1 (released in March 2021) that the integration is now officially generally available & production-ready. What is the high-level architecture of Spark on Kubernetes, how does it compare to alternatives, what does the migration look like? These are some of the questions we will answer together. We will first introduce the core concepts, then go through the stories of customers who migrated, and then give you concrete technical tips to help you be successful with Spark (on Kubernetes). If time permits, I may do a risky live demo. This will be a technical talk with very fresh content - I hope you will like it. I plan to make it short enough to make room for Q&A and improvisations based on your request. So let me know if there's something specific you're interested in.
Jean-Yves Stephan
5:00pm UK/9:00am PT
#43 Kubecost: open source cost monitoring for Kubernetes
Measuring costs in Kubernetes environments is complex. Applications and their resources needs are often dynamic. Teams share resources without transparent prices attached to workloads while organizations are increasingly running resources on a range of machine types and even cloud providers. Kubecost provides an approach built on open source for ensuring consistent and accurate visibility across all your workloads. This discussion will talk about practical examples for implementing cost monitoring & optimization and managing the data that is generated from these efforts.
Webb Brown
5:00pm UK/9:00am PT
#44 DataOps
The talk will cover the various aspects of DataOps, why DataOps is important. It will also talk about some of the client experiences and how DataOps strategy is helping addresses some of the challenges. The talk will also cover the DataOps implementations, tools and technologies.
Vijay AB Kumar
5:00pm UK/9:00am PT
#45 K8s DX Chronicles: Evolution From CLI to GitOps & Cloud Native IDEs
Within its 7 years of existence, Kubernetes has been the gravitational center of the Cloud Native landscape, elevating a pluggable system that contributed to the diversification of the entire ecosystem. Wider adaptability of the tool prompted the diversification of the end-user base, and a consistent DX for cluster interaction became essential for Kubernetes. The community channeled herculean efforts towards the enhancement of the developer experience by extending the cluster CLI, building portals, and highly-responsive UIs.
Katie Gamanji
5:00pm UK/9:00am PT
#46 Recovering and Porting Applications in the Fast-Paced DevOps World
Are you a Cloud Architect, DevOps Engineer or SRE who is developing cloud-native applications, managing complex app migration projects or needs infrastructure resiliency? Cloud-native applications present extraordinary performance, scale and compliance challenges in hybrid- and multi-cloud environments that legacy tools simply cannot support. In this session and demo, we’ll take you thru a case study for a large aerospace and defense company who is managing and migrating Kubernetes applications and databases in a multi-cloud environment. You’ll also learn how to handle common cloud-native development challenges like recovering from accidental namespace deletions during test/dv or migrating your application to another cloud for scale and performance testing.
Prashanto Kochavara
1:00pm UK/5:00am PT
DoK #47 FullStack OpenSource Observability using SigNoz
In the talk, we shall dive deep into the latest open-source tools like Prometheus and Jaeger and our journey in using them and ultimately building our own open-source observability tool, SigNoz. We shall discuss: - What is Observability? The 3 pillars of Observability - Metrics, Traces, and Logs - How is monitoring different than observability? - The hard things about Prometheus? - Why Distributed Tracing became so important? - Running both Prometheus and Jaeger to get metrics + traces. How complex can it go? - Pros and cons of using SaaS vs OSS solutions. Why self-host in the 21st century? - Why we built SigNoz? - What is OpenTelemetry? How to instrument a sample app using OpenTelemetry? - Demo of SigNoz to get detailed insights into your applications
Ankit Nayan
2:30pm UK/6:30am PT
DoK in Hindi #1: Pehle Kadam Data on Kubernetes Community mein!
Kya hota hai Kubernetes? Shuruwat kahan se kare? Community ka hissa kaise bane? Kya aap ke mann mein bhi ye sawaal aate hain? Join kariye hume iss meetup mein jahan hum baat karengey har cheez Data on K8s ke baare mein (Hindi mein)! May 3rd ko hum charcha karengey ki kaise aap community ka hissa ban sakte hain, CNCF kya hai, ek SRE ka kaam kya hota hai, and bahot kuch! But yehi nahi! Bhaag lijiye meetup ke end ki quiz mein jisse aap jeet sakte hain kuch special SWAGS DoK ki taraf se!
Kunal Kushwaha
5:00pm UK/9:00am PT
DoK #48 Airflow vs Argo - Battle Royale
We are going to be looking at and comparing Airflow (the established) versus Argo Workflows (The new kid on the block) and see how they measure up. What you would use each for, why you would want to use one or the other and who would win in a battle for data workflow management supremacy.
Tim van de Keer
5:00pm UK/9:00am PT
DoK #49 Deployments vs StatefulSets vs Daemonsets
Kubernetes provides different resources for deploying applications, we will be looking at them and the differences between them and how can we persist data using each of them.
Ali Kahoot
5:00pm UK/9:00am PT
DoK #50- Going Full Circle with Kafka
Tecton is building a data platform for machine learning. This talk shares some of the adventures and lessons learned while introducing Kafka into our data pipelines.
Ravi Trivedi
5:00pm UK/9:00am PT
DoK #51 Promscale: Using Prometheus + Promscale + PostgreSQL to go from Observation to Understanding
Often when I talk about putting observability data into PostgreSQL people ask me: are you crazy? And yet this somewhat heretical view has the potential to unlock a lot of the power and promise of observability. Thanks to TimescaleDB (an extension to PostgreSQL), storing time series, metric data inside of a relational database is now efficient, fast, and scalable. This is thanks to its unique partitioning, compression, and horizontal-scalability features.  But even if this is possible to do, why would you? The answer to that lies in the power of a flexible data model, joins, and SQL (which Promscale supports in addition to PromQL). A flexible data model allows you to combine metric data with various other data - from machine information such as the number of cores, memory, etc. to location information using GPS coordinates. This allows you to enrich your metrics with supplemental information using joins and performing much more sophisticated analysis using SQL for capacity analysis, BI, and more.  A flexible data model brings us to our second heretical idea: combining multiple modalities of observation in a single database. Combining metrics, logs, traces, event data, etc. in one DB has two major advantages: the first being a similar analytical advantage to what is described above: the ability to join and cross-correlate various types of signals together. The second major advantage is operational simplicity. As we all know, databases are the hardest things in our infrastructure to maintain and operationalize because of that pesky thing called state. So why maintain multiple different types of database systems if you could maintain just one?  While these ideas about observability data on Kubernetes may seem unusual and counter-intuitive, I hope they will generate interest and start a good conversation.
Matvey Arye
3:30pm UK/7:30am PT
Introduction to Docker: a meetup with DoK + CNCF Students Group
In this event we’ll introduce Docker and containers. You’ll learn how Docker works, and how to create and run Docker images. Since this is a Data on Kubernetes event, we’ll also look at how to attach Docker containers to persistent storage. This will set us up for future talks in this series, where we’ll talk about running applications on Kubernetes, including databases. Jeff will also share a few thoughts on being adaptable in your career path and how to stay current.
Jeffrey Carpenter
5:00pm UK/9:00am PT
DoK #52 Enterprise-grade Kubernetes requirements
We'll discuss best practices companies are adopting for enterprise-grade Kubernetes Management.
Haseeb Budhani
5:00pm UK/9:00am PT
DoK #53 Day Zero - Azure Kubernetes Service
Are you new to azure kubernetes service and just want to see how the nuts and bolts come together ? This is the talk to be. Single slide and a end to end demo on how to run your first container on aks.
Raj Balakrishnan
5:00pm UK/9:00am PT
DoK #54 Putting Chaos into Continuous Delivery - How to increase the resiliency of your applications
Continuous Delivery practices have evolved significantly with the cloud-native paradigm. GitOps & Chaos Engineering are at the forefront of this new CD approach, with an ever-increasing pattern involving Git-backed pipeline definitions that implement “chaos stages” in pre-prod environments to gauge service-level objective (SLO) compliance. In this talk, Juergen Etzlstorfer (maintainer of Keptn CNCF project) will discuss how you can construct pipelines that include chaos experimentation (using LitmusChaos) while simulating real-world load, and implement quality gates (based on SLOs) to ensure only resilient applications are deployed into production. He will also demonstrate how you can include chaos tests to your existing CD pipelines without the need of rewriting them.
Jürgen Etzlstorfer
5:00pm UK/9:00am PT
DoK #55 How to optimise operations and life cycle management for containers?
Modern applications are built to run on containerized infrastructure. Businesses are also migrating their existing apps from traditional to container deployments. In such a scenario, gaining end-to-end visibility of the complete Kubernetes container environment is an important challenge for the IT Operators/Administrators. In this talk, we will cover the following. - New-age business complexities. - How applications are moving from monolith to microservice architecture. - Operational challenges in monitoring the container architecture. - Strategies to efficiently manage the life cycle of the containers.
Rajalakshmi Srinivasan
5:00pm UK/9:00am PT
DoK #56 It's just a SQL - Crash course on Synapse Serverless for T-SQL ninjas!
Are you a seasoned T-SQL developer, used to solve each and every challenge by writing plain old SQL? But, now you need to leverage data coming from semi-structured or unstructured sources? What if I tell you that you can get your mission accomplished by writing your favorite T-SQL syntax? In this session, you will learn what is a Serverless SQL pool within Azure Synapse Analytics, how it works behind the scenes, and how can you preserve your "T-SQL Ninja" status even when dealing with the data coming from CSV and Parquet files, or from NoSQL database.
Nikola Ilic
5:00pm UK/9:00am PT
DoK #57 Key Criteria for Evaluating Kubernetes Data Storage
Enterprises of all sizes are embracing hybrid cloud strategies that are ever more complex and structured, moving quickly from a first adoption phase, where data and applications are distributed manually and statically across different on-premises and cloud environments, to a new paradigm in which data and application mobility is the key to flexibility and agility. Now organizations want the freedom to choose where applications and data should run dynamically, depending on any number of business, technical, and financial factors. Kubernetes is instrumental in executing this vision, but it needs the right integration with infrastructure layers—such as storage—to make it happen.
Enrico Signoretti
5:00pm UK/9:00am PT
DoK #58 Benchmarking for PostgreSQL workloads in Kubernetes
Databases like PostgreSQL cannot run on Kubernetes. That’s the refrain we hear all the time, as well as the motivation for us to break this barrier, once and for all. Hear the story of our journey so far about bringing PostgreSQL to Kubernetes. Discover why we believe that benchmarking both the storage and the database before production leads to a healthier and longer experience of the DBMS, even in Kubernetes. We’ll be sharing our process, and the results obtained so far, and unveil our plans for the future.
Gabriele Bartolini, Francesco Canovai
5:00pm UK/9:00am PT
DoK #59 Let's get Real: SRE | Do we need it?
More and more companies around the world are adopting SRE. Despite Google's great book series regarding SRE, there is no default implementation for SRE. Join me and watch me explain this by taking my home country as an example =).
Benoit Schipper
5:00pm UK/9:00am PT
Postgres on Kubernetes Hands-On-Lab
From 0 to 60/100 (depending on where you live) in just 2h! It may sound "slow" if you talk about cars, but when you talk about databases in general and Postgres in particular, it isn't! From an empty Kubernetes cluster, you will leave the session with one or more Postgres clusters created, all with: high availability and automatic failover; automated backups with lifecycle; distributed logs with lifecycle; Web Console to manage it all; connection pooling; tuned Postgres and connection pool configurations; and any number of installed Postgres extensions. All this in just a single session! BYOK (Bring Your Own Kubernetes): come prepared to the session with a Kubernetes cluster, and ready to create YAMLs and deploy via GitOps; and to do the same using the Web Console, for point-and-click lovers. And learn how to automate Postgres Day2 operations! This session is a tutorial on production quality Postgres clusters based on the open source StackGres.io platform. Go from Zero to Postgres Hero in just one Hands-on-Lab! $50 AWS vouchers will be provided for the first 10 attendees to register and share the event on both twitter and linkedin courtesy of AWS, so you can bring an AWS EKS cluster!
Alvaro Hernández
3:30pm UK/7:30am PT
DoK #60 Intro to Kubernetes
In this event we will introduce Kubernetes, containers and the cloud native initiative. You will get an overview of the benefits of containers running on Kubernetes and the new mindset that it requires, a new mindset driven by the cultural change the cloud native initiative is promoting. Concepts related to microservices and automation will be covered giving an overview of the different kinds of open tools you can find in the cloud native ecosystem to build an run modern applications in the cloud. Sections 1.- Introduction to Kubernetes & Cloud Native 2.- Docker & Containers. Microservices 3.- Kubernetes. The container orchestrator 4.- Cloud Native with Kubernetes. Modern applications 5.- Cloud Native tool landscape
Aitor Artola
5:00pm UK/9:00am PT
DoK #61 Perfecting Machine Learning Workloads on Kubernetes
More and more applications are powered by Machine Learning (ML) models. Where the gap between Software Engineers and a Production environment on Kubernetes is already big, the gap between Data Scientists and that same production environment is enormous. In this talk, we will provide you with a framework for translating ML requirements into infrastructural requirements and concrete Kubernetes resources. In the first half of this talk, we will discuss how ML applications are different from most other applications, how ML workloads are structured and how ML requirements translate into Kubernetes resource configurations. In the second half of the talk, we will put this theory into practice. We will do a live demonstration of an ML Deployment on Kubernetes using Istio, Knative and Kubeflow Serving.
Lars Suanet
4:00pm UK/8:00am PT
DoK #62 Easy Kubernetes Volumes using Longhorn
Longhorn is a lightweight, reliable, and powerful distributed block storage system for Kubernetes. It is an open source tool that can be installed on any Kubernetes Cluster. It has features like incremental snapshots and backup that can be backed up to NFS or S3-compatible object storage. In this talk, you will learn about Longhorn, its features including backup/recovery, and how you can take maximum benefit for your persistent Kubernetes volumes. You will also be shown a UI to understand the features in a much better way.
Saiyam Pathak
6:00pm UK/10:00am PT
DoK #63 Stranger Danger - Kubernetes Edition
Kubernetes is a powerful set of abstractions, but it's flexibility and configurability means it's pretty insecure by default. In this hands on talk, I'll show how an attacker can expand the blast radius of an exploit from a vulnerable web application in a container to owning the entire cluster. I'll also cover some ways in which you can prevent this happening to you !
Matt Jarvis
5:00pm UK/9:00am PT
DoK #65 Using Kubernetes and ClickHouse to enable high performance app analytics
Embedded analytics are a major source of value to application users. Virtually every SaaS offering has them or is adding them now. This talk shows how to build low latency analytic applications on Kubernetes with ClickHouse, a popular, open source data warehouse. We'll start with the ClickHouse Kubernetes Operator to manage data warehouses, then cover ingest and visualization options to build a complete apps. Since this is a K8s talk, we'll of course geek out on the underlying plumbing as well.
Robert Hodges
5:00pm UK/9:00am PT
DoK #66 Crossplane Packages as a Distribution Mechanism
A typical user's journey with Crossplane starts with provisioning infrastructure using the Kubernetes API, then evolves to composing infrastructure into higher level abstractions, and culminates with building a complete platform using packages. Crossplane packages are distributed as OCI images, meaning that a platform API can easily be reproduced in any cluster, and they can declare dependencies, which specify the lower level services that support the higher level abstractions. This functionality allows for companies to distribute their product in an infrastructure provider-agnostic manner, and for infrastructure admins to build internal platforms made up of both generic and organization-specific components.
Daniel Mangum
5:00pm UK/9:00am PT
DoK #67 Run Apache APISIX in Kubernetes
Apache APISIX is a dynamic, real-time, high-performance API gateway. You can use Apache APISIX to handle traditional north-south traffic, as well as east-west traffic between services. It can also be used as a k8s ingress controller. In this talk, Jintao Zhang will introduce how to run Apache APISIX on k8s and how to use Apache APISIX as an ingress controller.
Jintao Zhang
5:00pm UK/9:00am PT
DoK Talks #68- The Kubernetes-native way to providing database services to developers
As Kubernetes is becoming the infrastructure platform of choice in many companies, database teams are struggling with the question whether to run databases on the cluster or outside of it. This talk will not answer that age-old question. I would rather focus everyone's attention to the developer experience. Modern CI/CD processes need development teams to be flexible and able to deliver without having to ask other teams for resources. So how can a database team provide reliable service while upping their game in DevEx? The answer in Kubernetes-land is providing Custom Resources backed by operators which handle database provisioning on or off the cluster in a way that fits into a GitOps CI/CD workflow. In this talk I will explain the concepts and dive into how you can make your own operator to provide a self-service interface for developers.
Adam Sandor
5:00pm UK/9:00am PT
"DoK #69 To Certify or Not to Certify, is Kubernetes Certification Worth it?"
As an engineer, should I consider getting a certification? What makes a certification valuable to me or my employer? How do I pick which one to get? Will these really help me build stateful applications on Kubernetes?   In this talk, we will discuss the relative value of certifying on different technologies, with a specific focus on CNCF certifications for administration of k8s and developing Kubernetes-native applications.    In this session we will discuss:   - The pros and cons of getting certified - Why your current and future employers might care about your certifications  - What are other things you can do to make yourself a more attractive candidate in this cloud-native landscape    And of course, since Keith is a long-time database geek, we'll talk about how these might help you (or not) build stateful applications on Kubernetes.
Keith McClellan
5:00pm UK/9:00am PT
DoK Talks #70 - YugabyteDB - Distributed SQL Database on Kubernetes
Kubernetes has hit a home run for stateless workloads, but can it do the same for stateful services such as distributed databases? Before we can answer that question, we need to understand the challenges of running stateful workloads on, well anything. In this talk, we will first look at which stateful workloads, specifically databases, are ideal for running inside Kubernetes. Secondly, we will explore the various concerns around running databases in Kubernetes for production environments, such as: -     The production-readiness of Kubernetes for stateful workloads in general -     The pros and cons of the various deployment architectures -     The failure characteristics of a distributed database inside containers In this session, we will demonstrate what Kubernetes brings to the table for stateful workloads and what database servers must provide to fit the Kubernetes model. This talk will also highlight some of the modern databases that take full advantage of Kubernetes and offer a peek into what’s possible if stateful services can meet Kubernetes halfway. We will go into the details of deployment choices, how the different cloud-vendor managed container offerings differ in what they offer, as well as compare performance and failure characteristics of a Kubernetes-based deployment with an equivalent VM-based deployment.
Amey Banarse
5:00pm UK/9:00am PT
DoK Talks #71 Introducing Kubestr: A new way to benchmark your Kubernetes storage
Benchmarking storage is not a new concept, this has been happening on storage for a long time. But have we overlooked the benchmarking capabilities or at least the ease in which to achieve this in a cloud-native, container-based, Kubernetes landscape? There has been a rise in stateful workloads and support around persistent storage in Kubernetes is improving. Now we can take our traditional workloads such as SQL Server, Oracle and SAP alongside our data stores for microservices with the same storage system for MongoDB, Cassandra, Redis, MySQL and PostgreSQL. With each of these stateful applications having different performance requirements, it becomes necessary to benchmark the storage backing these Persistent volumes. The CSI (Container Storage Interface) is the standard for creating custom components to work with data storage. This has enabled many more storage vendors to adopt their platforms to the cloud-native approach and offerings. All of this is great, but how do we ensure that the right datastore is used to achieve the performance required for our microservices running these stateful workloads?
Michael Cade
5:00pm UK/9:00am PT
"DoK Talks #72- Highly available, pluggable and long term storage metrics for everyone. Extending Prometheus with Thanos"
Prometheus was initially made for short metric retention to answer questions on “what is happening ‘now’”. It is a strong project that solves certain problems really well, but still as a monolith when doing so. Thanos has been made to enable scaling, highly available setups and long term (cheap) storage for Prometheus. Everyone could leverage Thanos for these features. It does not stop there; Thanos has multiple components that could be used for multi-cluster telemetry, remote writes, and multi-tenancy. We want to introduce everyone to Thanos. Explaining the use-cases and how it could benefit your stack now observability becomes such an important factor in tech.
Wiard van Rij
5:00pm UK/9:00am PT
DoK Talks #73- Build Reproducible Experiments with Kubeflow and lakeFS
Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. lakeFS is wrapper layer around an object store that enables git-like operations such as branching and committing over datasets. Learn how to build ML workflows that are portable, scalable, and reproducible by integrating lakeFS operations into your Kubeflow pipeline components
Barak Amar