We hold live meetups every week where different guests share their stories, wisdom and practical advice on how to overcome common issues. All meetups are recorded and put on our YouTube and Podcast channels.

Events

2020-07-21
5:00pm GMT/9:00am PST
1-Is Kubernetes even ready for data?
Kubernetes has been a great solution for deploying application infrastructure. Trying to manage your data with the same control plane has been, less than ideal. This has been even more true when using distributed databases like Apache Cassandra. Once you get past the storage and stateful sets, you still have a lot to do. Let’s have a frank talk about the new opportunities to make Kubernetes ready for data.
Patrick McFadin
2020-07-28
5:00pm GMT/9:00am PST
2-Data on k8s maturity check
Let’s talk about storage. Optoro has moved to running stateful stores on Kubernetes. It’s a challenge, but it has a lot of value. Let’s talk about how we chose to do it, and what we figured out along the way.
Zach Dunn
2020-08-04
5:00pm GMT/9:00am PST
3-Design considerations for operationalizing Distributed SQL on Kubernetes
This talk is targeted towards cloud-native developers and architects looking to deploy the operational database on Kubernetes.  We are going to walk you through the design decisions YugabyteDB's team took when architecting the database as a service on Kubernetes. We are going to cover concepts related to Kubernetes Volume provisioning, pod placement strategies for data resilience/High availability, and how cluster events are used for reconciling the k8s workloads during day 2 operations like upgrades, scale-up/down.
Nikhil Chandrappa
2020-08-11
5:00pm GMT/9:00am PST
4-The problem of stateful workloads - balance of keeping data HA vs. costs
In an engineer’s ideal world we would love all the resources and redundancies we can possibly get for our services and infrastructure that supports them for sanity and of course, HA. However, how do you balance between “enough” redundancy and the actual operational costs of supporting such engineering choices, and what are some of the tough engineering decisions that need to be made? This talk focuses primarily on services being run on Kubernetes (or public cloud offering of Kubernetes), but the principles can be extended to any infrastructure environment. Key Topics: capacity planning, cost management, distributed services
Ren Lee
2020-08-18
5:00pm GMT/9:00am PST
5-The full cycle of doing data on k8s: a case study
Scaling ACID compliant databases in the cloud is challenging. We’ll look at a specific use case where we’re trying to scale a Saas Odoo ERP offering on Kubernetes and build a scalable Postgres cluster as a backend service.  
Dave Cook
2020-08-25
5:00pm GMT/9:00am PST
6-Operators, operators, operators…operators
Operators represent a great opportunity for the data community to solve for the complexities of managing data products for their customers in a way that standardizes UX and integration points -- historically the most powerful solutions had to be niche and highly customized.
Amit Gupta
2020-09-01
1:00am GMT/5:00pm PST
7-Conway’s Law & Kubernetes: Centralization vs. small team autonomy
Big clusters or small clusters?  Where to draw the line and how to know whats best for your use case? We will be talking to Joseph and Mike from Adobe about the inevitable questions that arise when running k8s at scale. If it is run by the platform team, is it inevitably a pet?  Or more of a pet?  Is that the idea, that we give stuff that ” must not fail” to platform teams so they are common services w/ SLAs?  Or how is it decided what is owned by the platform vs. the individual teams. While talking with Joseph and Mike we will also dive into what their stack looks like, must have tools they use on a daily bases, VM vs K8s, differences in stateful apps on k8s and War stories!
Joseph Sandoval, Mike Tougeron
2020-09-08
5:00pm GMT/9:00am PST
8-Appropriate workloads for databases in K8s
As more companies are moving to kubernetes and cloud native as a standard for developing net new functionality something has to happen to the legacy workloads. Often times we see a lift and shift mentality into kubernetes, we will talk about how that mentality can be dangerous or cause more work than expected.
Rick Vasquez
2020-09-15
5:00pm GMT/9:00am PST
9-Geospatial Sensor Networks and Partitioning Data
We use resources like weather reports or air quality measurements to navigate the world. These resources become especially important when faced by extreme events like the current wildfires in the Western USA. The data for the reports, predictions, and maps all start as realtime sensor networks.In this talk, I’ll present some of my research into scientific data representation on the Web and how the key mechanism is the partitioning, annotation, and naming of data representations. We’ll take a look at a few examples, including some recent work on air quality data relating to the current wildfires in the western USA. We’ll explore the central question of how geospatial sensor network data can be collected and consumed within K8s deployments.
Alex Miłowski
2020-09-22
5:00pm GMT/9:00am PST
10-Data on Kubernetes and container attached storage - an update
Back in 2018 the CNCF published a blog we wrote called Container Attached Storage. Today - September 22nd 2020 - a new blog is appearing on their site updating Container Attached Storage. This talk borrows very heavily from that blog. What is CAS? Why would anyone use Kubernetes itself for storage? How does a microservices architecture help? Why is shared storage at the end of the road - though still used underneath CAS sometimes?
Evan Powell
2020-09-29
5:00pm GMT/9:00am PST
11-Doing Data Wrong
In this talk, we'll look at great ways to lose data (like running databases on Kubernetes and bare metal), pain points for developers, lessons we've learned, and have a Festivus in September airing of grievances sessions for those who have felt this pain.,
Jeremy Tanner, David McKay
2020-10-06
5:00pm GMT/9:00am PST
12-PostgreSQL-as-a-Service on K8s at Zalando
PostgreSQL is a powerful, open-source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance, but an production-grade deployment requires many complementary technologies to the database core: high availability and automated failover, backup and recovery, monitoring and alerting, centralized access control and logging, connection pooling, and so on. Being not initially accustomed for running stateful workloads, Kubernetes with its infrastructure as a code paradigm, CustomResourceDefinition, and Operator pattern turned out to be extremely convenient for deploying and running PostgreSQL at scale. I will talk about a few open-source project developed and maintained by database team at Zalando which anybody could use to build own PgaaS: 1. https://github.com/zalando/patroni - Tool for PostgreSQL high availability and cluster management. Integrates with K8s API and makes PostgreSQL cloud-native. 2. https://github.com/zalando/spilo - The Docker image that packages Patroni, multiple versions of PostgreSQL, and tools for backup and recovery. 3. https://github.com/zalando/postgres-operator - Implements Kubernetes operator pattern, orchestrates hundreds and thouthands deployments of Patroni/Spilo clusters Aforementioned projects would never get to the current state without an effort of dozens external contributors.
Alexander Kukushkin
2020-10-13
5:00pm GMT/9:00am PST
13-Distributed Workloads on Kubernetes: Operators to the Rescue
How easily can you run distributed workloads on Kubernetes? The initial deployment of your 10-nodes database might be easy to setup, but day-2 operations (changing the configuration, adding and removing nodes, version upgrades, etc.) are much more complicated. We'll discuss how operators can help you manage distributed workloads, and a few operator tricks we learned while working on ECK (Elastic Cloud on Kubernetes) - an operator for the Elastic stack. 
Sebastien Guilloux
2020-10-20
5:00pm GMT/9:00am PST
14-Kubernetes Cost Control
The importance of cost control while working with the cloud. K8S, Data & Cost Control Hints/Tips around controlling your K8S costs,
Arie van den Bos
2020-10-27
4:00pm GMT/9:00am PST
15-Reaching limits in K8s: A case study with Ingress Controller
When talking about data, we usually think about big data and scale, and what do we do next. Such limits are sometimes a good problem to have. In this talk, we'll discuss our approach to this situation using the Ingress Controller.
Laurent Rouquette
2020-11-03
4:00pm GMT/8:00am PST
16-HyperStore-C: S3 object storage managed by Kubernetes
Cloudian’s HyperStore is S3-compatible object storage software focused on the enterprise market.  In this talk, I'll discuss how and why we are working on Kubernetes-managed versions of HyperStore, including where we are now and what we're looking.
Gary Ogasawara
2020-11-10
4:00pm GMT/8:00am PST
17-Is k8s Even Ready For Data? Round II
Data on Kubernetes Community: Is K8s even ready for data, Round II - Cassandra on OpenEBS - aka CaSS on CAS. In our inaugural DOKC meet-up, Patrick McFadin Developer Advocate at Datastax emphasized the challenges of running Cassandra on Kubernetes, concluding at one point that “Kubernetes might not be ready for Cassandra.” Since that meeting, the use of the open-source Container Attached Storage project OpenEBS as a simple and high performance per workload storage for Cassandra has proliferated. Also the Cassandra Operator from Datastax, aka “CaSS”, has progressed as well. So - where are we now? Is CaSS on CAS working well?  What is the future of collaboration between Datastax / Cassandra and MayaData / OpenEBS? Is Kubernetes now ready for Cassandra? What are the emerging technologies that might shape storage and Kubernetes in the near future?,
Jeffry Molanus, Patrick McFadin
2020-11-17
9:00am GMT/1:00am PST
18-DoK Panel: The State of State
Stateful vs stateless? We will stately be stating our statutes regarding the status of the state of statefulness and statelessness on k8s- oh yea! In the DoK Community, one of the main issues that folks have are how in the world can they flatten the learning curve when it comes to running stateful applications in k8s. That's why we've brought on 3 experts from 3 different countries to tell us what state state (intentionally doubled) is in!, Dok Panel: The State Of State On k8s, Just a participant of the meetup discussion, https://www.meetup.com/Data-on-Kubernetes-community/events/274551382/ A fireside chat on states and kubernetes.
Rosemary Wang, Lili Cosic, Tomasz Cholewa, Jacquie Grindrod
2020-11-24
5:00pm GMT/9:00am PST
19-Towards a K8s Native Streaming Application
Starting from a simple application which can be deployed in every machine running Docker, we will go through all steps required to transform the simple app into a Kubernetes native streaming application. We will explain the theory and then exemplify the learnt concepts to define a recipe for running streaming applications on Kubernetes. We will focus both on cultural and technical tricks to help you successfully adopt streaming applications at scale. At the end of the talk, you will have a comprehensive view regarding all platform building blocks and application requirements needed to successfully run a streaming application on Kubernetes. Spoiler: you will hear several times the words Apache Kafka, Kafka Streams and Strimzi.,
Francesco Nobilia, Jeremy Frenay
2020-12-01
5:00pm GMT/9:00am PST
20-Tips and tricks to get Kubernetes certifications
CKA (Certified Kubernetes Administrator) has a bad reputation as the hardest certification many people have faced. In this talk, we will go through the process to pass successfully the exam, tips on the exam itself, the environment and any other question that might arise. , How to fly into a kubernetes certification.
Eneko Perez, Carlos Gomez Carrero
2020-12-08
5:00pm GMT/9:00am PST
21-Data on Kubernetes: my insights
Data handling is one of the hardests things in Kubernetes. This talk will be an informal conversation about things (relateded to data management) Eduard found helping customers to embrace Kubernetes. I hope you find them useful!
Eduard Tomàs
2020-12-15
5:00pm GMT/9:00am PST
22-Vitess Operator for Kubernetes
In this talk, I would like to uncover our newly announced Vitess Operator for Kubernetes. This talk demonstrates the sample implementation of Vitess in Kubernetes topology. I also explore common DBA tasks by demonstrating how they are handled in the Vitess ecosystem. Vitess, out of the box, comes with a lot of tools and utilities that one has to either incorporate or develop to manage MySQL topology. Let’s take a look at the capabilities of Vitess in these areas and demonstrate how they are performed under the operator realm. 
ALKIN TEZUYSAL
2021-01-05
5:00pm GMT/9:00am PST
23- 2021 DoK Community Kickoff! Trends, friends, and more!
For our 23rd installation of the Data on K8s community meetup, we will be talking with Ariel Munafo who is a CNCF ambassador and the founder of EuropeClouds (among many other things), Arie Van den Bos Senior Systems Engineer on Cloud Systems at Kurago, and Jake Page who is a DevOps and Cloud Native Enthusiast.,
Ariel Munafo , Jake Page , Arie van den Bos
2021-01-12
5:00pm GMT/9:00am PST
24-The architecture of a distributed database
Cockroach Labs has built a database architected from the ground up to be distributed. It is a perfect fit for the cloud and Kubernetes as it naturally scales and survives without manual interaction. The unique architecture of CockroachDB delivers some key innovations that may not only provide value for your applications but might also give you insight into the challenges/solutions in distributed systems. In this session, we will deliver a deep-dive exploration into the internals of the database, exploring the following, and more: * How the database uses KV at the storage layer to effectively distribute data * How Raft and MVCC are used to guarantee serializable isolation for transactions * How Cockroach automates scale and guarantees an always-on resilient database * How to tie data to a location to help with performance and data privacy
Jim Walker
2021-01-19
5:00pm GMT/9:00am PST
25-Deconstructing Postgres into a Cloud Native Platform
Is deploying Postgres in Kubernetes just repackaging it into a container? Can’t Postgres leverage the wide range of Cloud-Native software and integrate well with K8s? Join this journey that will cover and demonstrate, with demos running on StackGres: * How to structure Postgres into an init-less container, plus several sidecar containers for connection pooling, backups, agents, etc. * Defining high level CRDs as the single API to interact with the Postgres operator. * Using K8s RBAC for user authentication of a web UI management interface. * Using Prometheus for monitoring; bundling a node, Postgres and PgBouncer exporters together. * Proxying Postgres traffic through Envoy. Terminate Postgres SSL with an Envoy plugin, that also exports wire protocol metrics to Prometheus. * Using Fluentbit to capture Postgres logs and forward them to Fluentd, which stores them on a centralized Postgres database.
Alvaro Hernandez
2021-01-26
5:00pm GMT/9:00am PST
26- How to unblock your release pipelines with data
Even though microservices are becoming a pattern, we still see a lot of "monolithical" deploys and manual reactive actions. This blocks the ability to achieve maximum velocity in your release. We can leverage data and smart use of traffic-shaping to achieve a higher release velocity AND quality.
Olaf Molenveld
2021-01-28
3:00pm GMT/7:00am PST
Nederkube Edition #1 - Is Kubernetes ready for Data Management?
Kubernetes became the standard for micro services architectures. But what about handling massive and scalable data management on top of it? Is it possible and what does it mean for operations? Cassandra has been adopted widely and accepted globally as the most scalable and reliable database. Now it adds ease of use by offering a Kubernetes native plug and play solution for enterprise use!,
Michel de Ru, Arie van den Bos, Jeffry Molanus
2021-01-29
10:00pm GMT/2:00pm PST
"DoK Brazil #1 - DevOps, Kubernetes and Data"
My experience in this contemporary technology journey of the last 4 years, fears, mistakes, IT paradigms, and agile methodologies impact my goals.
Rogeria Portilho Rodrigues
2021-02-02
5:00am GMT/9:00pm PST
"27- Cost management for OpenShift, a new SaaS service to understand your Kubernetes costs"
For IT decision-makers, this goes above and beyond just keeping infrastructure running and efficient; it is about understanding how your IT budget affects your business, and how well your resources maximize the use of your budget. This makes it critically important that IT teams can more quickly and easily see the totality of their IT costs across the hybrid cloud. We’re pleased to introduce a new software-as-a-service (SaaS) offering intended to help our customers better understand the costs of their OpenShift environments: OpenShift cost management. Available free of charge as part of a Red Hat OpenShift Container Platform subscription, OpenShift cost management provides a simplified, more intuitive view into the costs, from the macro to the granular, of an OpenShift deployment.
Sergio Ocón Cárdenas
2021-02-09
5:00am GMT/9:00pm PST
28- Getting Started Contributing to Kubernetes
This talk will walk through how to get started contributing to Kubernetes, combatting imposter syndrome, the many other ways you can get started contributing to K8s other than by writing code, and the benefits to joining a community such as K8s. ,
Rin Oliver, Savitha Raghunathan
2021-02-23
5:00pm GMT/9:00am PST
#30 Kyverno for Kubernetes!
Kubernetes is powerful but can be complex to manage! In this talk, Jim Bugwadia from Nirmata will show how policy managers can help address the complexity via admission controls and dynamic configurations. Jim will introduce Kyverno, a Kubernetes native policy engine and CNCF sandbox project. Jim will then demonstrate how you can use Kyverno to ensure security and best practice compliance for your clusters.
Jim Bugwadia
2021-02-26
7:00pm GMT/11:00am PST
DoK Brazil #2: Bora entender as Bases de dados na nuvem com a ajuda de Wagner Bianchi! (Talk in Portuguese)
Uma conversa descontraída sobre o futuro de bases de dados como um serviço. Dados em Kubernetes desde o ponto de vista dum DBA. E várias outros assuntos parecidos.
Wagner Bianchi
2021-03-02
5:00pm GMT/9:00am PST
#31 The Data Lifecycle - Where Do We Go From Here
Going from raw data to machine learning models successfully in companies of all sizes requires more than just an understanding of programming. Teams need to manage their data products lifecycle, their software as well as the data. Data products like machine learning models aren’t created out of thin air. They are built on layers of best practices that ensure the models are using accurate data, they are outputting reliable numbers and they have some method to interact with the outside world. So how do we get there? The purpose of this talk is to discuss the current state of the data lifecycle as it pertains to creating data products. This could be machine learning models, dashboards and data APIs. We will outline the general architecture that helps take data from raw to some form of machine learning model. In addition, we will discuss some of the concepts that are being applied from DevOps as well as being created in MLOps to help better facilitate your data life cycle. 
Benjamin Rogojan
2021-03-04
5:00pm GMT/9:00am PST
#32 How to choose a Kubernetes distribution for on-prem environments?
Buy a ready off-the-shelf product, customize an existing open source project, or build your own distribution? When you can't go to the cloud and leverage its powerful features you have to make a choice. On-prem environments need more attention, but they also often can be more cost-effective and are highly coveted by the development and operations teams. In this talk, I will cover some of the most important topics related to building an on-prem Kubernetes platform and I will describe the most popular distributions.
Tomasz Cholewa
2021-03-09
5:00pm GMT/9:00am PST
#33 Making observability accessible is the fourth pillar
Observability systems are typically a collection of tools that cover the three pillars of logs, metrics and tracing. These enable skilled engineers to correlate telemetry insights to perform data-driven diagnostics and rectify degraded services. In this talk, I discuss how over the course of three years, I have worked towards removing the built-in gatekeeping that comes with creating monitoring solutions and enabling them to work for an entire organisation. We shine a light on the overlooked developer community that interact with Observability but does not necessarily hail from SRE disciplines. Engaging with anecdotes from my past and illustrating the inherent bar to success that comes with connecting multiple tools together and the context that requires to achieve results. With years of experience working to improve adoption and create consumer-friendly facades for tools such as Grafana, Prometheus and Jaeger; I draw upon my background within large financial institutions and how building engaging and simplified DX can compel and excite engineers to work with observability.
Alex Jones
2021-03-11
5:00pm GMT/9:00am PST
"#34 Opstrace, An open source alternative to services like Datadog, SignalFx, and others..."
Open source observability should not be hard. What companies package as their enterprise offering should be available to anyone who wants to monitor their systems. Opstrace is a complete monitoring platform designed for the end user instead of the expert. It's goal is to be as easy to use and operate as a hosted SaaS provider but within ones own cloud account. This is not only up to 10x more cost-efficient but also allows full control over ones data.
Sébastien Pahl
2021-03-16
4:00pm GMT/9:00am PST
#35 Make Kubernetes your development environment
Developers spend a lot of time making their local machine look like a cluster. But why do we do that? Our local machine is not where our code is supposed to run! We built okteto (github.com/okteto/okteto) so we can make our Kubernetes clusters look like our local machine. In this talk, we'll show you how okteto helps you take advantage of all the goodness of Kubernetes and the cloud without having to sacrifice a really fast development and feedback loop.
Ramiro Berrelleza
2021-03-18
4:00pm GMT/9:00am PST
#36 A Snapshot of DevOps
DevOps is like a camera. We focus on what's important, we capture the good times, we develop from the negatives, and if things don't work out, we take another shot. Many teams establishing working best practices for their tools improve their time to deliver and ability to scale. However, the real challenges exist outside of tools and technology and many teams today still have questions about DevOps. So, join this session to learn the fundamentals of shaping a DevOps culture. We'll discuss key attributes around people, process, and technology, likening you and DevOps to pro photographers and cameras.
Tiffany Jachja
2021-03-23
4:00am GMT/9:00pm PST
#37 Running Data Replication Pipelines on Kubernetes with Argo
Hundreds of data teams have migrated to the ELT pattern in recent years, leveraging SaaS tools like Stitch or FiveTran to reliably load data into their infrastructure. These SaaS offerings are outstanding and can accelerate your time to production significantly. However, many teams prefer to roll their own tools. One solution in these cases is to deploy singer.io taps and targets — Python scripts that can perform data replication between arbitrary sources and destinations. The Singer specification is the foundation for the popular Stitch SaaS, and it is also leveraged by a number of independent consultants and data projects. Singer pipelines are highly modular. You can pipe any tap to any target to build a data pipeline that fits your needs, making them a good fit for containerized workflows. This article walks through the workflow at a high level and provides some example code to get up and running with some shared templates. I also drill into reasons for choosing the Argo approach over other orchestration tools like Airflow or Dagster, and the implications from a team perspective.
Stephen Bailey
2021-03-25
4:00pm GMT/9:00am PST
#29 How Absa Developed Cloud Native Global Load Balancer for Kubernetes
Global load balancing, commonly referred to as GSLB (Global Server Load Balancing) solutions, have typically been the domain of proprietary network software and hardware vendors and installed and managed by siloed network teams. k8gb is a completely open source, cloud native, global load balancing solution for Kubernetes. k8gb focuses on load balancing traffic across geographically dispersed Kubernetes clusters using multiple load balancing strategies to meet requirements such as region failover for high availability. Global load balancing for any Kubernetes Service can now be enabled and managed by any operations or development teams in the same Kubernetes native way as any other custom resource. The talk will cover both technical and business aspects of k8gb creation including ongoing adoption within the huge scale organization.
Yury Tsarev
2021-03-25
5:00pm GMT/10:00am PST
Dok in spanish #1 ¡Suelten el Krake! Trayendo la Energía al Lazo de Cómputo / Release the Krake! Bringing Energy into the Compute Loop
ES: Cloud&Heat has always focused on providing energy-efficient data centers. In the last 8 years, we have developed an innovative water cooling technology for servers, converting waste heat into a valuable asset. By doing so, we have already greatly improved the energy efficiency of individual data centers. However, this isn’t enough. To globally maximize the efficiency of distributed data center infrastructures, this talk presents Krake. Krake is an orchestration software for compute-intensive jobs. It improves the global cost and energy efficiency of infrastructures by balancing the load between data centers. Krake evaluates and selects the most efficient site to run jobs based on certain metrics, such as energy availability, heat demand, and latency. It also reacts to changes in the system by migrating jobs. In other words, it ensures a job is run in the most energy- and/or cost-efficient way at any given time. / EN: Cloud & Heat siempre se ha centrado en proporcionar centros de datos energéticamente eficientes. En los últimos 8 años, hemos desarrollado una innovadora tecnología de refrigeración por agua para servidores, que convierte el calor residual en un activo valioso. Al hacerlo, ya hemos mejorado enormemente la eficiencia energética de los centros de datos individuales. Sin embargo, esto no es suficiente. Para maximizar globalmente la eficiencia de las infraestructuras de centros de datos distribuidos, en esta charla presentaremos Krake. Krake es un software de orquestación para trabajos intensivos en computación. Mejora el costo global y la eficiencia energética de las infraestructuras al equilibrar la carga entre los centros de datos. Krake evalúa y selecciona el sitio más eficiente para ejecutar trabajos según ciertas métricas, como la disponibilidad de energía, la demanda de calor y la latencia. También reacciona a los cambios en el sistema mediante la migración de trabajos. En otras palabras, asegura que un trabajo se ejecute de la manera más eficiente en términos de energía y costo en un momento dado.
Juan A. Fraire