Data on Kubernetes Day Europe 2024 talks are now available for streaming!

Watch Now!

The Architecture of Distributed Databases with / Cockroach Labs

Even though 2020 is over an 2021 brings a great deal  of optimism and Hope not that much has changed apart from the changing of the calendar year. So many people still have Zoom meetings to attend to, children to take care of in the next room and a beautiful constellation of responsibilities that only a person who works from home has to deal with, so it’s totally understandable that you didn’t have time to check out  one of the latest meetups that we held here at Data on Kubernetes.  But fear not, in this article we will give you the lowdown on what was discussed, a summary with the key points of everything you need to know about the key points of the cockroach DB presentation. In case you have some time, and you like to watch up just follow the link.

The protagonists

                              Jim Walker                                       Keith McClellan                                       Lisa-Marie Namphy

Plot: 

Jim took the reins to explain the architecture of a distributed system not so much from the Kubernetes angle, but more from the angle of what his team had to go through at Cockroach Labs, that way trying to help the rest of us build better distributed systems.

Key points:

What are the advantages of a distributed database?

The age of cloud scale and advent of microservices requires a new approach for the relation, transactional database.

Characteristics of the database that has evolved for cloud native, distributed transactions.

 

In the case of CockroachDB it is a relational database, managed by

RAFT, scales across multiple regions, clouds or even k8s clusters. It is naturally resilient any node can go done and the db will be just fine. 

Data can be geolocated, great news for edge computing.

Fun fact: any node is an entryway into the logical db.

 

Ranges: SQL to KV 

Problem:

The old way of storing data doesn’t work for a distributed database, continually appending new data that would then be indexed in alphabetical order. In distributed systems we would have a very hard to parsing that index system and wouldn’t be able to find where data lives very easily.

Solution:

Raft 

Along with Paxos, Raft is one of the core algorithms that is driving distributed systems. 

Raft was used in the cockroachDB system over Paxos for simplicity reasons.

Fun fact: It took Jim 5 years to build a production ready RAFT algorithm

 

Distributed Data: Range Distribution, Scale and Resilience 

 

Fun fact: Jim likes to talk about the Cockroach Labs documentation team, and the word “Latency”.

Distributed Transactions 

 

“Transactions are tough, the corner cases will get you” – Jim

What’s ACID?

Distributed SQL Execution 

 

Able to utilize a cost based optimizer + location.

Distributed Latency 

 

Of course our DB is no longer in the backroom of the office it is distributed, therefore  latency becomes a big issue.

To counteract this CockroachDB fires up multiple clusters in different regions to deal with any query.

 

Distributed Performance Optimizations

 

For a deeper dive check out there SIGMOD paper (White paper for The Resilient  Geo-Distributed SQL Database

Deploy CockrochDB easily using the Kubernetes operator they developed