Using stateful services to index the world of open source

Nov 30, 2021 by melissa

Sourcegraph is a universal code search engine that allows developers to search across their entire codebases. It can be deployed on-prem or used directly from the cloud. The service currently indexes over 1 million open source repositories and provides code search across them. Sourcegraph is planning on indexing every open source repository with more than 1 star on GitHub, GitLab, and other code hosts. To achieve this scale, it relies on Kubernetes to scale several stateful services.

These services range from commonplace databases like Postgres and Redis to bespoke stateful services written in Go. This talk describes these services and the tradeoffs made in their design to aid practitioners in developing their stateful services. This talk also discusses the failure scenarios the Sourcegraph team has encountered. Examples will include long PVC mounting time causing downtime in seemingly unrelated services, and the on-call culture Sourcegraph has adopted due to many novel services. Lastly, this talk discusses plans to make our existing stateful services more resilient and highly available with information from previous downtime incidents.

If you want to gain an understanding of the challenges of running stateful data on Kubernetes – this talk is for you. Newcomers will understand possible failure cases and how to remedy them. Practitioners will gain knowledge to help them design stateful services on top of Kubernetes and ideas for making those services highly available.

This talk was given by Sourcegraph’s DevOps Engineer Dax MacDonald as part of DoK Day at KubeCon NA 2021, watch it below. You can access the other talks here.

Bart Farrell 00:00

Our next speaker Dax MacDonald is joining us from Sourcegraph. You heard Melissa talking a lot about the differences between stateless and stateful. So in this case of Sourcegraph, they’re running all different kinds of stateful services. And Dax is going to give us a little bit more info directly so we can turn it over to you now. Welcome, Dax

Dax McDonald 00:26

Let me give you this great overview of how we run stateful data at Sourceraph.

Bart Farrell 00:33

Just a reminder, folks, you can leave comments and questions in the YouTube chat. But also, if you really want to continue the conversation afterward with the speakers jump in our Slack. And that’s where you can have more direct interaction.

Dax McDonald 00:47

I’m Dax McDonald. I’m gonna be talking today about how we use stateful services at Sourcegraph index, the world of open source. I previously worked at Rancher Labs on multi-cluster management. Now, Sourcegraph, we’re focused on bringing code search to every developer. And we’re hiring- if this sounds interesting, please reach out to me after the talk. Here’s an overview for today- I’ll be talking about why Sourcegraph needs so much stateful data. And I’m gonna be focusing on two particular copaths in our product: indexed search and non-indexed search. I’ll be providing a high-level overview of stateful data in Kubernetes. I’ll talk about some of the problems that we’ve encountered, and our attempted solutions.