CFP for DoK Day at KubeCon Eu 2024 is open until Dec 4th.

Submit Proposal!

Using Kubernetes for Healthcare Data

Healthcare organizations are transforming their applications and embracing digital platforms for efficient patient care. Edge computing plays a critical role in deploying innovative care that would not be possible without it.

Avesha’s Olyvia Rakshit and Prasad Dorbala share how data on Kubernetes had a strong impact on a real-time colonoscopy use case.

View their talk from DoK Day NA 2022 or read the full video transcript below: 

Olyvia Rakshit  00:00

Hello, everyone. So I’m Olyvia from Avesha and I head up marketing and product UX at Avesha. And this is my colleague Prasad. He’s the chief product officer, and co-founder as well. So today, as Bart said, we’ll be talking about a use case of a healthcare system that we developed on Kubernetes. And all of the technology, the multi-cluster, and everything that we’ve been hearing today, we have a multi-cluster, multi-tenant orchestration platform called KubeSlice, and we built this healthcare system on KubeSlice that I will be describing a little bit. 

I’ll start by presenting the use case, and then we’ll go into the technology a bit as to how we did it. So show of hands, how many of you have had or know about colonoscopy, if you’re above 45, you’re required to have a colonoscopy. Now, we know how fun that is. So in this particular system that was built on Kubernetes, there was an AI, an AI model that was assisting a doctor to do a colonoscopy in real-time. So actually, I have a picture here. So it was like there was a bounding box that was telling a doctor, that there is a suspicious polyp here, look at it. So the components of the system we had edge servers, we had stuff running on the cloud, a multi-cluster system and we had the feed node, which is in the operating room, where all the data was being collected, sent to the AI model that’s running on the edge, and then some processing and learning happening on the cloud. In addition to this AI real-time AI detection system, there was also a workflow automation component that we had, where it was all hands-free. Talk about a hands-free system these days, right? So it was all NLP (natural language processing) driven. So the doctors would actually talk commands into the system. So when they see an AI model, and they see a bounding box like this, they’ll say, take a picture, through voice, and the system takes a picture. And then the doctor will say sessile polyp or whatever medical terms that they use, and our workflow automation system was smart enough to understand all the medical terms, and then it was directly written into a report. So all of this, so talking about data and healthcare on Kubernetes. So the system that we developed needed to be very secure, it needed to perform with ultra-low latency, what we mean by that is that the whole back and forth from capturing those video feeds from when the scope is inside your tummy into going to the AI model running at the edge and then going to the cloud and coming back that whole round trip latency needed to be less than 120 milliseconds. And then, we are capturing frame by frame and then the model is learning and the model is detecting at the same time that inter-frame latency needed to be 17 milliseconds. So there’s a lot of processing real-time processing it needed to be secure. 

And then coming back to the topology, before I get to this, I’d like to say that even last week, this was actually being used in New Delhi, in India by Apollo hospitals. Those of you are aware of the name, it’s a big hospital brand. And with Airtel and AWS as partners, and also in the US, we had one of the largest ambulatory service providers and surges, and together with Verizon, we tested all of this back-and-forth multi-cluster connectivity and the latencies. And what enabled us to put a system together like this, and this is where we are going with the system or where there will be multiple regional hospitals and each room would have a feed node collecting data from the patients, and there’ll be multiple edges that are processing the data. And in this hub and spoke model, there will be real-time inferencing for the doctors and the nurses and also real-time voice processing, as they record their findings into a report. 

So this is the whole workflow automation part that I alluded to a little bit earlier, where the doctors would actually ask if they’re seeing those monitors and the bounding box, the green bounding box that’s assisting them during a real-time, colonoscopy, they are talking commands to the system, and actually writing reports in real-time by voice. And with that, I will hand it over to Prasad, who will talk about how this multi-cluster connectivity and communication happened, how we had a multi-tenant architecture where KubeSlice is our fundamental technology, how that creates a slice for every hospital, or how we create that isolation between say, the voice and the AI and the voice and the video streaming data that’s going back and forth. 

Prasad Dorbala  06:05

So as Olivia described, right, one of the foundational things we have needed to do is that we needed to have a network that is L3 and above, right, so API-driven systems are not going to be possible, we needed to have RTP (Real-time Transport Protocol) streaming. One, it needs to be encrypted all the way in. So essentially, Ingress and Egress work well, but not everything for the data side. Right? So how do you bring about low latency, east-west traffic rather than not south traffic, that’s one thing that we wanted to look into. And obviously, Kubernetes is necessary for us to be containerized workloads, nothing can be easier than Kubernetes edge, as well as feed nodes running small k3s, and different shapes of this thing. One other thing, which, at the edge node, we had to have, processing using Nvidia for our image kind of thing. So CUDA core needed to be there. So that we need to have a dedicated node to make sure that we image process. And then the communication is very important for us because as Olivia described, interframe latency cannot be more than 17 milliseconds. And the overall round trip was, you realize that the scope is inside the body, right? And Doctor has a MicTap or phone, which Bluetooth is going to indicate if there is any bounding box or a detected polyp in there. So all that round trip is important for us the system to work. So that’s the reason why we built as latency is, as I described, the image needs to be stored because you’re constantly learning. In the cloud, we have a learning system, which has already processed a whole bunch of images, but as in new images coming in, we are kind of constantly learning. And at the edge node is where the inferencing module sits. And then the feed node is actually detecting so we actually have two screens, one screen is from the scope provider, whoever it is, Boston Scientific, or whoever, there will be there. And then there is another screen wherein the bounding boxes are showing up. So that at least the doctor is not distracted. So all of these things needed to have a more data-intense kind of transport, right? 

So what we built is a KubeSlice, right, which is essentially a way of sharding, the Kubernetes cluster, right, and then creating a virtual cluster across multiple clusters, you have a cluster in the cloud, you have a cluster at the edge, and you have a cluster at the feed node. And all of these things are kind of forming how it needs to communicate to each of these rights. So obviously, everything has to be end-to-end encrypted, because of HIPAA (Health Insurance Portability and Accountability) compliance, different patient data, which is going in there, although a lot of these things are obfuscated, you realize that when an image is there, on top of it, a patient name shows up or something like that, we need to make sure that everything is kosher (legitimate) from a standpoint of encryption, storage, and all that stuff. 

So tenancy is needed because of different, either hospitals or different divisions. Let’s say we are first entry in colonoscopy doesn’t mean to say upper GI versus lower GI. You can use the same system. And then cardiology has As other applications inside it. So same hospital may have different divisions that are going to be using the same inferencing system. So you needed to have a tendency associated with it. 

Obviously, if you look underneath the hood, the architecture from a slide standpoint, we actually have dual interfaces inside the pod. You have a CNI interface, which is essentially for the lifecycle management of the cluster. But we also created something called an overlay interface. The overlay interface is something that we use to have this real-time data growing across from part to part standpoint. And then, obviously, in order to for the services to discover themselves, we needed to have on the slice, what we call it as a slice is kind of a way of describing. It’s a CRD, It’s an operator, and everybody talks about it. In all the clusters, we have an operator running in it, which actually instantiates all the pods and then gives an address. The advantage we wanted to take is that we don’t want it to have IP address management across all these clusters, we wanted to get rid of all that IP address management, we need to be able to get the edges as fast as possible, and not have a long lifecycle with respect to addressing and all that stuff. So this overlay address gave us a much easier way of defining a slice, right? And they’re a bunch of components via a software-defined network built into the cluster. There are different ways AWS and Verizon bring the edge. That’s one way, Mac is another aspect, with respect to Airtel, we are trying to use AWS outpost as an edge, and then bring it up. AWS in association with us did this trial in India. But in respect of what we wanted to be self-sufficient in how we run these clusters. So that’s the reason why we needed a better way of isolation, a better network, which can do layer three and above, it can do RTP streaming, it doesn’t have any firewalls or slash, netting. So when you have a multi-cluster, you’re actually creating a virtual cluster from a networking standpoint. So every part on the slice is able to talk be it in a cluster in the cloud or edge or feed node. Right. So that’s, that’s the most important thing from a latency standpoint, which you have. Right? That’s about it. If you look at our YAML file because everybody started showing YAMLs. Our YAML file is very simple. So, we have a booth in su-46. If you want to see it, I can show you all that there.

Olyvia Rakshit  13:24

And I just like to add we have, please check out kubeslice.io, I’ve open-sourced a lot of our components and our tools. So as someone said, Go and start it, like it, use it, and please do reach out to us.

# # #