Attend Data on Kubernetes Day at KubeCon North American on Nov 12th

Register Now!!

DoK Town Hall : Running Batch Data Workloads in Kubernetes at Dish Network

Learn about optimizing batch data processing at scale on Kubernetes from David Fox of Dish Network. He addresses the challenges of fast, reliable data processing in the face of exponential data growth and showcases an architecture leveraging Apache Spark Operator, Apache YuniKorn, Karpenter, and Argo Workflows to enhance resource allocation, storage performance, and job scheduling, which demonstrates how businesses can efficiently handle large-scale data processing for improved decision-making.

Speaker:

Watch the Replay

Read the Transcript

Speaker 1: 00:00:03 All right, thanks everybody for joining us for the August edition of the data on Kubernetes Community Town Hall. Today we have David Fox. He’s going to be talking about running Spark Batch data workloads on Kubernetes. David from Dish Networks, he’s a staff software engineer. Before we get into the talk, I’m just going to go over some community announcements for you all. So first of course, I want to thank our gold sponsors, Google Cloud or Perona. Thank you for sponsoring the community. It allows us to do things like this, which is have great talks and presenters to present to the community. And I also just want to thank our silver sponsors and then our community collaborators as well.

00:00:50 And then a couple of community spotlights. Just want to point out some things that are going on. So Nelson Alde just released a book, big Data on Kubernetes. We’re going to have Nelson come talk in October, talk about the book, but you can scan the QR code. There’s a post he put on LinkedIn about the book. And then also Shing Yang, who’s the SIG Storage co-chair also was featured in a blog post in the Kubernetes at 10 series. So you should check that out. There have been a total of six articles by the same author all about Kubernetes at 10, featuring members of the community, talking about storage and Kubernetes and how it’s evolved. So you can check that out if you scan the QR code there.

00:01:39 Okay, again, just always letting you know, DOK day will be at Con North America in Salt Lake City on November 12th, the call for proposals closed that we’ll be making notifications. Well, the CNCF will about the schedule and which talks made it in on Monday, August 26th. So if you put in your proposal, you’ll be finding out in a couple short weeks. Just to let you know, all access passes are increasing from 8 99 to 10 79 after August 26th. And he does require an all access pass to go to the co-located events. I also just want to thank so far we have Event Gold sponsor per Kona for sponsoring the event. I appreciate it. We’re definitely always looking for more sponsors, but we want to thank them for sponsoring so far. And you can scan the QR code there if you want to register and get more information about DOK Day and Coon.

00:02:37 Let’s see. We’ve got some upcoming events. So we have a pretty full calendar coming up, but in September we have a doc talk with VI test with deep DC Ready. And then we also have two people from Intuit coming to talk to us at our Town hall on September 19th, empowering developers with easy, scalable stream processing on Ks on Kubernetes, and that’s with RI and Maurice. So you can go to the meetup page and register for those events. And just a little shout out, we reached over 3000 meetup members, so that’s pretty exciting milestone for us. So thanks for joining us on our meetup group. If you want to learn more about the community scan, the QR code on the left. It takes you to our community page. If you’re interested in sponsoring the community, you can scan the QR code on the other side on the right, and we’ll take it to our sponsors page for my information. And then lastly, if you stay until the end of our talk, we do a little quiz. Hopefully you’ve paid attention. And then also during David’s talk, because we’ll have some questions from the intro and the David’s talk and the winner can get a DOK shirt. Yeah, so before I introduce David, if you have questions, you can put ’em in the chat and most likely we’ll get ’em answered at the end.

00:04:07 So just put them in the chat and we’ll relay them to David. So I’ll stop sharing my screen. Yeah, so David, you can take it away and let us know about running Spark batch load workloads on Kubernetes. Oh, you are on mute. Can you hear me? Yep.

Speaker 2: 00:04:32 Okay, great. Yeah. Hi everyone. So on today’s presentation on running Spark Batch data workloads on Kubernetes, David Fox work at Dish in a data team, and I focus a lot on building, deploying scalable data applications in Kubernetes. So yeah, as data volumes continue to grow, organizations are seeking more scalable ways to process the data. And Apache Spark is a powerful tool for batch data processing while Kubernetes has revolutionized contain orchestration. So in this talk, I’m going to go explore how these two technologies can come together to create a scalable cost effective solution for the data processing.

00:05:26 Let’s see. Yeah, so this is just a brief outline of the talk. So first we’re going to go over why running, why we should run Spark and Kubernetes and why it’s becoming increasingly popular. Going to go over the architecture of the Spark operator and how it simplifies spark deployments. And then we’re going to go over how to batch schedule spark applications using omic unicorn. And then finally, we’re going to go over one strategy for reducing costs while also being resilient to failure. So let’s dig in. So I’m going to give two demos in this presentation. And for the demos, we’re going to be processing a dataset, just an example dataset that contains 112 million rows of yellow taxi trips in New York City.

00:06:34 And it contains various information about pickup drop off times, trip distances, et cetera. But basically the Python script will just process the data and extract long trips for a particular vendor to output file S3 for some context. So yeah, why Spark on Kubernetes? For those of you don’t know what Spark is, it’s a distributed data processing framework that can distribute and process huge amounts of data and gigabytes to terabytes or more over many machines. And so Spark and Kubernetes, they marry well together because the distributed nature of Spark across many machines benefits from the parallel processing. And with Kubernetes winning in the cloud, that has the burstable capacity of spinning up resources on demand and scaling back down when processing is finished, which ends up saving you costs.

00:07:49 Another incentive for our team to use Kubernetes for running Spark workloads was the resilience to spot failure in auto scaling groups. So there’s a lot of data stored from various operations in memory on disc, a lot of disc io. But when a spot instances is terminated, it’s very difficult to reattach and reuse data that is stored on attached EBS volumes. But in Kubernetes, there are standardized and automated mechanisms to recover from node failure by using data from as persistent volumes. And finally, you can, there’s reduced operational overhead because you can use the same underlying infrastructure to run many different spark jobs. So yeah, so basically Spark applications are submitted to the cluster and run and managed on Kubernetes through a Spark operator. And here I’m showing the architecture of the operator. And here’s the sequence of events that happens. So first the user will create Spark application object using QCTL, and then through the Spark application through the API server, the Spark application controller will get notified and sends a submission to the runner to create a driver pod here. And it’s the driver pod, which is orchestrating the Spark job. It contains the critical metadata, the execution status, everything. This is the main driver of the Spark application. And then subsequently the driver will request executor pods and the executors are the guys doing the work.

00:10:16 And then throughout the Spark application’s lifecycle, the Spark Pod monitor will track the status of the Spark application and report the status back to the application, whether it’s in progress or completed,

Speaker 3: 00:10:35 Sorry. So yeah,

Speaker 2: 00:10:41 Basically this whole process automates the deployment of Spark applications and basically just streamlines the entire workflow from execution to monitor. And so here’s an example of a Spark application manifest in a spec. And here you can specify everything to do with complete Spark application from the Spark version, the type here. Here we’re using Python, but you can use SC and you can use Java or in custom mode as opposed to client mode where you might use it in a Juta notebook. You can specify different images, easy to swap in and out, different images. Here’s the location in S3 where a PI box script is located. There are arguments to that Python script. This is basically saying where to grab the input data and where to send the output data.

Speaker 3: 00:11:48 Oh, sorry.

Speaker 2: 00:11:54 And then you can see here we can specify the resource constraints, the scheduling constraints for both the driver and the executor. And I think this is pretty self explanatory. Basically resource requests and limits and memory overhead for the JVM service account so that Spark can access S3 and so on.

Speaker 3: 00:12:28 Yep. So

Speaker 2: 00:12:33 The first challenge with running Spark applications in Kubernetes is that the default Kubernetes scheduler struggles with Spark two phase deployment when multiple Spark applications are being submitted. So you can see here when I described the leaf pug operator first, the driver pod is spawned and then the driver pod will request one or more executed pods. But this is a sequential process. The API server has no idea how many executors this Spark application will need to execute the job. So yeah, it may be that multiple Spark drivers are scheduled without ensuring that they can subsequently acquire the necessary executors to operate at full performance. So basically it can lead to saturation of the spark application, sorry, it can saturate the cluster with too many drivers competing for limited executor resources. So here’s an example. So we have two spark applications, A and B, and here we have time. So here, spark application A was submitted first and the driver requested three executors. The first two were successfully scheduled and run, but during this phase, another driver was scheduled and run and it requested one executor. And this was probably a smaller data processing job.

00:14:42 And it turns out that it is competing for the same resources, and it kind of hijacked the note that this executor was on. So that means that when multiple Spark applications are running, this can lead to unpredictable performance degradation and inefficient utilization of the cluster because this driver has fewer executors than it needed to run at full performance. And finally, default Kubernetes scheduler has, since it considers pods independent of one another, it lacks awareness of entire spark applications. The entire Spark application is a logical grouping of driver and executors together. And so yeah, this means that smaller apps can cut in line, as you could see here, this application B cut in line of this bigger application, and it doved the application A of resources that it needed to get going.

00:16:12 So yeah, so to overcome that limitation of the default scheduler, there is a batch scheduler code unicorn. And this allows for the logical grouping of pods that belonged to the same Spark application. And basically it will group pods based on tasks and then add that group to a queue in a first out manner. And then it will have entire groups of pods, which are applications weighed on a queue, which have a finite resources resource. So if the queue can fit on multiple spark applications, then both spark applications will be scheduled consecutively. If a queue runs out of resources, then the second spark application will be queued and we’ll wait until space is freed upon a queue. So yeah, unicorn has this.

00:17:38 There’s a aware of applications, so it can schedule all pods at the same time. So scheduling all pods at the same time according to some kind of logical grouping is called gang scheduling. And the main benefit of this is that it ensures that all components of a Spark application, both driver and all of the executors are scheduled together and it falls an all OnOne approach. So you can see here that through the operator, when Spark submit is run, the driver part is created and then place, basically unicorn creates place order pods for both the driver and executors. And you can see those in red. Red means placeholder.

00:18:37 And then basically the placeholders will have the same resource requests, resource constraints in terms of memory and PPU and scheduling constraints as the actual application parts. And so if necessary, the cluster will scale up with carpenter node autoscaler, and then once it scales up, brings up the nodes and we now have the resources, the compute nodes that we need to schedule all of the Spark application parts together. And so when the nodes are up, the unicorn will replace the placeholders driver and executors with the actual parts. So yeah, basically guarantees that Spark applications will have all the necessary resources before starting. So it basically prevents partial deployments or spark applications which don’t have all the necessary executors that they need to get going.

00:19:53 And the way you do this is by adding, so there’s a unicorn admission controller, and when it sees some labels on annotations, it will mutate the pod spec and inject the scheduler name equal unicorn so that the default Kubernetes schedule will skip this pod and it’ll be scheduled by unicorn instead. And we’ll look at that in a demo. So in this little example, you can see this is Spark application, but now we have labels and augmentations which unicorn will pick up. So here we have the queue. Oh, so we have the name of the application, which will be viewable in unicorn. We have the name of the queue that this Spark application will be put on. And then we have some, sorry, and we have some annotations. So we have the scheduling policy parameters, which basically says that we’re going to set up gang scheduling for the application. And then here we define task groups. So for Spark, we have tasks for both the driver and the executors.

00:21:29 And basically this is specifying the minimum resource capacity that must be available before scheduling the pods and starting the job. So you can see that we’re going to spin up the placeholder pods and we’re going to have one of them. It should have the CPO memory. We should specifically schedule those pods on certain types of machines, going to use node flexors and tolerations to target those machines. And then we’re going to, when the job starts, we want a minimum of four executors. And you can note for later in the second part of the talk that the driver is on demand and the executors will be on spot. So this is all that is needed to schedule batch workloads. I’m going to give a quick

Speaker 3: 00:22:30 Demo of this that come out and bear with me.

Speaker 1: 00:22:48 I’ll just say, but while you’re getting that ready, if anybody has questions, remember, feel free to just throw them in the chat. We’ll pass ’em on to David.

Speaker 2: 00:22:55 Oh yeah, if you have questions now while I’m setting up, feel free to ask.

Speaker 3: 00:23:15 Okay. So yeah,

Speaker 2: 00:23:18 As you saw in the presentation, I showed a Spark application spec going to put this application on a queue called Route Sandbox. There is basically limited capacity on this queue. I can quickly show you that. I won’t go into the nitty gritty gory details, but in unicorn you can define various cues. And then from the parent cues, you can have child cues and each cue will have a maximum amount of memory and CPU. And this will determine how many spark applications you can fit on that queue. And if there’s one that doesn’t quite fit, then it’ll be queued. It’ll have to wait until the other jobs are finished. So here I’m specifying that queue iq, and as you can see, I’m specifying gang scheduling and the two task groups. So

Speaker 3: 00:24:36 I’m going to first make sure I’ve got rid of all the

Speaker 4: 00:24:43 Spark applications.

Speaker 3: 00:24:52 Okay, I’m going to submit the first one. So it’s been created and we can get the status of the applications. You see we have

Speaker 2: 00:25:34 The Spark application we picked up by the operator, and it will do the spark submit for us, and when it submits a driver, pod will be created

Speaker 3: 00:25:47 And we can see that here. Okay, so it’s been submitted.

Speaker 1: 00:26:31 Hey David, can you increase the fonts a little bit? I dunno if that’s easy or zoom in a little bit.

Speaker 3: 00:26:41 Yeah, how do I do that in the tone?

Speaker 1: 00:26:47 I wonder.

Speaker 3: 00:26:48 Oh,

Speaker 1: 00:26:52 Command plus also might help too. I think I was done. Yeah, that’s better. Yeah, I think it’s definitely better on my end for sure. Okay, cool. Thanks.

Speaker 3: 00:27:19 Yeah, so I’m going to look at the

Speaker 4: 00:27:23 Pods. One sec.

Speaker 1: 00:27:58 Sorry, I forgot my canine. No worries. This is fun of live demos.

Speaker 2: 00:28:06 So yeah, you can see here that we have, these are the four placeholder pods that unicorn spun up. And I can describe this pod. You can see the annotations on the pod relating to uni unicorn and go to the events instead of the default Kubernetes schedule, we now have uni unicorn. And also if you notice that at the time of scaling up, sorry, at the time of scheduling these placeholders, we didn’t have sufficient resources. So carpenter, I dunno if you know carpenter, it’s a node. Autoscaler basically sees that there are appending pods with certain resource constraints and that there are no nodes to accommodate those pods to come to. Goes ahead and says that the pod should schedule on a specific node and then it creates, basically the placeholder is just a container. This morning, pause, sorry. And then while we were looking at that, those events, basically those placeholder pods were replaced by actual Spark application pods. So basically this demonstrates that the Spark application had all the resources that it needed to get going. It wasn’t starved at the resources, but we weren’t degrading resources and there was no partial deployment. And yeah, this will carry on going, but now I want to show you what happens when we deploy two sparking applications to the same queue. And using Unicom, we can demonstrate that we won’t be starving one application of resources because the two applications will be competing for the same underlying notes. So let me just get rid of this application,

Speaker 1: 00:30:50 And I don’t want to distract you, but there was a question from Omer. It says, can you please explain how the operator handles spot interruption from the executors?

Speaker 2: 00:31:01 Yeah, so the Spark operator, or more specifically, the driver will have mechanisms to handle execute a failure. When the executor is lost, the driver automatically requests a new executor to replace the lost executor. And actually that’s going to be the second part of my talk. When that happens, if you don’t have any mechanism to handle gracefully that node failure, then you can lose important shuffle data that was generated at that particular moment when the executor was lost. And that can increase your runtime and make the job run for a lot longer. So that’s not a second part of the top where we show how to handle spot interruptions.

Speaker 1: 00:32:01 All right, thanks.

Speaker 2: 00:32:03 Yeah, so now I’m going to submit two. So this part’s, I’m going to submit two applications and they’re basically doing the same thing, but if you remember the queue, the unicorn queue had a limited capacity and it wouldn’t be able to handle two applications at the same time. They’ll have to wait for the Spark application or the operator to submit the application before the driver shows up.

Speaker 4: 00:32:51 Take some time.

Speaker 3: 00:33:04 Yeah, this is the time to ask you questions. Okay, submit

Speaker 2: 00:33:07  It. Okay, so we should see, yeah, so we already see the driver and place all the pods. They’ll already be on the queue. Now I’m going to submit a second one,

Speaker 3: 00:33:33  And it’s

Speaker 2: 00:33:33  Exactly the same except for a few. I think I may have modified the resource requests and limits and change the application name, but yeah, I’m going to submit that too. And then we can go back to MS. And so you see this is already running actually, and we should see the second one come up, but it won’t say it won’t be able to run immediately.

Speaker 3: 00:34:14 And while that’s being submitted, there is a unicorn UI where you

Speaker 2: 00:34:18 Can check out. So you can see we have the first application taxi trip app one, and we have app two. And if we look at app two and describe it, look at the events,

Speaker 3: 00:35:09 It should say that it cannot be run.

Speaker 2: 00:35:34 There you go. Unicorn is giving a message that this executor, which is part of that gang or that task group, is queued and waiting for allocation because there’s not enough space on the queue. So it ensures a fairness like scheduling fellow fairness to entire application, not just pods. And you can see this, can you see my new unicorn ui?

Speaker 4: 00:36:06 Yeah.

Speaker 2: 00:36:07 Yep. Yeah. So you can see here you can view specific applications on particular queues. And you can see the two applications that I submitted first one’s running because it fits on the queue, but the second one is accepted, but it’s not allowed to run yet because it has to wait for the second one to finish. So yeah, just to reiterate, with a default, Kubernetes scheduler priority and furnace is only given at the pod level. So it could be the case when you saturate the cluster with multiple drivers that smaller applications could cut in line and steal needed resources from bigger data processing jobs.

Speaker 3: 00:37:04 Hope that makes sense.

Speaker 2: 00:37:09 Okay. So I don’t think we need to wait for these to finish, but basically when these finish, these placeholders will be replaced and then this will run. But if the queue had enough space to accommodate both of these applications, then both would run at the same time. I’m going to go to

Speaker 3: 00:37:28 The next section if I can find the tool.

Speaker 1: 00:37:41 There was one more question, but if it makes sense, if you’re going to answer this later, then

Speaker 2: 00:37:47 Oh no, please ask the questions.

Speaker 1: 00:37:50 Sure. Okay. This is from Kalin. It says I was taking notes, it might’ve missed it, but how does your application aware scheduling problem relate to Kubernetes resource requests limits and taints tolerations? And then the follow up was it seems like those tools must be insufficient to solve the problem, but unicorn addresses it. Well, have I got that right?

Speaker 2:00:38:15 If I understood the question correctly, the person is asking how does Yun solve the problem of resource constraints and scheduling? I don’t quite understand the question.

Speaker 1: 00:38:33 Let’s see. Okay, let’s see.

Speaker 2: 00:38:35 But basically, basically you use the unicorn task group spec to specify the minimum resources, which should be made available for the actual pods, and they must match the pod spec of the actual driver and executor. So take this as an example. You have no select and tolerations here, and the driver should be on machine that can accommodate six gig and one CPU. And this matches exactly what we have for the driver in the actual main spec or the spark application. So if one CPU, sorry, that’s for the executor, but same idea, we have minimum members, the minimum number of pods that we would like to schedule for the main application. So this is, that’s saying that we will schedule four placeholders and they’re going to ensure the minimum compute underlying compute in terms of EC2 nodes, and it’s going to bring up EC2 nodes, which are tainted with these labels so that the right type of machines, basically the resource constraints is six gigabytes matches up with this. So it’s memory plus memory overhead. This corresponds to JVM. So it is just ensuring the minimum compute at an application level while respecting resource and scheduling constraints, if that makes sense.

Speaker 1:00:40:28 Yeah. Great. Thank you.

Speaker 3: 00:40:31 Was there a second question?

Speaker 1: 00:40:33 Oh, well, there was. Okay. Maybe, and he maybe answered this, but so you need a higher level of abstraction than the pod, something that takes into account resources like deployment and maybe volumes?

Speaker 3: 00:40:52 That’s

Speaker 4: 00:40:53 A good question.

Speaker 2: 00:41:03 We could probably get onto volumes in the next section.

Speaker 4: 00:41:08 Okay.

Speaker 2: 00:41:09 Yep.

Speaker 3: 00:41:15 So let’s move on.

Speaker 2: 00:41:21 Yeah. Next section is, there was a question on how do we gracefully handle spot interruptions? Yeah, this section I,

Speaker 3: 00:41:39 Sorry about that. Yeah, so like I said before,

Speaker 2: 00:41:44 Those spark executors are resilient to spark intuitions because the driver will detect that and execute was lost and request additional executors to replace that. The shuffle files, and I’ll explain what shuffle files mean in a second. Basically important data that is read into the desk is going to be lost when the executor gets killed. And that means that that data will need to be computed, which will increase the overall runtime of the job. And spark released two features to address this issue. One was, one is called no decommissioning, whereby executors upon receiving that termination signal in that two minute grace period will copy over any important data to neighboring executors. And if an executor doesn’t exist, it can fall back to S3.

00:42:56 But there’s another approach too where we can use persistent volumes attached to each executor to store important data. And when an executor gets killed, the persistent volume claim can reuse the data on the PVC can be reused by the new executor that comes up. So yeah, this is just going to improve the overall resiliency of the application and increase or decrease one time. So yeah, basically you can, in a Spark application running on Kubernetes, you can put the driver on demand instances and they must go on demand because they’re the drivers, right? They hold the critical metadata and the execution state of the entire job. It’s the thing that’s orchestrating everything. But the executors, however, can go on spot instances because they’re doing a bunch of stateless computation, don’t store any critical metadata.

00:44:10 And although spot instances do offer steep discounts, they come with a caveat they can be reclaimed at any moment. And so like I said before, what whilst SPOT can be resilient to interruption by requesting additional drivers, it can result in lost data and having to waste time recomputing data that we’ve already operated on. So it’s adding to the runtime of the overall job. So yeah, the last force isation of files and basically you have to either go back to the last cache stage or worse you had to go back to the original data source. And so just to appreciate why this process is so time consuming, just going to go through a brief high level overview of what’s happening under the hood. So Spark is used for data processing and if you’ve heard of Pandas, but it uses data frames and you can perform various operations on the data. And there’s a specific type of operation which causes a reorganization of the data across nodes in the cluster. And it’s this operation which is expensive. So there is one such operation code group by key and basically data from all the older partitions on all the executors. So here we have three executors and they have, the data is organized according to partitions and group by key,

00:46:16 Sorry. Basically each one is holding a set of key value pairs where the key is a word and the value is basically account. And basically this data is redistributed across the cluster so that all words, sorry, all counts for given word end up on the same executor ready for aggregation. So you can see that the word hello ended up on this executor, and we have two one. So it’s basically sorting and merging values ready for aggregation, but it’s expensive because all of this data has to be written before it’s transferred over. The network from source to target has to be written to disk. Once it’s on the source executor, it has to be read from that. And then you can see here that the values had to be sorted and merged. And this is very CPU memory intensive. So this group by key operation, which causes a shuffle, is inherently expensive. So if we lose an executor due to spot termination, we lose all the shuffle data it held, which means we’ll have to potentially repeat it, repeat all of these steps or only specific steps, and we may have to go back one stage or to the original data source. So it’s basically adding a lot of time onto our entire job and it’s going to slow us down. And basically this is why developing a strategy to assist

00:48:10 The careful data, for instance, this data to disc allows us to quickly recover from the event that we lose and execute. So yeah, it’s a balancing between saving money and maintaining job performance and reliability. So this is basically the way it works. And if you j salon model, there’s AWS block on how it works. But yeah, using a persistent volume claim, which is tied to an EBS volume, allows us to decouple the data and the processing because we can use it to spill the shuffle data into local storage. So you can see here we had a Kubernetes sick term, which comes from the spot termination. I’m going to wait for it to come back and it kills the executor and we don’t have to do much in that two minute grace per, because the data is stored locally on the EBS, we don’t have to move it around.

00:49:42 And then basically the persistent volume claim, which is tied to the EBS, is not tied to the lifetime of the execut executor because Spark driver owns that persistent volume claim. So that when a new executor comes up to replace the all executor, the persistent volume claim can be reattached or can be attached as a new executor. And so that means we don’t have to recompute the shuffle files and the new executor can pick up where the old executor left off before he got killed. This allows us to save time required to request a new volume for an executor, and the files can be reused without moving files around because they stay on the EBS volume.

00:50:47 To enable this, you just have to add specific configurations to the Spark conf in the application. I think there was a question about specifying EBS volumes and in the unicorn config, so that is done actually dynamically by the driver. So the driver is the one that’s requesting persistent volume claims tied to EBS volumes through, and it assumes that you have the E-B-S-C-S-I driver installed in your cluster to dynamically provision and manage the lifecycle of EBS volumes. And that’s what we’re doing here. We’re giving the name of the claim, the storage class GP three, the size, the map path inside the container for the driver and executor. And we’re going to be writing, so we don’t need false. And each there will be a separate EBS volume per executor. And then these are these settings or so that the driver owns the PVC, meaning the PC, the PVC is not tied to the lifetime of the executor. So when executed dies, the PVC remains. And here this means that the new executor that comes up can be reused. So I’m going to give a quick demo. Basically this demo is just going to show that we can still run a Spark application, I guess in the expected time, even after manually killing one of the spot notes.

Speaker 3: 00:53:05 So again, I’m going to run delete all the apps, and this is the time to ask you questions. I’m

Speaker 2: 00:53:21 Going to submit this same one I run before, and also notice those labels in the spark config. So you can dynamically mount PVCs per executor, given a storage class and size. It’s basically saying that the driver pod becomes the owner instead of the executors. And then this label is saying that instead of creating a new PVC, we can kind of reuse the old one. So I’m going to apply that.

Speaker 3: 00:54:11 Okay.

Speaker 2: 00:54:15 I am going to look at the notes. I’m going to want to kill one of them. I can probably see that in here actually.

Speaker 4: 00:54:27 Okay, so wait, sorry.

Speaker 3: 00:54:56 Okay, so the job was submitted and we have a pods, they haven’t been scheduled on a node yet,

Speaker 1: 00:55:17 But just so you know, someone had put a comment about asking, saying they were pretty new to this and asking if this was all open source. And I pointed them to both the Apache websites letting Spark and Unicorn are both part of the Apache Software Foundation.

Speaker 2: 00:55:36 These all open source, they install both Unicorn and the Spark operator installed as helm charts. Basically, when I spin up the ETS cluster with Terraform during the bootstrap stage, I install these things as a charts, but basically I would recommend basically start on this journey through going to, actually, it’s a similar acronym, data on Kubernetes versus data on EKS. And here there’s a set of blueprints

Speaker 3: 00:56:18 To help you get started with

Speaker 2: 00:56:25 Deploying data applications on Kubernetes. So

Speaker 3: 00:56:31 You can see, sorry, I forget where it’s

Speaker 2: 00:56:41 Blueprint. Oh yeah, here. So there’s a set of blueprints and you can see data analytics on EKS block operator with Uni Unicorn. So this goes into a lot more detail than I’m going into. And they have a whole guide on deploying solution Carpenter Unicorn. I showed using EBS volumes to store data, but they also show examples of using local instant storage for shuffle data. Yeah, I would recommend going to data on EKS and checking out their blueprints. They have a bunch of GitLab repo too that you can check out.

Speaker 3: 00:57:30 Great resource. Okay,

Speaker 2: 00:57:38 So it looks like they put them all on the same node, but it doesn’t matter. There’s an EBS volume per executor, and we’ll wait for it to get started and then we’ll kill it. That is another thing to mention. This application, four executors, maybe you don’t want ’em to be on the same note. So you can add pod affinities, topology spread constraints to false spread of the pods across different nodes. But I didn’t do that here. And I think Carpenter just picks the node, tries to bin pack as many pods on the same node,

Speaker 3: 00:58:40 So you can see, had to see the locks. Yeah,

Speaker 2: 00:58:50 So you can see it’s going along, it’s performing tasks across the executors. 1, 2, 3, 4, 5, 1, 2, 3, 4. And I’m going to kill one of, well, I’ll be killing them all.

Speaker 3: 00:59:09 So let’s take this IP,

Speaker 1: 00:59:18 Just an FYI David, we’re running up to the hour shortly.

Speaker 2: 00:59:22 Alright, I’m nearly finished.

Speaker 1: 00:59:25 Cool.

Speaker 2: 00:59:26 I’m going to kill the note and then we’re done.

Speaker 1: 00:59:30 That sounds very dramatic.

Speaker 2: 00:59:35 Where are you? I’m simulating a spot termination.

Speaker 4: 00:59:41 Okay.

Speaker 3: 00:59:51 It’s terminating. And

Speaker 2: 00:59:55 I can look at here. Decommissioning not enabled, trying to create persistent volume and it will say it doesn’t need to create one because it’s going to reuse one, right? It is requested, we deleted PO 1, 2, 3, 4, and now it’s going to spin up 5, 6, 7, 8, and it will

Speaker 3: 01:00:20 Reuse an existing PVC or said reusable.

Speaker 2: 01:00:49 It’s still pending, but yeah, that is the end of the talk when these pods come up can show that they reuse the existing PVC. But yeah, guess I’m done. But yeah, main takeaway is so we can use a batch scheduler so that a batch schedule, which is application aware, so we can schedule all pods together to ensure that the application has the resources it needs to get going. And also that we can be resilient to spot termination by reusing data that’s stored on PVCs. Thank you.

Speaker 1: 01:01:33 Awesome. Thanks David. Maybe there was a couple more questions. Maybe we can throw them in the Slack chat for DOK and then I’ll ping you David, if you can answer, that’d be awesome.

Speaker 3: 01:01:53 Sure.

Speaker 1: 01:01:56 But yeah, thank you so much. Yeah, thanks everybody for watching. I think that was,

Speaker 2: 01:02:01 Thank you.

Speaker 1: 01:02:02 Very informative. So I think you helped a lot of people out there. We will try and get through our DOK quiz real quick. I know we’re past the hour a little bit, but if anybody who’s still here and wants to participate, let’s, let’s go for it. Let’s do that. And D Digest will be queuing that up in just a second. Okay. So bust out your phones and scan that QR code. You can take part in our quiz, just ask some questions over some things that David talked about and maybe some things that I talked about. We’ll see. Yeah, it is fair for fun. So if you win, you can get a DOK shirt. There’s a little delay, so we’ll wait for people to register. David, you can participate too, although you might have an unfair advantage.

Speaker 1 : 01:03:12 My next question.

Speaker 1:  01:03:20 All right, we’ve got some participants here. All right, one of five questions. Let’s do it. And the faster you answer, the more points you get. So what is one of the main benefits of Running Spark or Kubernetes?

01:03:47 All right, we’ve got four answers in. Okay. Everybody’s voted. Let’s see who’s in the lead. All right, this is anonymized, so we don’t know who’s who, but St. Elmo fastest and correct. And by the way, at the end of this, since I don’t know who it is, message me on Slack, however, LinkedIn or whatever, and let me know and then we’ll get the information to send you the shirt. Alright. Which components described as providing application aware scheduling for Spark jobs on Kubernetes? Cut, spark operator, carpenter, unicorn, Argo workflows. Oh, wait a second. Okay. All right. Saint Elmo. Still in the lead, I think. Oh wait,

01:05:00 You have a new leader? Oh wait, there’s some reshuffling. Okay, you’re right. Saint Saint Elbow still in the lead here. Question number three, what type of scheduling does Unicorn provide for Spark applications? Round robin, priority based, all or none. Gang scheduling or random scheduling. Ooh. All right. Everybody got it right? I, oh, even got a new player I think went from five to six players. Okay. Ooh, it’s so close. Alright, it’s going to come down to this last question. Whoever gets this fastest I think. Okay. Oh no, I’m sorry. There’s two more questions. Okay. What are the two strategies mentioned for running fault tolerant and cost optimized Spark Clusters and EKS?

01:06:22 I think this is one of our most challenging quizzes. We really are ready to pay attention. Oh wow. This is actually one of the closest games I’ve seen. Alright. Okay, question five. This is the last one, I promise. This is not technical. When will DOK talk announcements occur? So you had to pay attention to me or so and if you were here early, you would benefit from that. So August 26th, August 30th, September 10th, or September 19th. Ooh, okay. The one person who really heard me this tells you it benefits you to listen to me at the beginning, what I have to say, the community announcements. Okay. Alright. This was pretty close, so I think we’ll find out who, I think maybe Saint, oh wait, did the Hustler win?

01:07:45 I can’t tell Ness. Is that the winner? Yes, that’s the winner. Oh wow. Came from Behind the Hustler. Okay. The Hustler. If that’s you, reach out to me on Slack or LinkedIn, wherever you can find me. Paul Al on LinkedIn au. It’s my last name. Or on Slack, let me know and we’ll get you a shirt. But yeah, once again, David, thank you so much. Really appreciate you taking the time. This was a informative talk, so really appreciate you taking the time to do that. And then also thank you to everybody who came to the talk and asked questions. And like I said, I’ll post the question in Slack the last couple of questions and maybe David will be so kind as to answer them for us. All right, cool. Well everybody thank you for coming and we’ll see you at the next one.

Speaker 3: 01:08:41 Alright, thank you.

Speaker 1: 01:08:42 All right, see you later. Bye-Bye.

 

Data on Kubernetes Community resources