SlideShare a Scribd company logo
DIOS: Dynamic Instrumentation for (not so) Outstanding Scheduling Blake Sutton & Chris Sosa
Motivation ON OR
Approach: Adaptive Distributed Scheduler Centralized global scheduler and distributed local services Hares monitor machines for “undesirable” events Hares also gather  application-specific info  with Pin Rhino schedules jobs and responds to events from Hares Migrate Pause / Resume Kill / Restart
“ Pinvolvement”:  What it is Insert new code into apps on the fly No recompile Operates on a copy  Code caching Our Pintool Routine-level Instruction-level pin –t mytool -- ./myprogram Borrowed from Luk et al. 2005.
“ Pinvolvement”:  What it measures No reliance on hardware-specific-performance counters Want to capture memory behavior over time Gathered: Ratio of malloc to free calls Wall-clock time to execute 10,000,000 insns Number of memory ops in last 2,000,000 insns
Evaluation Distributed scheduler Rhino on realitytv13, Hare on realitytv13-16 heatedplate with modified parameters Hares detect if lower than 10% memory available and informs Rhino to take action Rhino reschedules youngest job at Hare site Baseline: Smallest Queues Pintool 2 applications from SPLASH-2 Heatedplate
Results: The Good Scheduler shows potential for improvement Lower total runtime with simple policy
Results: The Bad Overhead from Pintool is too high to realize gains Pin isn’t designed for on-the-fly analysis Could not reattach Code caching isn’t enough 7.64 7.90 14.51 6.27 1.25 1.00 lu 5.81 6.04 7.84 2.87 1.48 1.00 ocean 7.26 7.45 5.43 2.65 1.88 1.00 heatedplate latency # mems malloc/free count only pin native application
Results: The “Interesting” Pintool does capture intriguing info…
Other Issues Condor Process migration requires re-linking Doesn’t support multithreaded applications Other “user-level” process migration mechanisms have similar requirements Pin Unable to intersperse low and high overhead with Pintool Even the smallest overhead was not negligible Up to almost 2x slowdown just using Pin with heatedplate and no extra instrumentation Scheduling decisions have a bigger impact for long-running jobs
Conclusion: the Future of DIOS Overhead is prohibitive (for now) Pin needs to support reattach Lighter instrumentation framework However, instrumentation can capture aspects of application-specific behavior Future Work Pin as a process migration mechanism
¿ Preguntas?
Wait…hasn’t this been solved? Condor  popular user-space distributed scheduler process migration tries to keep queues balanced but jobs have different behavior over time from each other LSF (Load Sharing Facility) monitors system, moves processes around based on what they need must input static job information (requires profiling etc beforehand) what if something about your job isn't captured by your input? what if you end up giving it margins that are too large? too small?  unnecessary inefficiencies? it's not exactly hassle-free...   Hardware feedback PAPI Still not very portable (invasive kernel patch for install) Wouldn't it be nice if the scheduler could just..."do the right thing"?
Ad

Recommended

DIOS
DIOS
awesomesos
 
DBOps
DBOps
strikr .
 
Deterministic and high throughput data processing for CubeSats
Deterministic and high throughput data processing for CubeSats
Pablo Ghiglino
 
Building an event system on top MongoDB
Building an event system on top MongoDB
BigPanda
 
Introduction to Klepsydra
Introduction to Klepsydra
Pablo Ghiglino
 
Efficient IT operations using monitoring systems and standardized tools - Ici...
Efficient IT operations using monitoring systems and standardized tools - Ici...
Icinga
 
PV Monitoring Systems w/Arturo Zarate
PV Monitoring Systems w/Arturo Zarate
solpowerpeople
 
Take control of your DevOps Dumping Ground; Melissa Sussmann
Take control of your DevOps Dumping Ground; Melissa Sussmann
Puppet
 
SplunkLive! Customer Presentation - Garmin International
SplunkLive! Customer Presentation - Garmin International
Splunk
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments
Liming Zhu
 
Splunk Implementation and Usage - Garmin
Splunk Implementation and Usage - Garmin
Splunk
 
Production profiling: What, Why and How
Production profiling: What, Why and How
RichardWarburton
 
Reactive Microservices with eclipse vert.x
Reactive Microservices with eclipse vert.x
Tiera Fann, MBA
 
Semi-Real Time Inclinometer readings using Wireless Technologies
Semi-Real Time Inclinometer readings using Wireless Technologies
RekaNext Capital
 
Handling Byzantine Faults
Handling Byzantine Faults
awesomesos
 
Amazon’s Cloud Computing Efforts
Amazon’s Cloud Computing Efforts
awesomesos
 
Masters of Science presentation: Bringing The Grid Home
Masters of Science presentation: Bringing The Grid Home
awesomesos
 
An Installable File System For Genesis II
An Installable File System For Genesis II
awesomesos
 
Bringing The Grid Home for Grid2008
Bringing The Grid Home for Grid2008
awesomesos
 
A Guide to DAGMan
A Guide to DAGMan
awesomesos
 
A Hardware Architecture For Implementing Protection Rings
A Hardware Architecture For Implementing Protection Rings
awesomesos
 
Distributed Snapshots
Distributed Snapshots
awesomesos
 
"Scaling in space and time with Temporal", Andriy Lupa .pdf
"Scaling in space and time with Temporal", Andriy Lupa .pdf
Fwdays
 
Embedded Intro India05
Embedded Intro India05
Rajesh Gupta
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
Hariharan Ganesan
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
Spark Summit
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Natural Laws of Software Performance
Natural Laws of Software Performance
Gibraltar Software
 
operating system question bank
operating system question bank
rajatdeep kaur
 

More Related Content

What's hot (6)

SplunkLive! Customer Presentation - Garmin International
SplunkLive! Customer Presentation - Garmin International
Splunk
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments
Liming Zhu
 
Splunk Implementation and Usage - Garmin
Splunk Implementation and Usage - Garmin
Splunk
 
Production profiling: What, Why and How
Production profiling: What, Why and How
RichardWarburton
 
Reactive Microservices with eclipse vert.x
Reactive Microservices with eclipse vert.x
Tiera Fann, MBA
 
Semi-Real Time Inclinometer readings using Wireless Technologies
Semi-Real Time Inclinometer readings using Wireless Technologies
RekaNext Capital
 
SplunkLive! Customer Presentation - Garmin International
SplunkLive! Customer Presentation - Garmin International
Splunk
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments
Liming Zhu
 
Splunk Implementation and Usage - Garmin
Splunk Implementation and Usage - Garmin
Splunk
 
Production profiling: What, Why and How
Production profiling: What, Why and How
RichardWarburton
 
Reactive Microservices with eclipse vert.x
Reactive Microservices with eclipse vert.x
Tiera Fann, MBA
 
Semi-Real Time Inclinometer readings using Wireless Technologies
Semi-Real Time Inclinometer readings using Wireless Technologies
RekaNext Capital
 

Viewers also liked (8)

Handling Byzantine Faults
Handling Byzantine Faults
awesomesos
 
Amazon’s Cloud Computing Efforts
Amazon’s Cloud Computing Efforts
awesomesos
 
Masters of Science presentation: Bringing The Grid Home
Masters of Science presentation: Bringing The Grid Home
awesomesos
 
An Installable File System For Genesis II
An Installable File System For Genesis II
awesomesos
 
Bringing The Grid Home for Grid2008
Bringing The Grid Home for Grid2008
awesomesos
 
A Guide to DAGMan
A Guide to DAGMan
awesomesos
 
A Hardware Architecture For Implementing Protection Rings
A Hardware Architecture For Implementing Protection Rings
awesomesos
 
Distributed Snapshots
Distributed Snapshots
awesomesos
 
Handling Byzantine Faults
Handling Byzantine Faults
awesomesos
 
Amazon’s Cloud Computing Efforts
Amazon’s Cloud Computing Efforts
awesomesos
 
Masters of Science presentation: Bringing The Grid Home
Masters of Science presentation: Bringing The Grid Home
awesomesos
 
An Installable File System For Genesis II
An Installable File System For Genesis II
awesomesos
 
Bringing The Grid Home for Grid2008
Bringing The Grid Home for Grid2008
awesomesos
 
A Guide to DAGMan
A Guide to DAGMan
awesomesos
 
A Hardware Architecture For Implementing Protection Rings
A Hardware Architecture For Implementing Protection Rings
awesomesos
 
Distributed Snapshots
Distributed Snapshots
awesomesos
 
Ad

Similar to DIOS - compilers (20)

"Scaling in space and time with Temporal", Andriy Lupa .pdf
"Scaling in space and time with Temporal", Andriy Lupa .pdf
Fwdays
 
Embedded Intro India05
Embedded Intro India05
Rajesh Gupta
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
Hariharan Ganesan
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
Spark Summit
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Natural Laws of Software Performance
Natural Laws of Software Performance
Gibraltar Software
 
operating system question bank
operating system question bank
rajatdeep kaur
 
Understanding the characteristics of android wear os
Understanding the characteristics of android wear os
Pratik Jain
 
5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx
MohamedBilal73
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
ScyllaDB
 
Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)
siouxhotornot
 
Workload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning Platform
Activeeon
 
Autosar Basics hand book_v1
Autosar Basics hand book_v1
Keroles karam khalil
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
inside-BigData.com
 
Evolving to Cloud-Native - Nate Schutta (2/2)
Evolving to Cloud-Native - Nate Schutta (2/2)
VMware Tanzu
 
PPT.pdf
PPT.pdf
RameshBabu461344
 
Real time operating system which explains scheduling algorithms
Real time operating system which explains scheduling algorithms
Lavanya Sandeep
 
PART-1 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-1 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
FastBit Embedded Brain Academy
 
Automatic Undo for Cloud Management via AI Planning
Automatic Undo for Cloud Management via AI Planning
Hiroshi Wada
 
"Scaling in space and time with Temporal", Andriy Lupa .pdf
"Scaling in space and time with Temporal", Andriy Lupa .pdf
Fwdays
 
Embedded Intro India05
Embedded Intro India05
Rajesh Gupta
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Real Time Operating system (RTOS) - Embedded systems
Real Time Operating system (RTOS) - Embedded systems
Hariharan Ganesan
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
Spark Summit
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Data Con LA
 
Natural Laws of Software Performance
Natural Laws of Software Performance
Gibraltar Software
 
operating system question bank
operating system question bank
rajatdeep kaur
 
Understanding the characteristics of android wear os
Understanding the characteristics of android wear os
Pratik Jain
 
5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx
MohamedBilal73
 
Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
ScyllaDB
 
Sioux Hot-or-Not: The future of Linux (Alan Cox)
Sioux Hot-or-Not: The future of Linux (Alan Cox)
siouxhotornot
 
Workload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning Platform
Activeeon
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
inside-BigData.com
 
Evolving to Cloud-Native - Nate Schutta (2/2)
Evolving to Cloud-Native - Nate Schutta (2/2)
VMware Tanzu
 
Real time operating system which explains scheduling algorithms
Real time operating system which explains scheduling algorithms
Lavanya Sandeep
 
PART-1 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
PART-1 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
FastBit Embedded Brain Academy
 
Automatic Undo for Cloud Management via AI Planning
Automatic Undo for Cloud Management via AI Planning
Hiroshi Wada
 
Ad

More from awesomesos (9)

PicFS presentation
PicFS presentation
awesomesos
 
Online feedback correlation using clustering
Online feedback correlation using clustering
awesomesos
 
Web Service Choreography Interface (Wsci)
Web Service Choreography Interface (Wsci)
awesomesos
 
Hadoop Tutorial
Hadoop Tutorial
awesomesos
 
Lustre And Nfs V4
Lustre And Nfs V4
awesomesos
 
A Web Based Covert File System
A Web Based Covert File System
awesomesos
 
Distributed File Systems
Distributed File Systems
awesomesos
 
Exploring The Cloud
Exploring The Cloud
awesomesos
 
Data Grid Taxonomies
Data Grid Taxonomies
awesomesos
 
PicFS presentation
PicFS presentation
awesomesos
 
Online feedback correlation using clustering
Online feedback correlation using clustering
awesomesos
 
Web Service Choreography Interface (Wsci)
Web Service Choreography Interface (Wsci)
awesomesos
 
Hadoop Tutorial
Hadoop Tutorial
awesomesos
 
Lustre And Nfs V4
Lustre And Nfs V4
awesomesos
 
A Web Based Covert File System
A Web Based Covert File System
awesomesos
 
Distributed File Systems
Distributed File Systems
awesomesos
 
Exploring The Cloud
Exploring The Cloud
awesomesos
 
Data Grid Taxonomies
Data Grid Taxonomies
awesomesos
 

Recently uploaded (20)

UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
WebdriverIO & JavaScript: The Perfect Duo for Web Automation
digitaljignect
 
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
Salesforce Summer '25 Release Frenchgathering.pptx.pdf
yosra Saidani
 
Curietech AI in action - Accelerate MuleSoft development
Curietech AI in action - Accelerate MuleSoft development
shyamraj55
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 

DIOS - compilers

  • 1. DIOS: Dynamic Instrumentation for (not so) Outstanding Scheduling Blake Sutton & Chris Sosa
  • 3. Approach: Adaptive Distributed Scheduler Centralized global scheduler and distributed local services Hares monitor machines for “undesirable” events Hares also gather application-specific info with Pin Rhino schedules jobs and responds to events from Hares Migrate Pause / Resume Kill / Restart
  • 4. “ Pinvolvement”: What it is Insert new code into apps on the fly No recompile Operates on a copy Code caching Our Pintool Routine-level Instruction-level pin –t mytool -- ./myprogram Borrowed from Luk et al. 2005.
  • 5. “ Pinvolvement”: What it measures No reliance on hardware-specific-performance counters Want to capture memory behavior over time Gathered: Ratio of malloc to free calls Wall-clock time to execute 10,000,000 insns Number of memory ops in last 2,000,000 insns
  • 6. Evaluation Distributed scheduler Rhino on realitytv13, Hare on realitytv13-16 heatedplate with modified parameters Hares detect if lower than 10% memory available and informs Rhino to take action Rhino reschedules youngest job at Hare site Baseline: Smallest Queues Pintool 2 applications from SPLASH-2 Heatedplate
  • 7. Results: The Good Scheduler shows potential for improvement Lower total runtime with simple policy
  • 8. Results: The Bad Overhead from Pintool is too high to realize gains Pin isn’t designed for on-the-fly analysis Could not reattach Code caching isn’t enough 7.64 7.90 14.51 6.27 1.25 1.00 lu 5.81 6.04 7.84 2.87 1.48 1.00 ocean 7.26 7.45 5.43 2.65 1.88 1.00 heatedplate latency # mems malloc/free count only pin native application
  • 9. Results: The “Interesting” Pintool does capture intriguing info…
  • 10. Other Issues Condor Process migration requires re-linking Doesn’t support multithreaded applications Other “user-level” process migration mechanisms have similar requirements Pin Unable to intersperse low and high overhead with Pintool Even the smallest overhead was not negligible Up to almost 2x slowdown just using Pin with heatedplate and no extra instrumentation Scheduling decisions have a bigger impact for long-running jobs
  • 11. Conclusion: the Future of DIOS Overhead is prohibitive (for now) Pin needs to support reattach Lighter instrumentation framework However, instrumentation can capture aspects of application-specific behavior Future Work Pin as a process migration mechanism
  • 13. Wait…hasn’t this been solved? Condor popular user-space distributed scheduler process migration tries to keep queues balanced but jobs have different behavior over time from each other LSF (Load Sharing Facility) monitors system, moves processes around based on what they need must input static job information (requires profiling etc beforehand) what if something about your job isn't captured by your input? what if you end up giving it margins that are too large? too small? unnecessary inefficiencies? it's not exactly hassle-free...   Hardware feedback PAPI Still not very portable (invasive kernel patch for install) Wouldn't it be nice if the scheduler could just..."do the right thing"?

Editor's Notes

  • #3: Our project is about how to schedule jobs among a group of machines. Our implementation is at the user level, but the same idea could be applied in the kernel of a distributed operating system. Long-running, short-running, memory-intensive, cpu-bound…don’t know what kind of jobs to expect. So how can the scheduler put them where they should be if it doesn’t know these things? Transition: Wouldn’t it be nice if the scheduler could just “handle it” – without the user having specify characteristics of their jobs in advance?
  • #4: Our approach to this problem is DIOS – an adaptive distributed scheduler. Describe diagram: local schedulers (Hare) run on each machine, with queues of jobs. Global scheduler (Rhino) receives events from the Hares and sends down actions – like, migrate, or pause. Transition: So you must be thinking…wait, how are you going to just “gather application-specific info”?
  • #5: The answer is – we’ll write a tool with Pin, a dynamic instrumentation framework. Describe diagram – as you can see from the diagram, and from this command up here, Pin is kind of like a miniature virtual machine. It takes in a pintool and the program binary, and runs it in the context of Pin, inserting new code into the application as it runs – using the tool as the instructions for what code to execute and where to insert it. For example, a pintool to count the number of instructions executed in a program could insert code to increment a variable before every instruction. There are several point instrumentation can be introduced – our pintool uses routine-level and instruction-level.
  • #6: So we’ve established that Pin is a tool for what we want to do – dynamically instrument applications. But what code do we want to insert? What are we looking to get from our pintool? Since we are trying to detect and avoid memory contention between processes, it makes senses to study the memory behavior of the applications. To this end, we chose three things (describe them). The figure to side there shows how the pintool fits in to our overall plan – it would collect information for each application and report the results to Hare, the local scheduler. Then Hare, which is also monitoring the memory subsystem of the local machine, reports to Rhino, and Rhino decides what to do.
  • #7: Considering our motivation, it was important to try to evaluate it on a somewhat realistic workload. Since it seems like most long-running jobs on clusters are scientific applications, we wanted to use real scientific benchmarks. Describe benchmarks. To evaluate the scheduler, we measured the total runtime from groups of 100 jobs. We varied the parameters to the heatedplate program (dataset size and number of iterationas) in order to vary the length of the jobs, and produced a set of jobs on a curve – a great many short-running jobs with a few long-running jobs. Past work indicates that is a common job submission trend in batch systems. Then, to evaluate our pintool, we measured the overhead from running each application with our pintool and also tracked the information we collected over time to see if we could correlate it to interesting behavior or differences between programs.
  • #8: So here are our results from evaluating the distributed scheduler by itself. The good news is we saw potential for improvement –just from using a simple policy to react to the presence of memory contention, the total runtime goes down. Might be able to get even better results on long-running jobs, with better information on the running processes (like we could get from dynamic instrumentation!) So if you’re wondering why we’re showing you results for our scheduler with this simple policy – but not with our whole system of including application-specfic information…well that brings me to The Bad.
  • #9: Although our scheduler works perfectly well with the pintool, we discovered that the overhead introduced by Pin is just too much. Some of our overhead results are below – we show the time to run the application natively, with pin (no pintool), with a tool that only counts instructions, and with our three metrics. The way we hoped to solve the overhead problem originally was to basically only instrument when we needed to –like when the scheduler decided the machine was performing badly. Then, the relatively high overhead to run the analysis wouldn’t have to make much of an impact overall. However, we were unable to get the performance gains we hoped – Pin doesn’t offer the ability to completely attach and detach from a running program, only to attach, and we discovered when we tried to add and remove instrumentation dynamically that we lost the gains from code caching. So while this idea could work with another system or with a new Pin, we couldn’t manage to bring the overhead down.
  • #10: But on the bright side, we were able to collect some interesting information – this figure shows the variation over time of our memory instruction measurements – it shows the change in the number of memory instructions executed in a window over time – hence the negative numbers. Note how similar the patterns of LU and heatedplate are – talk about how that’s probably because they are tightly looped and very repetitive, whereas Ocean is obviously performing a more irregular and complex analysis with some possible distinct phases in it. Possibility of using the variation in a metric like this to “predict the predictability” - to separate applications that are better left alone from those that are more likely to be safely handled by common heuristics, etc.
  • #12: So – the future of DIOS.
  • #13: Questions?
  • #14: Kind of...but no comprehensive solution.