SlideShare a Scribd company logo
Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data March 29, 2011Robert GrossmanInstitute for Genomics & Systems BiologyComputation InstituteUniversity of ChicagoandOpen Cloud Consortium
Part 1Biology, Big Data & Clouds2Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR).
Source: Lincoln Stein
The Challenge is to Support Cubes of Next Gen Sequence DataEach cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq,  movie, etc. data set.Different developmental stagesDifferent pathologiesPerturb the environment
Genomics as a Big Data Science
What is a new about clouds?6
7Scale is New
Elastic, On-Demand Computing with Usage Based Pricing Is New8costs the same as1 computer in a rack for 120 hours120 computers in  three racks for 1 hour
Part 2.  What is Bionimbus?www.bionimbus.org
Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.
Step 2. Send sample tobe sequenced.Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.IGSBSequencersBID GeneratorExternal SequencersStep 5.  Cloud based analysis using IGSB and 3rdparty tools and applications. Step 3a. Return rawreads.Step 3b. Returnvariant calls, CNV, annotation…Bionimbus Private Cloud UCBionimbus Community CloudStep 4. Secure datarouting to appropriatecloud based upon BID.Bionimbus Private Cloud XYAmazondbGaP
What is a good unit to understand data intensive computing of biological data?
Bionimbus & OSDC TodayThe NIH in the U.S. currently makes available for download approximately 2PB of data.Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.
GWT-based Front EndElastic Cloud ServicesDatabase ServicesAnalysis Pipelines & Re-analysis ServicesIntercloud ServicesLarge Data Cloud ServicesData Ingestion Services
Bionimbus Deployment OptionsBionimbus Community Cloudwww.bionimbus.orgBionimbusAMIs & Amazon hosted applicationsBionimbus Private Clouds
Part 3. Some Bionimbus Case
Case Study: Public Datasets in Bionimbus
Case Study:  ModENCODEBionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).BionimbusVMs were used for some of the integrative analysis.Bionimbus is used as a backup for the modENCODE DCC
Case Study: IGSBAll samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
Bionimbus Virtual Machine Releases 20
Part 4What is the OSDC?
Open Science Data CloudAstronomical dataBiological data (Bionimbus)NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)
23U.S based not-for-profit corporation.
Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
Manages cloud computing testbeds: Open Cloud Testbed.
Develop reference implementations, benchmarks and standards.www.opencloudconsortium.org
OCC MembersCompanies: Cisco, Citrix, Yahoo!, …Universities:  University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, …Federal agencies: NASAOther: National Lambda RailAdding international partnersin 2011.24
Infrastructure2010 Proof-of-Concept Infrastructure450+ nodes3000+ cores3+ PBFour data centers (two more to come in 2011)Data centers have 10G network connections (some 100G links in 2011)Plan to add approximately 1 PB of data in 2011.With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.
Towards a Long Term, Sustainable ModelCap Exp about $1M/yearOp Exp about $1M/yearMoore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.
Variety of analysisScientist with laptopWideOpen Science Data CloudMedSequencing centers, LHC, LSSTLowData SizeMedium to Large SmallVery LargeDedicated infrastructureNo infrastructureGeneral infrastructure
Persistent dataLargedata cloudsMeddatabasesHPCSmallCyclesLarge & spec. clustersSmall to medium clustersSingle workstations
Bionimbus Team*David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman,  A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P WhiteInstitute for Genomics & Systems BiologyUniversity of ChicagoJia Chen, YunhongGu and Damian RoqueiroUniversity of Illinois at ChicagoLincoln Stein and ZhengZhaOntario Institute for Cancer Research*In alphabetical order

More Related Content

PPTX
Open Science Data Cloud - CCA 11
PPTX
An Overview of Bionimbus (March 2010)
PPT
Large Scale On-Demand Image Processing For Disaster Relief
PPTX
Open Science Data Cloud (IEEE Cloud 2011)
PPT
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
PPTX
My Other Computer is a Data Center: The Sector Perspective on Big Data
PPTX
Bionimbus - An Overview (2010-v6)
PPTX
Health & Status Monitoring (2010-v8)
Open Science Data Cloud - CCA 11
An Overview of Bionimbus (March 2010)
Large Scale On-Demand Image Processing For Disaster Relief
Open Science Data Cloud (IEEE Cloud 2011)
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
My Other Computer is a Data Center: The Sector Perspective on Big Data
Bionimbus - An Overview (2010-v6)
Health & Status Monitoring (2010-v8)

What's hot (20)

PPTX
OCC Overview OMG Clouds Meeting 07-13-09 v3
PPTX
Bioclouds CAMDA (Robert Grossman) 09-v9p
PPTX
Project Matsu: Elastic Clouds for Disaster Relief
PDF
What Are Science Clouds?
PPTX
Bionimbus - Northwestern CGI Workshop 4-21-2011
PPTX
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
PPTX
NERSC, AI and the Superfacility, Debbie Bard
PDF
The Open Science Data Cloud: Empowering the Long Tail of Science
PDF
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
PPTX
Learning Systems for Science
PPTX
Open Science Data Cloud (June 21, 2010)
PPTX
Coding the Continuum
PDF
Using the Open Science Data Cloud for Data Science Research
PDF
ieee cloud 2015 keynote talk
PDF
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
PDF
What is a Data Commons and Why Should You Care?
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PPTX
Big Data, Big Computing, AI, and Environmental Science
PPTX
Networking Materials Data
PDF
PIC Tier-1 (LHCP Conference / Barcelona)
OCC Overview OMG Clouds Meeting 07-13-09 v3
Bioclouds CAMDA (Robert Grossman) 09-v9p
Project Matsu: Elastic Clouds for Disaster Relief
What Are Science Clouds?
Bionimbus - Northwestern CGI Workshop 4-21-2011
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
NERSC, AI and the Superfacility, Debbie Bard
The Open Science Data Cloud: Empowering the Long Tail of Science
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Learning Systems for Science
Open Science Data Cloud (June 21, 2010)
Coding the Continuum
Using the Open Science Data Cloud for Data Science Research
ieee cloud 2015 keynote talk
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
What is a Data Commons and Why Should You Care?
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Big Data, Big Computing, AI, and Environmental Science
Networking Materials Data
PIC Tier-1 (LHCP Conference / Barcelona)
Ad
Ad

Similar to Bionimbus Cambridge Workshop (3-28-11, v7) (20)

PPTX
Open Cloud Consortium Overview (01-10-10 V6)
PPTX
The Transformation of Systems Biology Into A Large Data Science
PDF
Big Data, The Community and The Commons (May 12, 2014)
PPTX
Climb bath
PPTX
The Commons: Leveraging the Power of the Cloud for Big Data
PDF
Ntino Krampis GSC 2011
PPTX
Open Cloud Consortium: An Update (04-23-10, v9)
PPT
Genomic Research: The Jump to Light Speed
PPTX
So Long Computer Overlords
PPTX
Rpi talk foster september 2011
ODP
Cloud BioLinux S.Africa
PDF
F02-Cloud-Cloud BioLinux
PDF
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
PPTX
Climb stateoftheartintro
PPTX
The NIH Data Commons - BD2K All Hands Meeting 2015
PDF
E Afgan - Zero to a bioinformatics analysis platform in four minutes
PDF
BIPMed at Cloud
PPTX
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
PPTX
CLIMB System Introduction Talk - CLIMB Launch
PDF
Chi next gen-ntino-krampis
Open Cloud Consortium Overview (01-10-10 V6)
The Transformation of Systems Biology Into A Large Data Science
Big Data, The Community and The Commons (May 12, 2014)
Climb bath
The Commons: Leveraging the Power of the Cloud for Big Data
Ntino Krampis GSC 2011
Open Cloud Consortium: An Update (04-23-10, v9)
Genomic Research: The Jump to Light Speed
So Long Computer Overlords
Rpi talk foster september 2011
Cloud BioLinux S.Africa
F02-Cloud-Cloud BioLinux
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Climb stateoftheartintro
The NIH Data Commons - BD2K All Hands Meeting 2015
E Afgan - Zero to a bioinformatics analysis platform in four minutes
BIPMed at Cloud
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
CLIMB System Introduction Talk - CLIMB Launch
Chi next gen-ntino-krampis

More from Robert Grossman (20)

PDF
Some Frameworks for Improving Analytic Operations at Your Company
PDF
Some Proposed Principles for Interoperating Cloud Based Data Platforms
PDF
A Gen3 Perspective of Disparate Data
PDF
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
PDF
A Data Biosphere for Biomedical Research
PDF
What is Data Commons and How Can Your Organization Build One?
PDF
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
PDF
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
PDF
AnalyticOps - Chicago PAW 2016
PDF
Keynote on 2015 Yale Day of Data
PDF
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
PDF
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
PDF
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
PDF
Architectures for Data Commons (XLDB 15 Lightning Talk)
PDF
Practical Methods for Identifying Anomalies That Matter in Large Datasets
PDF
Adversarial Analytics - 2013 Strata & Hadoop World Talk
PDF
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
PDF
Big Data - Lab A1 (SC 11 Tutorial)
PDF
Managing Big Data (Chapter 2, SC 11 Tutorial)
PDF
Processing Big Data (Chapter 3, SC 11 Tutorial)
Some Frameworks for Improving Analytic Operations at Your Company
Some Proposed Principles for Interoperating Cloud Based Data Platforms
A Gen3 Perspective of Disparate Data
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
A Data Biosphere for Biomedical Research
What is Data Commons and How Can Your Organization Build One?
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
AnalyticOps - Chicago PAW 2016
Keynote on 2015 Yale Day of Data
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Architectures for Data Commons (XLDB 15 Lightning Talk)
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Big Data - Lab A1 (SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
cloud_computing_Infrastucture_as_cloud_p
OMC Textile Division Presentation 2021.pptx
1. Introduction to Computer Programming.pptx
Tartificialntelligence_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Reach Out and Touch Someone: Haptics and Empathic Computing
MIND Revenue Release Quarter 2 2025 Press Release
Building Integrated photovoltaic BIPV_UPV.pdf
Getting Started with Data Integration: FME Form 101
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Programs and apps: productivity, graphics, security and other tools
NewMind AI Weekly Chronicles - August'25-Week II
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
cloud_computing_Infrastucture_as_cloud_p

Bionimbus Cambridge Workshop (3-28-11, v7)

  • 1. Bionimbus: A Cloud-Based Infrastructure for Managing, Analyzing and Sharing Genomics Data March 29, 2011Robert GrossmanInstitute for Genomics & Systems BiologyComputation InstituteUniversity of ChicagoandOpen Cloud Consortium
  • 2. Part 1Biology, Big Data & Clouds2Two of the 14 high throughput sequencers at the Ontario Institute for Cancer Research (OICR).
  • 4. The Challenge is to Support Cubes of Next Gen Sequence DataEach cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set.Different developmental stagesDifferent pathologiesPerturb the environment
  • 5. Genomics as a Big Data Science
  • 6. What is a new about clouds?6
  • 8. Elastic, On-Demand Computing with Usage Based Pricing Is New8costs the same as1 computer in a rack for 120 hours120 computers in three racks for 1 hour
  • 9. Part 2. What is Bionimbus?www.bionimbus.org
  • 10. Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.
  • 11. Step 2. Send sample tobe sequenced.Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.IGSBSequencersBID GeneratorExternal SequencersStep 5. Cloud based analysis using IGSB and 3rdparty tools and applications. Step 3a. Return rawreads.Step 3b. Returnvariant calls, CNV, annotation…Bionimbus Private Cloud UCBionimbus Community CloudStep 4. Secure datarouting to appropriatecloud based upon BID.Bionimbus Private Cloud XYAmazondbGaP
  • 12. What is a good unit to understand data intensive computing of biological data?
  • 13. Bionimbus & OSDC TodayThe NIH in the U.S. currently makes available for download approximately 2PB of data.Bionimbus 2010 consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.Bionimbus is part of the POC Open Science Data Cloud that consists of 14 racks, 472 nodes, 3776 cores and 3+ PB of storage.
  • 14. GWT-based Front EndElastic Cloud ServicesDatabase ServicesAnalysis Pipelines & Re-analysis ServicesIntercloud ServicesLarge Data Cloud ServicesData Ingestion Services
  • 15. Bionimbus Deployment OptionsBionimbus Community Cloudwww.bionimbus.orgBionimbusAMIs & Amazon hosted applicationsBionimbus Private Clouds
  • 16. Part 3. Some Bionimbus Case
  • 17. Case Study: Public Datasets in Bionimbus
  • 18. Case Study: ModENCODEBionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).BionimbusVMs were used for some of the integrative analysis.Bionimbus is used as a backup for the modENCODE DCC
  • 19. Case Study: IGSBAll samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.
  • 21. Part 4What is the OSDC?
  • 22. Open Science Data CloudAstronomical dataBiological data (Bionimbus)NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)
  • 24. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
  • 25. Manages cloud computing testbeds: Open Cloud Testbed.
  • 26. Develop reference implementations, benchmarks and standards.www.opencloudconsortium.org
  • 27. OCC MembersCompanies: Cisco, Citrix, Yahoo!, …Universities: University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, …Federal agencies: NASAOther: National Lambda RailAdding international partnersin 2011.24
  • 28. Infrastructure2010 Proof-of-Concept Infrastructure450+ nodes3000+ cores3+ PBFour data centers (two more to come in 2011)Data centers have 10G network connections (some 100G links in 2011)Plan to add approximately 1 PB of data in 2011.With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.
  • 29. Towards a Long Term, Sustainable ModelCap Exp about $1M/yearOp Exp about $1M/yearMoore Foundation providing $1M/year for 2011 and 2012 to support the Cap Exp.
  • 30. Variety of analysisScientist with laptopWideOpen Science Data CloudMedSequencing centers, LHC, LSSTLowData SizeMedium to Large SmallVery LargeDedicated infrastructureNo infrastructureGeneral infrastructure
  • 31. Persistent dataLargedata cloudsMeddatabasesHPCSmallCyclesLarge & spec. clustersSmall to medium clustersSingle workstations
  • 32. Bionimbus Team*David Hanley, Nicolas Negre, Elizabeth Bartom, Nicholas Bild, Christopher D. Brown, Marc Domanus, , Robert L Grossman, A. Jason Grundstad, Xiangjun Liu, Michal Sabala, Parantu K Shah, Kevin P WhiteInstitute for Genomics & Systems BiologyUniversity of ChicagoJia Chen, YunhongGu and Damian RoqueiroUniversity of Illinois at ChicagoLincoln Stein and ZhengZhaOntario Institute for Cancer Research*In alphabetical order
  • 35. Thank YouFor more information: www.bionimbus.orgwww.opencloudconsortium.orgwww.igsb.orgrgrossman.com