SlideShare a Scribd company logo
Improving HDFS Availability with
Hadoop RPC Quality of Service
Hadoop Summit 2015
• Hadoop performance at scale
Ming Ma
• Hadoop reliability and scalability
Twitter Hadoop Team
Chris Li
Data Platform
Who We Are
@twitterhadoop
Agenda
‣Diagnosis of Namenode Congestion
• How does QoS help?
• How to use QoS in your clusters
@twitterhadoop
Hadoop Workloads @ Twitter, ebay
• Large scale
• Thousands of machines
• Tens of thousands of jobs / day
• Diverse
• Production vs ad-hoc
• Batch vs interactive vs iterative
• Require performance isolation
@twitterhadoop
Solutions for Performance Isolation
• YARN: flexible cluster resource management
• Cross Data Center Traffic QoS
• Set QoS policy via DSCP bits in IP header
• HDFS Federation
• Cluster Separation: run high SLA jobs in another
cluster
@twitterhadoop
Unsolved Extreme Cluster Slowdown
@twitterhadoop
Unsolved Extreme Cluster Slowdown
• hadoop fs -ls takes 5+ seconds
@twitterhadoop
Unsolved Extreme Cluster Slowdown
• hadoop fs -ls takes 5+ seconds
• Worst case: cluster outage
• Namenode lost some datanode heartbeats → replication storm
@twitterhadoop
Audit Logs to the Rescue
@twitterhadoop
Audit Logs to the Rescue
• Username, operation type, date record logged for
each operation
@twitterhadoop
Audit Logs to the Rescue
• Username, operation type, date record logged for
each operation
• We automatically backup into HDFS
@twitterhadoop
(Hadoop Learning about Itself)
@twitterhadoop
Cause: Resource Monopolization
Each color is a
different user
Area is number of calls
@twitterhadoop
What’s wrong with this code?
while (true) {
fileSystem.exists("/foo");
}
@twitterhadoop
What’s wrong with this code?
while (true) {
fileSystem.exists("/foo");
}
Don’t do this at home
@twitterhadoop
What’s wrong with this code?
while (true) {
fileSystem.exists("/foo");
}
Don’t do this at home
Unless QoS is on ;)
@twitterhadoop
Bad Code + MapReduce
= DDoS on Namenode!
Namenode
Bad User
Good Users
Other Users
@twitterhadoop
Bad Code + MapReduce
= DDoS on Namenode!
Namenode
Bad User
Good Users
Other Users
@twitterhadoop
Bad Code + MapReduce
= DDoS on Namenode!
Namenode
Bad User
Good Users
Other Users
@twitterhadoop
Bad Code + MapReduce
= DDoS on Namenode!
Namenode
Bad User
Good Users
Other Users
@twitterhadoop
Bad Code + MapReduce
= DDoS on Namenode!
Namenode
Bad User
Good Users
Other Users
@twitterhadoop
Bad Code + MapReduce
= DDoS on Namenode!
Namenode
Bad User
Good Users
Other Users
@twitterhadoop
Client Process Namenode Process
RPC Server
RPC Client
DFS Client Namenode Service
Responders
NN Lock
Hadoop RPC Overview
FIFO Call Queue HandlersReaders
@twitterhadoop
Hadoop RPC Overview
FIFO Call Queue HandlersReaders
@twitterhadoop
Hadoop RPC Overview
FIFO Call Queue HandlersReaders
@twitterhadoop
Diagnosing Congestion
FIFO Call Queue
HandlersReaders
@twitterhadoop
Diagnosing Congestion
Good User
Bad User
FIFO Call Queue
HandlersReaders
@twitterhadoop
Diagnosing Congestion
HandlersReaders
Good User
Bad User
@twitterhadoop
Diagnosing Congestion
HandlersReaders
Good User
Bad User
@twitterhadoop
Diagnosing Congestion
HandlersReaders
Good User
Bad User
@twitterhadoop
Diagnosing Congestion
HandlersReaders
Good User
Bad User
@twitterhadoop
Diagnosing Congestion
HandlersReaders
Good User
Bad User
@twitterhadoop
Diagnosing Congestion
HandlersReaders
Good User
Bad User
@twitterhadoop
Diagnosing Congestion
HandlersReaders
Good User
Bad User
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
@twitterhadoop
Solutions we’ve considered
@twitterhadoop
Solutions we’ve considered
• HDFS Federation
@twitterhadoop
Solutions we’ve considered
• HDFS Federation
• Use separate RPC server for datanode requests
(service RPC)
@twitterhadoop
Solutions we’ve considered
• HDFS Federation
• Use separate RPC server for datanode requests
(service RPC)
• Namenode global lock
@twitterhadoop
Agenda
✓ Diagnosis of Namenode Congestion
‣How does QoS help?
• How to use QoS in your clusters
@twitterhadoop
Goals
• Achieve Fairness and QoS
• No performance degradation
• High throughput
• Low overhead
@twitterhadoop
Model it as a scheduling problem
• Available resource is the RPC handler thread
• Users should be given a fair share of resources
@twitterhadoop
Design Considerations
• Pluggable, configurable
• Simplifying assumptions:
• All users are equal
• All RPC calls have the same cost
• Leverage existing scheduling algorithms
@twitterhadoop
Solving Congestion with FairCallQueue
Call Queue
HandlersReaders
Good User
Bad User
Queue 0
Queue 1
Queue 2
Queue 3Scheduler
Multiplexer
@twitterhadoop
Fair Scheduling
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Good User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Good User
Call Queue
HandlersReaders
Good User
Bad User
11%
@twitterhadoop
Fair Scheduling: Good User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Good User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Good User
Call Queue
HandlersReaders
Good User
Bad User
Queue 0: < 12%
@twitterhadoop
Fair Scheduling: Good User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Good User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
80%
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
Queue 3: > 50%
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling: Bad User
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Fair Scheduling Result
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
Take 3
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
Take 2
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
@twitterhadoop
Weighted Round-Robin Multiplexing
Call Queue
HandlersReaders
Good User
Bad User
Repeat
@twitterhadoop
FairCallQueue preventing high latency
FIFO CallQueue
FairCallQueue
@twitterhadoop
RPC Backoff
• Prevents RPC queue from completely filling up
• Clients are told to wait and retry with exponential
backoff
RPC Backoff
Good User
Bad User
Call Queue
HandlersReaders
Good User
RPC Backoff
Good User
Bad User
Call Queue
HandlersReaders
Good User
RetriableException
RPC Backoff
Good User
Bad User
Call Queue
HandlersReaders
Good User
@twitterhadoop
RPC Backoff Effects
ConnectTimeoutException
ConnectTimeoutException
GoodAppLatency(ms)
0
2250
4500
6750
9000
Abusive App - number of clients - number of connections
100 x 100 1k x 1k 10k x 100 10k x 500 10k x 10k 50k x 50k
Normal FairCallQueue FairCallQueue + RPC Backoff
@twitterhadoop
Current Status
• Enabled on all Twitter and ebay production
clusters for 6+ months
• Open source availability: HADOOP-9640
• Swappable call queue in 2.4
• FairCallQueue in 2.6
• RPC Backoff in 2.8
@twitterhadoop
Agenda
✓ Diagnosis of Namenode Congestion
✓ How does QoS help?
‣How to use QoS in your clusters
@twitterhadoop
QoS is Easy to Enable
hdfs-site.xml:
<property>
<name>ipc.8020.callqueue.impl</name>
<value>org.apache.hadoop.ipc.FairCallQueue</value>
</property>
<property>
<name>ipc.8020.backoff.enable</name>
<value>true</value>
</property>
Port you want QoS on
@twitterhadoop
Future Possibilities
• RPC scheduling improvements
• Weighted share per user
• Prioritize datanode RPCs over client RPC
• Overall HDFS QoS
• Namenode fine-grained locking
• Fairness for data transfers
• HTTP based payloads such as webHDFS
@twitterhadoop
Conclusion
• Try it out!
• No more namenode congestion since it’s been
enabled at both Twitter and ebay
• Providing QoS at the RPC level is an important
step towards HDFS fine-grained QoS
@twitterhadoop
Special thanks to our reviewers:
• Arpit Agarwal (Hortonworks)
• Daryn Sharp (Yahoo)
• Andrew Wang (Cloudera)
• Benoy Antony (ebay)
• Jing Zhao (Hortonworks)
• Hiroshi Ideka (vic.co.jp)
• Eddy Xu (Cloudera)
• Steve Loughran (Hortonworks)
• Suresh Srinivas (Hortonworks)
• Kihwal Lee (Yahoo)
• Joep Rottinghuis (Twitter)
• Lohit VijayaRenu (Twitter)
@twitterhadoop
Questions and Answers
• For help setting up QoS, feature ideas, questions:
Ming Ma Chris Li
@twitterhadoop
@mingmasplace
chrili_sf@ebaysf.com
@twitterhadoop
Appendix
@twitterhadoop
FairCallQueue Data
• 37 node cluster
• 10 users runs a job which has:
• 20 Mappers, each mapper:
• Runs 100 threads. Each thread:
• Continuously calls hdfs.exists() in a tight loop
• Spikes are caused by garbage collection, a
separate issue
@twitterhadoop
Client Backoff Data
• See https://issues.apache.org/jira/secure/
attachment/12670619/
MoreRPCClientBackoffEvaluation.pdf
@twitterhadoop
Related JIRAs
• FairCallQueue + Backoff: HADOOP-9640
• Cross Data Center Traffic QoS: HDFS-5175
• nntop: HDFS-6982
• Datanode Congestion Control: HDFS-7270
• Namenode fine-grained locking: HDFS-5453
@twitterhadoop
Thoughts on Tuning
• Worth considering if you run a larger cluster or
have many users
• Make your life easier while tuning by refreshing the
queue with hadoop dfsadmin -refreshCallQueue
@twitterhadoop
Anatomy of a QoS conf key
• core-site.xml
• ipc.8020.faircallqueue.priority-levels
RPC server’s port, customize if using
non-default port / service rpc port
key: default:
@twitterhadoop
Number of Sub-queues
• More subqueues = more unique classes of service
• Recommend 10 for larger clusters
ipc.8020.faircallqueue.priority-levels 4
key: default:
@twitterhadoop
Scheduler: Decay Factor
• Controls by how much accumulated counts are
decayed by on each sweep. Larger values decay
slower.
• Ex: 1024 calls with decay factor of 0.5 will take 10
sweeps to decay assuming the user makes no
additional calls.
ipc.8020.faircallqueue.decay-scheduler.decay-factor 0.5
key: default:
@twitterhadoop
Scheduler: Sweep Period
• How many ms between each decay sweep. Smaller
is more responsive, but sweeps have overhead.
• Ex: if it takes 10 sweeps to decay and we sweep
every 5 seconds, a user’s activity will remain for
50s.
ipc.8020.faircallqueue.decay-scheduler.period-ms 5000
key: default:
@twitterhadoop
Scheduler: Thresholds
• List of floats, determines boundaries between each service class. If you
have 4 queues, you’ll have 3 bounds.
• Each number represents a percentage of total calls.
• First number is threshold for going into queue 0 (highest priority).
Second number decides queue 1 vs rest. etc.
• Recommend trying even splits (10, 20, 30, … 90) or exponential
(default)
ipc.8020.faircallqueue.decay-scheduler.thresholds 12%, 25%, 50%
key: default:
@twitterhadoop
Multiplexer: Weights
• Weights are how many times the mux will try to read from a sub-queue it
represents before moving on to the next sub-queue.
• Ex: 4,3,1 is used for 3 queues, meaning: Read up to 4 times from queue
0, Read up to 3 times from queue 1, Read once from queue 2, Repeat
• The mux controls the penalty of being in a low-priority queue.
Recommend not setting anything to 0, as starvation is possible in that
case.
ipc.8020.faircallqueue.multiplexer.weights 8,4,2,1
key: default:
@twitterhadoop
Backoff Max Attempts
• The default is equivalent to 90 seconds of retrying
• To achieve equivalent of 10 minutes of retrying, set
it to 44.
dfs.client.retry.max.attempts 10

More Related Content

PPTX
HBase in Practice
PPTX
Scaling HBase for Big Data
PDF
Scaling Hadoop at LinkedIn
PPTX
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
PDF
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
PDF
Hive tuning
PDF
Facebook Messages & HBase
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
HBase in Practice
Scaling HBase for Big Data
Scaling Hadoop at LinkedIn
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Hive tuning
Facebook Messages & HBase
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka

What's hot (20)

PDF
Hive Anatomy
PPTX
TPC-H Column Store and MPP systems
PDF
Consumer offset management in Kafka
PPTX
YARN High Availability
PDF
What is new in Apache Hive 3.0?
PPTX
Capture the Streams of Database Changes
PDF
How to use Impala query plan and profile to fix performance issues
PDF
An Apache Hive Based Data Warehouse
PPTX
Internal Hive
PDF
Hive Bucketing in Apache Spark with Tejas Patil
PDF
High Concurrency Architecture and Laravel Performance Tuning
PDF
Hadoop Strata Talk - Uber, your hadoop has arrived
PDF
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
PDF
Running Apache NiFi with Apache Spark : Integration Options
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
PPTX
High throughput data replication over RAFT
PDF
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
PPTX
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
PPTX
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hive Anatomy
TPC-H Column Store and MPP systems
Consumer offset management in Kafka
YARN High Availability
What is new in Apache Hive 3.0?
Capture the Streams of Database Changes
How to use Impala query plan and profile to fix performance issues
An Apache Hive Based Data Warehouse
Internal Hive
Hive Bucketing in Apache Spark with Tejas Patil
High Concurrency Architecture and Laravel Performance Tuning
Hadoop Strata Talk - Uber, your hadoop has arrived
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Running Apache NiFi with Apache Spark : Integration Options
Building Reliable Lakehouses with Apache Flink and Delta Lake
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
High throughput data replication over RAFT
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Ad

Viewers also liked (20)

PDF
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
PPTX
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
PPTX
Hadoop crash course workshop at Hadoop Summit
PDF
How to use Parquet as a Sasis for ETL and Analytics
PDF
Apache Lens: Unified OLAP on Realtime and Historic Data
PDF
large scale collaborative filtering using Apache Giraph
PPTX
Spark crash course workshop at Hadoop Summit
PDF
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
PPTX
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
PPTX
June 10 145pm hortonworks_tan & welch_v2
PDF
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
PPTX
Internet of things Crash Course Workshop
PDF
a Secure Public Cache for YARN Application Resources
PDF
From Beginners to Experts, Data Wrangling for All
PDF
Scaling HDFS to Manage Billions of Files with Key-Value Stores
PPTX
Internet of Things Crash Course Workshop at Hadoop Summit
PDF
Sqoop on Spark for Data Ingestion
PDF
Apache Kylin - Balance Between Space and Time
PPTX
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
PDF
Improving HDFS Availability with Hadoop RPC Quality of Service
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Hadoop crash course workshop at Hadoop Summit
How to use Parquet as a Sasis for ETL and Analytics
Apache Lens: Unified OLAP on Realtime and Historic Data
large scale collaborative filtering using Apache Giraph
Spark crash course workshop at Hadoop Summit
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
June 10 145pm hortonworks_tan & welch_v2
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Internet of things Crash Course Workshop
a Secure Public Cache for YARN Application Resources
From Beginners to Experts, Data Wrangling for All
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Internet of Things Crash Course Workshop at Hadoop Summit
Sqoop on Spark for Data Ingestion
Apache Kylin - Balance Between Space and Time
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Improving HDFS Availability with Hadoop RPC Quality of Service
Ad

Similar to Improving HDFS Availability with IPC Quality of Service (20)

PDF
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
PPTX
Visual Mapping of Clickstream Data
PDF
The Hadoop Guarantee: Keeping Analytics Running On Time
PDF
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
PPTX
Copy data management
PDF
The practice of big data - making big data approachable
PDF
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
PDF
Hadoop Application Architectures - Fraud Detection
PDF
Hadoop and Mapreduce Certification
PPTX
Securing the Hadoop Ecosystem
PDF
MPP vs Hadoop
PPTX
Hadoop 2 @ Twitter, Elephant Scale
PPTX
Hadoop 2 @Twitter, Elephant Scale. Presented at
PDF
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
PDF
Hadoop and the Data Warehouse: Point/Counter Point
PDF
How you can benefit from using Redis - Ramirez
PPTX
The Travelling Pentester: Diaries of the Shortest Path to Compromise
PPTX
Practice of large Hadoop cluster in China Mobile
PDF
Nephele 2.0: How to get the most out of your Nephele results
PDF
Smart networking with service meshes
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Visual Mapping of Clickstream Data
The Hadoop Guarantee: Keeping Analytics Running On Time
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Copy data management
The practice of big data - making big data approachable
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Hadoop Application Architectures - Fraud Detection
Hadoop and Mapreduce Certification
Securing the Hadoop Ecosystem
MPP vs Hadoop
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @Twitter, Elephant Scale. Presented at
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Hadoop and the Data Warehouse: Point/Counter Point
How you can benefit from using Redis - Ramirez
The Travelling Pentester: Diaries of the Shortest Path to Compromise
Practice of large Hadoop cluster in China Mobile
Nephele 2.0: How to get the most out of your Nephele results
Smart networking with service meshes

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Getting Started with Data Integration: FME Form 101
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
August Patch Tuesday
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Machine learning based COVID-19 study performance prediction
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
A Presentation on Artificial Intelligence
Getting Started with Data Integration: FME Form 101
Spectral efficient network and resource selection model in 5G networks
August Patch Tuesday
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Tartificialntelligence_presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
SOPHOS-XG Firewall Administrator PPT.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
1. Introduction to Computer Programming.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Machine learning based COVID-19 study performance prediction
Heart disease approach using modified random forest and particle swarm optimi...
cloud_computing_Infrastucture_as_cloud_p
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
A Presentation on Artificial Intelligence

Improving HDFS Availability with IPC Quality of Service