SlideShare a Scribd company logo
Software Engineering |  30 Apr 2024 |  19 min
Database Sharding: Everything You Need to Know
Shubham Alavni
Senior Software Engineer
 
Shubham Alavni is a Senior Software Engineer at Nitor
Infotech. With a seasoned expertise in web
development, he has left his mark on diverse... Read
More
Imagine this: your latest application is booming with daily active users, more features are
being added, and data seems to pile up by every second. Although this may sound like a
great success but deep down, it’s not as your database performance can be hampered. So,
to keep up with the data load and other bottlenecks, Database Sharding stands out as the
best solution.
In this blog, I’ll provide a clear understanding of database sharding, its architecture, and
advantages. Apart from these, you’ll also get a peek into real-life scenarios and use cases
where database sharding shines.
Let’s get started with the basics!
Understanding Sharding
Sharding, derived from the term “shard,” signifies a fraction of a complete entity, and is a
technique used in database management. It involves the division of a large database into
smaller, more manageable units, a process also known as “horizontal scaling” (something
you will explore in a while). This approach involves splitting the rows of a single table into
distinct tables known as “shards.”
Despite maintaining identical schema and columns, each shard houses different rows,
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
Accept Cookie policy
ensuring that the data within each shard remains unique and non-overlapping. This method
effectively addresses the constraints of a single database by segmenting the data into
smaller portions and dispersing them across multiple database servers.
For example, it’s like having smaller buckets instead of one big bucket to carry water – each
bucket is easier to manage than one large, heavy bucket.
Keep reading to know how database sharding can help you.
Benefits of Database Sharding
Database Sharding offers several advantages:
 Improved Scalability: Sharding allows you to add more servers to your database, spreading the
load and enabling more traffic and faster processing. This contrasts with the traditional method of
scaling up, which involves adding more resources to a single server.
 Increased Operation Capacity: By distributing your database into multiple shards, you can
increase both read and write operation capacity if the tasks are performed within one shard at a
time.
 Expanded Storage Capacity: It also increases the storage capacity of your database, potentially
achieving nearly infinite storage capacity.
 High Availability: If one shard goes down, the other shards can still be accessed. Thus, preventing a
total system shutdown.
Onwards to the two different scaling techniques of database sharding.
Techniques of Scaling Database Sharding: Vertical
Partitioning vs Horizontal Partitioning
Here’s a tabular comparison of vertical versus horizontal partitioning.
Aspect Vertical Partitioning Horizontal Partitioning
Criteria Divides a table based on columns. Divides a table based on rows.
Suitability
Useful for tables with many
columns, where some columns are
rarely used.
Useful for table with many rows,
where data can be divided based on
some criteria.
Performance
Improvement
Improves query performance by
reducing I/O and allowing
efficient indexing of relevant
columns.
Improves query performance by
reducing the number of rows to be
scanned for specific queries.
Requirements
May require joins to retrieve data
from multiple partitions.
Joins between partitions are typically
not required because they contain
disjoint sets of rows.
Example
A table with 100 columns, where 20
columns are frequently accessed,
and 80 columns are rarely
accessed.
A table with 1 billion rows, where 300
million rows are accessed frequently,
and 700 million rows are rarely
accessed.
To get you some clarity, here’s how vertical and horizontal partitions would appear in
contrast to the original table:
Fig: Original vs. Vertical vs. Horizontal Partitions
I’m confident that you are clear with initials of database sharing and now you want to learn
more about how it works and how you can use it in the right manner. For that, get the
answers in the next sections!
Architecture of Sharding
After deciding to shard your database, the next step is to determine how to implement it.
This involves the critical process of running queries or distributing incoming data to sharded
tables or databases, ensuring data goes to the appropriate shard to prevent data loss or slow
queries.
In the following section, we will discuss several prevalent sharding architectures:
1. Key-Based Sharding:
This is also known as hash-based sharding. It uses a hash function to distribute data across
shards. A specific data value, such as a user ID, IP address, ZIP code, or Region, is used as
input to the hash function. The output is a Shard ID, which determines where the data will be
stored.
The data value used in the hash function is called the Shard key. The Shard key should be a
static column, like a primary key, to ensure consistent data distribution and efficient update
operations.
Note: However, key-based sharding can complicate the process of adding or removing
database servers. As servers change, the data must be remapped and migrated. This can be
an expensive and time-consuming process, potentially causing system downtime.
Despite its challenges, key-based sharding is popular for evenly distributing data across
shards and minimizing the risk of database hotspots, ensuring balanced workloads.
Fig: Key-Based Sharding
2. Range-Based Sharding:
This is a technique that divides data into shards based on a specific range of values. For
example, in a product’s database, the products could be sharded based on their price
ranges. Products with prices between $0 and $100 could be stored in one shard, while
products with prices between $100 and $200 could be stored in another shard.
ange-based sharding is a simple and straightforward method. Each shard contains a unique
set of data but maintains the same schema as the original database. The application then
decides the range of the data and writes it to the correct shard.
Note: Range-based sharding can cause uneven data distribution, leading to “database
hotspots” where some shards receive more traffic than others. This can result in
performance issues, slow queries, and imbalanced workloads. For example, a shard
containing products with prices between $0 and $100 may receive more traffic than a shard
containing products with prices between $100 and $200.
Fig: Range-Based Sharding
3. Directory-Based Sharding:
This is a database strategy that utilizes a lookup table to dictate data storage locations. It
assigns each key to a specific shard, using the lookup table that contains fixed data location
information. It is adaptable and simplifies the process of adding new shards.
For example, consider a lookup table with columns for Delivery Zone and Shard ID. The
Delivery Zone column serves as the Shard key, directing data from a particular delivery zone
to the corresponding shard ID in the lookup table.
Directory-based sharding provides flexible data distribution, efficient query routing, and
dynamic scalability. It uses a central directory for managing data-to-shard mapping,
optimizing query performance, and enabling efficient load balancing. The system can scale
dynamically by modifying the number of shards, without affecting the application logic. Thus,
easily adapting to changing needs and workloads.
Note: However, it can potentially slow down operations due to lookup table access for each
query or write. It can also create a single point of failure, making the entire database
inaccessible if the lookup table fails. Using a distributed lookup table can mitigate this but
adds system complexity.
Fig: Directory-Based Sharding
Onwards to the use cases!
Use Cases of Database Sharding
Sharding finds common application in the following scenarios:
 E-commerce Platforms: These platforms deal with large volumes of product data, customer data,
and order data. Here, harding helps distribute the load across multiple servers and improve
performance.
 Social Media Platforms: With billions of users and large amounts of user-generated content,
sharding helps these platforms manage data effectively.
 Gaming Platforms: Real-time data management for millions of players in online multiplayer games
benefits from sharding, as it distributes the load and boosts performance.
To get you an in-depth clarity about its use case, let’s look at a particular scenario!
SCENARIO: DATABASE SHARDING FOR SCALABILITY
Envision that you’re architecting a user account management system for an application. To
address scalability and performance challenges, you need to choose to distribute the user
data across multiple database shards.
You can select directory-based sharding, utilizing the country_code as the key attribute for
sharding. The country_code is a three-letter code representing each country. A lookup table
can be used to store the mapping of each country_code to its corresponding shard_id.
Here are the steps that you can follow:
Step 1: Determine the number of shards
Assuming the application is used in 3 countries, we’ll use 3 shards.
Step 2: Lookup table for mapping country_code to shard_id
 We’ll create a lookup table to store the mapping of country_code to shard_id. The table will have two
columns: country_code and shard_id.
 The country_code column will store the three-letter code for each country.
 country_code example: South Korea (KOR), Thailand (THA), and Malaysia (MYS).
country_code shard_id
KOR 1
THA 2
MYS 3
Step 3: Handling the queries
 We’ll demonstrate how a user goes through the process of signing up a new user and how to
choose the correct shard based on the country_code of the user.
 We will also show how to choose the correct shard to fetch user data from the database while
signing in the user.
Step 4: Basic Implementation of Database Sharding in Ruby on Rails
Framework (6.1+)
1. First, let’s set up our Rails application with multiple databases. In config/database.yml,
we’ll define our shards:
Next, we’ll make changes to the ApplicationRecord class to connect to the primary and
replica databases, and to the Shard model to connect to the shard databases. We’ll also
define a method to choose the correct shard based on the country_code in this manner:
2. When a new user signs up, we need to choose the correct shard based on the user’s
country_code. We can do this by using the connected_to method to connect to the correct
shard and then create the user.
3. To choose the correct shard for sser sign-in, we need to choose the correct shard to fetch
the user data from the database.
It’s important to note that the above is a simplified example. In a real-world scenario, you
would need to consider additional factors such as data consistency, replication, and failover.
After all this explanation, you might be asking yourself – “Should I shard my database or
not?”, right?
Keep reading to know the conditions when you can consider sharding!
Factors to be considered before Sharding
Consider the following factors before deciding to shard your database:
 Database Size: Sharding is typically used for large databases that have outgrown the capacity of a
single server.
 Traffic Patterns: If your database experiences uneven traffic patterns, sharding may be beneficial.
 Growth Projections: If your database is expected to scale significantly in the future, sharding may
be a good option.
 Complexity: Sharding adds complexity to your database architecture and requires careful planning
and maintenance.
 Cost: Sharding can be expensive, as it requires additional hardware resources and infrastructure to
support multiple servers.
Note: Sharded databases can increase latency by needing a unique service to direct
queries. They can also raise maintenance by requiring upkeep of shards and additional
nodes, along with syncing data updates if replication is used.
So, database sharding has both its perks and challenges, and you can decide if it suits your
application’s needs.
To know more about database management, reach out to us at Nitor Infotech.
Dive into the experience of building world class software products
in 2024.
Download Datasheet
Table of contents
Understanding Sharding
Benefits of Database Sharding
Techniques of Scaling Database Sharding: Vertical Partitioning vs Horizontal Partitioning
Architecture of Sharding
Use Cases of Database Sharding
SCENARIO: DATABASE SHARDING FOR SCALABILITY
Factors to be considered before Sharding
Related Blogs
10+ Tips to Optimize Your Angular Application Performance
BDD: Your secret weapon for building better software
Product Testing Completeness with GenAI: A Comprehensive Overview
 Previous Blog Next Blog 
Recent Blogs
10+ Tips to Optimize
Your Angular
Application
Performance
Software Engineering
Pandas vs. PySpark:
Comparing Modern
Python Data
Processing
Paradigms
Big Data and Analytics
BDD: Your secret
weapon for building
better software
Software Engineering
Subscribe to our
fortnightly newsletter!
we'll keep you in the loop with everything that's trending in the tech world.

Nitor Infotech, an Ascendion company, is an ISV preferred IT software product development
services organization. We serve cutting edge Gen-AI powered services and solutions for the
web, Cloud, data, and devices. Nitor’s consulting-driven value engineering approach makes it
the right fit to be an agile and nimble partner to organizations on the path to digital
transformation.
Armed with a digitalization strategy, we build disruptive solutions for businesses through
innovative, readily deployable, and customizable accelerators and frameworks.
COMPANY
About Us
Leadership
PR &
Events
Career
Contact
Us
INSIGHTS
Blogs
Podcast
Videos
TechKnowpedia
INDUSTRIES
Healthcare
BFSI
Retail
Manufacturing
Supply
Chain
TECHNOLOGIES
AI & ML
Generative
AI
Blockchain
Big Data &
Analytics
Cloud &
DevOps
IoT
SERVICES
Idea To MVP
Product
Engineering
Platform
Engineering
Prompt
Engineering
Research As A
Service
Peer Product
Management
Quality
Engineering
Product
Modernization
Mobile App
Development
Web App
Development
UX
Engineering
Cloud
Migration
GET IN TOUCH
900 National Pkwy, Suite 210,
Schaumburg, IL 60173,
USA
marketing@nitorinfotech.com
+1 (224) 265-7110
     
SUBSCRIBE
Subscribe to our newsletter & stay updated

Enter Email Address
© 2024 Nitor Infotech All rights reserved Terms Of Usage Privacy Policy Cookie Policy

More Related Content

PPTX
Understanding Database Sharding and Partitioning
DOCX
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
PDF
What is Scalability and How can affect on overall system performance of database
DOCX
Cassandra data modelling best practices
PDF
Massive sacalabilitty with InterSystems IRIS Data Platform
PDF
Scaling apps using azure cloud services
PPTX
No sql database
PDF
Lecture4 big data technology foundations
Understanding Database Sharding and Partitioning
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
What is Scalability and How can affect on overall system performance of database
Cassandra data modelling best practices
Massive sacalabilitty with InterSystems IRIS Data Platform
Scaling apps using azure cloud services
No sql database
Lecture4 big data technology foundations

Similar to Database Sharding: Complete understanding (20)

PPTX
Hadoop Integration with Microstrategy
PDF
Data management in cloud study of existing systems and future opportunities
PDF
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
PPTX
Introduction to Big Data
PDF
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
PDF
Hybrid Database System for Big Data Storage and Management
PDF
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
DOCX
PDF
Enterprise Data Lake
PDF
Enterprise Data Lake - Scalable Digital
DOCX
Microsoft Fabric data warehouse by dataplatr
DOCX
Report 1.0.docx
PDF
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
DOCX
Data warehouse 2.0 and sql server architecture and vision
DOCX
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
DOCX
Report 2.0.docx
PPT
CouchBase The Complete NoSql Solution for Big Data
PPTX
Exploring Microsoft Azure Infrastructures
 
PDF
Steps to Modernize Your Data Ecosystem | Mindtree
Hadoop Integration with Microstrategy
Data management in cloud study of existing systems and future opportunities
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
Introduction to Big Data
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
Hybrid Database System for Big Data Storage and Management
HYBRID DATABASE SYSTEM FOR BIG DATA STORAGE AND MANAGEMENT
Enterprise Data Lake
Enterprise Data Lake - Scalable Digital
Microsoft Fabric data warehouse by dataplatr
Report 1.0.docx
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
Module 2.2 Introduction to NoSQL Databases.pptx
Data warehouse 2.0 and sql server architecture and vision
LEGO EMBRACING CHANGE BY COMBINING BI WITH FLEXIBLE INFORMATION SYSTEM
Report 2.0.docx
CouchBase The Complete NoSql Solution for Big Data
Exploring Microsoft Azure Infrastructures
 
Steps to Modernize Your Data Ecosystem | Mindtree
Ad

More from servicesNitor (19)

PDF
Why Variational Autoencoders Matter in Modern AI
PDF
Unlock Your Dream Career in IT with Nitor Infotech
PDF
Getting Started with Microservices – Part 2
PDF
Nitor Infotech: Future of Product Engineering
PDF
What is hybrid mobile app development? | Nitor Infotech
PDF
Hands-on with Apache Druid: Installation & Data Ingestion Steps
PDF
Cloud Migration Services | Nitor Infotech
PDF
How Mulesoft Enhances Data Connectivity Across Platforms?
PDF
a guide to install rasa and rasa x | Nitor Infotech
PDF
five best practices for technical writing
PDF
How to integrate salesforce data with azure data factory
PDF
substrate: A framework to efficiently build blockchains
PDF
The three stages of Power BI Deployment Pipeline
PDF
IP Centric Solutioning Whitepaper | Nitor Infotech
PDF
Quality engineering Services | Nitor Infotech
PDF
Cloud and devops.pdf
PDF
Product engineering services_seo.pdf
PDF
02.pdf (2).pdf
PDF
Regression Testing How It Works (1).pdf
Why Variational Autoencoders Matter in Modern AI
Unlock Your Dream Career in IT with Nitor Infotech
Getting Started with Microservices – Part 2
Nitor Infotech: Future of Product Engineering
What is hybrid mobile app development? | Nitor Infotech
Hands-on with Apache Druid: Installation & Data Ingestion Steps
Cloud Migration Services | Nitor Infotech
How Mulesoft Enhances Data Connectivity Across Platforms?
a guide to install rasa and rasa x | Nitor Infotech
five best practices for technical writing
How to integrate salesforce data with azure data factory
substrate: A framework to efficiently build blockchains
The three stages of Power BI Deployment Pipeline
IP Centric Solutioning Whitepaper | Nitor Infotech
Quality engineering Services | Nitor Infotech
Cloud and devops.pdf
Product engineering services_seo.pdf
02.pdf (2).pdf
Regression Testing How It Works (1).pdf
Ad

Recently uploaded (20)

PDF
Cost to Outsource Software Development in 2025
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
System and Network Administration Chapter 2
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Introduction to Artificial Intelligence
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Transform Your Business with a Software ERP System
PPTX
assetexplorer- product-overview - presentation
Cost to Outsource Software Development in 2025
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Odoo POS Development Services by CandidRoot Solutions
System and Network Administration Chapter 2
Why Generative AI is the Future of Content, Code & Creativity?
Navsoft: AI-Powered Business Solutions & Custom Software Development
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Upgrade and Innovation Strategies for SAP ERP Customers
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Introduction to Artificial Intelligence
wealthsignaloriginal-com-DS-text-... (1).pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Which alternative to Crystal Reports is best for small or large businesses.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Transform Your Business with a Software ERP System
assetexplorer- product-overview - presentation

Database Sharding: Complete understanding

  • 1. Software Engineering |  30 Apr 2024 |  19 min Database Sharding: Everything You Need to Know Shubham Alavni Senior Software Engineer   Shubham Alavni is a Senior Software Engineer at Nitor Infotech. With a seasoned expertise in web development, he has left his mark on diverse... Read More Imagine this: your latest application is booming with daily active users, more features are being added, and data seems to pile up by every second. Although this may sound like a great success but deep down, it’s not as your database performance can be hampered. So, to keep up with the data load and other bottlenecks, Database Sharding stands out as the best solution. In this blog, I’ll provide a clear understanding of database sharding, its architecture, and advantages. Apart from these, you’ll also get a peek into real-life scenarios and use cases where database sharding shines. Let’s get started with the basics! Understanding Sharding Sharding, derived from the term “shard,” signifies a fraction of a complete entity, and is a technique used in database management. It involves the division of a large database into smaller, more manageable units, a process also known as “horizontal scaling” (something you will explore in a while). This approach involves splitting the rows of a single table into distinct tables known as “shards.” Despite maintaining identical schema and columns, each shard houses different rows, We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Accept Cookie policy
  • 2. ensuring that the data within each shard remains unique and non-overlapping. This method effectively addresses the constraints of a single database by segmenting the data into smaller portions and dispersing them across multiple database servers. For example, it’s like having smaller buckets instead of one big bucket to carry water – each bucket is easier to manage than one large, heavy bucket. Keep reading to know how database sharding can help you. Benefits of Database Sharding Database Sharding offers several advantages:  Improved Scalability: Sharding allows you to add more servers to your database, spreading the load and enabling more traffic and faster processing. This contrasts with the traditional method of scaling up, which involves adding more resources to a single server.  Increased Operation Capacity: By distributing your database into multiple shards, you can increase both read and write operation capacity if the tasks are performed within one shard at a time.  Expanded Storage Capacity: It also increases the storage capacity of your database, potentially achieving nearly infinite storage capacity.  High Availability: If one shard goes down, the other shards can still be accessed. Thus, preventing a total system shutdown. Onwards to the two different scaling techniques of database sharding. Techniques of Scaling Database Sharding: Vertical Partitioning vs Horizontal Partitioning Here’s a tabular comparison of vertical versus horizontal partitioning. Aspect Vertical Partitioning Horizontal Partitioning Criteria Divides a table based on columns. Divides a table based on rows. Suitability Useful for tables with many columns, where some columns are rarely used. Useful for table with many rows, where data can be divided based on some criteria. Performance Improvement Improves query performance by reducing I/O and allowing efficient indexing of relevant columns. Improves query performance by reducing the number of rows to be scanned for specific queries. Requirements May require joins to retrieve data from multiple partitions. Joins between partitions are typically not required because they contain disjoint sets of rows. Example A table with 100 columns, where 20 columns are frequently accessed, and 80 columns are rarely accessed. A table with 1 billion rows, where 300 million rows are accessed frequently, and 700 million rows are rarely accessed. To get you some clarity, here’s how vertical and horizontal partitions would appear in contrast to the original table:
  • 3. Fig: Original vs. Vertical vs. Horizontal Partitions I’m confident that you are clear with initials of database sharing and now you want to learn more about how it works and how you can use it in the right manner. For that, get the answers in the next sections! Architecture of Sharding After deciding to shard your database, the next step is to determine how to implement it. This involves the critical process of running queries or distributing incoming data to sharded tables or databases, ensuring data goes to the appropriate shard to prevent data loss or slow queries. In the following section, we will discuss several prevalent sharding architectures: 1. Key-Based Sharding: This is also known as hash-based sharding. It uses a hash function to distribute data across shards. A specific data value, such as a user ID, IP address, ZIP code, or Region, is used as input to the hash function. The output is a Shard ID, which determines where the data will be stored. The data value used in the hash function is called the Shard key. The Shard key should be a static column, like a primary key, to ensure consistent data distribution and efficient update operations. Note: However, key-based sharding can complicate the process of adding or removing
  • 4. database servers. As servers change, the data must be remapped and migrated. This can be an expensive and time-consuming process, potentially causing system downtime. Despite its challenges, key-based sharding is popular for evenly distributing data across shards and minimizing the risk of database hotspots, ensuring balanced workloads. Fig: Key-Based Sharding 2. Range-Based Sharding: This is a technique that divides data into shards based on a specific range of values. For example, in a product’s database, the products could be sharded based on their price ranges. Products with prices between $0 and $100 could be stored in one shard, while products with prices between $100 and $200 could be stored in another shard. ange-based sharding is a simple and straightforward method. Each shard contains a unique
  • 5. set of data but maintains the same schema as the original database. The application then decides the range of the data and writes it to the correct shard. Note: Range-based sharding can cause uneven data distribution, leading to “database hotspots” where some shards receive more traffic than others. This can result in performance issues, slow queries, and imbalanced workloads. For example, a shard containing products with prices between $0 and $100 may receive more traffic than a shard containing products with prices between $100 and $200. Fig: Range-Based Sharding 3. Directory-Based Sharding: This is a database strategy that utilizes a lookup table to dictate data storage locations. It assigns each key to a specific shard, using the lookup table that contains fixed data location information. It is adaptable and simplifies the process of adding new shards. For example, consider a lookup table with columns for Delivery Zone and Shard ID. The Delivery Zone column serves as the Shard key, directing data from a particular delivery zone to the corresponding shard ID in the lookup table. Directory-based sharding provides flexible data distribution, efficient query routing, and dynamic scalability. It uses a central directory for managing data-to-shard mapping, optimizing query performance, and enabling efficient load balancing. The system can scale dynamically by modifying the number of shards, without affecting the application logic. Thus, easily adapting to changing needs and workloads. Note: However, it can potentially slow down operations due to lookup table access for each query or write. It can also create a single point of failure, making the entire database inaccessible if the lookup table fails. Using a distributed lookup table can mitigate this but adds system complexity.
  • 6. Fig: Directory-Based Sharding Onwards to the use cases! Use Cases of Database Sharding Sharding finds common application in the following scenarios:  E-commerce Platforms: These platforms deal with large volumes of product data, customer data, and order data. Here, harding helps distribute the load across multiple servers and improve performance.  Social Media Platforms: With billions of users and large amounts of user-generated content, sharding helps these platforms manage data effectively.  Gaming Platforms: Real-time data management for millions of players in online multiplayer games benefits from sharding, as it distributes the load and boosts performance. To get you an in-depth clarity about its use case, let’s look at a particular scenario! SCENARIO: DATABASE SHARDING FOR SCALABILITY Envision that you’re architecting a user account management system for an application. To address scalability and performance challenges, you need to choose to distribute the user data across multiple database shards. You can select directory-based sharding, utilizing the country_code as the key attribute for sharding. The country_code is a three-letter code representing each country. A lookup table can be used to store the mapping of each country_code to its corresponding shard_id.
  • 7. Here are the steps that you can follow: Step 1: Determine the number of shards Assuming the application is used in 3 countries, we’ll use 3 shards. Step 2: Lookup table for mapping country_code to shard_id  We’ll create a lookup table to store the mapping of country_code to shard_id. The table will have two columns: country_code and shard_id.  The country_code column will store the three-letter code for each country.  country_code example: South Korea (KOR), Thailand (THA), and Malaysia (MYS). country_code shard_id KOR 1 THA 2 MYS 3 Step 3: Handling the queries  We’ll demonstrate how a user goes through the process of signing up a new user and how to choose the correct shard based on the country_code of the user.  We will also show how to choose the correct shard to fetch user data from the database while signing in the user. Step 4: Basic Implementation of Database Sharding in Ruby on Rails Framework (6.1+) 1. First, let’s set up our Rails application with multiple databases. In config/database.yml, we’ll define our shards:
  • 8. Next, we’ll make changes to the ApplicationRecord class to connect to the primary and replica databases, and to the Shard model to connect to the shard databases. We’ll also define a method to choose the correct shard based on the country_code in this manner:
  • 9. 2. When a new user signs up, we need to choose the correct shard based on the user’s country_code. We can do this by using the connected_to method to connect to the correct shard and then create the user. 3. To choose the correct shard for sser sign-in, we need to choose the correct shard to fetch the user data from the database.
  • 10. It’s important to note that the above is a simplified example. In a real-world scenario, you would need to consider additional factors such as data consistency, replication, and failover. After all this explanation, you might be asking yourself – “Should I shard my database or not?”, right? Keep reading to know the conditions when you can consider sharding! Factors to be considered before Sharding Consider the following factors before deciding to shard your database:  Database Size: Sharding is typically used for large databases that have outgrown the capacity of a single server.  Traffic Patterns: If your database experiences uneven traffic patterns, sharding may be beneficial.  Growth Projections: If your database is expected to scale significantly in the future, sharding may be a good option.  Complexity: Sharding adds complexity to your database architecture and requires careful planning and maintenance.  Cost: Sharding can be expensive, as it requires additional hardware resources and infrastructure to support multiple servers. Note: Sharded databases can increase latency by needing a unique service to direct queries. They can also raise maintenance by requiring upkeep of shards and additional nodes, along with syncing data updates if replication is used. So, database sharding has both its perks and challenges, and you can decide if it suits your application’s needs. To know more about database management, reach out to us at Nitor Infotech. Dive into the experience of building world class software products in 2024. Download Datasheet Table of contents Understanding Sharding Benefits of Database Sharding Techniques of Scaling Database Sharding: Vertical Partitioning vs Horizontal Partitioning Architecture of Sharding
  • 11. Use Cases of Database Sharding SCENARIO: DATABASE SHARDING FOR SCALABILITY Factors to be considered before Sharding Related Blogs 10+ Tips to Optimize Your Angular Application Performance BDD: Your secret weapon for building better software Product Testing Completeness with GenAI: A Comprehensive Overview  Previous Blog Next Blog  Recent Blogs 10+ Tips to Optimize Your Angular Application Performance Software Engineering Pandas vs. PySpark: Comparing Modern Python Data Processing Paradigms Big Data and Analytics BDD: Your secret weapon for building better software Software Engineering
  • 12. Subscribe to our fortnightly newsletter! we'll keep you in the loop with everything that's trending in the tech world.  Nitor Infotech, an Ascendion company, is an ISV preferred IT software product development services organization. We serve cutting edge Gen-AI powered services and solutions for the web, Cloud, data, and devices. Nitor’s consulting-driven value engineering approach makes it the right fit to be an agile and nimble partner to organizations on the path to digital transformation. Armed with a digitalization strategy, we build disruptive solutions for businesses through innovative, readily deployable, and customizable accelerators and frameworks. COMPANY About Us Leadership PR & Events Career Contact Us INSIGHTS Blogs Podcast Videos TechKnowpedia INDUSTRIES Healthcare BFSI Retail Manufacturing Supply Chain TECHNOLOGIES AI & ML Generative AI Blockchain Big Data & Analytics Cloud & DevOps IoT SERVICES Idea To MVP Product Engineering Platform Engineering Prompt Engineering Research As A Service Peer Product Management Quality Engineering Product Modernization Mobile App Development Web App Development UX Engineering Cloud Migration GET IN TOUCH 900 National Pkwy, Suite 210, Schaumburg, IL 60173, USA [email protected] +1 (224) 265-7110       SUBSCRIBE Subscribe to our newsletter & stay updated  Enter Email Address
  • 13. © 2024 Nitor Infotech All rights reserved Terms Of Usage Privacy Policy Cookie Policy