SlideShare a Scribd company logo
10053 – NULL is not “NOTHING”
“How many time do you ignore NULL and replace it with a special value?”
Have you ever do that before? Or maybe you are doing it now?
Please, stop it now! Take a sit and read this small example before you continue what you are doing.
Mr. A is working for some research company and his job is to enter employee’s details into the database. He is
working with PERSON table. These are the details of that table:
-

It has 10,000 rows with 4 columns
Column NAME (varchar) has 10,000 distinct values, it holds the employee’s name
Column BIRTH (number) has 12 distinct values, it represents employee’s month of birth
Column ID (number) has 10,000 distinct values, it holds employee’s ID
Column CATEGORY (number) has 1,000 distinct values, it holds employee’s category

Here is the situation:
-

Mr. A finds that 5 rows in the CATEGORY do not have any number yet (it should be any value between
10,001 and 11,000)
Since his manager is not in the office, Mr. A decides to store “0” for those un-defined employee’s
category (“0” is the special value in this case)
As the result, column CATEGORY has 1,001 distinct values.

Is he doing something right, or wrong?
Let’s answer that question using below example  Let’s build the word 
p.s. Another good example: people uses “1 January 1900” as date of birth. Who live for 100 years now?
Start the Exercise
I ran 15 scenarios for this exercise and the table of comparison can be found below.Later I will go through the
scenarios one by one. In general there are 3 categories for the scenarios:
1. SELECT * FROM person WHERE category <= 10001
 Query with “special values”in the range.
2. SELECT * FROM person WHERE category >= 11000
 Query without “special values” in the range with bounded – closed predicate.
3. SELECT * FROM person WHERE category BETWEEN 10999 AND 11000
 Query without “special values” in the range with bounded – closed – closed predicate.
The main objective of this exercise is to show how Oracle calculate cardinality in below situations:
-

When there is no statistics (to see the impact when dynamic sampling feature is turn on and turn off)
When there is “special value” (I use “0” as the“special value”) which has an extreme range from the
lower bound of the data
When there is a histogram in the column
The impact of bucket’ size of histogram

autotrace
outputs.zip

10053 trace files.zip
The Scenarios
Category 1, query with less than or equal
1. With Special Value and Without Statistics.
Without statistics in the table, Oracle uses dynamic sampling to get the statistics during run time (dynamic
sampling is enabled by default). This can be seen in the output of auto trace or 10053 trace file.With only
10K rows in the table, the sample size is 100% (all rows). During dynamic sampling process, Oracle runs the
following query which gives the correct answer in this case. For bigger table, things can be different.
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB)
opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */
NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("PERSON")
FULL("PERSON") NO_PARALLEL_INDEX("PERSON") */ 1 AS C1, CASE WHEN "PERSON"."CATEGORY"<=10001 THEN 1
ELSE 0 END AS C2 FROM "PERSON" "PERSON") SAMPLESUB;
NVL(SUM(C1),0) NVL(SUM(C2),0)
-------------- -------------10000
15

Dynamic sampling level
Sample size in % and number of row
Original and computed cardinality
after predicate is applied
In the above example, Oracle calculates the cardinality perfectly when dynamic sampling feature is turned
on. As comparison, when dynamic sampling is disabled, the calculation is far from perfect. It can be seen in
below example, the cardinality went to 176 and the cost of full table scan (11.22) is very close to cost of
index scan (10.01) which is critical since Oracle can be easily switches to full table scan for different
predicate (in this example, index range scan is more efficient).

nostats_nodyn.LST

orcl10_ora_6584_no
stats_nodyn.trc

Lastly, when the table is big (with a lot of rows), the sample size can be different. In the next example, the
same query is executed against table with 10,000,000 rows in it and based on the calculation, Oracle uses
full table scan instead of more efficient index range scan and calculation of cardinality is wrong.

nostats_big.LST

orcl10_ora_4008_no
stats_big.trc
2. With Special Value and Statisticsbut Without Histogram.
In the second scenario, I gather both table and index statistics using below command.

In the absent of histogram in the column as we can see above, the density is simply 1/(num_distinct).
Another interesting fact is that Oracle decides to use all rows as sample (the same effectwhen we use
estimate_percent=>100 in dbms_stats.gather_table_stats).
The cardinality calculation in this scenario is the worst of all, computed cardinality is 9,102 and Oracle uses
full table scan as the access method. This is happened because of skewed data in column CATEGORY (data
is not evenly distributed).

We can create histogram on the skewed column so that Oracle will be able to calculate the cardinality
better than before.
3. With Special Value, Statistics and Histogram of Bucket 50.
I will combine the explanation of scenario 3 and 4, since the different is only in the number of bucket in the
histogram.
4. With Special Value, Statisticsand Histogram of Bucket 250.
In these scenarios, I create histogram on CATEGORY column with 50 and 250 buckets. With 50 buckets,
Oracle calculates the cardinality as 200 (10,000 * 0.019962) and with 250 buckets, Oracle calculates the
cardinality as 40 (10,000 * 0.0039988)  values in the bracket are taken from 10053 trace file. So when
we have more bucket, the calculation of cardinality is likely getting closer to the actual number of filtered
rows.
For histogram with 50 buckets, the calculation of cardinality and selectivity is like below:
Selectivity = ((required range) / (high value – low value) + density) / number of bucket
= ((10,001 – 0) / (10,020 – 0) + 0.0009993) / 50
= (0.9981038 + 0.0009993) / 50
= 0.9991031 / 50
= 0.019982
Cardinality = selectivity * number of rows
= 0.019982* 10,000
= 199.82 = 200
The calculation of cardinality when the number of bucket is increased to 250 is 39.96, and here is the
details:
Selectivity = ((required range) / (high value – low value) + density) / number of bucket
= ((10,001 – 0) / (10,020 – 0) + 0.0009993) / 250
= (0.9981038 + 0.0009993) / 250
= 0.9991031 / 250
= 0.003996
Cardinality = selectivity * number of rows
= 0.003996 * 10,000
= 39.96 = 40
In above 2 scenarios (when there is bucket), Oracle can perfectly uses index range scan as the access
method.

Cardinality

Selectivity

5. With NULL and Statisticsbut Without Histogram.
Next we will update the “0” records with NULL value (which is good and recommended way for storing undefined value). This method will not create a huge gap in the data distribution.
Now we can see Oracle calculates the cardinality perfectly and also chooses index range scan as the access
method. The output of 10053 trace file is like below, and since there is no histogram in the column, density
will be 1/(num_distinct)= 1/1,000 = 0.001.

Selectivity = ((10,001 – 10,001) / (11,000 – 10,001) + 0.001)
= (0 + 0.001)
= 0.001
Cardinality = 0.001 * 10,000
= 10
6. With NULL, Statisticsand Histogram of Bucket 50.
If we have evenly distributed data with normal distribution and no popular value, histogram is not
neededin that kind of column, because in most of the cases Oracle is able to calculate the cardinality
correctly.In this scenario, histogram is created with 50 buckets and Oracle calculates the cardinality
perfectly. But if we take a look in the output of 10053 trace file, Oracle uses 0.001 as selectivity since
Oracle thinks that the predicate is out-of-range.
So, cardinality will be 0.001 * 10,000 = 10.

For the comparison, let’s try one more query with different predicate, for example: 10,019. When there is
histogram with 50 buckets, we can calculate the selectivity and cardinality as below.
Selectivity = ((10,019 – 10,001) / (10,021 – 10,001) + 0.001) / 50
= ((18/20) + 0.001) / 50=0.01802
Cardinality = 0.01802 * 10,000= 180.2 = 180
While when we don’t have histogram, the calculation is like below. Both of the results are close to the real
number of returned rows (185).
Selectivity = ((10,019 – 10,001) / (11,000 – 10,001) + 0.001)
= ((18/999) + 0.001)=0.01902
Cardinality = 0.01902* 10,000= 190.2 =190

Category 2, query with more than or equal
7. With Special Value and Statistics but Without Histogram.
8. With Special Value, Statistics and Histogram of Bucket 50.
9. With Special Value, Statistics and Histogram of Bucket 250.
I will combine the explanation of above 3 scenarios (7 – 9) here. In this query (where category >=
11000), there is no “special value”, and also the predicate is in the upper bound of the range, Oracle uses
prorated density as the selectivity, so I don’t need to show how selectivity and cardinality is calculated 
Now, let’s again take another predicate to simulate the calculation, for example: 10,995. The calculation
when there is no histogram will be:
Selectivity = ((11,000 – 10,995) / (11,000 – 10,001) + 0.001)
= ((5/999) + 0.001) = 0.006005
Cardinality = 0.006005 * 10,000 = 60.05 = 60
And here is the calculation when we create histogram with 50 buckets:
Selectivity = ((11,000 – 10,995) / (11,000 – 10,981) + 0.001) / 50
= ((5/19) + 0.001) / 50
= 0.264158 / 50
= 0.005283
Cardinality = 0.005283* 10,000 = 52.83 = 53

So, it is similar to scenario 6, when we have perfectly distributed data in the column, we don’t need to
create any histogram in it. Just leave it and Oracle will does the job nicely.
Category 3
10. With Special Value and Statistics but Without Histogram.
11. With Special Value, Statistics and Histogram of Bucket 50.
12. With Special Value, Statistics and Histogram of Bucket 250.
13. With NULL and Statistics but Without Histogram.
14. With NULL, Statistics and Histogram of Bucket 50.
15. With NULL, Statistics and Histogram of Bucket 250.
In the category 3, all the scenarios are not too relevant with the objective of this exercise (NULL is not
“Nothing”), those scenarios are here only to show that when we don’t have “special value” in the range of
the operated predicate, histogram doesn’t give significant impact to the cardinality and selectivity
calculation, and again looks like the result is better when we don’t have histogram.
Below table of comparison shows that when we don’t have histogram in the column, the result is close to
real number of rows, and since the “special value” is out-of-range, it doesn’t give any impact as well.

Conclusion
1. NULL is not “Nothing”, it is something and is created for some purposes, so please do not afraid to use
it in your table. Optimizer also includes NULL in the calculation of selectivity and cardinality.
2. Oracle dynamic sampling feature is good enough for “small” table or index which does not have
statistics in it. For bigger table, the result is unpredictable and can produces wrong execution plan.
3. Histogram can helps Oracle deciding better execution plan for the query, by giving better calculation
on the density value. Depends on the present of histogram, selectivity is calculated as
1/(num_distinct) or density.
4. The relation between number of bucket in the histogram and calculation of cardinality is linear. The
bigger the bucket the better cardinality is. The number of bucket in the histogram is hard limit to 254.

5. Another “popular bad-habit” is storing date value using varchar data type (for ex: storing date in string
formatted, like: YYYYDDMM). I would like to do this exercise as well, but it will be great if someone
already done this 
-heri-

More Related Content

DOCX
Not in vs not exists
DOCX
The internals
PPTX
Introduction to oracle optimizer
PPTX
MYSql manage db
PPTX
MYSQL join
PPTX
Sql modifying data - MYSQL part I
PPTX
MYSQL using set operators
PPSX
Data Structure (Circular Linked List)
Not in vs not exists
The internals
Introduction to oracle optimizer
MYSql manage db
MYSQL join
Sql modifying data - MYSQL part I
MYSQL using set operators
Data Structure (Circular Linked List)

What's hot (19)

PPTX
MYSQL single rowfunc-multirowfunc-groupby-having
DOCX
Correlated update vs merge
DOCX
Technical
PPTX
Ppt on Linked list,stack,queue
PDF
Circular linked list
PPTX
Data Structures - Lecture 9 [Stack & Queue using Linked List]
PPT
Unit ii(dsc++)
PPTX
Mca ii dfs u-3 linklist,stack,queue
PPT
Linked list
PPT
Circular linked list
PPTX
PPT
Stacks & Queues By Ms. Niti Arora
PPTX
MySQL index optimization techniques
PDF
Stacks,queues,linked-list
PPSX
Data Structure (Double Linked List)
PPT
IBM Informix Database SQL Set operators and ANSI Hash Join
PDF
Discover the power of Recursive SQL and query transformation with Informix da...
PPT
Csphtp1 23
PPT
Linked lists
MYSQL single rowfunc-multirowfunc-groupby-having
Correlated update vs merge
Technical
Ppt on Linked list,stack,queue
Circular linked list
Data Structures - Lecture 9 [Stack & Queue using Linked List]
Unit ii(dsc++)
Mca ii dfs u-3 linklist,stack,queue
Linked list
Circular linked list
Stacks & Queues By Ms. Niti Arora
MySQL index optimization techniques
Stacks,queues,linked-list
Data Structure (Double Linked List)
IBM Informix Database SQL Set operators and ANSI Hash Join
Discover the power of Recursive SQL and query transformation with Informix da...
Csphtp1 23
Linked lists
Ad

Viewers also liked (6)

PPTX
Few useful features
DOCX
DOCX
Subquery factoring for FTS
DOCX
Checking clustering factor to detect row migration
DOCX
MV sql profile and index
DOCX
Nested loop join technique - part2
Few useful features
Subquery factoring for FTS
Checking clustering factor to detect row migration
MV sql profile and index
Nested loop join technique - part2
Ad

Similar to 10053 - null is not nothing (20)

PDF
Mysql Optimization
PDF
Mastering VBA: Automate Excel & Office Tasks with Visual Basic for Applications
PPTX
5 surprising oracle sql behaviors that very few people know
PDF
New Dynamic Array Functions. Excel Tutorial
PPT
Get more from excel
PPTX
Elementary Data Analysis with MS Excel_Day-4
PPTX
Analytics on Spreadsheets-converted.pptx
PPTX
Creating a histogram
PPTX
Creating a histogram
PDF
Advanced MySQL Query Optimizations
PPTX
Focusing on specific data by using filterss
PDF
PDF
Oracle sql tutorial
PPTX
Advanced Excel Courses Mumbai
DOCX
Visual basic bt0082
PPTX
Excel 2016 VBA PPT Slide Deck - For Basic to Adavance VBA Learning
DOCX
Statistics is both the science of uncertainty and the technology.docx
PPTX
Oracle basic queries
PDF
Data Base Management System Lecture 10.pdf
DOCX
SQL report
Mysql Optimization
Mastering VBA: Automate Excel & Office Tasks with Visual Basic for Applications
5 surprising oracle sql behaviors that very few people know
New Dynamic Array Functions. Excel Tutorial
Get more from excel
Elementary Data Analysis with MS Excel_Day-4
Analytics on Spreadsheets-converted.pptx
Creating a histogram
Creating a histogram
Advanced MySQL Query Optimizations
Focusing on specific data by using filterss
Oracle sql tutorial
Advanced Excel Courses Mumbai
Visual basic bt0082
Excel 2016 VBA PPT Slide Deck - For Basic to Adavance VBA Learning
Statistics is both the science of uncertainty and the technology.docx
Oracle basic queries
Data Base Management System Lecture 10.pdf
SQL report

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Machine Learning_overview_presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Group 1 Presentation -Planning and Decision Making .pptx
Heart disease approach using modified random forest and particle swarm optimi...
Assigned Numbers - 2025 - Bluetooth® Document
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Machine Learning_overview_presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
OMC Textile Division Presentation 2021.pptx
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
gpt5_lecture_notes_comprehensive_20250812015547.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...

10053 - null is not nothing

  • 1. 10053 – NULL is not “NOTHING” “How many time do you ignore NULL and replace it with a special value?” Have you ever do that before? Or maybe you are doing it now? Please, stop it now! Take a sit and read this small example before you continue what you are doing. Mr. A is working for some research company and his job is to enter employee’s details into the database. He is working with PERSON table. These are the details of that table: - It has 10,000 rows with 4 columns Column NAME (varchar) has 10,000 distinct values, it holds the employee’s name Column BIRTH (number) has 12 distinct values, it represents employee’s month of birth Column ID (number) has 10,000 distinct values, it holds employee’s ID Column CATEGORY (number) has 1,000 distinct values, it holds employee’s category Here is the situation: - Mr. A finds that 5 rows in the CATEGORY do not have any number yet (it should be any value between 10,001 and 11,000) Since his manager is not in the office, Mr. A decides to store “0” for those un-defined employee’s category (“0” is the special value in this case) As the result, column CATEGORY has 1,001 distinct values. Is he doing something right, or wrong? Let’s answer that question using below example  Let’s build the word  p.s. Another good example: people uses “1 January 1900” as date of birth. Who live for 100 years now?
  • 2. Start the Exercise I ran 15 scenarios for this exercise and the table of comparison can be found below.Later I will go through the scenarios one by one. In general there are 3 categories for the scenarios: 1. SELECT * FROM person WHERE category <= 10001  Query with “special values”in the range. 2. SELECT * FROM person WHERE category >= 11000  Query without “special values” in the range with bounded – closed predicate. 3. SELECT * FROM person WHERE category BETWEEN 10999 AND 11000  Query without “special values” in the range with bounded – closed – closed predicate. The main objective of this exercise is to show how Oracle calculate cardinality in below situations: - When there is no statistics (to see the impact when dynamic sampling feature is turn on and turn off) When there is “special value” (I use “0” as the“special value”) which has an extreme range from the lower bound of the data When there is a histogram in the column The impact of bucket’ size of histogram autotrace outputs.zip 10053 trace files.zip
  • 3. The Scenarios Category 1, query with less than or equal 1. With Special Value and Without Statistics. Without statistics in the table, Oracle uses dynamic sampling to get the statistics during run time (dynamic sampling is enabled by default). This can be seen in the output of auto trace or 10053 trace file.With only 10K rows in the table, the sample size is 100% (all rows). During dynamic sampling process, Oracle runs the following query which gives the correct answer in this case. For bigger table, things can be different. SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("PERSON") FULL("PERSON") NO_PARALLEL_INDEX("PERSON") */ 1 AS C1, CASE WHEN "PERSON"."CATEGORY"<=10001 THEN 1 ELSE 0 END AS C2 FROM "PERSON" "PERSON") SAMPLESUB; NVL(SUM(C1),0) NVL(SUM(C2),0) -------------- -------------10000 15 Dynamic sampling level Sample size in % and number of row Original and computed cardinality after predicate is applied
  • 4. In the above example, Oracle calculates the cardinality perfectly when dynamic sampling feature is turned on. As comparison, when dynamic sampling is disabled, the calculation is far from perfect. It can be seen in below example, the cardinality went to 176 and the cost of full table scan (11.22) is very close to cost of index scan (10.01) which is critical since Oracle can be easily switches to full table scan for different predicate (in this example, index range scan is more efficient). nostats_nodyn.LST orcl10_ora_6584_no stats_nodyn.trc Lastly, when the table is big (with a lot of rows), the sample size can be different. In the next example, the same query is executed against table with 10,000,000 rows in it and based on the calculation, Oracle uses full table scan instead of more efficient index range scan and calculation of cardinality is wrong. nostats_big.LST orcl10_ora_4008_no stats_big.trc
  • 5. 2. With Special Value and Statisticsbut Without Histogram. In the second scenario, I gather both table and index statistics using below command. In the absent of histogram in the column as we can see above, the density is simply 1/(num_distinct). Another interesting fact is that Oracle decides to use all rows as sample (the same effectwhen we use estimate_percent=>100 in dbms_stats.gather_table_stats).
  • 6. The cardinality calculation in this scenario is the worst of all, computed cardinality is 9,102 and Oracle uses full table scan as the access method. This is happened because of skewed data in column CATEGORY (data is not evenly distributed). We can create histogram on the skewed column so that Oracle will be able to calculate the cardinality better than before. 3. With Special Value, Statistics and Histogram of Bucket 50. I will combine the explanation of scenario 3 and 4, since the different is only in the number of bucket in the histogram. 4. With Special Value, Statisticsand Histogram of Bucket 250. In these scenarios, I create histogram on CATEGORY column with 50 and 250 buckets. With 50 buckets, Oracle calculates the cardinality as 200 (10,000 * 0.019962) and with 250 buckets, Oracle calculates the cardinality as 40 (10,000 * 0.0039988)  values in the bracket are taken from 10053 trace file. So when we have more bucket, the calculation of cardinality is likely getting closer to the actual number of filtered rows.
  • 7. For histogram with 50 buckets, the calculation of cardinality and selectivity is like below: Selectivity = ((required range) / (high value – low value) + density) / number of bucket = ((10,001 – 0) / (10,020 – 0) + 0.0009993) / 50 = (0.9981038 + 0.0009993) / 50 = 0.9991031 / 50 = 0.019982 Cardinality = selectivity * number of rows = 0.019982* 10,000 = 199.82 = 200
  • 8. The calculation of cardinality when the number of bucket is increased to 250 is 39.96, and here is the details: Selectivity = ((required range) / (high value – low value) + density) / number of bucket = ((10,001 – 0) / (10,020 – 0) + 0.0009993) / 250 = (0.9981038 + 0.0009993) / 250 = 0.9991031 / 250 = 0.003996 Cardinality = selectivity * number of rows = 0.003996 * 10,000 = 39.96 = 40 In above 2 scenarios (when there is bucket), Oracle can perfectly uses index range scan as the access method. Cardinality Selectivity 5. With NULL and Statisticsbut Without Histogram. Next we will update the “0” records with NULL value (which is good and recommended way for storing undefined value). This method will not create a huge gap in the data distribution.
  • 9. Now we can see Oracle calculates the cardinality perfectly and also chooses index range scan as the access method. The output of 10053 trace file is like below, and since there is no histogram in the column, density will be 1/(num_distinct)= 1/1,000 = 0.001. Selectivity = ((10,001 – 10,001) / (11,000 – 10,001) + 0.001) = (0 + 0.001) = 0.001 Cardinality = 0.001 * 10,000 = 10
  • 10. 6. With NULL, Statisticsand Histogram of Bucket 50. If we have evenly distributed data with normal distribution and no popular value, histogram is not neededin that kind of column, because in most of the cases Oracle is able to calculate the cardinality correctly.In this scenario, histogram is created with 50 buckets and Oracle calculates the cardinality perfectly. But if we take a look in the output of 10053 trace file, Oracle uses 0.001 as selectivity since Oracle thinks that the predicate is out-of-range. So, cardinality will be 0.001 * 10,000 = 10. For the comparison, let’s try one more query with different predicate, for example: 10,019. When there is histogram with 50 buckets, we can calculate the selectivity and cardinality as below. Selectivity = ((10,019 – 10,001) / (10,021 – 10,001) + 0.001) / 50 = ((18/20) + 0.001) / 50=0.01802
  • 11. Cardinality = 0.01802 * 10,000= 180.2 = 180
  • 12. While when we don’t have histogram, the calculation is like below. Both of the results are close to the real number of returned rows (185). Selectivity = ((10,019 – 10,001) / (11,000 – 10,001) + 0.001) = ((18/999) + 0.001)=0.01902 Cardinality = 0.01902* 10,000= 190.2 =190 Category 2, query with more than or equal 7. With Special Value and Statistics but Without Histogram. 8. With Special Value, Statistics and Histogram of Bucket 50. 9. With Special Value, Statistics and Histogram of Bucket 250. I will combine the explanation of above 3 scenarios (7 – 9) here. In this query (where category >= 11000), there is no “special value”, and also the predicate is in the upper bound of the range, Oracle uses prorated density as the selectivity, so I don’t need to show how selectivity and cardinality is calculated 
  • 13. Now, let’s again take another predicate to simulate the calculation, for example: 10,995. The calculation when there is no histogram will be: Selectivity = ((11,000 – 10,995) / (11,000 – 10,001) + 0.001) = ((5/999) + 0.001) = 0.006005 Cardinality = 0.006005 * 10,000 = 60.05 = 60
  • 14. And here is the calculation when we create histogram with 50 buckets: Selectivity = ((11,000 – 10,995) / (11,000 – 10,981) + 0.001) / 50 = ((5/19) + 0.001) / 50 = 0.264158 / 50 = 0.005283 Cardinality = 0.005283* 10,000 = 52.83 = 53 So, it is similar to scenario 6, when we have perfectly distributed data in the column, we don’t need to create any histogram in it. Just leave it and Oracle will does the job nicely. Category 3 10. With Special Value and Statistics but Without Histogram. 11. With Special Value, Statistics and Histogram of Bucket 50. 12. With Special Value, Statistics and Histogram of Bucket 250.
  • 15. 13. With NULL and Statistics but Without Histogram. 14. With NULL, Statistics and Histogram of Bucket 50. 15. With NULL, Statistics and Histogram of Bucket 250. In the category 3, all the scenarios are not too relevant with the objective of this exercise (NULL is not “Nothing”), those scenarios are here only to show that when we don’t have “special value” in the range of the operated predicate, histogram doesn’t give significant impact to the cardinality and selectivity calculation, and again looks like the result is better when we don’t have histogram. Below table of comparison shows that when we don’t have histogram in the column, the result is close to real number of rows, and since the “special value” is out-of-range, it doesn’t give any impact as well. Conclusion 1. NULL is not “Nothing”, it is something and is created for some purposes, so please do not afraid to use it in your table. Optimizer also includes NULL in the calculation of selectivity and cardinality. 2. Oracle dynamic sampling feature is good enough for “small” table or index which does not have statistics in it. For bigger table, the result is unpredictable and can produces wrong execution plan. 3. Histogram can helps Oracle deciding better execution plan for the query, by giving better calculation on the density value. Depends on the present of histogram, selectivity is calculated as 1/(num_distinct) or density. 4. The relation between number of bucket in the histogram and calculation of cardinality is linear. The bigger the bucket the better cardinality is. The number of bucket in the histogram is hard limit to 254. 5. Another “popular bad-habit” is storing date value using varchar data type (for ex: storing date in string formatted, like: YYYYDDMM). I would like to do this exercise as well, but it will be great if someone already done this  -heri-