Microsoft Business Intelligence (Data Tools)|Tables

Showing posts with label Tables. Show all posts

Sunday, January 31, 2016

SQL - COALESCE() Function

SQL - COALESCE() - This function plays a very important role in SQL because it will take any number of parameters, and return the first value encountered that isn't NULL. All the passed parameters into the function must have the same data type. If all parameters are NULL, COALESCE returns NULL. At least one of the null values must be a typed NULL.
COALESCE is ANSI standard function which is internally translated to a CASE expression and there is an advantage that we aware what to expect when using it. COALESCE is harder to spell, but at least it doesn't lead to incorrect.

SQL Syntax

COALESCE ( expression [ ,...n ] )

Arguments

Expression: Is an expression of any type.

Return Types: Returns the data type of expression with the highest data type precedence.

Note: COALESCE requires at least minimum of two expressions. If all the passed parameters are not the same data-type then they get implicitly cast to an appropriate data-type using data-type order of precedence.

To understand the powerful features of COALESCE function; we are taking an example to explain it. In this example, we are going to create a temporary table to store the user comments on the various topics and their contact numbers.

USE tempdb

---- CREATE TEMP TABLE FOR COMMENTS

CREATE TABLE #UserComments

(

UserId Int,

Topic varchar(200),

Comments varchar(500),

Phone1 Varchar(12),

Phone2 Varchar(12),

Phone3 Varchar(12)

)

---- insert data into comments table

INSERT INTO #UserComments(UserId,Topic,Comments,Phone1,Phone2,Phone3)

VALUES(1,'Population','Population could be the biggest problem for the world',1245679870,NULL,9852631450),

(2,'Pollution','Now a day, people are suffering lots of disease due to Pollution',NULL,6758901243,NULL),

(3,'Population','India and China is the biggest country for the Population',9321456235,NULL,NULL),

(4,'Pollution','There are many type of the Pollution',NULL,NULL,8856321456)

----- pull the records from the comments table

SELECT UserId,Topic,Comments,

Phone1,Phone2,Phone3

FROM #UserComments;

UserId	Topic	Comments	Phone1	Phone2	Phone3
1	Population	Population could be the biggest problem for the world.	1245679870	NULL	9852631450
2	Pollution	Now a day, people are suffering lots of disease due to Pollution.	NULL	6758901243	NULL
3	Population	India and China is the biggest country for the Population.	9321456235	NULL	NULL
4	Pollution	There are many type of the Pollutions.	NULL	NULL	8856321456

---- drop temp table

DROP TABLE #UserComments;

Return first Non-able Value

On the basis of the User Comments table, we can get the any available contact number against any user Id because COALESCE function can take n-numbers of the arguments and will return the first not null able value from them as given below:

---- Pull the contact number for the user

select

UserId,

Topic,

ContactNumber=Coalesce(Phone1,Phone2, Phone3)

from

#UserComments

UserId	Topic	ContactNumber
1	Population	1245679870
2	Pollution	6758901243
3	Population	9321456235
4	Pollution	8856321456

Before coalesce function is run, it checks what data types it is comparing. i.e. INT, INT, INT and DATETIME. It decides that for this function they should all be processed as DATETIME and implicitly converts them before it calls the function.

Using Coalesce to comma separated values

In the above table, we can use the coalesce function to get the comma separated values as given below:

---- declare local variable

DECLARE @UserComments VARCHAR(1000)

----- get the vales into variable

SELECT @UserComments = COALESCE(@UserComments,'') + Comments + ','

FROM #UserComments

WHERE (Topic = 'Population')

----- pull value from variable

SELECT @UserComments AS UserComments

UserComments

Population could be the biggest problem for the world.,India and China is the biggest country for the Population.,

Conclusion

Coalesce allows multiple items to be compared in one statement. COALESCE () most often appears within a very specific content, such as in a query or view or stored procedure. COALESCE is a powerful tool if you are returning numerous values to a user and want to substitute occurrences of NULL with values from a different column or with an expression.

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Wednesday, January 20, 2016

DW - Full load & Incremental or Refresh load

Data is the business asset which is used to making the right business decision on the right time. It is very important to design our database correctly, up to whatever normal form we can bear.
While OLAP cubes are still common across the BI technology marketplace, data warehousing has definitely seen growth as it is serving consumers in more modern and complete ways. In the current business world, every organization is spending lot of money on the BI technologies to quick access of the valuable data because they believe that data visualisation is the key of the success in making the right decisions at the right time.

So, it is become very important urgent or sometimes treated as do and die situation and need to apply very highly recommended approaches which should be based on the real cases. We are trying to highlight the importance of using an incremental load and change data capture technique for loading data to the dashboards for the end users.
ETL tools such as SQL Server Warehouse Builder have the ability to set table loading to insert/update that will support both full load and incremental load with the use of the same ETL routines. The key to supporting this, staging tables are created in the staging layer to join to source tables based upon update or create dates of the record.

What is Incremental Load?
As we know that Data is very important factor for an organisation for making the right business decision on the right time. So, Incremental load is the best option and it is an widely used method to load data in data warehouses from the respective source systems. This technique is employed to perform faster load in less time utilizing less system resources. To understand the incremental load functionality, we can take an example for the daily product hierarchy sales chain where we can put the staging tables between destination and source.
In this scenario, staging tables always refreshed in the beginning of the ETL process and then after load from the source files after proper data transformation. In the last section of the ETL process, targeted tables are filled from the staging tables based on the following conditions -

If data is matched with the target table then matching data will be updated.

If data is not matched with the target tables then data will be treated as new and append into the target tables.

Incremental data loading is the most common approach to capture ongoing changes and keeping historical data on a predefined schedule. In the incremental approach, the key to supporting this is a created table in the staging layer to join to source tables based upon update or create dates of the record.

What is Full Load?
In this technique, there is no staging area and always truncate the target tables before load all the data from the source raw data files. This technique is very time consuming and easily lost the information but capable to provide the guaranteed information. Full data loading also known as the fully refresh data loading also.

To know more, click on

Mukesh Singh

Friday, December 25, 2015

SQL - LOG() Mathematical Function

SQL LOG()

This function was introduced within the release of SQL Server 2005. This function is used to calculate the natural logarithm of a number in SQL. In a very simple manner, we can say that the LOG function computes the logarithm of an expression. It plays a very important role in the data analytics.

Syntax

We can use this function by using below syntax-

LOG (Float Expression [, Base ] )

Base

Optional integer argument that sets the base for the logarithm. The base by which to compute the logarithm. When you do not specify a value, the function computes the natural logarithm of the expression by using e for the base where e is equal to 2.718281828459.

Float Expression

A numeric expression which is greater than zero. When the value is equal to or less than zero, LOG returns an NA value.

Return Types: The important thing is that it always return float type output.

Remarks: By default, LOG() returns the natural logarithm. User can change the base of the logarithm to another value using the optional base parameter.

The natural logarithm of the exponential of a number is the number itself: LOG( EXP( n ) ) = n. And the exponential of the natural logarithm of a number is the number itself:

EXP( LOG( n ) ) = n.

Calculating the logarithm

You can easily calculate the log for the specified float expression as given below:

--- declare float variable and set value

Declare @inpFloat float=10.24

--- declare int variable and set value

Declare @inpInt int=10

--- declare decimal variable and set value

Declare @inpDecimal int=10.24

---- select value from the log function

SELECT LOG(@inpFloat) as 'LOG Value from Float',

LOG(@inpInt) as 'LOG Value from Int',

LOG(@inpDecimal) as 'LOG Value from Decimal'

LOG Value from Float	LOG Value from Int	LOG Value from Decimal
2.32630162	2.302585093	2.302585093

--- declare base

Declare @base int=2

---- select value from the log function

SELECT LOG(@inpFloat, @base ) as 'LOG Value from Float',

LOG(@inpInt, @base) as 'LOG Value from Int',

LOG(@inpDecimal, @base) as 'LOG Value from Decimal'

LOG Value from Float	LOG Value from Int	LOG Value from Decimal
3.35614381	3.321928095	3.321928095

Now, we can see that log function returns two results. One is without base value and another one is with base value.

Conclusion

As we already know that this function plays important role in the data analytics calculations. Apart from this; logarithm is used by navigators, scientists, engineers, and others to perform computations more easily. In public-key cryptography application, discrete logarithm is used to generate keys.

Mukesh Singh

Sunday, October 25, 2015

SQL – LEAD function

LEAD function was introduced in SQL Server 2012 in the group of analytics functions which is very helpful to make analytics in T-SQL a possibility and would add some value from a BI perspective. LEAD function is used for accessing subsequent (or columns from the subsequent row) rows along with the current row which could make certain operations which done in a multi-step fashion be more efficient.

The basic fundamental of LEAD function, accesses data from a subsequent row in the same result set without the use of a self-join in SQL Server 2012. Use this analytic function in a SELECT statement to compare values in the current row with values in a previous row.

How to use LEAD function

Lead function supports non-deterministic nature. Non-deterministic functions may return different results each time they are called with a specific set of input values even if the database state that they access remains the same. By using below syntax, we can use this function where want to use.

LEAD (scalar_expression [,offset] [,default])

OVER ( [ partition_by_clause ] order_by_clause )

Arguments

scalar_expression

Within this parameter, we can specify a scalar expression or column name whose value from the subsequent row is to be returned.

Scalar_expression cannot be an analytic function.

Offset, default

Within this parameter, we can specify an offset to access not only the next immediate row but any row after the current row. Its default value of 1 accesses the next immediate row whereas a value of 3 accesses the third row from the current row.

OVER ( [ partition_by_clause ] order_by_clause)

Partition_by_clause divides the result set produced by the FROM clause into partitions to which the function is applied. If not specified, the function treats all rows of the query result set as a single group. Order_by_clause determines the order of the data before the function is applied.

Return Types

The data type of the specified scalar_expression. NULL is returned if scalar_expression is nullable or default is set to NULL.

LEAD functions are flexible to let us specify the number of rows to move forward from the current row with the offset parameter. To better understand this analytical function, we can take an example to calculation daily performance of any stock where we need to calculate the current value divided by Next day’s value as shown below:

Performance= (Current Day Value/Next Day’s Value)-1

Now, we need to get the Next day’s value and LEAD function is capable to fulfill this requirement but first of all we need to know about our data table and position of the actual data into the table before using LEAD Function.

USE TEMPDB

---- Declare Table variable for Daily Stock Values

DECLARE @TableStock Table

(

StockId int,

StockName Varchar(20),

StockDate Date,

StockValue real

)

---- Insert Values in the stock Table variable

INSERT INTO @TableStock (StockId, StockName, StockDate, StockValue)

VALUES

(101, 'StockName S1', '2015-10-23', '546.56'),

(101, 'StockName S1', '2015-10-22', '544.22'),

(101, 'StockName S1', '2015-10-21', '543.50'),

(101, 'StockName S1', '2015-10-20', '544.75'),

(101, 'StockName S1', '2015-10-19', '543.78')

---- Values in the Table Variable

SELECT StockId, StockName, StockDate, StockValue FROM @TableStock

StockId	StockName	StockDate	StockValue
101	StockName S1	10/23/2015	546.56
101	StockName S1	10/22/2015	544.22
101	StockName S1	10/21/2015	543.50
101	StockName S1	10/20/2015	544.75
101	StockName S1	10/19/2015	543.78

Get previous day’s value with the help of LAG Function as given below:

---- USE CTE to get the Next Day's Stock Value

;WITH CTE AS

(

SELECT StockId, StockName, StockDate, StockValue,

---- LEAD Function to get Next Day's Value

NextDayValue=LEAD(StockValue,1,0) Over (ORDER BY StockDate)

FROM @TableStock

)

----- Values in CTE table

SELECT StockId,

StockName, StockDate,

CurrentValue=StockValue,

NextDayValue FROM CTE

ORDER BY StockDate DESC;

StockId	StockName	StockDate	CurrentValue	NextDayValue
101	StockName S1	10/23/2015	546.56	0.00
101	StockName S1	10/22/2015	544.22	546.56
101	StockName S1	10/21/2015	543.50	544.22
101	StockName S1	10/20/2015	544.75	543.50
101	StockName S1	10/19/2015	543.78	544.75

Get the Daily performance now as shown below:

---- USE CTE to get the Next Day's Stock Value

;WITH CTE AS

(

SELECT StockId, StockName, StockDate, StockValue,

---- LEAD Function to get Next Day's Value

NextDayValue=LEAD(StockValue,1,0) Over (ORDER BY StockDate)

FROM @TableStock

)

----- Performance Calculation in CTE table

SELECT StockId,

StockName, StockDate,

CurrentValue=StockValue,

NextDayValue,

---- If Next day value is 0 then set current value

Performance=(StockValue/ (Case when NextDayValue=0 then StockValue else NextDayValue end) )-1

FROM CTE

ORDER BY StockDate DESC;

StockId	StockName	StockDate	CurrentValue	NextDayValue	Performance
101	StockName S1	10/23/2015	546.56	0.00	0.000000
101	StockName S1	10/22/2015	544.22	546.56	-0.004281
101	StockName S1	10/21/2015	543.50	544.22	-0.001323
101	StockName S1	10/20/2015	544.75	543.50	0.002300
101	StockName S1	10/19/2015	543.78	544.75	-0.001781

With the help of the LEAD function, we can get any next values for day, month, year or anything which you want to use in you analytics calculations.

Query at a Glance

USE TEMPDB

---- Declare Table variable for Daily Stock Values

DECLARE @TableStock Table

(

StockId int,

StockName Varchar(20),

StockDate Date,

StockValue real

)

---- Insert Values in the stock Table variable

INSERT INTO @TableStock (StockId, StockName, StockDate, StockValue)

VALUES

(101, 'StockName S1', '2015-10-23', '546.56'),

(101, 'StockName S1', '2015-10-22', '544.22'),

(101, 'StockName S1', '2015-10-21', '543.50'),

(101, 'StockName S1', '2015-10-20', '544.75'),

(101, 'StockName S1', '2015-10-19', '543.78')

---- Values in the Table Variable

SELECT StockId, StockName, StockDate, StockValue FROM @TableStock

---- USE CTE to get the Next Day's Stock Value

;WITH CTE AS

(

SELECT StockId, StockName, StockDate, StockValue,

---- LEAD Function to get Next Day's Value

NextDayValue=LEAD(StockValue,1,0) Over (ORDER BY StockDate)

FROM @TableStock

)

----- Performance Calculation in CTE table

SELECT StockId,

StockName, StockDate,

CurrentValue=StockValue,

NextDayValue,

---- If Next day value is 0 then set current value

Performance=(StockValue/ (Case when NextDayValue=0 then StockValue else NextDayValue end) )-1

FROM CTE

ORDER BY StockDate DESC;

So, LEAD function can really help in looking at after records to compute records differences from a single T-SQL statement that is more readable and logical than previous methods for gathering this information. Learn more on another features of SQL as:

SQL - Stored Procedures
SQL - Create Stored Procedure
SQL - Execute Stored Procedure
SQL - Alter Stored Procedure
SQL - Delete Stored Procedure
SQL - Views
SQL - Stored Procedures Vs Ad-Hoc (In-Line) Queries
SQL - Stored Procedures Vs Functions
SQL - Stored Procedures Vs Triggers
SQL - Merge Statement
SQL - Functions
SQL - Triggers
SQL - Indexes
SQL - Cursors

Mukesh Singh

Sunday, January 31, 2016

SQL - COALESCE() Function

Wednesday, January 20, 2016

DW - Full load & Incremental or Refresh load

Friday, December 25, 2015

SQL - LOG() Mathematical Function

Sunday, October 25, 2015

SQL – LEAD function

Popular Posts