SlideShare a Scribd company logo
Data Science Interview Questions
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Can you
solve?
You have two buckets - one of 3 liters and other of 5
liters.You are expected to measure exactly 4 liters. How
will you complete the task?
Note:There is no third bucket
Step 1: Fill in 5 liter bucket and empty it in the 3 liter bucket. You are
left with 2 liter in the 5 liter bucket
Step 2: Empty the 3 liter bucket and pour the contents of 5 liter
bucket in it. So 3 liter bucket now has 2 liters
Step 3: Fill the 5 liter bucket again and pour the water in 3 liter bucket
(already has 2 liters of water from step 2)
You now have 4 liters in the 5 liter bucket
53
What are the datatypes supported in Tableau?1 List the differences between supervised
and unsupervised learning01
1 List the differences between supervised and unsupervised learning
Requires both an input and an output to
be given to the model for it to be
trained.
• Uses known and labeled data as input • Uses unlabeled data as input
• Most commonly used unsupervised
learning algorithms are k-means clustering,
hierarchical clustering, apriori algorithm
• Most commonly used supervised learning
algorithms are decision tree, logistic
regression, support vector machine
• Supervised learning has a feedback
mechanism
• Unsupervised learning has no feedback
mechanism
Supervised Learning Unsupervised Learning
What are the datatypes supported in Tableau?1 How is logistic regression done?02
2 How is logistic regression done?
Logistic Regression measures the relationship between the dependent variable (our label, what we
want to predict) and the one or more independent variables (our features), by estimating probabilities
using it’s underlying logistic function (sigmoid)
X1
X2
X3
X4
0.5
0.8
0.9
0.1
0.9
0.1
0 or 1
Inputs Probabilities Values close to
0 and 1
Linear
Model
Sigmoid
Function
Threshold
Classifier
2
0
100 1
0
Sigmoid
Curve
Sigmoid Function
y = m*x + c
p =
1
1 + ⅇ
− y
p
ln (
1-p
) = m*x + c
No. of hours studied No. of hours studied
Marks
Pass
How is logistic regression done?
What are the datatypes supported in Tableau?1 Explain the steps in making a decision
tree
03
3 Explain the steps in making a decision tree
Take the entire dataset as input
Calculate entropy of target variable as well as predictor attributes
Calculate information gain of all attributes
Choose the attribute with highest information gain as the root node
Repeat the same process on every branch till the decision node of each
branch is finalized
3 Explain the steps in making a decision tree
NoYes
Yes
Salary >
$50,000
No
Commute
> 1 hour
YesNo
Decline Offer
Play Decline OfferOffers
Incentives
Decline OfferAccept Offer
Tip: You should know the
formulae for entropy and
information gain!
For example, if you want to build a decision tree to decide whether
we should accept or decline a job offer
What are the datatypes supported in Tableau?1 How do you build a random forest
model?
04
4 How do you build a random forest model?
Randomly select “k” features from total “m” features
Where k << m
Among the “k” features, calculate the node “d” using the best split point
Split the node into daughter nodes using the best split
Repeat steps 2 and 3 steps until leaf nodes are finalized
Build forest by repeating steps 1 to 4 for “n” number times to create “n”
number of trees
What are the datatypes supported in Tableau?1 How can you avoid overfitting of your
model?
05
5 How can you avoid overfitting of your model?
There are three main methods to avoid overfitting:
Keep the model simple: take into
account fewer variables, thereby
removing some of the noise in the
training data
Use cross-validation
techniques such as k-folds
cross-validation
Use regularization techniques
such as LASSO that penalize
certain model parameters if
they’re likely to cause
overfitting
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
There are 9 balls out of which one ball is heavy in weight
and rest are of the same weight. In how many minimum
Weightings will you find the heavier ball?
Can you
solve?
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
You will need to perform 2 weightings:
Step 1: Place three balls on each side
Scenario(a): Balance out
Out of the remaining three balls from step 1, take two balls and
place one ball on each side – if they balance out then the left out
ball will be the heavier ball. Otherwise, you will see it in the balance.
Scenario(b): Not balanced out
If the balls in step 1 do not balance out, then take those three balls
and reproduce step 2 to find out the heavier ball.
What are the datatypes supported in Tableau?1 Differentiate between univariate,
bivariate and multivariate analysis
06
6 Differentiate between univariate, bivariate and multivariate analysis
This type of data contains only one variable. The purpose of
univariate analysis is to describe the data and find patterns that exist
within it
Example: height of students
The patterns can be studied by drawing conclusions using mean,
median and mode, dispersion or range, minimum, maximum etc
Height (in cm)
164
167.3
170
174.2
178
180
6 Differentiate between univariate, bivariate and multivariate analysis
This type of data involves two different variables. The analysis of
this type of data deals with causes and relationships and the analysis
is done to find out the relationship among the two variables
Example: temperature and ice cream sales in summer season
Here, the relationship is visible from the table that temperature and
sales are directly proportional to each other
Temperature (in
Celsius)
Sales
20 2000
25 2100
26 2300
28 2400
30 2600
35 3100
6 Differentiate between univariate, bivariate and multivariate analysis
When the data involves three or more variables, it is categorized
under multivariate.
It is similar to bivariate but contains more than one dependent
variable
Example: data for house price prediction
The patterns can be studied by drawing conclusions using mean,
median and mode, dispersion or range, minimum, maximum etc
No. of
rooms
Floor Sqft. Area Price
2 0 900 40,00,00
3 2 1100 60,00,000
3.5 5 1500 90,00,000
4 3 2100 1,20,00,000
What are the datatypes supported in Tableau?107 What are the feature selection methods
to select the right variables?
7 What are the feature selection methods to select the right variables?
Following are the methods of variable selection you can use:
There are two main methods for feature selection:
Filter Methods Wrapper Methods
• Linear Discriminant Analysis
• ANOVA
• Chi-Sqaure
• Forward Selection
• Backward Selection
• Recursive Feature Elimination
What are the datatypes supported in Tableau?1
In your choice of language: Write a program that prints
the numbers from 1 to 50. But for multiples of three
print “Fizz” instead of the number and for the multiples
of five print “Buzz”. For numbers which are multiples of
both three and five print “FizzBuzz”
08
8 Program Code
Code:
8 Program Code
Output:
.
.
.
What are the datatypes supported in Tableau?1 You are given a dataset consisting of variables
having more than 30% missing values? How
will you deal with them?
09
9 You are given a dataset consisting of variables having more than 30% missing values?
How will you deal with them?
Ways to handle missing data values:
If dataset is huge, we can
simply remove the rows
with missing data values.
It is the quickest way
i.e. we use the rest of the
data to predict the values
We can substitute missing
values with mean of rest of
the data using pandas
dataframe in python
i.e. df.mean()
df.fillna(mean)
What are the datatypes supported in Tableau?1 For the given points, how will you
calculate the Eucledian Distance, in
Python?
10
1
0
For the given points, how will you calculate the Eucledian Distance, in Python?
Given points:
plot1 = [1,3]
plot2 = [2,5]
euclidean_distance = sqrt( (plot1[0]-plot2[0])**2 + (plot1[1]-plot2[1])**2 )
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
What is the angle between the hour and minute
hands of a clock when the time is half past six?
Can you
solve?
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
• The minute hand has travelled for 30 minutes. So, it has
covered 30×6=180°
• The hour hand has travelled for 6.5 hours. So, it has
covered 6.5×30=195°
• The difference between the two will give the angle between
the two hands. Thus, the required angle=195°-180°=15°
Note: A clock is a complete circle having 360 degrees
In 1 hour, the hour hand covers: 360/12 = 30°
In 1 minute, the minute hand covers 360/60 = 6°
What are the datatypes supported in Tableau?1 Explain dimensionality reduction, and list
its benefits?
11
1
1
Explain dimensionality reduction, and list its benefits?
Dimension Reduction refers to the process of converting a set of data having vast dimensions into data with
lesser dimensions (fields) to convey similar information concisely
It helps in data compressing and reducing the storage space
It reduces computation time as less dimensions lead to less
computing
It removes redundant features
For example: there is no point in storing a value in two different units
(meters and inches)
What are the datatypes supported in Tableau?1 How will you calculate eigen values and
eigen vectors of a 3 by 3 matrix?12
1
2
How will you calculate eigen values and eigen vectors of a 3 by 3 matrix?
-2 -4 2
-2 1 2
4 2 5
Characteristic equation:
Expanding determinant: (-2 – λ) [(1-λ) (5-λ)-2x2] + 4[(-2) x (5-λ) -4x2] + 2[(-2) x 2-4(1-λ)] =0
- λ3 + 4 λ2 + 27λ – 90 = 0,
λ 3 - 4 λ2 -27 λ + 90 = 0
1
2
How will you calculate eigen values and eigen vectors of a 3 by 3 matrix?
By hit and trial:
Hence (λ-3) is a factor
So, eigen values are 3, -5, 6
Calculate eigenvector for λ=3
For X = 1,
33 – 4 x 32 - 27 x 3 +90 = 0
λ 3 - 4 λ2 - 27 λ +90 = (λ – 3) (λ2 – λ – 30)
(λ – 3) (λ2 – λ – 30) = (λ – 3) (λ+5) (λ-6),
-5 -4Y +2Z =0,
-2 -2Y +2Z =0
1
2
How will you calculate eigen values and eigen vectors of a 3 by 3 matrix?
Subracting the two equation:
Subracting back into second equation:
Similarly, we can calculate the eigen
vectors for -5 and 6
Z = -
1
2
.
3 + 2Y = 0,
Y = -
3
2
.
What are the datatypes supported in Tableau?1 How should you maintain your deployed
model?
13
1
3
How should you maintain your deployed model?
CompareEvaluateMonitor Rebuild
Constant monitoring of
all of the models is
needed to determine the
performance accuracy of
the models
Evaluation metrics of the
current model is
calculated to determine
if new algorithm is
needed
The new models are
compared against each
other to determine which
model performs the best
The best performing
model is re-built on
current state of data
What are the datatypes supported in Tableau?1 What are recommender systems?14
1
4
What are recommender systems?
A recommender system predicts the "rating" or "preference“, a user
would give to a product
Collaborative Filtering Content-based Filtering
Example:
Last.fm recommends tracks that are often
played by other users with similar interests
Example:
Pandora uses the properties of a song to
recommend music with similar properties
What are the datatypes supported in Tableau?1 How to find RMSE and MSE in linear
regression model?
15
1
5
How to find RMSE and MSE in linear regression model?
RMSE and MSE are the two of the most common measures of accuracy for a linear regression
RMSE indicates the Root Mean Square Error
MSE indicates the Mean Square Error
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
If it rains on Saturday with probability 0.6, and it rains on
Sunday with probability 0.2 , what is the probability that
it rains this weekend?
Can you
solve?
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Total probability – (Probability that it will not rain on Saturday)
(Probability that it will not rain on Sunday)
1−(1−0.6)(1−0.2)=0.68
Can you
solve?
U
What are the datatypes supported in Tableau?1 How can you select k for k-means?16
1
6
How can you select k for k-means?
We use “Elbow Method” to select k for k-means
• The idea of the elbow method is to run k-means clustering on the
dataset where ‘k’ is the number of clusters
• Within sum of squares (WSS) is defined as the sum of the squared distance
between each member of the cluster and its centroid
WSS
No . of. clusters
Elbow Point
What are the datatypes supported in Tableau?1 What is the significance of p-value?17
1
7
What is the significance of p-value?
p-value
typically ≤ 0.05
p-value
typically > 0.05
p-value
Cutoff 0.05
Indicates strong evidence against the null hypothesis, so
you reject the null hypothesis
Indicates weak evidence against the null hypothesis, so
you fail to reject the null hypothesis
Considered to be marginal (could go either way)
What are the datatypes supported in Tableau?1 How can outlier values be treated?18
1
8 How can outlier values be treated?
1. You can drop outliers only if it is a garbage value
Example. Height of adult = abc ft. This cannot be true as height
cannot be a string value. In this case, outliers can be removed
2. If the outliers have extreme values, they can be removed
For example, if all the data points are clustered between 0 to 10 but
one point lies at 100, then we can remove this point
Actual Values
PredictedValues
1
8 How can outlier values be treated?
If you cannot drop outliers, you can try the following:
1. Try a different model. Data detected as outliers by linear model can
be fit by non-linear model. So, be sure you are choosing the right
model
2. Try normalizing the data. This way the extreme data points are
pulled to a similar range
3. You can use algorithms which are less affected by outliers,
example random forest
Actual Values
PredictedValues
What are the datatypes supported in Tableau?1 How can you say that a time series data
is stationary?
19
1
9
How can you say that a time series data is stationary?
We can say that a time-series is stationary when the variance and mean of the series is
constant with time
Stationary Non-Stationary Stationary Non-Stationary
Here, mean is
constant with time
Here, mean is
increasing with time
Here, variance is
constant with time
Here, variance is
changing with time
What are the datatypes supported in Tableau?1 How can you calculate accuracy using
confusion matrix?
20
20 How can you calculate accuracy using confusion matrix?
Total=650 actual
p n
predicted
P 262 15
N 26 347
False Positive
True Negative
True Positive
False Negative
Accuracy = (True Positive + True Negative) / Total Observations
= (262+347) / 650
= 609 / 650
= 0.93
What are the datatypes supported in Tableau?1 Write the equation and calculate
precision and recall rate21
21 Write the equation and calculate precision and recall rate
Total=650 actual
p n
predicted
P 262 15
N 26 347
False Positive
True Negative
True Positive
False Negative
Precision = (True Positive) / (True Positive + False Positive)
Recall Rate = (True Positive ) / (Total Positive + False Negative)
Precision = 262/277 = 0.94
Recall = 262/288 = 0.90
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
if a drawer contains 12 red socks, 16 blue socks, and 20
white socks, how many must you pull out to be sure of
having a matching pair?
Can you
solve?
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
The answer is 4,
An example:
First pick is white
Second pick is red
Third pick blue, so no pairs yet
Fourth pick is 100% guaranteed to be a pair, because
it's either white, blue or red.
So, four picks guarantees a pair.
If it was four colors, the answer would be 5, and so
on.
What are the datatypes supported in Tableau?1 ‘People who bought this, also bought…’
recommendations seen on Amazon is a
result of which algorithm?
22
22
Collaborative Filtering exploits the behavior of other users and their
purchase history in terms of ratings, selection etc.
It makes predictions on what might interest a person based on the
preference of many other users!
In this algorithm, features of the items are not known
Recommendation
engine is done using
Collaborative Filtering
‘People who bought this, also bought…’ recommendations seen on Amazon is a result of
which algorithm?
22
‘People who bought this, also bought…’ recommendations seen on Amazon is a result of
which algorithm?
For example, suppose x number of people buy a new
phone and then also buys a tempered glass with it.
Next time, when a person buys a phone, he will be
recommended to buy a tempered glass along with it.
What are the datatypes supported in Tableau?1 Write a SQL query to list all orders with
customer information
23
23 Write a SQL query to list all orders with customer information
SELECT OrderNumber, TotalAmount, FirstName, LastName, City, Country
FROM Order
JOIN Customer
ON Order.CustomerId = Customer.Id
Orderid
CustomerId
OrderNumber
Total Amount
Id
FirstName
LastName
City
Country
Order Table Customer Table
What are the datatypes supported in Tableau?1
You are given a dataset on cancer detection. You’ve
build a classification model and achieved an accuracy
of 96%. Why shouldn’t you be happy with your model
performance? What can you do about it?
24
24
Cancer detection
results in
IMBALANCED
DATA
You are given a dataset on cancer detection. You’ve build a classification model and achieved an
accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about
it?
In an imbalanced dataset, accuracy should not be used as a measure of performance because it is
important to focus on the remaining 4%, which are the people who were wrongly diagnosed.
Wrong diagnosis is of major concern because there can be people who have cancer but were not predicted so.
24
Cancer detection
results in
IMBALANCED
DATA
In an imbalanced dataset, accuracy should not be used as a measure of performance because it is
important to focus on the remaining 4%, which are the people who were wrongly diagnosed.
Wrong diagnosis is of major concern because there can be people who have cancer but were not predicted so.
Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity
(True Negative Rate), F measure to determine class wise performance of the classifier
You are given a dataset on cancer detection. You’ve build a classification model and achieved an
accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about
it?
What are the datatypes supported in Tableau?1 Which of the following machine learning
algorithm can be used for imputing missing
values of both categorical and continuous
variables?
25
25 Which of the following machine learning algorithm can be used for imputing missing values
of both categorical and continuous variables?
K-means clustering
Linear regression
K-NN
Decision trees
25 Which of the following machine learning algorithm can be used for imputing missing values
of both categorical and continuous variables?
K-means clustering
Linear regression
K-NN
Decision trees
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
Given a box of matches and two ropes, not
necessarily identical, measure a period of 45
minutes
Can you
solve?
Note: The ropes are not uniform in nature and the rope takes
exactly 60 minutes to completely burn out
What do you understand from Measures and Dimensions?
Each field from the data source is automatically assigned a
datatype (such as string, integer) and a role (dimension or
measure)
Aggregation applied on measures is ‘Sum’ by default but you
can always change the default aggregation in the settings
We have two ropes A and B.
• Light A from both the ends and B from one end.
• When A is finished burning we know that 30 minutes have
elapsed and B has 30 minutes remaining.
• Now, light the other end of B also so that remaining part of
B will burn taking 15 minutes to burn.
• Thus, we have got 30+15 = 45 minutes.
What are the datatypes supported in Tableau?1
Below are the 8 actual values of target
variable in the train file.
[0,0,0,1,1,1,1,1]
What is the entropy of the target variable?
26
26 What is the entropy of the target variable?
-(5/8 log(5/8) + 3/8 log(3/8))
5/8 log(5/8) + 3/8 log(3/8)
3/8 log(5/8) + 5/8 log(3/8)
5/8 log(3/8) – 3/8 log(5/8)
[0,0,0,1,1,1,1,1]
26 What is the entropy of the target variable?
-(5/8 log(5/8) + 3/8 log(3/8))
5/8 log(5/8) + 3/8 log(3/8)
3/8 log(5/8) + 5/8 log(3/8)
5/8 log(3/8) – 3/8 log(5/8)
[0,0,0,1,1,1,1,1]
Hint:
What are the datatypes supported in Tableau?1
We want to predict the probability of death from heart
disease based on three
risk factors: age, gender, and blood cholesterol level.
What is the most appropriate algorithm for this use case?
27
27 Choose the right algorithm
Logistic regression
Linear regression
K-means clustering
Apriori algorithm
27 Choose the right algorithm
Logistic regression
Linear regression
K-means clustering
Apriori algorithm
What are the datatypes supported in Tableau?1
After studying the behavior of a population, you have
identified four specific individual types who are valuable to
your study. You would like to find all users who are most
similar to each individual type.
Which algorithm is most appropriate for this study?
28
28 Choose the right algorithm
K-means clustering
Linear regression
Association rules
Decision trees
28 Choose the right algorithm
K-means clustering
Linear regression
Association rules
Decision trees
What are the datatypes supported in Tableau?1
You have run the association rules algorithm on your
dataset, and the two rules
{banana, apple} => {grape} and
{apple, orange}=> {grape}
have been found to be relevant.
What else must be true?
29
29 Choose the right answer
{banana, apple, grape, orange} must be a frequent itemset
{banana, apple} => {orange} must be a relevant rule
{grape} => {banana, apple} must be a relevant rule
{grape, apple} must be a frequent itemset
29 Choose the right answer
{banana, apple, grape, orange} must be a frequent itemset
{banana, apple} => {orange} must be a relevant rule
{grape} => {banana, apple} must be a relevant rule
{grape, apple} must be a frequent itemset
What are the datatypes supported in Tableau?1
Your organization has a website where visitors randomly receive one
of two coupons. It is also possible that visitors to the website will not
receive a coupon.
You have been asked to determine if offering a coupon to visitors to
your website has any impact on their purchase decision. Which
analysis method should you use?
30
30 Choose the right analysis method
One-way ANOVA
K-means clustering
Association rules
Student T-test
30 Choose the right analysis method
One-way ANOVA
K-means clustering
Association rules
Student T-test
Data Science Interview Questions | Data Science Interview Questions And Answers | Simplilearn

More Related Content

PPTX
Machine Learning Interview Questions And Answers | Data Science Interview Que...
PDF
Data Science interview questions of Statistics
PDF
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
PPTX
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
PPTX
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
PPTX
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
PDF
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Machine Learning Interview Questions And Answers | Data Science Interview Que...
Data Science interview questions of Statistics
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...

What's hot (20)

PPTX
Exploratory data analysis
PPTX
Linear Discriminant Analysis (LDA)
PPTX
Exploratory data analysis
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Exploratory data analysis with Python
PPTX
Overfitting & Underfitting
PDF
Introduction to Machine Learning Classifiers
PDF
Modelling and evaluation
PPTX
Random forest
PDF
Exploratory data analysis
PDF
Intro to LLMs
PDF
Linear Regression vs Logistic Regression | Edureka
PPTX
Implement principal component analysis (PCA) in python from scratch
PPSX
ADABoost classifier
PPTX
3.5 Exploratory Data Analysis
PDF
PCA (Principal component analysis)
PDF
Application of Chebyshev and Markov Inequality in Machine Learning
PPTX
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
PPT
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Exploratory data analysis
Linear Discriminant Analysis (LDA)
Exploratory data analysis
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Exploratory data analysis with Python
Overfitting & Underfitting
Introduction to Machine Learning Classifiers
Modelling and evaluation
Random forest
Exploratory data analysis
Intro to LLMs
Linear Regression vs Logistic Regression | Edureka
Implement principal component analysis (PCA) in python from scratch
ADABoost classifier
3.5 Exploratory Data Analysis
PCA (Principal component analysis)
Application of Chebyshev and Markov Inequality in Machine Learning
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Ad

Similar to Data Science Interview Questions | Data Science Interview Questions And Answers | Simplilearn (20)

PPTX
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
PDF
Machine Learning.pdf
PDF
Machine learning Mind Map
PDF
Explore ML day 1
PDF
Foundations of Machine Learning - StampedeCon AI Summit 2017
PDF
Data Science Interview Questions PDF By ScholarHat
PPT
Data Mining
PPTX
DataScienceConcept_Kanchana_Weerasinghe.pptx
PDF
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
PDF
3 module 2
PPT
Data preperation
PPT
Data preparation
PPT
Data preparation
PPT
Data preparation
PPT
Data preparation
PPT
Data preperation
PPT
Data preperation
ODP
Advanced business mathematics and statistics for entrepreneurs
PPTX
Unit 2- Machine Learninnonjjnkbhkhjjljknkmg.pptx
PDF
Data Science Cheatsheet.pdf
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Machine Learning.pdf
Machine learning Mind Map
Explore ML day 1
Foundations of Machine Learning - StampedeCon AI Summit 2017
Data Science Interview Questions PDF By ScholarHat
Data Mining
DataScienceConcept_Kanchana_Weerasinghe.pptx
prvg4sczsginx3ynyqlc-signature-b84f0cf1da1e7d0fde4ecfab2a28f243cfa561f9aa2c9b...
3 module 2
Data preperation
Data preparation
Data preparation
Data preparation
Data preparation
Data preperation
Data preperation
Advanced business mathematics and statistics for entrepreneurs
Unit 2- Machine Learninnonjjnkbhkhjjljknkmg.pptx
Data Science Cheatsheet.pdf
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Classroom Observation Tools for Teachers
PPTX
GDM (1) (1).pptx small presentation for students
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
master seminar digital applications in india
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Presentation on HIE in infants and its manifestations
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
01-Introduction-to-Information-Management.pdf
Final Presentation General Medicine 03-08-2024.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
A systematic review of self-coping strategies used by university students to ...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Anesthesia in Laparoscopic Surgery in India
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Module 4: Burden of Disease Tutorial Slides S2 2025
Classroom Observation Tools for Teachers
GDM (1) (1).pptx small presentation for students
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
RMMM.pdf make it easy to upload and study
master seminar digital applications in india
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Presentation on HIE in infants and its manifestations

Data Science Interview Questions | Data Science Interview Questions And Answers | Simplilearn

  • 2. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings Can you solve? You have two buckets - one of 3 liters and other of 5 liters.You are expected to measure exactly 4 liters. How will you complete the task? Note:There is no third bucket
  • 3. Step 1: Fill in 5 liter bucket and empty it in the 3 liter bucket. You are left with 2 liter in the 5 liter bucket Step 2: Empty the 3 liter bucket and pour the contents of 5 liter bucket in it. So 3 liter bucket now has 2 liters Step 3: Fill the 5 liter bucket again and pour the water in 3 liter bucket (already has 2 liters of water from step 2) You now have 4 liters in the 5 liter bucket 53
  • 4. What are the datatypes supported in Tableau?1 List the differences between supervised and unsupervised learning01
  • 5. 1 List the differences between supervised and unsupervised learning Requires both an input and an output to be given to the model for it to be trained. • Uses known and labeled data as input • Uses unlabeled data as input • Most commonly used unsupervised learning algorithms are k-means clustering, hierarchical clustering, apriori algorithm • Most commonly used supervised learning algorithms are decision tree, logistic regression, support vector machine • Supervised learning has a feedback mechanism • Unsupervised learning has no feedback mechanism Supervised Learning Unsupervised Learning
  • 6. What are the datatypes supported in Tableau?1 How is logistic regression done?02
  • 7. 2 How is logistic regression done? Logistic Regression measures the relationship between the dependent variable (our label, what we want to predict) and the one or more independent variables (our features), by estimating probabilities using it’s underlying logistic function (sigmoid) X1 X2 X3 X4 0.5 0.8 0.9 0.1 0.9 0.1 0 or 1 Inputs Probabilities Values close to 0 and 1 Linear Model Sigmoid Function Threshold Classifier
  • 8. 2 0 100 1 0 Sigmoid Curve Sigmoid Function y = m*x + c p = 1 1 + ⅇ − y p ln ( 1-p ) = m*x + c No. of hours studied No. of hours studied Marks Pass How is logistic regression done?
  • 9. What are the datatypes supported in Tableau?1 Explain the steps in making a decision tree 03
  • 10. 3 Explain the steps in making a decision tree Take the entire dataset as input Calculate entropy of target variable as well as predictor attributes Calculate information gain of all attributes Choose the attribute with highest information gain as the root node Repeat the same process on every branch till the decision node of each branch is finalized
  • 11. 3 Explain the steps in making a decision tree NoYes Yes Salary > $50,000 No Commute > 1 hour YesNo Decline Offer Play Decline OfferOffers Incentives Decline OfferAccept Offer Tip: You should know the formulae for entropy and information gain! For example, if you want to build a decision tree to decide whether we should accept or decline a job offer
  • 12. What are the datatypes supported in Tableau?1 How do you build a random forest model? 04
  • 13. 4 How do you build a random forest model? Randomly select “k” features from total “m” features Where k << m Among the “k” features, calculate the node “d” using the best split point Split the node into daughter nodes using the best split Repeat steps 2 and 3 steps until leaf nodes are finalized Build forest by repeating steps 1 to 4 for “n” number times to create “n” number of trees
  • 14. What are the datatypes supported in Tableau?1 How can you avoid overfitting of your model? 05
  • 15. 5 How can you avoid overfitting of your model? There are three main methods to avoid overfitting: Keep the model simple: take into account fewer variables, thereby removing some of the noise in the training data Use cross-validation techniques such as k-folds cross-validation Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting
  • 16. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings There are 9 balls out of which one ball is heavy in weight and rest are of the same weight. In how many minimum Weightings will you find the heavier ball? Can you solve?
  • 17. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings You will need to perform 2 weightings: Step 1: Place three balls on each side Scenario(a): Balance out Out of the remaining three balls from step 1, take two balls and place one ball on each side – if they balance out then the left out ball will be the heavier ball. Otherwise, you will see it in the balance. Scenario(b): Not balanced out If the balls in step 1 do not balance out, then take those three balls and reproduce step 2 to find out the heavier ball.
  • 18. What are the datatypes supported in Tableau?1 Differentiate between univariate, bivariate and multivariate analysis 06
  • 19. 6 Differentiate between univariate, bivariate and multivariate analysis This type of data contains only one variable. The purpose of univariate analysis is to describe the data and find patterns that exist within it Example: height of students The patterns can be studied by drawing conclusions using mean, median and mode, dispersion or range, minimum, maximum etc Height (in cm) 164 167.3 170 174.2 178 180
  • 20. 6 Differentiate between univariate, bivariate and multivariate analysis This type of data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables Example: temperature and ice cream sales in summer season Here, the relationship is visible from the table that temperature and sales are directly proportional to each other Temperature (in Celsius) Sales 20 2000 25 2100 26 2300 28 2400 30 2600 35 3100
  • 21. 6 Differentiate between univariate, bivariate and multivariate analysis When the data involves three or more variables, it is categorized under multivariate. It is similar to bivariate but contains more than one dependent variable Example: data for house price prediction The patterns can be studied by drawing conclusions using mean, median and mode, dispersion or range, minimum, maximum etc No. of rooms Floor Sqft. Area Price 2 0 900 40,00,00 3 2 1100 60,00,000 3.5 5 1500 90,00,000 4 3 2100 1,20,00,000
  • 22. What are the datatypes supported in Tableau?107 What are the feature selection methods to select the right variables?
  • 23. 7 What are the feature selection methods to select the right variables? Following are the methods of variable selection you can use: There are two main methods for feature selection: Filter Methods Wrapper Methods • Linear Discriminant Analysis • ANOVA • Chi-Sqaure • Forward Selection • Backward Selection • Recursive Feature Elimination
  • 24. What are the datatypes supported in Tableau?1 In your choice of language: Write a program that prints the numbers from 1 to 50. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz” 08
  • 27. What are the datatypes supported in Tableau?1 You are given a dataset consisting of variables having more than 30% missing values? How will you deal with them? 09
  • 28. 9 You are given a dataset consisting of variables having more than 30% missing values? How will you deal with them? Ways to handle missing data values: If dataset is huge, we can simply remove the rows with missing data values. It is the quickest way i.e. we use the rest of the data to predict the values We can substitute missing values with mean of rest of the data using pandas dataframe in python i.e. df.mean() df.fillna(mean)
  • 29. What are the datatypes supported in Tableau?1 For the given points, how will you calculate the Eucledian Distance, in Python? 10
  • 30. 1 0 For the given points, how will you calculate the Eucledian Distance, in Python? Given points: plot1 = [1,3] plot2 = [2,5] euclidean_distance = sqrt( (plot1[0]-plot2[0])**2 + (plot1[1]-plot2[1])**2 )
  • 31. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings What is the angle between the hour and minute hands of a clock when the time is half past six? Can you solve?
  • 32. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings • The minute hand has travelled for 30 minutes. So, it has covered 30×6=180° • The hour hand has travelled for 6.5 hours. So, it has covered 6.5×30=195° • The difference between the two will give the angle between the two hands. Thus, the required angle=195°-180°=15° Note: A clock is a complete circle having 360 degrees In 1 hour, the hour hand covers: 360/12 = 30° In 1 minute, the minute hand covers 360/60 = 6°
  • 33. What are the datatypes supported in Tableau?1 Explain dimensionality reduction, and list its benefits? 11
  • 34. 1 1 Explain dimensionality reduction, and list its benefits? Dimension Reduction refers to the process of converting a set of data having vast dimensions into data with lesser dimensions (fields) to convey similar information concisely It helps in data compressing and reducing the storage space It reduces computation time as less dimensions lead to less computing It removes redundant features For example: there is no point in storing a value in two different units (meters and inches)
  • 35. What are the datatypes supported in Tableau?1 How will you calculate eigen values and eigen vectors of a 3 by 3 matrix?12
  • 36. 1 2 How will you calculate eigen values and eigen vectors of a 3 by 3 matrix? -2 -4 2 -2 1 2 4 2 5 Characteristic equation: Expanding determinant: (-2 – λ) [(1-λ) (5-λ)-2x2] + 4[(-2) x (5-λ) -4x2] + 2[(-2) x 2-4(1-λ)] =0 - λ3 + 4 λ2 + 27λ – 90 = 0, λ 3 - 4 λ2 -27 λ + 90 = 0
  • 37. 1 2 How will you calculate eigen values and eigen vectors of a 3 by 3 matrix? By hit and trial: Hence (λ-3) is a factor So, eigen values are 3, -5, 6 Calculate eigenvector for λ=3 For X = 1, 33 – 4 x 32 - 27 x 3 +90 = 0 λ 3 - 4 λ2 - 27 λ +90 = (λ – 3) (λ2 – λ – 30) (λ – 3) (λ2 – λ – 30) = (λ – 3) (λ+5) (λ-6), -5 -4Y +2Z =0, -2 -2Y +2Z =0
  • 38. 1 2 How will you calculate eigen values and eigen vectors of a 3 by 3 matrix? Subracting the two equation: Subracting back into second equation: Similarly, we can calculate the eigen vectors for -5 and 6 Z = - 1 2 . 3 + 2Y = 0, Y = - 3 2 .
  • 39. What are the datatypes supported in Tableau?1 How should you maintain your deployed model? 13
  • 40. 1 3 How should you maintain your deployed model? CompareEvaluateMonitor Rebuild Constant monitoring of all of the models is needed to determine the performance accuracy of the models Evaluation metrics of the current model is calculated to determine if new algorithm is needed The new models are compared against each other to determine which model performs the best The best performing model is re-built on current state of data
  • 41. What are the datatypes supported in Tableau?1 What are recommender systems?14
  • 42. 1 4 What are recommender systems? A recommender system predicts the "rating" or "preference“, a user would give to a product Collaborative Filtering Content-based Filtering Example: Last.fm recommends tracks that are often played by other users with similar interests Example: Pandora uses the properties of a song to recommend music with similar properties
  • 43. What are the datatypes supported in Tableau?1 How to find RMSE and MSE in linear regression model? 15
  • 44. 1 5 How to find RMSE and MSE in linear regression model? RMSE and MSE are the two of the most common measures of accuracy for a linear regression RMSE indicates the Root Mean Square Error MSE indicates the Mean Square Error
  • 45. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings If it rains on Saturday with probability 0.6, and it rains on Sunday with probability 0.2 , what is the probability that it rains this weekend? Can you solve?
  • 46. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings Total probability – (Probability that it will not rain on Saturday) (Probability that it will not rain on Sunday) 1−(1−0.6)(1−0.2)=0.68 Can you solve? U
  • 47. What are the datatypes supported in Tableau?1 How can you select k for k-means?16
  • 48. 1 6 How can you select k for k-means? We use “Elbow Method” to select k for k-means • The idea of the elbow method is to run k-means clustering on the dataset where ‘k’ is the number of clusters • Within sum of squares (WSS) is defined as the sum of the squared distance between each member of the cluster and its centroid WSS No . of. clusters Elbow Point
  • 49. What are the datatypes supported in Tableau?1 What is the significance of p-value?17
  • 50. 1 7 What is the significance of p-value? p-value typically ≤ 0.05 p-value typically > 0.05 p-value Cutoff 0.05 Indicates strong evidence against the null hypothesis, so you reject the null hypothesis Indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis Considered to be marginal (could go either way)
  • 51. What are the datatypes supported in Tableau?1 How can outlier values be treated?18
  • 52. 1 8 How can outlier values be treated? 1. You can drop outliers only if it is a garbage value Example. Height of adult = abc ft. This cannot be true as height cannot be a string value. In this case, outliers can be removed 2. If the outliers have extreme values, they can be removed For example, if all the data points are clustered between 0 to 10 but one point lies at 100, then we can remove this point Actual Values PredictedValues
  • 53. 1 8 How can outlier values be treated? If you cannot drop outliers, you can try the following: 1. Try a different model. Data detected as outliers by linear model can be fit by non-linear model. So, be sure you are choosing the right model 2. Try normalizing the data. This way the extreme data points are pulled to a similar range 3. You can use algorithms which are less affected by outliers, example random forest Actual Values PredictedValues
  • 54. What are the datatypes supported in Tableau?1 How can you say that a time series data is stationary? 19
  • 55. 1 9 How can you say that a time series data is stationary? We can say that a time-series is stationary when the variance and mean of the series is constant with time Stationary Non-Stationary Stationary Non-Stationary Here, mean is constant with time Here, mean is increasing with time Here, variance is constant with time Here, variance is changing with time
  • 56. What are the datatypes supported in Tableau?1 How can you calculate accuracy using confusion matrix? 20
  • 57. 20 How can you calculate accuracy using confusion matrix? Total=650 actual p n predicted P 262 15 N 26 347 False Positive True Negative True Positive False Negative Accuracy = (True Positive + True Negative) / Total Observations = (262+347) / 650 = 609 / 650 = 0.93
  • 58. What are the datatypes supported in Tableau?1 Write the equation and calculate precision and recall rate21
  • 59. 21 Write the equation and calculate precision and recall rate Total=650 actual p n predicted P 262 15 N 26 347 False Positive True Negative True Positive False Negative Precision = (True Positive) / (True Positive + False Positive) Recall Rate = (True Positive ) / (Total Positive + False Negative) Precision = 262/277 = 0.94 Recall = 262/288 = 0.90
  • 60. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings if a drawer contains 12 red socks, 16 blue socks, and 20 white socks, how many must you pull out to be sure of having a matching pair? Can you solve?
  • 61. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings The answer is 4, An example: First pick is white Second pick is red Third pick blue, so no pairs yet Fourth pick is 100% guaranteed to be a pair, because it's either white, blue or red. So, four picks guarantees a pair. If it was four colors, the answer would be 5, and so on.
  • 62. What are the datatypes supported in Tableau?1 ‘People who bought this, also bought…’ recommendations seen on Amazon is a result of which algorithm? 22
  • 63. 22 Collaborative Filtering exploits the behavior of other users and their purchase history in terms of ratings, selection etc. It makes predictions on what might interest a person based on the preference of many other users! In this algorithm, features of the items are not known Recommendation engine is done using Collaborative Filtering ‘People who bought this, also bought…’ recommendations seen on Amazon is a result of which algorithm?
  • 64. 22 ‘People who bought this, also bought…’ recommendations seen on Amazon is a result of which algorithm? For example, suppose x number of people buy a new phone and then also buys a tempered glass with it. Next time, when a person buys a phone, he will be recommended to buy a tempered glass along with it.
  • 65. What are the datatypes supported in Tableau?1 Write a SQL query to list all orders with customer information 23
  • 66. 23 Write a SQL query to list all orders with customer information SELECT OrderNumber, TotalAmount, FirstName, LastName, City, Country FROM Order JOIN Customer ON Order.CustomerId = Customer.Id Orderid CustomerId OrderNumber Total Amount Id FirstName LastName City Country Order Table Customer Table
  • 67. What are the datatypes supported in Tableau?1 You are given a dataset on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it? 24
  • 68. 24 Cancer detection results in IMBALANCED DATA You are given a dataset on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it? In an imbalanced dataset, accuracy should not be used as a measure of performance because it is important to focus on the remaining 4%, which are the people who were wrongly diagnosed. Wrong diagnosis is of major concern because there can be people who have cancer but were not predicted so.
  • 69. 24 Cancer detection results in IMBALANCED DATA In an imbalanced dataset, accuracy should not be used as a measure of performance because it is important to focus on the remaining 4%, which are the people who were wrongly diagnosed. Wrong diagnosis is of major concern because there can be people who have cancer but were not predicted so. Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine class wise performance of the classifier You are given a dataset on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it?
  • 70. What are the datatypes supported in Tableau?1 Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? 25
  • 71. 25 Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? K-means clustering Linear regression K-NN Decision trees
  • 72. 25 Which of the following machine learning algorithm can be used for imputing missing values of both categorical and continuous variables? K-means clustering Linear regression K-NN Decision trees
  • 73. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings Given a box of matches and two ropes, not necessarily identical, measure a period of 45 minutes Can you solve? Note: The ropes are not uniform in nature and the rope takes exactly 60 minutes to completely burn out
  • 74. What do you understand from Measures and Dimensions? Each field from the data source is automatically assigned a datatype (such as string, integer) and a role (dimension or measure) Aggregation applied on measures is ‘Sum’ by default but you can always change the default aggregation in the settings We have two ropes A and B. • Light A from both the ends and B from one end. • When A is finished burning we know that 30 minutes have elapsed and B has 30 minutes remaining. • Now, light the other end of B also so that remaining part of B will burn taking 15 minutes to burn. • Thus, we have got 30+15 = 45 minutes.
  • 75. What are the datatypes supported in Tableau?1 Below are the 8 actual values of target variable in the train file. [0,0,0,1,1,1,1,1] What is the entropy of the target variable? 26
  • 76. 26 What is the entropy of the target variable? -(5/8 log(5/8) + 3/8 log(3/8)) 5/8 log(5/8) + 3/8 log(3/8) 3/8 log(5/8) + 5/8 log(3/8) 5/8 log(3/8) – 3/8 log(5/8) [0,0,0,1,1,1,1,1]
  • 77. 26 What is the entropy of the target variable? -(5/8 log(5/8) + 3/8 log(3/8)) 5/8 log(5/8) + 3/8 log(3/8) 3/8 log(5/8) + 5/8 log(3/8) 5/8 log(3/8) – 3/8 log(5/8) [0,0,0,1,1,1,1,1] Hint:
  • 78. What are the datatypes supported in Tableau?1 We want to predict the probability of death from heart disease based on three risk factors: age, gender, and blood cholesterol level. What is the most appropriate algorithm for this use case? 27
  • 79. 27 Choose the right algorithm Logistic regression Linear regression K-means clustering Apriori algorithm
  • 80. 27 Choose the right algorithm Logistic regression Linear regression K-means clustering Apriori algorithm
  • 81. What are the datatypes supported in Tableau?1 After studying the behavior of a population, you have identified four specific individual types who are valuable to your study. You would like to find all users who are most similar to each individual type. Which algorithm is most appropriate for this study? 28
  • 82. 28 Choose the right algorithm K-means clustering Linear regression Association rules Decision trees
  • 83. 28 Choose the right algorithm K-means clustering Linear regression Association rules Decision trees
  • 84. What are the datatypes supported in Tableau?1 You have run the association rules algorithm on your dataset, and the two rules {banana, apple} => {grape} and {apple, orange}=> {grape} have been found to be relevant. What else must be true? 29
  • 85. 29 Choose the right answer {banana, apple, grape, orange} must be a frequent itemset {banana, apple} => {orange} must be a relevant rule {grape} => {banana, apple} must be a relevant rule {grape, apple} must be a frequent itemset
  • 86. 29 Choose the right answer {banana, apple, grape, orange} must be a frequent itemset {banana, apple} => {orange} must be a relevant rule {grape} => {banana, apple} must be a relevant rule {grape, apple} must be a frequent itemset
  • 87. What are the datatypes supported in Tableau?1 Your organization has a website where visitors randomly receive one of two coupons. It is also possible that visitors to the website will not receive a coupon. You have been asked to determine if offering a coupon to visitors to your website has any impact on their purchase decision. Which analysis method should you use? 30
  • 88. 30 Choose the right analysis method One-way ANOVA K-means clustering Association rules Student T-test
  • 89. 30 Choose the right analysis method One-way ANOVA K-means clustering Association rules Student T-test

Editor's Notes

  • #2: Style - 01
  • #3: Style - 01
  • #4: Note: We have to measure 4 liters in the 5 liter bucket only, also please mention that there are no measurements given on the bucket.
  • #5: Style - 01
  • #7: Style - 01
  • #8: Note: Please mention the significance of sigmoid function and threshold classifier. As we are moving from linear to logistic, we can also talk about the difference between linear and logistic regression in a line or two. These probabilities must then be transformed into binary values in order to actually make a prediction. This is the task of the logistic function, also called the sigmoid function. The Sigmoid-Function is an S-shaped curve that can take any real-valued number and map it into a value between the range of 0 and 1, but never exactly at those limits. This values between 0 and 1 will then be transformed into either 0 or 1 using a threshold classifier.
  • #9: Note: This is an example, which will determine whether a student will pass or fail, the factor which will help us determine is the “no of hours studied”. So, no of hours studied by a student is directly proportional to whether he will pass or fail.
  • #10: Style - 01
  • #12: Example is “to build a decision tree to determine whether you should accept a job offer or not”
  • #13: Style - 01
  • #16: Explain overfitting
  • #17: Style - 01
  • #18: Style - 01
  • #19: Style - 01
  • #20: Style - 01
  • #21: Style - 01
  • #22: Style - 01
  • #23: Style - 01
  • #24: LDA: Linear discriminant analysis is used to find a linear combination of features that characterizes or separates two or more classes (or levels) of a categorical variable. ANOVA: ANOVA stands for Analysis of variance. It is similar to LDA except for the fact that it is operated using one or more categorical independent features and one continuous dependent feature. It provides a statistical test of whether the means of several groups are equal or not. Chi-Square: It is a is a statistical test applied to the groups of categorical features to evaluate the likelihood of correlation or association between them using their frequency distribution. Forward Selection: Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model. Backward Elimination: In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features. Recursive Feature elimination: It is a greedy optimization algorithm which aims to find the best performing feature subset. It repeatedly creates models and keeps aside the best or the worst performing feature at each iteration. It constructs the next model with the left features until all the features are exhausted. It then ranks the features based on the order of their elimination.
  • #25: Style - 01
  • #28: Style - 01
  • #30: Style - 01
  • #32: Style - 01
  • #33: Please explain the concept of angles in the clock so that the viewer can answer similar questions in the interview
  • #34: Style - 01
  • #35: Style - 01
  • #36: Style - 01
  • #40: Style - 01
  • #42: Style - 01
  • #43: Style - 01
  • #44: Style - 01
  • #45: Please explain these terms and why are they significant for measuring accuracy
  • #46: Style - 01
  • #47: Style - 01
  • #48: Style - 01
  • #50: Style - 01
  • #51: Please explain the concept of null hypothesis and alternative hypothesis
  • #52: Style - 01
  • #53: Style - 01
  • #54: Style - 01
  • #55: Style - 01
  • #57: Style - 01
  • #59: Style - 01
  • #60: Please explain the significance of precision and recall rate
  • #61: Style - 01
  • #62: Style - 01
  • #63: Style - 01
  • #64: In this, we can explain briefly about how recommender systems work
  • #66: Style - 01
  • #68: Style - 01
  • #71: Style - 01
  • #74: Style - 01
  • #75: Style - 01
  • #76: Note: We can talk about entropy and how it affects the decision tree
  • #77: Style - 01
  • #78: Style - 01
  • #79: Style - 01
  • #80: Style - 01
  • #81: Style - 01
  • #82: Style - 01
  • #83: Style - 01
  • #84: Style - 01
  • #85: Style - 01
  • #88: Style - 01
  • #89: Style - 01
  • #90: Style - 01