5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
MATLAB for Data Processing and Visualization
1. Getting Started
Summary: Getting Started with the Data
hurrs = readtable("hurricaneData1990s.txt",...
The readtable function creates a table in MATLAB from a data
"NumHeaderLines",5,"CommentStyle","##");
file.
Additional inputs can help import irregularly formatted files.
scatter(hurrs.Windspeed,hurrs.Pressure)
You can use dot notation to access variables in a table. The
scatter function creates a scatter plot of two vectors.
hurrs.Country = categorical(hurrs.Country);
The categorical function creates a categorical array from data.
t = hurrs.Timestamp;
By default, the readtable function may import certain variables in
the table as datetime .
In this example, hurrs.Timestamp is a datetime .
h = hour(t);
histogram(h)
The hour function returns the hour numbers of the input datetime
values.
2. Preprocessing Data
Review - Preprocessing Data
Missing Data
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 1/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
data = readtable("myfile")
data =
4×2 table
Var1 Var2
When you import data into MATLAB, missing numerical values are
____ ____
replaced with NaN, which stands for Not a Number.
7 0.81
1 NaN
9 0.13
10 0.91
v = mean(data.Var2)
When you calculate statistics on arrays that contain NaNs, the v =
result in another NaN.
NaN
v = mean(data.Var2,"omitnan")
v =
To ignore NaNs in the calculation, use the "omitnan" flag.
0.6167
cleaned = rmmissing(data)
cleaned =
3×2 table
Var1 Var2
You can delete rows containing missing data with rmmissing .
____ ____
7 0.81
9 0.13
10 0.91
Categories and Sets
x = categorical(["medium" "large" "large" "red" "small" "red"]);
Categorical arrays use less
less memory and work with
many plotting functions.
c = categories(x)
c =
4×1 cell array
Use the categories function
to get a list of unique
{'large' }
categories.
{'medium'}
{'red' }
{'small' }
x = mergecats(x,["small" "medium" "large"],"size")
x =
Merge different categories with
1×6 categorical array
the mergecats function.
size size size red size red
x = renamecats(x,"red","color")
x =
Rename categories with the
1×6 categorical array
renamecats function.
size size size color size color
Discretizing Continuous Data
Ranges in continuous data can represent categories. Categorize continues data into discrete bins with the discretize
function.
>> y = discretize(X,edges,"Categorical",cats)
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 2/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
Outputs Inputs
y If the "Categorical" option is set, y is a X Array of continuous
categorical array. Otherwise, y is numeric data. X is usually
bin values. numeric or datetime.
edges Consecutive elements
in edges form discrete
bins. There will be one
fewer bins than the
number of edges
specified.
You can use inf in
edges to create a bin
with no edge.
"Categorical",cats Optional input for the
name of each bin
category.
3. Graphics Formatting Functions
Review - Graphics Formatting Functions
plot(x,y)
As a reminder, here is a plot with default
properties. There are no markers and
the line color is blue.
plot(x,y,"o-","MarkerSize",8,"MarkerFaceColor","r")
When you create a graphic with the
plot function, you can set various
options to modify the appearance.
For example, options are set in this plot
to
Use circle markers with a solid line
Increase marker size
Color markers red, but leave the
default line color
Many of these line properties can be set
with various graphics like scatter ,
fplot , and more.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 3/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
grid("on")
grid("minor")
axis("square")
You can also customize the appearance xlim([0 8])
of existing plots. Here are a few common
graphics customization functions:
Function Controls
hold Plot replacement
behavior of axes
xlim/ylim/zlim Limits of the
appropriate axis
grid Axes grid lines
axis Axis limits,
shape, and
appearance
4. Review Project 1
5. Importing Data from Multiple Files
Review - Importing Data from Multiple Files
A datastore is a reference to a file or set of files. The datastore function informs where to find the files.
Code Description
ds = datastore(filename) Reference a single file
ds = datastore(directory) Reference a folder of files
data = read(ds) Read data incrementally
data = readall(ds) Read all data referenced in datastore
If your data isn't formatted the way datastore expects, you can set the datastore properties. Examples of common properties
are shown below. You can find all the properties in the the documentation.
>> ds = datastore(filename,"Delimiter","-","TextscanFormats","%D%C%f","SelectedVariableNames",var)
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 4/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
Outputs Inputs
ds Reference to a collection of data. filename File location.
"Delimiter","-" Delimiter is
one or more
characters
that separate
data values
in the file.
"TextscanFormats","%D%C%f" Import
variables
using the
output class
in the format
specification
string.
"SelectedVariableNames",var Import only
the variables
listed in var .
Merging Data
Once you read in multiple tables, you may want to join them together. You can join two tables in many ways. The various join
functions are listed in the table below.
Function Example
join
Key1 in Tright
must have unique
values and contain
every key in
Tleft .
innerjoin
outerjoin
Two key variables
are created.
outerjoin with
"MergeKeys" on
6. Analyzing Groups within Data
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 5/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
Review - Analyzing Groups within Data
petdata = readtable("petdata.txt","Format","%C%C%f")
The table petdata has two categorical
5×3 table
variables, Species and Color .
Species Color Weight
Using these two variables, there are five
_______ ______ ______
potential groups:
Orange cat
cat orange 12
Orange fish
fish orange 0.68
Black cat
cat black 14
Black fish
cat white 8
White cat
fish black 0.54
[grpS,speciesVals] = findgroups(petdata.Species)
grpS =
1
The findgroups function will return a group
2
number for each element in an array.
1
1
The second output is the name associated
2
with each group number. Here, the value 1
speciesVals =
means cat .
2×1 categorical array
cat
fish
splitapply(@mean,Weight,grpS)
The splitapply function will peform a
calculation on each inputted group. ans =
11.3333
You can interpret this code as "What is the 0.6100
average weight of each species?"
[grpC,colorVals] = findgroups(petdata.Color)
splitapply(@min,Weight,grpC)
grpC =
2
2
findgroups and splitapply are commonly
1
used together. This code answers "What is the
3
minimum weight of each color?"
1
colorVals =
Notice that grpC has values 1, 2, and 3
3×1 categorical array
because there are three different colors in the
black
data. colorVals contains the meaning for
orange
each value.
white
ans =
0.5400
0.6800
8.0000
maxWeight = accumarray([grpS grpC],Weight,[],@max)
accumarray calculates a value for all five
potential groups. maxWeight =
14.0000 12.0000 8.0000
The first input is an array containing both 0.5400 0.6800 0
group numbers. The first vector ( grpS )
corresponds to the output rows, and the
second vector ( grpC ) corresponds to the
output colulmns.
Notice that the element in the second column,
third row (white fish) is 0 because there's no
data in that group.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 6/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
bar(maxWeight)
xticklabels(speciesVals)
ylabel("Weight")
legend(colorVals)
The output of accumarray can be difficult to
interpret on its own, but the format is
convenient for visualizations or further
processing.
For example, the output can be passed
directly to the bar function.
7. Customizing Graphics Objects
Review - Customizing Graphics Objects
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 7/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
All graphics objects are part of a hierarchy. Most graphics objects consist of a figure window,
containing one or more axes, which contain any number of plot objects.
You can use the graphics object hierarchy to modify specific graphics objects after a plot is created.
If you stored a handle to Figure , you could use the Children properties to modify the Bar plot.
8. Review Project 2
9. Images and 3-D Surface Plots
Review - Images and 3-D Surface Plots
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 8/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
data = readtable("my3Ddata")
plot3(data.x,data.y,data.z,'.')
x y z
_________ ________ ___________
2.2506 -0.30105 0.012974
-1.3443 -0.79976 -0.11638
0.53421 -0.92891 0.16945
-0.070088 -0.67461 -0.044245
... ... ...
Images or 3-D plots generally begin with
x, y, and z data. In many cases, the x
and y data are not evenly spaced on a
grid.
xvec = -2:.2:2;
To interpolate the data onto a grid, start
yvec = -2:.05:2;
by defining the grid points. Here, yvec is
denser than xvec .
[xgrid,ygrid] = meshgrid(xvec,yvec);
The meshgrid function will convert your
vectors into the grid expected by surf
and pcolor .
zgrid = griddata(data.x,data.y,data.z,xgrid,ygrid);
Then use the griddata function to
interpolate your data onto the grid.
Consistent naming of your variables
from previous steps will the griddata
syntax easier.
surf(xgrid,ygrid,zgrid);
Once your x, y, and z data is gridded,
you can visualize it in a variety of ways.
surf creates a surface plot.
Notice the difference between the x and
y axes. This is because xvec and yvec
had a different number of grid points.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 9/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
im = pcolor(xgrid,ygrid,zgrid);
im.EdgeAlpha = 0;
You can also visualize your 3-D data as
an pseudocolor image.
imagesc(xvec,yvec,zgrid);
This scaled image contains the same
data, but the first two inputs are the
vectors of grid points instead of the
output from meshgrid .
If you inspect the right yellow shape, you
can see that the imagesc plot is
vertically flipped from the pcolor plot.
10. Importing Unstructured Data
Review - Importing Unstructured Data
To import data from files where the formatting changes and must be inferred from the data itself, you can use functions that
allow you to interact directly with files.
fid = fopen("myfile");
Open the file and store the
file identifier. You'll use fid
with the other low-level
import functions.
firstLine = fgetl(fid)
You can import files line-by-
line using fgetl . firstLine =
There is a file position '09/12/2005 Level1 12.34 45 1.23e10 inf'
indicator that keeps track of secondLine = fgetl(fid)
where you're located in the
secondLine =
file, so calling fgetl twice
will return the first two lines.
'10/12/2005 Level2 23.54 60 9e19 -inf 0.001'
frewind(fid)
To return back to the
beginning of the file, you can
rewind the file position
indicator.
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 10/11
5/23/2021 MATLAB for Data Processing and Visualization - Quick Reference
formatSpec = "%{MM/dd/uuuu}D %s %f32 %d8 %u %f";
If you know the format of the myData = textscan(fid, formatSpec)
data, you can pass a format myData =
specification string to 1×9 cell array
textscan .
{3×1 datetime} {3×1 cell} {3×1 single} {3×1 int8} {3×1 uint32} {3×1 double}
fclose(fid);
When you're finished
importing, make sure you
close the file connection.
11. Conclusion
https://matlabacademy.mathworks.com/artifacts/quick-reference.html?course=mlvi&release=R2021a&language=en 11/11