Data Frame
Load data frame
Load data
We will use one of the example data sets from R, mtcars, for these examples. First, switch into the Lisp-Stat package:
Now load the data:
Examine data
Lisp-Stat’s printing system is integrated with the Common Lisp Pretty
Printing
facility. To control aspects of printing, you can use the built in
lisp pretty printing configuration system. By default Lisp-Stat sets
*print-pretty*
to nil
.
Basic information
Type the name of the data frame at the REPL to get a simple one-line summary.
Printing data
By default, the head
function will print the first 6 rows:
and tail
the last 6 rows:
print-data
can be used to print the whole data frame:
The two dots “..” at the end indicate that output has been truncated.
Lisp-Stat sets the default for pretty printer *print-lines*
to 25
rows and output more than this is truncated. If you’d like to print
all rows, set this value to nil
, (setf *print-lines* nil)
Notice the column named X1
. This is the name given to the column by
the data reading function. Note the warning that was issued during the
import. Missing columns are named X1, X2, …, Xn in increasing order
for the duration of the Lisp-Stat session.
This column is actually the row name, so we’ll rename it:
The keys of a data frame are symbols, so you need to quote them to prevent the reader from trying to evaluate them to a value.
Note that your row may be named something other than X1
, depending
on whether or not you have loaded any other data frames with variable
name replacement. Also note: the !
at the end of the function
name. This is a convention indicating a destructive operation; a copy
will not be returned, it’s the actual data that will be modified.
Now let’s view the results:
Column names
To see the names of the columns, use the column-names
function:
Remember we mentioned that the keys (column names) are symbols? Compare the above to the keys of the data frame:
These symbols are printed without double quotes. If a function takes
a key, it must be quoted, e.g. 'mpg
and not mpg
or "mpg"
Dimensions
We saw the dimensions above in basic information. That was a printed
for human consumption. To get the values in a form suitable for
passing to other functions, use the dims
command:
Common Lisp specifies dimensions in row-column order, so mtcars
has
32 rows and 12 columns.
Note
Lisp-Stat generally follows the tidyverse philosophy when it comes to row names. By definition, row names are unique, so there is no point including them in a statistical analysis. Nevertheless, many data sets include row names, so we include some special handling for columns with all distinct values; they are excluded by default from summaries (and you can include it if you wish). There is no concept of independent row names as with a R data frame. A Lisp-Stat data frame is more like a tibble.Basic Statistics
Minimum & Maximum
To get the minimum or maximum of a column, say mpg
, you can use several
Common Lisp methods. Let’s see what mpg
looks like by typing
the name of the column into the REPL:
You could, for example, use something like this to find the minimum:
or the Lisp-Stat function seq-max
to find the maximum
or perhaps you’d prefer alexandria:extremum, a general-purpose tool to find the minimum in a different way:
The important thing to note is that mtcars:mpg
is a standard Common
Lisp vector and you can manipulate it like one.
Mean & standard deviation
Summarise
You can summarise a column with the summarize-column
function:
or the entire data frame:
Recall that the column named model
is treated specially, notice
that it is not included in the summary. You can see why it’s excluded
by examining the column’s summary:
Columns with unique values in each row aren’t very interesting.
Saving data
To save a data frame to a CSV file, use the write-csv
method. Here we save mtcars
into the Lisp-Stat datasets directory,
including the column names:
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.