Quickstart Guide

This guide shows how to quickly get started with Kotlin DataFrame:
you'll learn how to load data, perform basic transformations, and build a simple plot using Kandy.

We recommend starting with Kotlin Notebook for the best beginner experience — everything works out of the box, including interactivity and rich DataFrame and plots rendering.
You can instantly see the results of each operation: view the contents of your DataFrames after every transformation, inspect individual rows and columns, and explore data step-by-step in a live and interactive way.

You can view this guide as a notebook on GitHub or download quickstart.ipynb.

To start working with Kotlin DataFrame in a notebook, run the cell with the next code:

%useLatestDescriptors
%use dataframe

This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame rendering. Learn more here.

Read DataFrame

Kotlin DataFrame supports all popular data formats, including CSV, JSON, and Excel, as well as reading from various databases. Read a CSV with the "Jetbrains Repositories" dataset into df variable:

val df = DataFrame.readCsv(
    "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
)

Display And Explore

To display your dataframe as a cell output, place it in the last line of the cell:

Kotlin Notebook has special interactive outputs for DataFrame. Learn more about them here.

Use .describe() method to get dataset summaries — column types, number of nulls, and simple statistics.

df.describe()

Select Columns

Kotlin DataFrame features a typesafe Columns Selection DSL, enabling flexible and safe selection of any combination of columns. Column selectors are widely used across operations — one of the simplest examples is .select { }, which returns a new DataFrame with only the columns chosen in Columns Selection expression.

After executing the cell where a DataFrame variable is declared, extension properties for its columns are automatically generated. These properties can then be used in the Columns Selection DSL expression for typesafe and convenient column access.

Select some columns:

// Select "full_name", "stargazers_count" and "topics" columns
val dfSelected = df.select { full_name and stargazers_count and topics }
dfSelected

Row Filtering

Some operations use the DataRow API, with expressions and conditions that apply for all DataFrame rows. For example, .filter { } that returns a new DataFrame with rows that satisfy a condition given by row expression.

Inside a row expression, you can access the values of the current row by column names through auto-generated properties. Similar to the Columns Selection DSL, but in this case the properties represent actual values, not column references.

Filter rows by "stargazers_count" value:

// Keep only rows where "stargazers_count" value is more than 1000
val dfFiltered = dfSelected.filter { stargazers_count >= 1000 }
dfFiltered

Columns Rename

Columns can be renamed using the .rename { } operation, which also uses the Columns Selection DSL to select a column to rename. The rename operation does not perform the renaming immediately; instead, it creates an intermediate object that must be finalized into a new DataFrame by calling the .into() function with the new column name.

Rename "full_name" and "stargazers_count" columns:

// Rename "full_name" column into "name"
val dfRenamed = dfFiltered.rename { full_name }.into("name")
    // And "stargazers_count" into "starsCount"
    .rename { stargazers_count }.into("starsCount")
dfRenamed

Modify Columns

Columns can be modified using the update { } and convert { } operations. Both operations select columns to modify via the Columns Selection DSL and, similar to rename, create an intermediate object that must be finalized to produce a new DataFrame.

The update operation preserves the original column types, while convert allows changing the type. In both cases, column names and their positions remain unchanged.

Update "name" and convert "topics":

val dfUpdated = dfRenamed
    // Update "name" values with only its second part (after '/')
    .update { name }.with { it.split("/")[1] }
    // Convert "topics" `String` values into `List<String>` by splitting:
    .convert { topics }.with { it.removePrefix("[").removeSuffix("]").split(", ") }
dfUpdated

Check the new "topics" type out:

dfUpdated.topics.type()

Output:

kotlin.collections.List<kotlin.String>

Adding New Columns

The .add { } function allows creating a DataFrame with a new column, where the value for each row is computed based on the existing values in that row. These values can be accessed within the row expressions.

Add a new Boolean column "isIntellij":

// Add a `Boolean` column indicating whether the `name` contains the "intellij" substring
// or the topics include "intellij".
val dfWithIsIntellij = dfUpdated.add("isIntellij") {
    name.contains("intellij") || "intellij" in topics
}
dfWithIsIntellij

Grouping And Aggregating

A DataFrame can be grouped by column keys, meaning its rows are split into groups based on the values in the key columns. The .groupBy { } operation selects columns and groups the DataFrame by their values, using them as grouping keys.

The result is a GroupBy — a DataFrame-like structure that associates each key with the corresponding subset of the original DataFrame.

Group dfWithIsIntellij by "isIntellij":

val groupedByIsIntellij = dfWithIsIntellij.groupBy { isIntellij }
groupedByIsIntellij

A GroupBy can be aggregated — that is, you can compute one or several summary statistics for each group. The result of the aggregation is a DataFrame containing the key columns along with new columns holding the computed statistics for a corresponding group.

For example, count() computes size of group:

groupedByIsIntellij.count()

Compute several statistics with .aggregate { } that provides an expression for aggregating:

groupedByIsIntellij.aggregate {
    // Compute sum and max of "starsCount" within each group into "sumStars" and "maxStars" columns
    sumOf { starsCount } into "sumStars"
    maxOf { starsCount } into "maxStars"
}

Sorting Rows

.sort {}/.sortByDesc sortes rows by value in selected columns, returning a DataFrame with sorted rows. take(n) returns a new DataFrame with the first n rows.

Combine them to get Top-10 repositories by number of stars:

val dfTop10 = dfWithIsIntellij
    // Sort by "starsCount" value descending
    .sortByDesc { starsCount }.take(10)
dfTop10

Plotting With Kandy

Kandy is a Kotlin plotting library designed to bring Kotlin DataFrame features into chart creation, providing a convenient and typesafe way to build data visualizations.

Kandy can be loaded into notebook using %use kandy:

%use kandy

Build a simple bar chart with .plot { } extension for DataFrame, that allows to use extension properties inside Kandy plotting DSL (plot will be rendered as an output after cell execution):

dfTop10.plot {
    bars {
        x(name)
        y(starsCount)
    }

    layout.title = "Top 10 JetBrains repositories by stars count"
}

Write DataFrame

A DataFrame supports writing to all formats that it is capable of reading.

Write into Excel:

dfWithIsIntellij.writeExcel("jb_repos.xlsx")

What's Next?

In this quickstart, we covered the basics — reading data, transforming it, and building a simple visualization.
Ready to go deeper? Check out what’s next:

📘 Explore in-depth guides and various examples with different datasets, API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
🛠️ Browse the operations overview to learn what Kotlin DataFrame can do.
🧠 Understand the design and core concepts in the library overview.
🔤 Learn more about Extension Properties
and make working with your data both convenient and type-safe.
💡 Use Kotlin DataFrame Compiler Plugin
for auto-generated column access in your IntelliJ IDEA projects.
📊 Master Kandy for stunning and expressive DataFrame visualizations learning Kandy Documentation.

16 June 2025