Extension Properties API

When working with a DataFrame, the most convenient and reliable way to access its columns — including for operations and retrieving column values in row expressions — is through auto-generated extension properties. They are generated based on a dataframe schema, with the name and type of properties inferred from the name and type of the corresponding columns. It also works for all types of hierarchical dataframes.

Example

Consider a simple hierarchical dataframe from example.csv.

This table consists of two columns: name, which is a String column, and info, which is a column group containing two nested value columns — age of type Int, and height of type Double.

name	info
	age	height
Alice	23	175.5
Bob	27	160.2

Read the DataFrame from the CSV file:

val df = DataFrame.readCsv("example.csv")

After cell execution data schema and extensions for this DataFrame will be generated so you can use extensions for accessing columns, using it in operations inside the Column Selector DSL and DataRow API:

// Get nested column
df.info.age
// Sort by multiple columns
df.sortBy { name and info.height }
// Filter rows using a row condition. 
// These extensions express the exact value in the row 
// with the corresponding type:
df.filter { name.startsWith("A") && info.age >= 16 }

If you change the dataframe's schema by changing any column name, or type or add a new one, you need to run a cell with a new DataFrame declaration first. For example, rename the name column into "firstName":

val dfRenamed = df.rename { name }.into("firstName")

After running the cell with the code above, you can use firstName extensions in the following cells:

dfRenamed.firstName
dfRenamed.rename { firstName }.into("name")
dfRenamed.filter { firstName == "Nikita" }

See the Quickstart Guide in Kotlin Notebook with basic Extension Properties API examples.

For now, if you read DataFrame from a file or URL, you need to define its schema manually. You can do it quickly with generate..() methods.

Define schemas:

@DataSchema
data class PersonInfo(
    val age: Int,
    val height: Float
)

@DataSchema
data class Person(
    val info: PersonInfo,
    val name: String
)

Read the DataFrame from the CSV file and specify the schema with .convertTo() or cast():

val df = DataFrame.readCsv("example.csv").convertTo<Person>()

Extensions for this DataFrame will be generated automatically by the plugin, so you can use extensions for accessing columns, using it in operations inside the Column Selector DSL and DataRow API.

// Get nested column
df.info.age
// Sort by multiple columns
df.sortBy { name and info.height }
// Filter rows using a row condition. 
// These extensions express the exact value in the row 
// with the corresponding type:
df.filter { name.startsWith("A") && info.age >= 16 }

Moreover, new extensions will be generated on-the-fly after each schema change: by changing any column name, or type or add a new one. For example, rename the name column into "firstName" and then we can use firstName extensions in the following operations:

// Rename "name" column into "firstName"
df.rename { name }.into("firstName")
    // Can use `firstName` extension in the row condition 
    // right after renaming
    .filter { firstName == "Nikita" }

See Compiler Plugin Example IDEA project with basic Extension Properties API examples.

16 June 2025