Skip to content

Updated data schema documentation#1859

Open
AndreiKingsley wants to merge 2 commits into
masterfrom
data_schema_docs
Open

Updated data schema documentation#1859
AndreiKingsley wants to merge 2 commits into
masterfrom
data_schema_docs

Conversation

@AndreiKingsley
Copy link
Copy Markdown
Collaborator

@AndreiKingsley AndreiKingsley commented May 13, 2026

Closes #309.

We now explicitly recommend against using “complex” classes for data schemas in the documentation.

I also think it's better use interface for describing data schemas, especially for beginners, so I put interfaces on the first place and updated example with them.

@AndreiKingsley AndreiKingsley marked this pull request as ready for review May 13, 2026 12:28
val orders: List<_DataFrameType11>
val user: kotlin.String
}
data class Customer(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-generate classes please, it'll have nested structure and domain name Orders instead of Customer1

@koperagen
Copy link
Copy Markdown
Collaborator

koperagen commented May 13, 2026

Btw, as for closing #309:

val DataRow<DataSchemaType>.myComputedProperty get() = ...

Can be used on type directly:

DataFrame.readCsv().cast<DataSchemaType().aggregate { 
  last().myComputedProperty into "last"
  first().myComputedProperty into "first"
}

Keeping in mind that compiler plugin operations change initial type

DataFrame.readCsv().cast<DataSchemaType()
    .add("example") { 1 }.filter { myComputedProperty } // unresolved because of add 

Workaround, use cast in DataRow context:
df.filter { cast<DataSchemaType>().myComputedProperty }

@AndreiKingsley
Copy link
Copy Markdown
Collaborator Author

@koperagen should we add this trick to the documentation?

<primary>
<title>First steps</title>
<a href="SetupKotlinNotebook.md"/>
<a href="SetupGradle.md"/>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm maybe we should keep the setup kotlin notebook for a while but just move it downward


```kotlin
val df = DataFrame.readCsv("example.csv").convertTo<Person>()
val df = DataFrame.readCsv("example.csv").cast<Person>()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an interesting example... CSV is inherently flat, yet we have a nested type here XD This can only occur if there's json inside the csv, which is not that common

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specify that it's not "just a CSV file", but that it contains a JSON column

with a separate class representing the schema of each column group or nested `DataFrame`.

For example, consider a simple hierarchical DataFrame from
For example, consider a simple hierarchical dataframe from
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, here we should also mention that this is a csv which has a json column, which is why it has a hierarchical structure

## `@DataSchema` annotation

`@DataSchema` is a Kotlin annotation that marks a data class or interface as a data schema.
Compiler plugin generates [extension properties](extensionPropertiesApi.md) for the `DataFrame`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler plugin (and link to it)


`@DataSchema` is a Kotlin annotation that marks a data class or interface as a data schema.
Compiler plugin generates [extension properties](extensionPropertiesApi.md) for the `DataFrame`
(or `DataRow`, `ColumnGroup`, etc.)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

links?

> we highly recommend using it only on interfaces and data classes specially made
> for representing the data schema of a `DataFrame`.
>
> Use only trivial properties, avoiding computed, `lateinit` or delegated properties.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

", or"

@koperagen
Copy link
Copy Markdown
Collaborator

@koperagen should we add this trick to the documentation?

Yes please, somewhere about dataschemas. As an alternative to having computed properties right in the classes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

convertTo() requires values for computed properties

3 participants