Skip to content

Assume columns can have missing values by default #153

@cynddl

Description

@cynddl

The current behavior of CSV.jl is to scan columns until row_for_type_detect iterations have passed and infer their type.

However, for large files with few null cells, this force to set row_for_type_detect to the maximum number of rows, as the heuristic cannot detect if the last row of the file might contain null cells for instance.


Solution

One workaround would be to assume that rows can contain null cells by default, except if the user disable it. This was the behavior in DataFrames.readtable for instance.

Another would be to have a manual flag in CSV.Options to assume that all rows can contain null values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions