Skip to content

Read SAS (sas7bdat) #12654

Closed
Closed
@ywhcuhk

Description

@ywhcuhk

Very excited to have this new feature in Pandas. I have a few comments to share:

  1. pd.read_sas() doesn't read SAS date variable correctly (this is noted in the doc). Dates are read as numpy.float64. Note in SAS, dates are recorded as numbers relative to 1960-1-1. It would be helpful to allow some sort of arguments to parse the date variable correctly.
  2. Moreover, SAS has some special missing variables such as .B or .R. I wonder how are these cases treated?
  3. Not nearly as fast as read_csv(). To read a 700MB SAS data. The time is
CPU times: user 1min 47s, sys: 955 ms, total: 1min 48s
Wall time: 1min 48s

The time for the same CSV file (I covered the same file to CSV using SAS) is

CPU times: user 3.93 s, sys: 343 ms, total: 4.28 s
Wall time: 4.29 s

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO SASSAS: read_sasPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions