Skip to content

HDF5 index corruption #8265

Closed
Closed
@rockg

Description

@rockg

I generated a multindexed DataFrame and wrote it to hdf5 using to_hdf. It uses zlib level 5 compression. The file was written all at once. The file is located here: https://www.dropbox.com/s/122q55g5ubcf4fl/indexIssue.h5?dl=0

The below methods should be identical but the former select with a where clause has 2892 records but getting all values and subselecting on the path returns 2972 (values are missing for path 6 between 3-5-2015 20:00 to 3-6-2015 9:00). I tried using reindex on the able but that didn't fix anything. I don't really know what's going on.

store   =   HDFStore(path_to_file, mode='r')

p1      =   store.select('ts', where=Term('Path', '=', 6), auto_close=False)
print(len(p1))
p2      =   store.select('ts', auto_close=False)
p2s     =   p2[p2.index.get_level_values('Path') == 6]
print(len(p2s))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions