Skip to content

BUG: Load ORC-format data failed when pandas version>1.2.0.dev0 #40918

Closed
@amznero

Description

@amznero

Code Sample, a copy-pastable example

...
import pandas as pd
orc_data = pd.read_orc(orc_file_path)

Problem description

Pandas uses PyArrow package to load ORC/Parquet data.

For the orc data format, it will use pyarrow.orc.ORCFile to read data (orc.py), but the PyArrow does not declare orc in __init__.py file, so pandas will raise an AttributeError: module 'pyarrow' has no attribute 'orc'

image

This bug will occur if the Pandas version is greater than v1.2.0.dev0(after commit-6d1541e). Before that, pandas/io/orc.py will declare import pyarrow.orc before uses pyarrow to load orc data(v1.1.5/pandas/io.orc.py/).


Testing environment:

  • Ubuntu 18.04
  • python 3.7
  • pandas v1.2.1
  • pyarrow v3.0.0 (install via pip)(I haven't installed pyarrow via Conda for testing yet.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDependenciesRequired and optional dependenciesIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions