Skip to content

read_csv() & EOF character in string cause parsing issue #5500

Closed
@stephenjshaw

Description

@stephenjshaw

While importing large text files using read_csv we occasionally get an EOF (End of File ) character within a string, which causes an exception: "Error tokenizing data. C error: EOF inside string starting at line. 844863" . This occurs even with "error_bad_lines = False"..

Further, the line stated in the error message is not the line containing the EOF character. In this particular case the actual row was approx. 230 rows before the one stated, which hinders exception handling. (I now see this difference was caused by other "bad_lines" that were being skipped - the quoted error line is correct but the imported rows was less.)

I feel it would be appropriate if "error_bad_lines = False" handled this exception and allowed such rows to be skipped.

I note that when importing this text file into Excel, the "premature" EOF is simply ignored.

We are running on Windows 8 , with python version 2.7 and pandas version 0.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions