Description
While importing large text files using read_csv we occasionally get an EOF (End of File ) character within a string, which causes an exception: "Error tokenizing data. C error: EOF inside string starting at line. 844863" . This occurs even with "error_bad_lines = False"..
Further, the line stated in the error message is not the line containing the EOF character. In this particular case the actual row was approx. 230 rows before the one stated, which hinders exception handling. (I now see this difference was caused by other "bad_lines" that were being skipped - the quoted error line is correct but the imported rows was less.)
I feel it would be appropriate if "error_bad_lines = False" handled this exception and allowed such rows to be skipped.
I note that when importing this text file into Excel, the "premature" EOF is simply ignored.
We are running on Windows 8 , with python version 2.7 and pandas version 0.12