-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
EnhancementError ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIO CSVread_csv, to_csvread_csv, to_csv
Milestone
Description
When read_fwf is used with iterator = True and skiprows = [list] arguments it doesn't properly skip all the rows in the skiprows list. Things work properly when either of those arguments is used in isolation.
Here is a simple bit of code to reproduce:
import pandas as pd
#Create a fixed width file to test with.
df = pd.DataFrame({'a': range(10)})
with open('testfwf.txt', 'w') as f:
f.write(df.to_string(index = False, header = False))
rows_to_skip = [0,1,2,6,9]
df_iter = pd.read_fwf('testfwf.txt', colspecs = [(0,2)], names = ['a'], iterator = True,
chunksize = 2, skiprows = rows_to_skip)
print('The fixed width file in chunks with rows [0,1,2,6,9] skipped: ')
for df in df_iter:
print(df)
print('Notice how row 6 of the fixed width file has not been skipped even though it should')
print('have been.')
It seems that all rows are skipped until there are rows that aren't skipped. For example, the leading rows 0,1,2 are skipped. But since there are then rows that aren't skipped the skipping stops for all rows until then end, when row 9 IS skipped.
Metadata
Metadata
Assignees
Labels
EnhancementError ReportingIncorrect or improved errors from pandasIncorrect or improved errors from pandasIO CSVread_csv, to_csvread_csv, to_csv