Skip to content

ExcelWriter's append mode unable to write to existing worksheet #28653

Closed
@garygsw

Description

@garygsw

Problem description

The openpxyl engine for ExcelWriter's append mode allows us to write data into a new Worksheet and append it to an existing Workbook. However, when trying to append data into an existing Worksheet, it creates a new blank Worksheet (without any existing data) with the Worksheet's name suffixed numerically incremented.

Sample Code

import pandas as pd
from openpyxl import load_workbook

excel_filename = 'excel_file.xlsx'

# Create initial excel file
initial_data = pd.DataFrame({'col1': ['A', 'B'], 'col2': [1, 2]})
with pd.ExcelWriter(excel_filename, engine='openpyxl', mode='w') as writer:
    initial_data.to_excel(writer, sheet_name="sheet", index=False)

# Append some new rows into an existing worksheet
new_data = pd.DataFrame({'col1': ['C', 'D'], 'col2': [3, 4]})
workbook = load_workbook(excel_filename)
with pd.ExcelWriter(excel_filename, engine='openpyxl', mode='a') as writer:
     new_data.to_excel(writer,
                       sheet_name="sheet",
                       startrow=workbook['sheet'].max_row,
                       header=False,
                       index=False)

Expected Output

We would expect the Excel file to contain a Worksheet named "sheet", with the following data:

col1 col2
A 1
B 2
C 3
D 4

Actual Output

However, the Excel file contains two Worksheets, namely "sheet" and "sheet1", with the following data:

sheet:

col1 col2
A 1
B 2

sheet1:

C 3
D 4

Explanation

When the ExcelWriter object is instantiated, it assumes a new Empty Workbook with no Worksheets (see ExcelWriter: line 701).

self.sheets = {}

So when we try to write data, it always creates a new blank Worksheet instead (see _OpenpyxlWriter: line 400-403).

if sheet_name in self.sheets:
wks = self.sheets[sheet_name]
else:
wks = self.book.create_sheet()

In addition, the openpxyl library performs a check before writing to "avoid duplicate names" (see openpxyl: line 18), which numerically increment the suffix of the sheet name.

Existing solution

An existing solution is found in Stack Overflow (see here).

It is a simple fix with just one line to add all existing worksheets after the ExcelWriter object is instantiated (see Sample Code above):

writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)

It would be preferred if this is done in the backend, right after the workbook is loaded here:

if self.mode == "a": # Load from existing workbook
from openpyxl import load_workbook
book = load_workbook(self.path)
self.book = book

It will improve usability of the DataFrame.to_excel function.

However, some additional flags should be added to warn the user that existing data in existing worksheets may be overwritten (if the startrow or startcol parameter is stated wrongly by accident). Perhaps a new default parameter overwrite=false should be created for DataFrame.to_excel to only write data to empty regions (by performing bound checks for startrow and startcol).

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.2
pytest: 2.8.5
pip: 10.0.1
setuptools: 39.2.0
Cython: 0.23.4
numpy: 1.16.4
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
feather: None
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml.etree: 3.5.0
bs4: 4.4.1
html5lib: None
sqlalchemy: 1.0.11
pymysql: 0.8.0
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions