-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Excel output in non-ascii encodings #5025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1396,8 +1398,11 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', | |||
""" | |||
from pandas.io.excel import ExcelWriter | |||
need_save = False | |||
if encoding == None: | |||
encoding = 'ascii' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks off
@jtornero please replace all tabs with 4 spaces. Otherwise we won't be able to incorporate this. |
Well, I've replaced the tabs by spaces and commited again. I got some complains from Travis IC, but I can't interpretate them very well, specially those about python2. . What should I do now? A new pull request? Thank you very much Jorge Tornero |
you don't need a new pull request, let me take a look at Travis. |
It's an issue with a new excel engine we added. Going to need the expert on this. @jmcnamara - how do you handle non-ascii encoding in xlsxwriter? |
Xlsxwriter handles utf8 at the very least. I'll have have a look at the issue. |
The xlsxwriter issue isn't with the encoding, it handles that correctly and passes the test. The issue is that the constructor doesn't handle arbitrary keywords like the new I should have spotted that and added some mapping of user keywords to keywords supported by xlsxwriter (an earlier version of my patch had that I think). Anyway, for now @jtornero can patch
|
This patch is only required for So it seems like overkill to add a new keyword just for this and opening the flood gates for any other future patches that want to add configuration to specific engines. So, instead perhaps the new parameter should be something like Or perhaps that is the way it already works and I've missed it. I've only had a chance to have a quick look. |
@jmcnamara okay, that's fine. Yes, the clear answer is to not pass encoding to other engines. Easiest way to handle this is to add to the constructor: |
@jmcnamara @jtornero To be clear, to fix this all you need to do is change the
I'm not convinced we need to validate kwargs or anything like that (yet). |
One additional note - before this PR, you could pass encoding to xlwt without issue right? So this is really just a testing thing. The other easy way to do this would be to have the xlwt engine set a default for encoding, and then set read_excel to do: if encoding:
kwargs['encoding'] = encoding And that way it only gets passed to the constructor if necessary. I might prefer this second option because it means that constructors don't need to deal with extra keyword arguments. |
@jtornero: The Travis test that failed is being run under Python3.2. That version of Python does not accept the To fix, add
to the top of |
Guess it's all done |
okay I'll take look in the next few days. No major changes to Excel so should be fine. |
@@ -544,7 +547,11 @@ def __init__(self, path, **engine_kwargs): | |||
|
|||
super(_XlwtWriter, self).__init__(path, **engine_kwargs) | |||
|
|||
self.book = xlwt.Workbook() | |||
if 'encoding' in engine_kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should have __init__
with encoding=None
and then add
if encoding is None:
encoding = 'ascii'
self.book = xlwt.Workbook(encoding=encoding)
@jtratner what's the state of this? |
Well I did the PR and just waiting. |
Please rebase this on current master and then we can make sure everything passes |
pls rebase on master and can see where this is |
@jtornero needs a rebase |
I rebased you to this commit: pls incorporate this and go from here |
@jreback can you take a look at https://gist.github.com/jtornero/1324e77425715cdfe987 Best regards, Jorge Tornero |
@jtornero just post here what you need |
@jreback Well... what it is supposed that I have to expect and what it is supposed from me to do? I mean: when I rebase, I should run the tests again? or just make a new PR? I will clone my forked repo again, because the "original" one (where I did the work that origined my PR) is not available anymore for me. Does it matter? |
in this case just confirm that the version I out up looks ok if not |
So steps are
If not:
|
you don't need to fork again, just create a new branch like this:
then you can work with the new branch I am suggesting this as sort of an exercise as the fix is fine, but this is nice to know how to do |
Well, I've cloned and installed and updated some libraries and run the tests. Some errors arise related to google and timeseries (see the nose output at https://gist.github.com/jtornero/9496276) It is ok then? |
hmm those are a bit odd |
@ jreback Well I my first attempts complainted with lots of messages sort of "compiled against version 9 but your numpy version y 7" so I installed 1.8.0 with pip, so currently I'm using 1.8.0. I'll try to compile it from source and see what happens |
@jreback See the output from tests https://gist.github.com/jtornero/9503057 |
not sure what you are doing |
@jreback This is what I get https://gist.github.com/jtornero/9503717 |
So first thing is to install AT LEAST xlrd and xlwt (sorry, I thought this system already had it) Will update all when installed |
you are running with an old version of master git pull upstream/master |
@jreback This is... what is this?!?!? Well Cloned my repo git checkout -b new_excel_branch upstream/master fatal: Cannot update paths and switch to branch 'new_excel_branch' at the same time. OMG!!! I think I need a shrink!! |
merged via 268ee80 thanks for the PR. maybe next time will be a bit easier. |
@jreback Thank you very much for all your patience. For, say so, very amateurs, it is reasonablely easy to try to improve things, but the big white git shark really annoys us (or, at least, me). You've suffered the bite, also... my apologies. Best regards, Jorge Tornero |
git can be a bit of a beast..... try looking here for a sample workflow: https://github.com/pydata/pandas/wiki/Git-Workflows give a try again! always appreciate contributions! |
ENH/TST: Support for non-ascii encodings in DataFrame.to_excel
Closes #3710.
Notice: Despite (for my modest knowledge of python) it should be easier to just put encoding='ascii' in the declaration of to_excel in line 1352 of frame.py, I've declared it with encoding=None as jreback suggested, and then making it default to ascii later with the if clause.