ENH: Excel output in non-ascii encodings #5025

jtornero · 2013-09-28T21:13:11Z

ENH/TST: Support for non-ascii encodings in DataFrame.to_excel

Closes #3710.

Notice: Despite (for my modest knowledge of python) it should be easier to just put encoding='ascii' in the declaration of to_excel in line 1352 of frame.py, I've declared it with encoding=None as jreback suggested, and then making it default to ascii later with the if clause.

ENH/TST: Support for non-ascii encodings in DataFrame.to_excel

jtratner · 2013-09-29T19:16:21Z

pandas/core/frame.py

@@ -1396,8 +1398,11 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
        """
        from pandas.io.excel import ExcelWriter
        need_save = False
+        if encoding == None:
+	  encoding = 'ascii'


this looks off

jtratner · 2013-09-29T19:39:58Z

@jtornero please replace all tabs with 4 spaces. Otherwise we won't be able to incorporate this.

ENH: Excel output in non-ascii encodings Replaced tabs to follow PEP8

jtornero · 2013-09-29T22:04:53Z

Well, I've replaced the tabs by spaces and commited again. I got some complains from Travis IC, but I can't interpretate them very well, specially those about python2. . What should I do now? A new pull request?

Thank you very much

Jorge Tornero

jtratner · 2013-09-29T22:16:10Z

you don't need a new pull request, let me take a look at Travis.

jtratner · 2013-09-29T22:17:31Z

It's an issue with a new excel engine we added. Going to need the expert on this. @jmcnamara - how do you handle non-ascii encoding in xlsxwriter?

jmcnamara · 2013-09-30T09:02:32Z

Xlsxwriter handles utf8 at the very least. I'll have have a look at the issue.

jmcnamara · 2013-09-30T10:23:50Z

The xlsxwriter issue isn't with the encoding, it handles that correctly and passes the test. The issue is that the constructor doesn't handle arbitrary keywords like the new encoding keyword.

I should have spotted that and added some mapping of user keywords to keywords supported by xlsxwriter (an earlier version of my patch had that I think).

Anyway, for now @jtornero can patch excel.py as follows to remove keyword handling from XlsxWriter while I work on a cleaner solution.

$ git diff
diff --git a/pandas/io/excel.py b/pandas/io/excel.py
index 2c82261..8b49a40 100644
--- a/pandas/io/excel.py
+++ b/pandas/io/excel.py
@@ -668,7 +668,7 @@ class _XlsxWriter(ExcelWriter):

         super(_XlsxWriter, self).__init__(path, **engine_kwargs)

-        self.book = xlsxwriter.Workbook(path, **engine_kwargs)
+        self.book = xlsxwriter.Workbook(path)

     def save(self):
         """

jmcnamara · 2013-09-30T11:01:18Z

@jreback

This patch is only required for xlwt since openpyxl, xlsxwriter and pyexcelerate (I'm getting around to it) don't need it for utf8 and don't handle other encodings anyway (afaik).

So it seems like overkill to add a new keyword just for this and opening the flood gates for any other future patches that want to add configuration to specific engines.

So, instead perhaps the new parameter should be something like engine_kwargs or even just **kwargs as a generic interface to passing options into the various engines. The engines can then have their own logic for handling or ignoring options.

Or perhaps that is the way it already works and I've missed it. I've only had a chance to have a quick look.

jtratner · 2013-09-30T11:31:21Z

@jmcnamara okay, that's fine. Yes, the clear answer is to not pass encoding to other engines.

Easiest way to handle this is to add to the constructor: encoding=None and not pass it forwards for engines that don't need it. Simple. Plus add a warning that non-utf8 encodings aren't supported.

jtratner · 2013-09-30T11:41:23Z

@jmcnamara @jtornero To be clear, to fix this all you need to do is change the __init__ method of each ExcelWriter to the following (I agree we should just change it to kwargs):

def __init__(self, io, encoding=None, **kwargs):

I'm not convinced we need to validate kwargs or anything like that (yet).

jtratner · 2013-09-30T12:03:13Z

One additional note - before this PR, you could pass encoding to xlwt without issue right? So this is really just a testing thing. The other easy way to do this would be to have the xlwt engine set a default for encoding, and then set read_excel to do:

if encoding:
    kwargs['encoding'] = encoding

And that way it only gets passed to the constructor if necessary. I might prefer this second option because it means that constructors don't need to deal with extra keyword arguments.

Fixed some spacing issues

unutbu · 2013-10-01T10:55:00Z

@jtornero: The Travis test that failed is being run under Python3.2. That version of Python does not accept the u'...' syntax for strs (or what's called unicode in Python2). The u'...' syntax was added back in Python3, which is why the last Travis test passed.

To fix, add

from pandas.compat import u

to the top of test_excel.py, and replace u'...' with u('...').

jtornero · 2013-10-02T05:51:33Z

Guess it's all done

jtratner · 2013-10-02T11:23:04Z

okay I'll take look in the next few days. No major changes to Excel so should be fine.

jtratner · 2013-10-02T11:24:40Z

pandas/io/excel.py

@@ -544,7 +547,11 @@ def __init__(self, path, **engine_kwargs):

        super(_XlwtWriter, self).__init__(path, **engine_kwargs)

-        self.book = xlwt.Workbook()
+        if 'encoding' in engine_kwargs:


you should have __init__ with encoding=None and then add

if encoding is None: encoding = 'ascii' self.book = xlwt.Workbook(encoding=encoding)

jreback · 2014-01-03T22:24:05Z

@jtratner what's the state of this?

jtornero · 2014-01-03T22:31:46Z

Well I did the PR and just waiting.

jtratner · 2014-01-04T03:43:19Z

Please rebase this on current master and then we can make sure everything passes

jreback · 2014-02-16T12:09:01Z

pls rebase on master and can see where this is

jreback · 2014-03-09T14:57:01Z

@jtornero needs a rebase

jtornero · 2014-03-10T09:08:37Z

@jreback @jtratner I'm so sorry but I am absolutely lost about this. I guess I have to clone my forked repo again, then rebase (don't know the steps clearly)? And then? I'm so sorry this is disappointing but my git skills are unfortunately too low!!

jreback · 2014-03-10T12:40:10Z

I rebased you to this commit: pls incorporate this and go from here

jreback@1478660

jtornero · 2014-03-11T07:35:30Z

@jreback can you take a look at https://gist.github.com/jtornero/1324e77425715cdfe987

Best regards,

Jorge Tornero

jreback · 2014-03-11T11:43:44Z

@jtornero just post here what you need

jreback · 2014-03-11T11:44:38Z

https://github.com/pydata/pandas/wiki/Using-Git

jtornero · 2014-03-11T11:52:08Z

@jreback Well... what it is supposed that I have to expect and what it is supposed from me to do? I mean: when I rebase, I should run the tests again? or just make a new PR? I will clone my forked repo again, because the "original" one (where I did the work that origined my PR) is not available anymore for me. Does it matter?

jreback · 2014-03-11T11:55:59Z

in this case just confirm that the version I out up looks ok
if so I can just directly incorporate it

if not
then create a new branch
pull this commit in
then make changes and submit a new pr

jtornero · 2014-03-11T12:07:11Z

So steps are

clone my forked repo
Because it is already rebased (you did it yesterday), run the tests
It tests are ok, post here

If not:

create branch
pull this commit in???
make whatever changes are neccessary to pass the tests and in that case, make PR

jreback · 2014-03-11T12:52:21Z

you don't need to fork again, just create a new branch

like this:

git checkout -b new_excel_branch upstream/master
git pull https://github.com/jreback/pandas.git excel_encoding

then you can work with the new branch

I am suggesting this as sort of an exercise as the fix is fine, but this is nice to know how to do

jtornero · 2014-03-11T22:20:22Z

Well, I've cloned and installed and updated some libraries and run the tests. Some errors arise related to google and timeseries (see the nose output at https://gist.github.com/jtornero/9496276)

It is ok then?

jreback · 2014-03-12T00:16:45Z

hmm those are a bit odd
what version of numpy do u have?

jtornero · 2014-03-12T07:19:16Z

@ jreback Well I my first attempts complainted with lots of messages sort of "compiled against version 9 but your numpy version y 7" so I installed 1.8.0 with pip, so currently I'm using 1.8.0. I'll try to compile it from source and see what happens

jtornero · 2014-03-12T08:38:29Z

@jreback See the output from tests https://gist.github.com/jtornero/9503057

jreback · 2014-03-12T09:07:57Z

not sure what you are doing
tests pass on 0.13.1 for a properly installed version?
can u show print_versions?

jtornero · 2014-03-12T09:37:06Z

@jreback This is what I get

https://gist.github.com/jtornero/9503717

jtornero · 2014-03-12T09:38:58Z

So first thing is to install AT LEAST xlrd and xlwt (sorry, I thought this system already had it)

Will update all when installed

jreback · 2014-03-12T09:40:44Z

you are running with an old version of master
you need to

git pull upstream/master

jtornero · 2014-03-12T10:59:33Z

@jreback This is... what is this?!?!?

Well

Cloned my repo
Added upstream
now

git checkout -b new_excel_branch upstream/master

fatal: Cannot update paths and switch to branch 'new_excel_branch' at the same time.
Did you intend to checkout 'upstream/master' which can not be resolved as commit?

OMG!!! I think I need a shrink!!

jreback · 2014-03-12T13:00:55Z

merged via 268ee80

thanks for the PR.

maybe next time will be a bit easier.

jtornero · 2014-03-12T13:09:00Z

@jreback Thank you very much for all your patience. For, say so, very amateurs, it is reasonablely easy to try to improve things, but the big white git shark really annoys us (or, at least, me). You've suffered the bite, also... my apologies.

Best regards,

Jorge Tornero

jreback · 2014-03-12T13:15:23Z

git can be a bit of a beast.....

try looking here for a sample workflow: https://github.com/pydata/pandas/wiki/Git-Workflows

give a try again!

always appreciate contributions!

#3710

a714e82

ENH/TST: Support for non-ascii encodings in DataFrame.to_excel

jtratner reviewed Sep 29, 2013
View reviewed changes

#5025

f629686

ENH: Excel output in non-ascii encodings Replaced tabs to follow PEP8

jreback mentioned this pull request Sep 29, 2013

Excel output in non-ascii encodings #3710

Closed

jtornero added 3 commits October 1, 2013 00:35

#5025 ENH: Excel output in non-ascii encodings

5a19169

#5025 ENH: Excel output in non-ascii encodings

c1203c6

#5025 ENH: Excel output in non-ascii encodings

b4d3ea7

Fixed some spacing issues

#5025 TST: Fixed unicode compatibility

5fc3ed5

jtratner reviewed Oct 2, 2013
View reviewed changes

#5025 ENH: Several fixes proposed by @jtratner

9106cca

jreback added Enhancement labels Feb 16, 2014

jreback closed this Mar 12, 2014

Uh oh!

ENH: Excel output in non-ascii encodings #5025

ENH: Excel output in non-ascii encodings #5025

Uh oh!

Conversation

jtornero commented Sep 28, 2013

Uh oh!

jtratner Sep 29, 2013

Choose a reason for hiding this comment

Uh oh!

jtratner commented Sep 29, 2013

Uh oh!

jtornero commented Sep 29, 2013

Uh oh!

jtratner commented Sep 29, 2013

Uh oh!

jtratner commented Sep 29, 2013

Uh oh!

jmcnamara commented Sep 30, 2013

Uh oh!

jmcnamara commented Sep 30, 2013

Uh oh!

jmcnamara commented Sep 30, 2013

Uh oh!

jtratner commented Sep 30, 2013

Uh oh!

jtratner commented Sep 30, 2013

Uh oh!

jtratner commented Sep 30, 2013

Uh oh!

unutbu commented Oct 1, 2013

Uh oh!

jtornero commented Oct 2, 2013

Uh oh!

jtratner commented Oct 2, 2013

Uh oh!

jtratner Oct 2, 2013

Choose a reason for hiding this comment

Uh oh!

jreback commented Jan 3, 2014

Uh oh!

jtornero commented Jan 3, 2014

Uh oh!

jtratner commented Jan 4, 2014

Uh oh!

jreback commented Feb 16, 2014

Uh oh!

jreback commented Mar 9, 2014

Uh oh!

jtornero commented Mar 10, 2014

Uh oh!

jreback commented Mar 10, 2014

Uh oh!

jtornero commented Mar 11, 2014

Uh oh!

jreback commented Mar 11, 2014

Uh oh!

jreback commented Mar 11, 2014

Uh oh!

jtornero commented Mar 11, 2014

Uh oh!

jreback commented Mar 11, 2014

Uh oh!

jtornero commented Mar 11, 2014

Uh oh!

jreback commented Mar 11, 2014

Uh oh!

jtornero commented Mar 11, 2014

Uh oh!

jreback commented Mar 12, 2014

Uh oh!

jtornero commented Mar 12, 2014

Uh oh!

jtornero commented Mar 12, 2014

Uh oh!

jreback commented Mar 12, 2014

Uh oh!

jtornero commented Mar 12, 2014

Uh oh!

jtornero commented Mar 12, 2014