Skip to content

get_dummies chokes on unicode values #6885

Closed
@maxgrenderjones

Description

@maxgrenderjones

(Context: pandas version 0.13.1 running on 2.7.6 |Anaconda 1.9.1 (64-bit)| (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)])

In my code I have a category containing lots of non-English names and want to create dummies out of it.

So I call:

dummies=pandas.get_dummies(data[cat], prefix=prefix)

and get:

c:\Anaconda\lib\site-packages\pandas\core\reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na)
    971     if prefix is not None:
    972         dummy_cols = ['%s%s%s' % (prefix, prefix_sep, str(v))
--> 973                       for v in levels]
    974     else:
    975         dummy_cols = levels

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 19: ordinal not in range(128)

Issue would appear to be the call to str(v) - if v is a unicode string with non-ascii, this is liable to explode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugStringsString extension data type and string dataUnicodeUnicode strings

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions