Skip to content

json_normalize() can't deal with non-ascii characters in unicode keys #13213

Closed
@fmarczin

Description

@fmarczin

Example code:

import pandas
import json

testjson = u'''
[{"Ünicøde":0,"sub":{"A":1, "B":2}},
 {"Ünicøde":1,"sub":{"A":3, "B":4}}]
 '''.encode('utf8')
pd.io.json.json_normalize(json.loads(testjson))

Output:

Traceback (most recent call last):
  File "...lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-f866f9c7ec7c>", line 5, in <module>
    pd.io.json.json_normalize(json.loads(testjson))
  File ".../lib/python2.7/site-packages/pandas/io/json.py", line 715, in json_normalize
    data = nested_to_record(data)
  File ".../lib/python2.7/site-packages/pandas/io/json.py", line 617, in nested_to_record
    newkey = str(k)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 0: ordinal not in range(128)

Expected output

   sub.A  sub.B  Ünicøde
0      1      2        0
1      3      4        1

The cause are probably
https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L618
and https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L620

Those lines seemingly were introduced to deal with numeric types, but fail when k is a Unicode object containing non-ascii characters.

It seems to be the same bug in principle as #13101

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions