Closed
Description
Example code:
import pandas
import json
testjson = u'''
[{"Ünicøde":0,"sub":{"A":1, "B":2}},
{"Ünicøde":1,"sub":{"A":3, "B":4}}]
'''.encode('utf8')
pd.io.json.json_normalize(json.loads(testjson))
Output:
Traceback (most recent call last):
File "...lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-12-f866f9c7ec7c>", line 5, in <module>
pd.io.json.json_normalize(json.loads(testjson))
File ".../lib/python2.7/site-packages/pandas/io/json.py", line 715, in json_normalize
data = nested_to_record(data)
File ".../lib/python2.7/site-packages/pandas/io/json.py", line 617, in nested_to_record
newkey = str(k)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdc' in position 0: ordinal not in range(128)
Expected output
sub.A sub.B Ünicøde
0 1 2 0
1 3 4 1
The cause are probably
https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L618
and https://github.com/pydata/pandas/blob/master/pandas/io/json.py#L620
Those lines seemingly were introduced to deal with numeric types, but fail when k
is a Unicode object containing non-ascii characters.
It seems to be the same bug in principle as #13101