Skip to content

pd.read_json(file, lines=True) does not work if json has quotes inside it #15132

Closed
@Bigbrd

Description

@Bigbrd

Code Sample, a copy-pastable example if possible

{"errors":["This check-in does not exist, it may have been deleted."]},
{"list":{"id":487004,"description":"foo.”\r\n\r\n* “I am aware that I’m drafting an email responding to a complaint.”\r\n\r\n* “I am aware that I’m wondering who will win.”\r\n\r\nThe great thing about this exercise is that it is generalizable. You practice during the meditation, but then you use it for your own goals during your day to day."..........}

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 32.3.1
Cython: None
numpy: 1.11.3
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

Problem description

List has quotes inside the json data. Expected to read this data line by line, but we get a UnicodeDecodeError at the position of that inner quote in the description

Expected Output

read successful

Output:

Traceback (most recent call last):
data = pd.read_json(fileName, lines=True)
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 275, in r
ead_json
json = u'[' + u','.join(lines) + u']'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4924: ordina
l not in range(128)

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO JSONread_json, to_json, json_normalizeUnicodeUnicode strings

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions