Converting nested JSON structures to Pandas DataFrames
Last Updated :
22 Nov, 2021
In this article, we are going to see how to convert nested JSON structures to Pandas DataFrames.
JSON with multiple levels
In this case, the nested JSON data contains another JSON object as the value for some of its attributes. This makes the data multi-level and we need to flatten it as per the project requirements for better readability, as explained below.
Python3
# importing the libraries used
import pandas as pd
# initializing the data
data = {
'company': 'XYZ pvt ltd',
'location': 'London',
'info': {
'president': 'Rakesh Kapoor',
'contacts': {
'email': '[email protected]',
'tel': '9876543210'
}
}
}
Here, the data contains multiple levels. To convert it to a dataframe we will use the json_normalize() function of the pandas library.
Python3
Output:
json data converted to pandas dataframe
Here, we see that the data is flattened and converted to columns. If we do not wish to completely flatten the data, we can use the max_level attribute as shown below.
Python3
pd.json_normalize(data,max_level=0)
Output:
json data converted to pandas dataframe
Here, we see that the info column is not flattened further.
Python3
pd.json_normalize(data,max_level=1)
Output:
json data converted to pandas dataframe
Here, we see that the contacts column is not flattened further.
List of nested JSON
Now, if the data is a list of nested JSONs, we will get multiple records in our dataframe.
Python3
data = [
{
'id': '001',
'company': 'XYZ pvt ltd',
'location': 'London',
'info': {
'president': 'Rakesh Kapoor',
'contacts': {
'email': '[email protected]',
'tel': '9876543210'
}
}
},
{
'id': '002',
'company': 'PQR Associates',
'location': 'Abu Dhabi',
'info': {
'president': 'Neelam Subramaniyam',
'contacts': {
'email': '[email protected]',
'tel': '8876443210'
}
}
}
]
pd.json_normalize(data)
Output:
json data converted to pandas dataframe
So, in the case of multiple levels of JSON, we can try out different values of max_level attribute.
JSON with nested lists
In this case, the nested JSON has a list of JSON objects as the value for some of its attributes. In such a case, we can choose the inner list items to be the records/rows of our dataframe using the record_path attribute.Â
Python3
# initialising the data
data = {
'company': 'XYZ pvt ltd',
'location': 'London',
'info': {
'president': 'Rakesh Kapoor',
'contacts': {
'email': '[email protected]',
'tel': '9876543210'
}
},
'employees': [
{'name': 'A'},
{'name': 'B'},
{'name': 'C'}
]
}
# converting the data to dataframe
df = pd.json_normalize(data)
Output:
json data converted to pandas dataframe
Here, the nested list is not flattened. We need to use record_path attribute to flatten the nested list.
Python3
pd.json_normalize(data,record_path=['employees'])
Output:
nested list is not flattened
Now, we observe that it does not include 'info' and other features. To include them we use another attribute, meta.  Note that, in the below code, to include an attribute of an inner JSON we have specified the path as  "['info', 'president']".
Python3
pd.json_normalize(data, record_path=['employees'], meta=[
'company', 'location', ['info', 'president']])
Output:
json data converted to pandas dataframe
Now in the case of multiple nested JSON objects, we will get a dataframe with multiple records as shown below.
Python3
data = [
{
'id': '001',
'company': 'XYZ pvt ltd',
'location': 'London',
'info': {
'president': 'Rakesh Kapoor',
'contacts': {
'email': '[email protected]',
'tel': '9876543210'
}
},
'employees': [
{'name': 'A'},
{'name': 'B'},
{'name': 'C'}
]
},
{
'id': '002',
'company': 'PQR Associates',
'location': 'Abu Dhabi',
'info': {
'president': 'Neelam Subramaniyam',
'contacts': {
'email': '[email protected]',
'tel': '8876443210'
}
},
'employees': [
{'name': 'L'},
{'name': 'M'},
{'name': 'N'}
]
}
]
df = pd.json_normalize(data, record_path=['employees'], meta=[
'company', 'location', ['info', 'president']])
print(df)
Output :
json data converted to pandas dataframe
Similar Reads
Convert JSON to Pandas DataFrame
When working with data, it's common to encounter JSON (JavaScript Object Notation) files, which are widely used for storing and exchanging data. Pandas, a powerful data manipulation library in Python, provides a convenient way to convert JSON data into a Pandas data frame. In this article, we'll exp
4 min read
How To Convert Pandas Dataframe To Nested Dictionary
In this article, we will learn how to convert Pandas DataFrame to Nested Dictionary. Convert Pandas Dataframe To Nested DictionaryConverting a Pandas DataFrame to a nested dictionary involves organizing the data in a hierarchical structure based on specific columns. In Python's Pandas library, we ca
2 min read
Converting Django QuerySet to Pandas DataFrame
Django's ORM provides a powerful way to query databases and retrieve data using QuerySet objects. However, there are times when you may need to manipulate, analyze, or visualize this data in a more sophisticated way than what Django alone can offer. In such cases, pandas, a popular data manipulation
5 min read
How to Convert String to Integer in Pandas DataFrame?
Let's see methods to convert string to an integer in Pandas DataFrame: Method 1: Use of Series.astype() method. Syntax: Series.astype(dtype, copy=True, errors=âraiseâ) Parameters: This method will take following parameters: dtype: Data type to convert the series into. (for example str, float, int).c
3 min read
Pyspark - Converting JSON to DataFrame
In this article, we are going to convert JSON String to DataFrame in Pyspark. Method 1: Using read_json() We can read JSON files using pandas.read_json. This method is basically used to read JSON files through pandas. Syntax: pandas.read_json("file_name.json") Here we are going to use this JSON file
1 min read
Python Pandas Dataframe To Nested Json
When working with data in Python,Pandas is a popular library for handling tabular data efficiently. Converting a Pandas DataFrame to a nested JSON structure can be necessary for various reasons, such as preparing data for API responses or interacting with nested JSON-based data structures. In this a
3 min read
How to Convert Integers to Strings in Pandas DataFrame?
In this article, we'll look at different methods to convert an integer into a string in a Pandas dataframe. In Pandas, there are different functions that we can use to achieve this task : map(str)astype(str)apply(str)applymap(str) Example 1 : In this example, we'll convert each value of a column of
3 min read
How to Convert Floats to Strings in Pandas DataFrame?
In this post, we'll see different ways to Convert Floats to Strings in Pandas Dataframe? Pandas Dataframe provides the freedom to change the data type of column values. We can change them from Integers to Float type, Integer to String, String to Integer, Float to String, etc. There are three methods
4 min read
How to Convert Pandas to PySpark DataFrame ?
In this article, we will learn How to Convert Pandas to PySpark DataFrame. Sometimes we will get csv, xlsx, etc. format data, and we have to store it in PySpark DataFrame and that can be done by loading data in Pandas then converted PySpark DataFrame. For conversion, we pass the Pandas dataframe int
3 min read
Convert PySpark Row List to Pandas DataFrame
In this article, we will convert a PySpark Row List to Pandas Data Frame. A Row object is defined as a single Row in a PySpark DataFrame. Thus, a Data Frame can be easily represented as a Python List of Row objects. Method 1 : Use createDataFrame() method and use toPandas() method Here is the syntax
4 min read