Saturday, November 12, 2022

What does pd.json_normalize() do?

JSON file can sometimes be clumsy having different levels and hierarchy. Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.

Below are different ways Pandas provide to convert JSON to dataframe 


# Use json_normalize() to convert JSON to DataFrame

dict= json.loads(data)

df = json_normalize(dict['technologies']) 


# Convert JSON to DataFrame Using read_json()

df2 = pd.read_json(jsonStr, orient ='index')


# Use pandas.DataFrame.from_dict() to Convert JSON to DataFrame

dict= json.loads(data)

df2 = pd.DataFrame.from_dict(dict, orient="index")



If the  JSON is like below


{

  '_index': 'complaint-public-v2',

  '_type': 'complaint',

  '_id': '3230997',

  '_score': 0.0,

  '_source': {'tags': None,

   'zip_code': '49508',

   'complaint_id': '3230997',

   'issue': 'Managing an account',

   'date_received': '2019-05-01T12:00:00-05:00',

   'state': 'MI',

   'consumer_disputed': 'N/A',

   'product': 'Checking or savings account',

   'company_response': 'Closed with monetary relief',

   'company': 'JPMORGAN CHASE & CO.',

   'submitted_via': 'Referral',

   'date_sent_to_company': '2019-05-02T12:00:00-05:00',

   'company_public_response': None,

   'sub_product': 'Checking account',

   'timely': 'Yes',

   'complaint_what_happened': '',

   'sub_issue': 'Problem making or receiving payments',

   'consumer_consent_provided': 'N/A'}

}



df = pd.json_normalize(test_dict) 

df.columns


Index(['_index', '_type', '_id', '_score', '_source.tags', '_source.zip_code',

       '_source.complaint_id', '_source.issue', '_source.date_received',

       '_source.state', '_source.consumer_disputed', '_source.product',

       '_source.company_response', '_source.company', '_source.submitted_via',

       '_source.date_sent_to_company', '_source.company_public_response',

       '_source.sub_product', '_source.timely',

       '_source.complaint_what_happened', '_source.sub_issue',

       '_source.consumer_consent_provided'],

      dtype='object')


When load data frame using json_normalize, it becomes like this below 




references:

https://www.geeksforgeeks.org/python-pandas-flatten-nested-json/

No comments:

Post a Comment