Friday, June 3, 2022

AI/ML: Using dataframe to group and summarise

data.groupby(['month']).groups.keys()

Out[59]: ['2014-12', '2014-11', '2015-02', '2015-03', '2015-01']

len(data.groupby(['month']).groups['2014-11'])

Out[61]: 230


data.groupby('month').first()

==> This gives first row of each month 


data.groupby('month')['duration'].sum()

===> This gives sum by each month 


data.groupby('month')['date'].count() 

===> This gives entries in each month 



data.groupby('month')['duration'].sum() 

===> produces Pandas Series


data.groupby('month')[['duration']].sum()

===> Produces Pandas DataFrame



data.groupby('month', as_index=False).agg({"duration": "sum"})

===> The groupby output will have an index or multi-index on rows corresponding to your chosen grouping variables. To avoid setting this index, pass “as_index=False” to the groupby operation.


    

df_analyze.groupby(['week_label','evt']).agg({'evt':'count','t' : 'sum'})

===> This is powerful to give thhe evt filed as count and time as sum if we are using agg function 



agg_procedure = {

    'evt':'count',

    't' : 'sum'

}

df_analyze.groupby(['week_label','evt']).agg(agg_procedure)


===> This above is an equivalent of the corresponding above, just that defined as a procedure   



df_analyze.groupby(['week_label','evt']).agg({

    'dt' : ['min','max', 'sum'],

    'evt' : 'count',

    't' : ['min', 'first', 'nunique']

})




references:

https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/


No comments:

Post a Comment