data.groupby(['month']).groups.keys()
Out[59]: ['2014-12', '2014-11', '2015-02', '2015-03', '2015-01']
len(data.groupby(['month']).groups['2014-11'])
Out[61]: 230
data.groupby('month').first()
==> This gives first row of each month
data.groupby('month')['duration'].sum()
===> This gives sum by each month
data.groupby('month')['date'].count()
===> This gives entries in each month
data.groupby('month')['duration'].sum()
===> produces Pandas Series
data.groupby('month')[['duration']].sum()
===> Produces Pandas DataFrame
data.groupby('month', as_index=False).agg({"duration": "sum"})
===> The groupby output will have an index or multi-index on rows corresponding to your chosen grouping variables. To avoid setting this index, pass “as_index=False” to the groupby operation.
df_analyze.groupby(['week_label','evt']).agg({'evt':'count','t' : 'sum'})
===> This is powerful to give thhe evt filed as count and time as sum if we are using agg function
agg_procedure = {
'evt':'count',
't' : 'sum'
}
df_analyze.groupby(['week_label','evt']).agg(agg_procedure)
===> This above is an equivalent of the corresponding above, just that defined as a procedure
df_analyze.groupby(['week_label','evt']).agg({
'dt' : ['min','max', 'sum'],
'evt' : 'count',
't' : ['min', 'first', 'nunique']
})
references:
https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/
No comments:
Post a Comment