Wednesday, June 8, 2022

Pandas Dataframe tips

=====

 df_appmtns.apply(filter_date, axis=1) 

Here axis is a very important argument because otherwise which it will be giving a series of first column values. if passed as 1, it will give the row by row values 

===== 

When a date object is stored into a dataframe as datetime, it get stored as pandas object type 

dtVal = parser.parse(<date_str>) 

when df['dateval'] = dtVal; 

and when retrieving the type of the date field becomes 

<class 'pandas._libs.tslibs.timestamps.Timestamp'>

So, when trying to filter things based on datetime object, it has to be first converted to datetime and then filter out 

It may be easier to store as millis value for that purpose. 

but a tslibs timestamp can be converted to datetime using below 
bookTs.to_pydatetime()

===================

Inorder to iterate a dataframe row and update one of its column, below can be done

for i, row in df_appmtns.iterrows():
    df_appmtns.at[i,'t'] = row['bookedAt']
    df_appmtns.at[i,'week_label'] = get_week_index(row['bookedAt'])

if we just do row['week_label'] = 'value' , this does not work 

=========================
To append two data frames one below other, below can be used

frames = [df_appmtns,df_data]
df_analyze = pd.concat(frames)


=================================

Dataframe dropping a column
df_data.drop('Id', inplace=True, axis=1)

===================================

DataFrame get all numerical columns 

def is_int_dtype(_type):
    return (_type == np.int64 or _type == np.float64) 

numeric_cols = [col for col in df_data.columns if is_int_dtype(df_data[col].dtype)]

====================================

Dataframe how to append to the end of dataframe 
df.loc[len(df)] = [line, target]

No comments:

Post a Comment