Saturday, December 24, 2022

What is Pandas Profiling

The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame. 


The report consist of the following:


DataFrame overview,

Each attribute on which DataFrame is defined,

Correlations between attributes (Pearson Correlation and Spearman Correlation), and

A sample of DataFrame.


pandas_profiling.ProfileReport(df, **kwargs)


bins int Number of bins in histogram. The default is 10.

check_correlation boolean Whether or not to check correlation. It’s `True` by default.

correlation_threshold float Threshold to determine if the variable pair is correlated. The default is 0.9.

correlation_overrides list Variable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.

check_recoded boolean Whether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.

pool_size int Number of workers in thread pool. The default is equal to the number of CPU.



References:

https://www.geeksforgeeks.org/pandas-profiling-in-python/

No comments:

Post a Comment