The pandas_profiling library in Python include a method named as ProfileReport() which generate a basic report on the input DataFrame.
The report consist of the following:
DataFrame overview,
Each attribute on which DataFrame is defined,
Correlations between attributes (Pearson Correlation and Spearman Correlation), and
A sample of DataFrame.
pandas_profiling.ProfileReport(df, **kwargs)
bins int Number of bins in histogram. The default is 10.
check_correlation boolean Whether or not to check correlation. It’s `True` by default.
correlation_threshold float Threshold to determine if the variable pair is correlated. The default is 0.9.
correlation_overrides list Variable names not to be rejected because they are correlated. There is no variable in the list (`None`) by default.
check_recoded boolean Whether or not to check recoded correlation (memory heavy feature). Since it’s an expensive computation it can be activated for small datasets. `check_correlation` must be true to disable this check. It’s `False` by default.
pool_size int Number of workers in thread pool. The default is equal to the number of CPU.
References:
https://www.geeksforgeeks.org/pandas-profiling-in-python/
No comments:
Post a Comment