The lower the RMSE, the better a given model is able to “fit” a dataset. However, the range of the dataset you’re working with is important in determining whether or not a given RMSE value is “low” or not.
For example, consider the following scenarios:
Scenario 1: We would like to use a regression model predict the price of homes in a certain city. Suppose the model has an RMSE value of $500. Since the typical range of houses prices is between $70,000 and $300,000, this RMSE value is extremely low. This tells us that the model is able to predict house prices accurately.
Scenario 2: Now suppose we would like to use a regression model to predict how much someone will spend per month in a certain city. Suppose the model has an RMSE value of $500. If the typical range of monthly spending is $1,500 – $4,000, this RMSE value is quite high. This tells us that the model is not able to predict monthly spending very accurately.
Normalizing the RMSE Value
One way to gain a better understanding of whether a certain RMSE value is “good” is to normalize it using the following formula:
Normalized RMSE = RMSE / (max value – min value)
This produces a value between 0 and 1, where values closer to 0 represent better fitting models.
For example, suppose our RMSE value is $500 and our range of values is between $70,000 and $300,000. We would calculate the normalized RMSE value as:
Normalized RMSE = $500 / ($300,000 – $70,000) = 0.002
Conversely, suppose our RMSE value is $500 and our range of values is between $1,500 and $4,000. We would calculate the normalized RMSE value as:
Normalized RMSE = $500 / ($4,000 – $1,500) = 0.2.
The first normalized RMSE value is much lower, which indicates that it provides a much better fit to the data compared to the second normalized RMSE value.
No comments:
Post a Comment