Model Validation
Prediction error: error = actual - predicted
Mean Absolute Error (MAE): Average of | error |
Validation Data
when new data is introduced to model, the model's predictions will not be accurate
since model's practical value comes from predicting new data, we measure performance on new data that was not used in model
therefore, when building model, you exclude some data and use it as test data
test data = validation data
Splitting
groups = 2^(# of splits)
Overfitting
model matches training data almost perfectly
model does poorly in validating new data
Underfitting
fails to capture patterns in data - does poorly even in training data
Optimizing model
Try to get the spot between the underfitting curve and the overfitting curve
max_leaf_nodes
controls thismore leaves = model moving closer to overfitting than underfitting
Can use for loop to see what number of max_left_nodes is optimal
From the example, 500 max leaf nodes is optimal
Last updated