Model Cost Function
Last updated
Last updated
Goal: To make the hypothesis/prediction, , a good predictor when is the input and is the output. We essentially want (Want the prediction to equal the answer/output)
When is continuous → Regression problem
When is discrete → Classification problem
Hypothesis:
are called parameters
Q: How do we choose these parameters to make close/equal to ?
A: Minimization problem! We want to minimize this: . If this difference is 0, then the two values are equal, thus a perfect prediction. So we have to find the values that minimize this expression. We can expand this idea to a quadratic function called a mean squared error function, a type of cost function:
is the cost function. We want to minimize this function's output. Remember these parameters are represented in the hypothesis equation above.
Common cost function for linear regression problems
Note: just makes the math for gradient descent easier for the moment and represents # of training samples.
Example:
On the left, we have the hypothesis graphed. We can see that when , the line fits the training data perfectly.
On the right, we have that the cost function, when
Note: We have made for simplicity's sake, if we were to graph we would have a multivariable graph and thus - a 3D graph (yay for Math 237)
The cost function is formed from graphing the mean squared error function from above which depends on the hypothesis (equation above).
Conclusion: we can see that the value of that minimizes the cost function is the same value of the slope of the line that is the closest the training data. Our goal of fitting closely to the training data has been achieved!
The cost function this time is represented using two variables and plotted using contour plots (remember in Math 237, the contours are kind of coming out of the page to create a 3D shape).
You can see that as you get closer to the minimum, similarly in 2D, becomes a better predictor. In other words, its line of best fit is closer to the training data. Yay!