# Model Cost Function

## Formal Supervised Learning Model Representation

**Goal**: To make the hypothesis/prediction, $$h$$, a good predictor when $$x$$ is the input and $$y$$ is the output. We essentially want $$h(x)=y$$ (Want the prediction to equal the answer/output)

* When $$y$$ is continuous → Regression problem
* When $$y$$ is discrete → Classification problem

![Supervised Learning Process](https://868646840-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LztfBhQUrZzyA7O_ZkJ%2Fuploads%2Fgit-blob-0528b4f9f65bb8b0475c1de33f5f9dcd4450bab2%2Fflow.png?alt=media)

## Cost Function for Linear Regression

**Hypothesis**: $$h\_\theta(x) = \theta\_0+\theta\_1x$$

* $$\theta\_i$$ are called parameters

Q: How do we choose these parameters to make $$h$$ close/equal to $$y$$?

A: Minimization problem! We want to minimize this: $$h(x\_i)-y\_i$$. If this difference is 0, then the two values are equal, thus a perfect prediction. So we have to find the $$\theta\_i$$ values that minimize this expression. We can expand this idea to a quadratic function called a mean squared error function, a type of cost function:

#### Mean Squared Error Function: $$J(\theta\_0, \theta\_1) = \frac{1}{2m}\Sigma(h\_\theta(x\_i)-y\_i)^2$$

* $$J(\theta\_0, \theta\_1)$$ is the cost function. We want to minimize this function's output. Remember these parameters are represented in the hypothesis equation above.
* Common cost function for **linear regression** problems

**Note**: $$\frac{1}{2}$$ just makes the math for gradient descent easier for the moment and $$m$$ represents # of training samples.

**Example**:

![Minimizing Cost Function Example](https://868646840-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LztfBhQUrZzyA7O_ZkJ%2Fuploads%2Fgit-blob-43c72bfd7433f99b9cb647bfe2d235027a0d645c%2Fcost_function.png?alt=media)

* On the left, we have the hypothesis graphed. We can see that when $$\theta\_1=1$$, the line fits the training data perfectly.
* On the right, we have that the cost function, $$J(\theta\_1)=0$$ when $$\theta\_1=1$$
  * **Note**: We have made $$\theta\_0=0$$ for simplicity's sake, if we were to graph $$J(\theta\_0, \theta\_1)$$ we would have a multivariable graph and thus - a 3D graph (yay for Math 237)
  * The cost function is formed from graphing the mean squared error function from above which depends on the hypothesis (equation above).
  * **Conclusion**: we can see that the value of $$\theta\_1$$ that minimizes the cost function is the same value of the slope of the line that is the closest the training data. Our goal of fitting closely to the training data has been achieved!

## Multivariable Cost Function for Linear Regression

![](https://868646840-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LztfBhQUrZzyA7O_ZkJ%2Fuploads%2Fgit-blob-636e7dee26412e5a65d39de79c343af14e28cd3f%2Fmulti_cost_func_bad.png?alt=media)

![](https://868646840-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LztfBhQUrZzyA7O_ZkJ%2Fuploads%2Fgit-blob-e8b6ac6f191de4b5124821a6362f46e0eb102f4f%2Fmulti_cost_func_flat.png?alt=media)

![](https://868646840-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LztfBhQUrZzyA7O_ZkJ%2Fuploads%2Fgit-blob-1b63b0a8776f1f2a322f018a53cddaf17c46b51b%2Fmulti_cost_func_good.png?alt=media)

The cost function this time is represented using two variables and plotted using contour plots (remember in Math 237, the contours are kind of coming out of the page to create a 3D shape).

You can see that as you get closer to the minimum, similarly in 2D, $$h$$ becomes a better predictor. In other words, its line of best fit is closer to the training data. Yay!
