diff --git a/05-FittingModels.Rmd b/05-FittingModels.Rmd index 85f66d4..c510ae5 100644 --- a/05-FittingModels.Rmd +++ b/05-FittingModels.Rmd @@ -597,10 +597,10 @@ Once we have described the central tendency of the data, we often also want to d We have already encountered the sum of squared errors above, which is the basis for the most commonly used measures of variability: the *variance* and the *standard deviation*. The variance for a population (referred to as $\sigma^2$) is simply the sum of squared errors divided by the number of observations - that is, it is exactly the same as the *mean squared error* that you encountered earlier: $$ -\sigma^2 = \frac{SSE}{N} = \frac{\sum_{i=1}^n (x_i - \mu)^2}{N} +\sigma^2 = \frac{SSE}{N} = \frac{\sum_{i=1}^N (x_i - \mu)^2}{N} $$ -where $\mu$ is the population mean. The population standard deviation is simply the square root of this -- that is, the *root mean squared error* that we saw before. The standard deviation is useful because the errors are in the same units as the original data (undoing the squaring that we applied to the errors). +where $\mu$ is the population mean and $N$ ist the entire population. The population standard deviation is simply the square root of this -- that is, the *root mean squared error* that we saw before. The standard deviation is useful because the errors are in the same units as the original data (undoing the squaring that we applied to the errors). We usually don't have access to the entire population, so we have to compute the variance using a sample, which we refer to as $\hat{\sigma}^2$, with the "hat" representing the fact that this is an estimate based on a sample. The equation for $\hat{\sigma}^2$ is similar to the one for $\sigma^2$: