96 L1.04: Section 3
Section 3: Using the sum of squared deviations to compute the standard deviation for a model
While the sum-of-squared-deviations indicator works very well as the target for Solver to minimize, it does not directly provide information about the typical deviation size because several terms are added together, and each term in the sum is squared. It would be useful to have a standardized goodness-of-fit indicator that did not depend on the number of deviations and was expressed in the same units as the data, model, and deviations. This would also be valid for comparing the quality of models with different numbers of parameters that are fit to the same dataset.
We can compute such a standardized indicator by computing an average squared deviation, then taking the square root of that value. The only difference from a regular average is that the divisor is made smaller by the number of parameters (i.e., the divisor is n−2 instead of n for a linear or exponential model, where n is the number of squared deviations in the sum).
The number computed in this way is the standard deviation of the data from the model. This is the same kind of indicator as was computed to measure the noise in repeated measurements — in that case the STDEV function adjusted the divisor to be n−1 because the deviation is from a single-parameter model, the average. For more complex models we have to make an explicit adjustment rather than using the STDEV function. Since we are already computing the sum of squared deviations, this is not difficult.
For the 8-value sediment data, for example, this computation could be accomplished by the formula “=SQRT(H8/6)”, since H8 already has been set to the sum of the 8 squared deviations from E3 to E10. For the 10-value population data, the formula to use would be “=SQRT(H8/8)”. For the earlier 7-value quadratic basketball-path data, the formula would be “=SQRT(H8/4)”, because a quadratic model has three parameters.
The adjustment to the divisor for the average is needed because if there are only two data points for a linear model, for example, then the model can go through them exactly even when there would be scatter if more points from the same measurement process were added. It is only when there are more points than the number of parameters that you know anything about what scatter can be expected around the model predictions.
Example 5: Modify the worksheets from Examples 3 and 4 so they also compute standard deviation.
Solution approaches (any of these will give correct answers):
[A] Dataset-specific-formulas — Place the formulas “=SQRT(H8/8)” or “=SQRT(H8/6)”, respectively for Examples 3 and 4, into an empty cell in the worksheet, such as H9. The disadvantage of this approach is that you are likely to forget to change the formula (which is not visible) when the worksheet is re-used with a different number of data points.
[B] Visible count totals — Set H10 to the number of squared deviations summed in forming H8 (this will be the number of data points unless you have excluded some as outliers). Then set H11 to the number of model parameters (e.g., 2 for linear & exponential, 3 for quadratic). Finally, set H12 to the formula “=SQRT(H8/(H10-H11))”. If you re-use this spreadsheet with a different dataset, all you have to change is the deviation count in H10 to the appropriate setting. The formula in H12 will not need to be changed.
[C] Automatic count totals — Put the formula “=COUNT(E3:E99)” into cell H10. This will evaluate to the number of cells in column E that have numeric values, indicating the presence of a deviation. Then follow approach B (set H11 to the number of parameters, and set H12 to the formula “=SQRT(H8/(H11-H12))”. (Warning: approach C will cause errors if there are any extraneous numbers in column E.)
Whichever approach is used, the respective standard deviation values, rounded to the usual two significant digits, are σ = 0.46 million people for Example 3 and σ = 4.49 mm for Example 4.
Here is what the census spreadsheet would look like after using approach B or C:
A | B | C | D | E | F | G | H | I | |
1 | x | data y | model y | Data-Model | Squared | Exponential model: y = a * (1+r)^x | |||
2 | Year-1780 | Population | Prediction | deviation | deviation | y = 3.108069 * (1.028937^x) | |||
3 | 0 | 2.8 | 3.108101 | -0.308101 | 0.094926 | 3.108069 | a: Initial value at x=0 | ||
4 | 10 | 3.9 | 4.134104 | -0.234104 | 0.054804 | 0.028937 | r: Growth rate | ||
5 | 20 | 5.3 | 5.498797 | -0.198797 | 0.039520 | ||||
6 | 30 | 7.2 | 7.313983 | -0.113983 | 0.012992 | ||||
7 | 40 | 9.6 | 9.728374 | -0.128374 | 0.016480 | Goodness of fit for these settings | |||
8 | 50 | 12.9 | 12.93977 | -0.039770 | 0.001582 | Sum of sq. dev. | 1.721981 | ||
9 | 60 | 17.1 | 17.21127 | -0.111268 | 0.012381 | ||||
10 | 70 | 23.2 | 22.89281 | 0.307187 | 0.094364 | # of dev. summed | 10 | ||
11 | 80 | 31.4 | 30.44987 | 0.950128 | 0.902744 | # of parameters | 2 | ||
12 | 90 | 39.8 | 40.50156 | -0.701561 | 0.492188 | standard deviation | 0.463948 |