Editor’s note: George Butler provides modeling and scoring services through Iona Investment Corp., Redwood City, Calif.
If you think of regression modeling as unfathomable or if you had a hard time with high school algebra, this article is for you. For the others, it couldn’t hurt.
Indulge me for a bit and imagine that you are given a database containing the age and income of each resident in a certain neighborhood. Your boss requests that you use this data to come up with a model for that neighborhood to estimate someone’s income using their age as a predictor. An urgent call goes out for stalwart statistical help in the form of a certain Dr. Sigma over at Information Systems. Fortune smiles, the doctor is in. Doc Sigma wisely assures himself that there are no extreme values of income in the data to warp the analysis. Then he works his magic and presents you with a bona fide mathematical model: “Multiply the age in years by 971.4 and add 1536.2 and you get annual income in dollars. That’s your model and it’s optimal.”
You are duly grateful to Dr. Sigma and get to work on a report for your boss. You use the formula to graph income vertically versus age horizontally and admire the economy of this rule relating age to income. It’s a straight line - and an optimal one, at that. The glow dims somewhat when you see that the model estimates the income of 18-year-olds to be $19,021. (These youngsters should still be doing homework, not racking up that kind of dough.) The luster vanishes completely when you see that the estimated income for 70-year-olds is $69,534 and that each additional year of survival means an automatic $971 boost (hardly accounted for by Social Security cost-of-living adjustments).
Why is Sigma’s formula fishy? Because it’s a poor model. How could it be a poor model when it is “optimal?” It is optimal only if Sigma’s assumption about the shape of the model is correct. He a...