Editor's note: Scott MacStravic, Ph.D., is vice president, marketing/strategy with Provenant Health Partners, Denver.
Almost all the customer surveys that are conducted in health care use scales. Such scales typically (and arbitrarily) assign numerical values to verbal options. Common examples include:
- Agree/disagree scale:
1 = strongly disagree
2 = disagree
3 = neutral or unsure
4 = agree
5 = strongly agree - Satisfaction scales
1= very dissatisfied
2 = mostly dissatisfied
3 = somewhat dissatisfied
4 = neither satisfied nor dissatisfied
5 = somewhat satisfied
6 = mostly satisfied
7 = very satisfied - Quality rating scale
1 = terrible
2 = poor
3 = okay
4 = good
5 = excellent
In summing up such scales, common practice is to calculate mean scores from a frequency distribution of responses, then compared to a "perfect" score. For example, the following frequency distribution would be scored:
Score | x | Frequency | = | Points |
1 | x | 3 | = | 3 points |
2 | x | 7 | = | 14 points |
3 | x | 24 | = | 72 points |
4 | x | 30 | = | 120 points |
5 | x | 36 | = | 180 points |
Total | 389 points |
It has a cumulative value of 389 points and a mean of 3.89, given 100 responses. A perfect score would yield (5 x 100 =) 500 points, so the standardized score for this distribution would be 389 divided by 500 = 77.8 percent. By standardizing, we are able to compare results to scores based on other scales, involving a variety of scale points and verbal options.
A long-recognized problem with verbal scales, even when there is clearly an underlying ordinality to the words, is the arbitrary nature of numerical values assigned to those words. Who is to say that "strongly agree" is the same psychological distance from "agree" as "agree" is from "neutral" or "unsure"? Is "very dissatisfied" as far from "mostly dissatisfied" as the latter is from "neither satisfied nor dissatisfied"?
While the numbers assigned to verbal choices in such scales respect the underlying ordinality of those choices, they may not reflect the "true" distance between them. Yet some kind of numerical value must be assigned to each verbal choice to permit summarizing of responses, comparing one set of frequencies to another, calculating simple and complex correlations or deciding how close a set of responses comes to "perfection."
Another problem with scaling is that different numerical values produce different summary values, therefore different perceptions as to how "good" a given frequency distribution may be. In the five-point example above, a plus-and-minus scoring system might be used instead of one-to-five, with values as follows:
Score | Frequency | |||
-2 | x | 3 | = | -6 points |
-1 | x | 7 | = | -7 points |
0 | x | 24 | = | 0 points |
+1 | x | 30 | = | 30 points |
+2 | x | 36 | = | 72 points |
100 | 89 points |
The same distribution of answers as before would produce a calculated total of 89 points. With a potential "perfect" score of (100 x +2 =) 200, given 100 responses, the standardized score on such a scale would be only 44.5 percent, vastly different from the 77.8 percent derived from the one-to-five scaling.
Even a slight modification in the assignment of numbers can have an impact on the normalized score, and therefore on the impression that such a score gives as to whether the results are good, bad or indifferent. If instead of scaling one to five, we used a zero-to-four scale, for example, the same distribution would yield:
Score | Frequency | |||
0 | x | 3 | = | 0 points |
1 | x | 7 | = | 7 points |
2 | x | 24 | = | 48 points |
3 | x | 30 | = | 90 points |
4 | x | 36 | = | 144 points |
100 | 289 points |
With a "perfect" score of (100 x 4 =) 400, these results would produce a normalized score of 289 divided by 400 = 72.3 percent, different enough from the 77.8 percent derived from the one-to-five scale to suggest a result closer to a "C" grade rather than a "B."
If we were contriving to ensure good results on a normalized score, we could easily "game the system" by assigning a different set of numbers to the results:
Score | Frequency | |||
96 | x | 3 | = | 288 points |
97 | x | 7 | = | 679 points |
98 | x | 24 | = | 2,352 points |
99 | x | 30 | = | 2,970 points |
100 | x | 36 | = | 3,600 points |
9,889 points |
With a total of 9,889 points against a "perfect" score of 10,000, the normalized score from this distribution would be 98.9 percent, apparently an excellent result.
Clearly, when different, arbitrary (if commonplace) assignment of numbers to a set of verbal answers can yield results as disparate as 44.5 percent and 77.8 percent (even 98.9 percent, if we're clever) from the same distribution of responses, such numerical scaling warrants some attention. Is there any way to come up with the "right" numbers for a given set of verbal answers?
An alternative to the arbitrary assignment of numerical values is to link the pattern of answers to something of value where each answer has a calculated link to that value. For example, in our community surveys, we have found that preference for our hospitals (intent to go there should need for hospital care arise) is strongly influenced by self-reported familiarity with the institution.
In cross tabulating familiarity levels and preference, we found the following distribution (results slight modified to simplify calculations):
Familiarity scale | % Preferring hosptials |
completely familiar | 60 |
mostly familiar | 25 |
somewhat familiar | 10 |
mostly familiar | 5 |
completely familiar | 0 |
Since the preference percentages are all evenly divisible by five, we can translate this pattern into a scale where
completely familiar = 12
mostly familiar = 5
somewhat familiar = 2
mostly unfamiliar = 1
completely unfamiliar = 0
In this case, the scale scores reflect the relative value of each nominal response in terms of its link to preference. The normalization of the previously analyzed results on such a scale would be:
Response | Score | Frequency | Points | ||
completely familiar | = 12 | x | 36 | = | 432 |
mostly familiar | = 5 | x | 30 | = | 150 |
somewhat familiar | = 2 | x | 24 | = | 48 |
mostlyunfamiliar | = 1 | x | 7 | = | 7 |
completely unfamiliar | = 0 | x | 3 | = | 0 |
Total | 637 |
Similarly, in our survey of patient satisfaction we have found that the relationship between satisfaction scale choices and intention to choose the hospital again is roughly as follows:
Response % Preferring Same Hospital
more than dissatisfied 0
mostly dissatisfied 0
somewhat dissatisfied 10
neutral 25
somewhat satisfied 40
mostly satisfied 50
more than satisfied 80
These results translate into:
Scale Score
more than dissatisfied 0
mostly dissatisfied 0
somewhat dissatisfied 2
neutral 5
somewhat satisfied 8
mostly satisfied 10
more than satisfied 16
A distribution such as:
Scale | Points | x | Frequency | = | Total points |
more than dissatisfied | 0 | x | 2 | = | 0 |
mostly dissatisfied | 0 | x | 5 | = | 0 |
somewhat dissatisfied | 2 | x | 11 | = | 22 |
neutral | 5 | x | 20 | = | 100 |
somewhat satisfied | 8 | x | 10 | = | 80 |
mostly satisfied | 10 | x | 17 | = | 170 |
more than satisfied | 16 | x | 35 | = | 560 |
Produces a total score of: | 932 |
With 897 points out of a potential 1,500 maximum, the normalized score would be 932 divided by 1,500 = 62.1 percent. While it might seem at least odd to have two nominal options assigned the same numerical score, it makes sense in light of links with preference. If patients who report themselves to be "mostly dissatisfied" are no more likely to choose the hospital in the future than are patients who report themselves to be "more than dissatisfied," then the hospital should take little pleasure in noting an "upward" shift between these two scale points.
Where scales show a substantially higher value for the top verbal choices than for the lower choices, as was the case with both the above examples, the effect on normalized scores will generally be to dampen them as compared to common, arbitrary scale numbers. A simple one-to-five scale for the familiarity distribution would have yielded a normalized score of 77.8 percent where the value-derived point scale shows only 53.1 percent. An arbitrary one-to-seven scale for the patient satisfaction distribution would have produced a normalized score of 522 divided by 700 = 74.6 percent instead of the 62.1 percent compiled with the value-based scale scores.
By contrast, if top levels of nominal choices are linked to only marginally different practical values, the value-based scales would produce normalized scores that are higher than those resulting from arbitrary numerical scales. Should a value-based connection between hospital food ratings and preference be something like:
Rating of Food % Preferring Hospital
terrible 40
poor 50
fair 60
good 65
very good 70
outstanding 75
It would translate into the following scale (with hypothetical frequencies for illustration):
Score | x | Frequency | = | Points | |
terrible | 8 | 10 | 80 | ||
poor | 10 | 15 | 150 | ||
fair | 12 | 20 | 240 | ||
good | 13 | 15 | 195 | ||
very good | 14 | 20 | 280 | ||
outstanding | 15 | 20 | 300 | ||
100 | 1,245 |
A simple frequency such as the above translates into a normalized score of 1,245 divided by 1,500 = 83.0 percent where the same frequency would produce, on an arbitrary one-to-six scale, a score of only 52.7.
A side benefit of a value-based scaling system is that it will tend to show where the hospital is likely to see the greatest impact from improving survey results. The fact that food scores are linked to such modest differences in preference suggests that not as much can be accomplished by improving such scores as can through improving scores on overall satisfaction or familiarity. The normalized scores reflecting value-based scales are likely to offer a significantly more accurate and meaningful message than scores based on arbitrary numerical values, such as are most commonly used.
Anyone considering use of a value-based scaling approach will have to first decide what value to link verbal responses to. Preference has a strong market value, though within a survey, links to overall satisfaction or perceived quality may function as well. Once the value is selected, the cross tabulated relationship between answers to a question of interest and that value, will provide the basis for determining value-based scale numbers, as in the above examples.
This approach does not pretend to answer the question of how to assign numerical values to verbal responses once and for all. It does, however, offer an improvement over conventionally arbitrary assignment methods.