Data Use: Scale scoring in health care customer surveys

Abstract

Almost all health care customer surveys use scales. This article discusses problems with using scales and the benefits of using an alternative system: value-based scaling.

Listen to this article

Editor's note: Scott MacStravic, Ph.D., is vice president, marketing/strategy with Provenant Health Partners, Denver.

Almost all the customer surveys that are conducted in health care use scales. Such scales typically (and arbitrarily) assign numerical values to verbal options. Common examples include:

Agree/disagree scale:
1 = strongly disagree
2 = disagree
3 = neutral or unsure
4 = agree
5 = strongly agree
Satisfaction scales
1= very dissatisfied
2 = mostly dissatisfied
3 = somewhat dissatisfied
4 = neither satisfied nor dissatisfied
5 = somewhat satisfied
6 = mostly satisfied
7 = very satisfied
Quality rating scale
1 = terrible
2 = poor
3 = okay
4 = good
5 = excellent

In summing up such scales, common practice is to calculate mean scores from a frequency distribution of responses, then compared to a "perfect" score. For example, the following frequency distribution would be scored:

Score	x	Frequency	=	Points
1	x	3	=	3 points
2	x	7	=	14 points
3	x	24	=	72 points
4	x	30	=	120 points
5	x	36	=	180 points
	Total			389 points

It has a cumulative value of 389 points and a mean of 3.89, given 100 responses. A perfect score would yield (5 x 100 =) 500 points, so the standardized score for this distribution would be 389 divided by 500 = 77.8 percent. By standardizing, we are able to compare results to scores based on other scales, involving a variety of scale points and verbal options.

A long-recognized problem with verbal scales, even when there is clearly an underlying ordinality to the words, is the arbitrary nature of numerical values assigned to those words. Who is to say that "strongly agree" is the same psychological distance from "agree" as "agree" is from "neutral" or "unsure"? Is "very dissatisfied" as far from "mostly dissatisfied" as the latter is from "neither satisfied nor dissatisfied"?
While the numbers assigned to verbal choices in such scales respect the underlying ordinality of those choices, they may not reflect the "true" distance between them. Yet some kind of numerical value must be assigned to each verbal choice to permit summarizing of responses, comparing one set of frequencies to another, calculating simple and complex correlations or deciding how close a set of responses comes to "perfection."

Another problem with scaling is that different numerical values produce different summary values, therefore different perceptions as to how "good" a given frequency distribution may be. In the five-point example above, a plus-and-minus scoring system might be used instead of one-to-five, with values as follows:

Score		Frequency
-2	x	3	=	-6 points
-1	x	7	=	-7 points
0	x	24	=	0 points
+1	x	30	=	30 points
+2	x	36	=	72 points
		100		89 points

The same distribution of answers as before would produce a calculated total of 89 points. With a potential "perfect" score of (100 x +2 =) 200, given 100 responses, the standardized score on such a scale would be only 44.5 percent, vastly different from the 77.8 percent derived from the one-to-five scaling.
Even a slight modification in the assignment of numbers can have an impact on the normalized score, and therefore on the impression that such a score gives as to whether the results are good, bad or indifferent. If instead of scaling one to five, we used a zero-to-four scale, for example, the same distribution would yield:

Score		Frequency
0	x	3	=	0 points
1	x	7	=	7 points
2	x	24	=	48 points
3	x	30	=	90 points
4	x	36	=	144 points
		100		289 points

With a "perfect" score of (100 x 4 =) 400, these results would produce a normalized score of 289 divided by 400 = 72.3 percent, different enough from the 77.8 percent derived from the one-to-five scale to suggest a result closer to a "C" grade rather than a "B."

If we were contriving to ensure good results on a normalized score, we could easily "game the system" by assigning a different set of numbers to the results:

Score		Frequency
96	x	3	=	288 points
97	x	7	=	679 points
98	x	24	=	2,352 points
99	x	30	=	2,970 points
100	x	36	=	3,600 points
				9,889 points

With a total of 9,889 points against a "perfect" score of 10,000, the normalized score from this distribution would be 98.9 percent, apparently an excellent result.

Clearly, when different, arbitrary (if commonplace) assignment of numbers to a set of verbal answers can yield results as disparate as 44.5 percent and 77.8 percent (even 98.9 percent, if we're clever) from the same distribution of responses, such numerical scaling warrants some attention. Is there any way to come up with the "right" numbers for a given set of verbal answers?

An alternative to the arbitrary assignment of numerical values is to link the pattern of answers to something of value where each answer has a calculated link to that value. For example, in our community surveys, we have found that preference for our hospitals (intent to go there should need for hospital care arise) is strongly influenced by self-reported familiarity with the institution.

In cross tabulating familiarity levels and preference, we found the following distribution (results slight modified to simplify calculations):

Familiarity scale	% Preferring hosptials
completely familiar	60
mostly familiar	25
somewhat familiar	10
mostly familiar	5
completely familiar	0

Since the preference percentages are all evenly divisible by five, we can translate this pattern into a scale where

completely familiar       = 12
mostly familiar            = 5
somewhat familiar       = 2
mostly unfamiliar         = 1
completely unfamiliar   =   0

In this case, the scale scores reflect the relative value of each nominal response in terms of its link to preference. The normalization of the previously analyzed results on such a scale would be:

Response	Score		Frequency		Points
completely familiar	= 12	x	36	=	432
mostly familiar	= 5	x	30	=	150
somewhat familiar	= 2	x	24	=	48
mostlyunfamiliar	= 1	x	7	=	7
completely unfamiliar	= 0	x	3	=	0

	Total				637

With a perfect score of (100 x 12=) 1,200, the normalized results would be 637 divided by 1,200 = 53.1 percent, or just a little more than halfway toward a "perfect" score. Such a score accurately indicates that only about half the potential value of public preference has been realized, based on the distribution of familiarity responses.

Similarly, in our survey of patient satisfaction we have found that the relationship between satisfaction scale choices and intention to choose the hospital again is roughly as follows:

Response                      % Preferring Same Hospital
more than dissatisfied                       0
mostly dissatisfied                            0
somewhat dissatisfied                       10
neutral                                           25
somewhat satisfied                           40
mostly satisfied                                50
more than satisfied                           80

These results translate into:

                                                 Scale Score
more than dissatisfied                        0
mostly dissatisfied                             0
somewhat dissatisfied                        2
neutral                                            5
somewhat satisfied                            8
mostly satisfied                                 10
more than satisfied                            16

A distribution such as:

Scale	Points	x	Frequency	=	Total points
more than dissatisfied	0	x	2	=	0
mostly dissatisfied	0	x	5	=	0
somewhat dissatisfied	2	x	11	=	22
neutral	5	x	20	=	100
somewhat satisfied	8	x	10	=	80
mostly satisfied	10	x	17	=	170
more than satisfied	16	x	35	=	560

Produces a total score of:					932

With 897 points out of a potential 1,500 maximum, the normalized score would be 932 divided by 1,500 = 62.1 percent. While it might seem at least odd to have two nominal options assigned the same numerical score, it makes sense in light of links with preference. If patients who report themselves to be "mostly dissatisfied" are no more likely to choose the hospital in the future than are patients who report themselves to be "more than dissatisfied," then the hospital should take little pleasure in noting an "upward" shift between these two scale points.

Where scales show a substantially higher value for the top verbal choices than for the lower choices, as was the case with both the above examples, the effect on normalized scores will generally be to dampen them as compared to common, arbitrary scale numbers. A simple one-to-five scale for the familiarity distribution would have yielded a normalized score of 77.8 percent where the value-derived point scale shows only 53.1 percent. An arbitrary one-to-seven scale for the patient satisfaction distribution would have produced a normalized score of 522 divided by 700 = 74.6 percent instead of the 62.1 percent compiled with the value-based scale scores.

By contrast, if top levels of nominal choices are linked to only marginally different practical values, the value-based scales would produce normalized scores that are higher than those resulting from arbitrary numerical scales. Should a value-based connection between hospital food ratings and preference be something like:

Rating of Food           % Preferring Hospital
terrible                      40
poor                         50
fair                           60
good                         65
very good                  70
outstanding                75

It would translate into the following scale (with hypothetical frequencies for illustration):

	Score	x	Frequency	=	Points
terrible	8		10		80
poor	10		15		150
fair	12		20		240
good	13		15		195
very good	14		20		280
outstanding	15		20		300
			100		1,245

A simple frequency such as the above translates into a normalized score of 1,245 divided by 1,500 = 83.0 percent where the same frequency would produce, on an arbitrary one-to-six scale, a score of only 52.7.
A side benefit of a value-based scaling system is that it will tend to show where the hospital is likely to see the greatest impact from improving survey results. The fact that food scores are linked to such modest differences in preference suggests that not as much can be accomplished by improving such scores as can through improving scores on overall satisfaction or familiarity. The normalized scores reflecting value-based scales are likely to offer a significantly more accurate and meaningful message than scores based on arbitrary numerical values, such as are most commonly used.

Anyone considering use of a value-based scaling approach will have to first decide what value to link verbal responses to. Preference has a strong market value, though within a survey, links to overall satisfaction or perceived quality may function as well. Once the value is selected, the cross tabulated relationship between answers to a question of interest and that value, will provide the basis for determining value-based scale numbers, as in the above examples.

This approach does not pretend to answer the question of how to assign numerical values to verbal responses once and for all. It does, however, offer an improvement over conventionally arbitrary assignment methods.