Editor's note: Based in Chesterton, Ind., Keith Chrzan is senior vice president of Sawtooth Analytics, Sawtooth Software Inc., Orem, Utah. He can be reached at 219-921-9215 or at keith@sawtoothsoftware.com.

Maximum difference scaling (max-diff) uses an experimentally-designed set of conjoint-like choice questions to put a set of items on a common scale (Finn and Louviere 1992). For example, in a study where we want to understand how much respondents would like 20 different activities we might ask several questions like those shown in Figure 1. Each question would contain a different subset of the 20 activities and across a given respondent’s set of questions each activity will appear three or four times.

Marketing researchers primarily use max-diff scaling in two ways. One involves measuring the relative importance of various attributes (some recent examples include valuing the attributes of a retail store; the claims that could be made about a new drug; aspects of a casino’s loyalty program; the properties of a type of clothing; the customer experiences at a casual-dining restaurant and so on). Researchers also rely on max-diff for scaling the relative appeal of items in a set – of products (portfolio optimization), product concepts (concept screening) or of colors, flavors or styles (product-line optimization).

Less commonly, max-diff has been used as a general psychometric measurement tool to replace traditional rating scales. For example, Sa Lucas (2004) used max-diff to measure the perceived severity of crimes while Lee, Louviere and Soutar (2008) used max-diff to replace ratings in the Schwartz Value Survey (Schwartz 1992). In both cases the authors found that max-diff measured the intended constructs well and that external variables validated the max-diff measures of the constructs.

A customer’s personality may affect how she makes decisions, what kinds of information she seeks, which brands she trusts and so on. So measuring personality has been of interest to marketers. The following sections briefly review theories of personality, methods of personality measurement and an empirical test of a max-diff tool for quantifying the popular five-factor model (FFM) of personality.

Personality theories and measurements

Temperament theories and trait theories both have their proponents. Temperament theories describe people as belonging to one of several different personality types. Hippocrates’ ancient theory of four temperaments (sanguine, choleric, melancholic and phlegmatic) provides an early example and many other examples appear in Bryce’s (2002) book. The Myers-Briggs Type Indicator is another example, one wherein one’s high/low score on each of four basic psychological functions qualifies a person into one of 16 personality types (INTJ, for example, for someone with high scores on being introverted, intuitive, thinking and judging).

Trait theories, on the other hand, identify dimensions that may be used to characterize personality without suggesting that people fall into any number of discrete personality types. Francis Galton suggested in 1884 that, because people are one of our favorite topics of conversation, language could be mined to identify personality traits. Allport and Odbert (1936) took Galton’s suggestion and categorized 4,500 personality descriptors from the dictionary into a hierarchy of personality traits. More rigorously, Cattell (1965) and Eysenck (1995) used factor analysis of respondents’ self-ratings on long lists of personality descriptors to disagree on whether there are, respectively, 16 or just two or three latent personality factors. Many other researchers developed their own lists of traits but over the years a consensus across a large number of studies seems to have coalesced around the five-factor model (Costa and McCrae 1985). As Digman (1990) notes in a review, FFM appears to be the trait theory with the most empirical support.

FFM’s five factors are commonly abbreviated as OCEAN: openness to experience; conscientiousness; extraversion; agreeableness; and neuroticism.

The FFM also posits more specific personality traits nesting beneath each of these factors, providing more granularity in understanding personality differences among individual people.

An FFM measurement instrument typically uses a rating scale with verbal anchors about level of agreement or about the extent to which an item describes a person. Longer versions of an FFM questionnaire might have 40 or more items (eight or more per factor) while shorter versions might have as few as 10 items, two per factor. Some versions of the questionnaire reverse some items for each factor to counteract respondents’ inclination to use one side of the rating scale or the other.

Empirical study

To test the viability of using max-diff measurement to support the FFM we compare a rating scale version of FFM questions to a max-diff version. The ratings version features 20 items, four per factor, and has respondents indicate how accurately each item describes them, using a five-point rating scale. The max-diff version of the FFM instrument uses 15 max-diff questions per respondent, each with four items (one item each from four of the five factors). Figures 2 and 3 show screen shots of a portion of the ratings grid and of a sample max-diff question.


In designing the specific items we had to balance between having sets that were completely identical for ratings and max-diff with the fact that reversed items may help rating scale measures but may seem confusing in a max-diff context. Thus six of the 20 items were reversed in the ratings questions but not in the max-diff questions. The specific wording for the 20 items were as shown in Figure 4.

Each respondent completed both the 20 ratings questions and the 15 max-diff questions, with half of respondents answering the ratings first and half answering the max-diff questions first. For both tasks, items appeared in a random order across respondents. The survey fielded in July 2014 with sample generously provided by Survey Sampling International. A total of 729 respondents, screened to be at least 18 years old, completed the survey. (Incidentally, you can take the max-diff version of the study online at www.sawtoothsoftware.com/ffm. An individualized report will allow you to see how your personality compares to the original 729 survey respondents and to other people who have since taken the survey.)

To evaluate the success of the max-diff measure relative to the rating scale measure we rely on standard psychographic assessments (reliability and construct validity) and on relations of the two measures with external variables as reported in previous research.

One way to assess construct validity is to construct a multitrait-multimethod matrix. The 20x20 table is too large to display here but the summary in Figure 5 tells the story: scores for each max-diff factor are correlated more highly with the corresponding ratings-based factor than they are with other factors measured with either max-diff or with ratings.
In other words, any given diagonal entry in Figure 5 is higher than any other entry in the same row or column. This means that the two measures of each factor correspond well. This analysis gives us no reason to doubt the validity of the max-diff measurement of the FFM.

A scale’s reliability quantifies the extent to which a measure of the factor would be the same if we ran the study again on a different day or with different respondents. The standard measure for reliability is called Cronbach’s alpha and the rule of thumb for a reliable measure is that alpha should be at least 0.70 or 0.80. Figure 6 shows Cronbach’s alpha for the factor measurements.

We can see that measures for all of the max-diff factors pass the higher 0.80 target for reliability while measures for most of the rating scale factors fail even the less-demanding 0.70 threshold. In terms of reliability, max-diff is the hands-down winner.

Figure 7 summarizes relations of the five factors with other variables (Tkach and Lyubomirsky 2006; Sadowski and Cogburn 1997; Kardum and Hudek-Knezevic 2012; Sansone, Wiebe and Morgan 1999; Raynor and Levine 2009). Positive and negative relations found in these studies appear as “+” and “-” signs in the table.

In terms of their correlations with these variables (life satisfaction, need for cognition, pessimism, self-control, healthy behaviors) the ratings personality measures and max-diff measures performed comparably and mostly in line with expectations from previous research: Both were significantly and about equally related to life satisfaction, need for cognition, pessimism, self-control, eating fruits and vegetables and getting regular exercise. When we failed to replicate previous results (for conscientiousness being related to not smoking and for extraversion being related to smoking and not getting enough sleep) both the ratings and the max-diff measures failed to show the hypothesized relationships.

Much better reliability

Max-diff does a good job bringing the five-factor model of personality to life: It performs at parity with a rating scale version of FFM in terms of construct validity and external validation while providing much better reliability. As such, max-diff is a viable tool for researchers to use when they want to include an FFM personality measurement a survey. More generally, this research confirms earlier findings that max-diff may be a viable measurement methodology for a wider variety of psychological constructs than its current use suggests.

Allport, G.W. and H.S. Odbert (1936). “Trait names: A psycholexical study.” Psychological Monographs 47: 171 - 220.

Bryce, N. (2002). “Standing naked in the shower: Life-enriching insights that expose human nature.” Orem: Insight Learning Foundation.

Cattell, R.B. (1965). The Scientific Analysis of Personality. Baltimore: Penguin Books.

Digman, J.M. (1990). “Personality structure: Emergence of the five-factor model.” Annual Review of Psychology, 41, 417-440.

Finn, A., and Louviere, J. J. (1992). “Determining the appropriate response to evidence of public concern: The case of food safety.” Journal of Public Policy and Marketing, 11, 12-25.

Galton, F. (1884). “Measurement of character.” Fortnightly Review, 36: 179–185.

Eysenck, H.J. (1997). Dimensions of Personality. New Brunswick: Transaction.

Kardum, I. & J. Hudek-Knezevic (2012). “Relationships between five-factor personality traits and specific health-related personality dimensions.” International Journal of Clinical and Health Psychology, 12, 373-387.

Lee, J.A., G.N. Soutar & J. Louviere (2008). “An alternative approach to measuring Schwartz’s values: The best-worst scaling approach.” Journal of Personality Assessment, 90, 335-347.

Raynor, D. A. and H. Levine (2009). “Associations between the five-factor model of personality and health behaviors among college students.” Journal of the American College of Health, 58, 73-81.

Sa Lucas, L. (2004). “Scale development with max-diffs: A case study,” in Sawtooth Software Conference Proceedings. Sequim: Sawtooth Software.
Sadowski, C.J., & H. E. Cogburn (1997). “Need for cognition in the big-five factor structure.” Journal of Psychology, 131, 307-312.

Sansone, C., D.J. Wiebe, & C.L. Morgan (1999). “Self-regulating motivation: The moderating role of hardiness and conscientiousness.” Journal of Personality, 67, 701-733.

Schwartz, S.H. (1992). “Universals in the content and structure of values: Theory and empirical tests in 20 countries.” In M. Zanna (Ed.), Advances in Experimental Social Psychology, 25, 1-65. New York: Academic Press.

Tkach, C., & S. Lyubomirsky (2006). “How do people pursue happiness? Relating personality, happiness-increasing strategies and well-being.” Journal of Happiness Studies, 7, 183-225.