Editor's note: Vince Migliore is owner/manager of Accu-Stat, a San Jose, Calif., research firm specializing in statistical analysis. The Statistical Package for the Social Sciences (SPSS) is used for all tests and breakdowns in this article. SPSS is a registered trademark of SPSS Inc., Chicago, Ill.

Most people run away, or make a gagging gesture, when they hear the word "statistics." If you work in the field of market research, however, there are some basic principles of statistical analysis that you must be familiar with to do your job effectively. This article uses a simple example to explain the most important concepts of survey interpretation. No math or special jargon is used. If you hate statistics, but need to know the fundamentals of survey research analysis, then read on.

Once a survey is completed, writing the findings can be fairly easy if you're looking at the total population. "57 percent said 'yes' to question one." Or, for a rating question: "On a scale of 1 to 10, with 10 being the highest, Product A was given an average rating of 8.7." Occasionally, the person or company paying for the research will be satisfied with such a superficial report, but more likely they will demand deeper analysis.

Statistical analysis is required when you want to examine survey results in more detail. The two most commonly applied tests provide information about subgroup responses for a particular question. For instance, in question one above, 57 percent said 'yes' to question one, but that response was 60 percent for women and 54 percent for men. Likewise, the rating scale question yielded 8.7 for the total sample, but 8.4 for women, and 9.0 for men. The question that comes up immediately is whether or not such differences are real, or simply part of the normal variation found among respondents. The client needs to know if women really perceive his product differently. The tests we are about to describe will answer that question. More importantly, the statistical analysis will help paint a picture for you that will provide a comprehensive understanding of what's going on with the survey responses. This insight is exactly what you need to write a meaningful report.

The coach

The following story illustrates critical abstractions that are required for the understanding of statistical tests. These tests are for determining the significance of differences between subgroups of a population, for both category questions, and ratings questions.

Sam Smith was the coach and organizer for the extracurricular activities of a high school in Kentucky. The school had just three teams - the jockey club, the glee club and the basketball team. In order to help him order the correct sizes for team shirts and jerseys, Sam got into the habit of recording the heights of the team members, by marking on a board in the gymnasium. Every year he changed the board (Figure 1, left). The black marks record the glee club heights, gray for the basketball team, and striped for the jockey club. Later, he standardized the tick marks by plotting them onto graph paper with equal squares to mark each student's height. The number of students in each 1-inch category is shown by the graphs on the right side of Figure 1.

Each team has 25 members, and obviously, the basketball team has the highest average height. The jockey club has a much lower average height and the glee club heights are spread all over the board. Figure 1 shows a typical grouping, or distribution, for a single school year. Sam collected data for 25 years. Naturally, the distributions for each club were different for each year, but as you might guess, the average height for the jockey club was always lower than the glee club and the basketball team was always higher.

Key concepts

Take a minute now for a hard look at Figure 1. Understanding these next concepts is crucial for a grasp of survey data analysis. The board on the left, with the height marks on it, has been translated into boxes plotted at each inch on a yardstick. Rotate the graph 1/4 turn counter clockwise, and look at the distribution. There are three important points.

1. You can distinguish which club is which by the average height (marked by a dashed line). If you saw this graph without any labels, you should be able to identify each team by its average height, assuming, for example, the jockey club needs short, lightweight riders.

2. The central group, the glee club, has a wide range of heights, but the other two teams have a distribution of members that are clustered close to the team average. This has the effect of creating a distinct shape for their graphic representations. The shortest basketball player may be only three inches shorter than the average height for the team, but the shortest glee club member may be eight inches shorter than the average. Again, if there were no labels on the graph, you should be able to distinguish the glee club from the basketball team just from the spread, or range of heights.

3. Finally - and this is what forces many students to switch to liberal arts so they can avoid statistics - let's jump to a more abstract level. With the graph still rotated 1/4 turn, imagine a line forming a smooth envelope for the three distributions. The graph you're looking at is for just one year. If we combined the graphs for the basketball teams, for instance, for all 25 years we'd have a clean curve representing a more universal description of heights of people on the basketball team.

Once you've gleaned these three essentials, turn your attention to the top part of Figure 2. This diagram represents the abstract, or smoothed, distributions for the three teams over the 25-year period. Notice, by the way, that the axis shows increasing height, so the jockey club curve is on the left, as opposed to the right (bottom) of Figure 1.

To review the key concepts, now applied to the top of Figure 2:

  • You can distinguish the three teams just from the mean height: 55 inches for the jockey club, 65 inches for the glee club, and 75 inches for the basketball team.
  • You can distinguish the glee club from the basketball team just by the shape of the curve - the basketball team has most of its members crammed in close the 75-inch average height.
  • The shape of the curve for any one year might be a little bumpy, but as we combine measures for several years for each team, the distribution takes on a smooth, bell-shaped curve.

The eyeball test

Congratulations! You've just learned the most difficult part of statistical theory! This type of analysis constitutes at least 80 percent of all the testing done on market research surveys. What follows is just the fine-tuning and technical procedures for carrying out the actual testing. Of course, there are many advanced statistical tests, and we're covering the basics of only two of them.

The phrase "statistical analysis" sounds authoritative. Remember, however, the science of statistics is just a tool to help you in the evaluation of your survey findings. If you can reach an understanding of your research results without a lot of mathematics, then you're actually ahead of the game. This occurs quite often, and we call it the "eyeball test" - which, of course, you'll never find in a textbook. If we took the average height of the basketball team, for example, and plotted it on a graph covering 25 years, and the numbers rose steadily from 68 inches in 1971 to 75 inches in 1995, then that's an eyeball test. Put the fact that the average height increased seven inches in 25 years into your report and you really don't need any further elaboration.

There's an eyeball test in the top of Figure 2 also. The question is, are these three groups different statistically from each other? The answer is yes, intuitively, but why? Let's look at just the jockey club (avg. = 55 inches) versus the glee club (avg. = 65 inches). There are three clues: 1) the averages are different, 2) the shapes of the curves are different, and 3) there is very little overlap between them. The statistical tests that you might perform on this data are simply mathematical techniques for verifying these same conditions.

If you have a survey with a rating question, the rating scale is similar to the height-in-inches scale that we have in Figure 2. Going back to a previous example with a 1 to 10 rating scale, the women respondents gave an average rating of 8.4 to a product, while the men rated it at 9.0. The question is whether this difference is real or just due to chance? Here's how you tell. Create a plot of the male ratings and the female ratings - do this by asking your programmer (or statistical vendor) for a histogram of the rating question by male vs. female. If the two plots show very little overlap and have different shapes then the two subgroups (male and female) are different.

Experienced researchers don't use plots and histograms that often, but instead rely on a statistical shorthand that conveys the same information. The shape, or spread, of the distribution is denoted by its "standard deviation," and the average by the "mean." In Figure 2, the standard deviation for the jockey club is about two-and-a-half inches; for the glee club it's about five inches. By convention, two standard deviations includes 95 percent of the area under a curve. For the jockey club curve (mean = 55) you can see that at least 95 percent of the group does not overlap with the middle curve. Key point: if two distributions have a separation in their means of two standard deviations or more - that is, 95 percent of their areas do not overlap -- then you can say the two are "significantly different" statistically.
(I know I promised not to use technical terms, but . . . I lied. Besides, we're almost home free, and if you're going to walk the walk, you may as well talk the talk! From here on out, all technical matters will be in italics, in case you want to skip them.)

Statistical testing

In many research studies there is substantial overlap between two subgroups that are being tested, so the eyeball test doesn't work very well. In the bottom portion of Figure 2, for 1971, many of the basketball players were drafted into the Vietnam war. The mean height is close to that of the glee club, and there is substantial overlap between the two curves. It's still possible to prove, with confidence, that the groups are different, but we have to resort to a more formal statistical test. The test for establishing a difference between two means is called the Student's t-test, and for the shape of the curve, is the F-test. In most statistical software packages the two tests are combined into one operation (Figure 3).

If the means are fairly close, but the shapes (standard deviations) are markedly different, then the F-test will show significance (Figure 3A, the F value is .050 or less). If the means are different, and there is only a small amount of overlap in the distributions, then the t-test will show significance (Figure 3B, the t value is .050 or less). The .050 or less standard represents the 95 percent confidence level. Some studies, such as in medical research, require a higher confidence level, such as 99 percent, in which case the significance threshold is .010 or less.

Use the t-test for any rating question that is broken down into two distinct subgroups, such as male/female, or branch 1 vs. branch 2. In a typical survey there are many scale and rating questions, such as evaluations for products A, B, C, D and E, as well as demographic measures, like age, income, height, and weight.

Sometimes you'll need to test two category questions for significance. For instance, you may want to test the difference for a "yes/no" question where 60 percent of the women said "yes" versus 54 percent of the men. Since there is no mean nor standard deviation, we have to use a different test that is based on expected probabilities. This is the Chi-square test, Figure 4. The Chi-square test can handle any number of categories, but for best results there should be at least five cases per cell. For example, if you're testing a "yes/no" question by ethnic group, the smallest ethnic category should have five cases that said "yes" or "no." If the smallest ethnic category is Hispanic with 10 percent of the population, and the "yes/no" question is split 50/50, then 5 percent (50 percent of 10 percent) would have to be in the "yes" category for an effective Chi-square test. If five cases equals 5 percent, then your sample size must be 100 cases. If you have smaller numbers you may want to collapse categories to reach the required size per cell.

The Chi-square test is read the same way as the t-test. If the probability value is .050 or less, then the test shows a significant difference between the subgroups. Chi-square statistics are printed as an option to cross-tabulation tables.

Real-world analysis

Some surveys can be tabulated using a computer spreadsheet program, but these are difficult to manipulate for crosstabulations and statistical tests. The best bet is to use a dedicated statistical package that has options for automatically creating frequency counts, crosstabs with Chi-square tests, mean breakdowns, correlations, and t-tests. These are the basic necessities. Statistical programs often include features that make analysis a lot easier, such as temporary Select-Ifs, new variable computations, and pre-formatted output reports.

A thorough statistical analysis will include first the frequency counts for all questions, then means and standard deviations for the scale and rating questions. This is followed by the mean breakdowns for each subgroup (Figure 5) and finally t-tests between subgroups for the rating questions, and Chi-square tests for category question crosstabs.

By studying the gross findings, as well as the variables with significant statistical tests, you'll be able to discover and write about patterns that are important to the client. A word of caution: statistics are based on probabilities. If you're working at the 95 percent confidence level, this means that for every 100 tests, as many as five may have false readings. With ample practice you'll be able to spot the cases where the statistical test does not show significance but you know in your heart that there is meaning in a particular table.

Many programmers and vendors provide "banners," which are multiple crosstabs on one page. For example, a one-page banner would have the totals for a "yes/no" question in the leftmost column, showing the number of cases and percentage. To the right of that appears the count and percentage for each demographic subgroup - such as male/female, high/medium/low income, ethnic group, etc. Here is another case where experience pays off. If the "yes/no" percentage is 50/50 for the total, and going across the banner categories all you see is 50 percent, 48 percent, 49 percent, 52 percent, etc., then you know the Chi-square tests will not be significant. On the other hand, if you see 60 percent/40 percent for the male/female crosstab, and 61 percent/51 percent/39 percent across the high/medium/low income groups, then you can see that these subgroups vary considerably from the total. You can be fairly certain the tests will show that the differences are real - not due to chance. To make short work of banners, use a yellow highlighter to mark all the "deviant" banner subgroups, then go back and note the patterns.

Have no fear

There is no need to fear the idea of statistical analysis. For a comprehensive report using statistical testing, the two most important procedures, besides the total frequency counts, are the t-test and the Chi-square test. The t-test is used to compare the mean for a rating or scale question between two subgroups, such as male versus female. This requires a working knowledge of the mean and standard deviation for a subgroup distribution. A graphic display, such as a histogram plot, aids in this understanding. For a two-category question, such as a "yes/no" response by "male/female," use a cross-tabulation with the Chi-square test.

To proceed from survey findings to an authoritative report, you must review and understand both the total findings and the data for each subgroup. The subgroup patterns and statistical tests add another dimension to the analysis of market research surveys. The tests described here are used most often in market research. They are, however, only two out of a vast array of statistical tools that are useful in data analysis.