Skip to: Main Content / Navigation

A simple solution to nagging questions about survey, sample size and validity



Article ID:
19990101
Published:
January 1999
Author:
Susie Sangren

Article Abstract

The quality of a market analysis is judged by its validity. Unfortunately, data from non-probability, informal sample surveys lack measurable confidence. This article demonstrates an easy method of calculating the sample size needed for a specific market survey or experiment.

Editor’s note: Susie Sangren, president of Clearview Data Strategy, Inc., Ithaca, N.Y., is a consulting statistician.

You wouldn’t believe how many times I have been asked, "How big should my sample size be to give a reasonable estimate of the target population?" (My answer is, "It all depends. . .") The questioners are usually research analysts not trained in probability sampling and statistical theory.

The quality of a market analysis is judged by its validity -- in other words, how confident are you, as a researcher, about your findings being replicated in the real marketplace? Data collected from non-probability, informal sample surveys will not allow you to make conclusions about the population with measurable confidence. Remember that the intent of a survey is never just to describe the particular individuals who happen to be selected into the sample, but to obtain a composite profile of the population.

What I am about to show you is an easy (and nonetheless robust) method of calculating the sample size you would need for your specific market survey or an experiment. The research design is the simple random sampling, and the sample size calculated is the number of completed surveys required to achieve a certain level of confidence and error rate. The number of "completes" may be a lot lower than that of the surveys you will actually send out, depending on your expectation of the response rate.

The beauty of the simple random sampling is that it is probability-based (therefore representative of the population, because everyone in the population has an equal chance of being selected), and it is simple. You can use a random-number generator to pick any sampling units out of the entire population. Simple random sampling is robust because it can meet the needs of most managers. With probability sampling, you can report the following two quantities to relate the accuracy of your sample estimate to the population parameter:

  • Sampling errors: How close is your sample estimate to the true population number? A typical answer may be, "The population number is within ±3 percent of the sample estimate." Naturally, the smaller the sampling error you want, the larger the sample size you will need.

  • Level of confidence: How confident are you about your one-sample estimate in repeating itself through repeated samples? An answer may be, "I am 95 percent confident that the population number is between A and B." The larger the confidence level you want, the larger the sample size you will need.

    The sample size should be determined before other survey considerations such as: what questions you should ask; what response rate you can expect: how to or who should collect the data. There are two ways to approach the sample-size problems:

    1) You have already decided on the confidence level and the sampling error requirements, now you want to know the sample size;

    2) You have decided on the sample size and the confidence level required, now you want to know the error rate of your sample estimate.

    To solve Problem One for the sample size, I begin by assuming the following, rather limited, conditions:

  • All my survey questions have the yes/no type of dichotomous answers.

  • My absolute error-rate (E) requirement is 3 percent. (The true population number is within the range of ±3 percent of my sample estimate.)

  • My confidence level (C) requirement is 95 percent. (I want to be sure that my population number estimated from one sample can be repeated 95 times out of 100 samples.)

  • My first guess at the percentage estimate for the "yes" answer in my sample for a particular question (P) is 35 percent.

    The sample size (N) calculation formula is simply:

    N = square of {square root of [P x (1-P)] / (E/std(C)},

    where "std(C)" is the equivalent of confidence level, expressed in terms of standard deviation. I list below three widely acceptable levels of confidence, and their standard-deviation counterparts:

    1. 68 percent confidence level -- The population number is within plus or minus one standard deviation of my sample estimate.

    2. 95 percent confidence level -- plus or minus two standard deviations. It is the most popular level.

    3. 99.7 percent (almost 100 percent) confidence level -- three standard deviations.

    Now, let’s substitute all the known quantities into the size calculation formula to solve for N:

    0.4770 = sq. rt. of [0.35 x (1-0.35)]

    0.015 = 0.03/2

    N = (0.4770/0.015) ** 2 = 1,011

    Therefore, the required survey sample size is 1,011, for a 95 percent confidence level and a tight error bound of ±3 percent. Exhibit 1 shows the calculated sample sizes under various levels of sampling error rates and estimated "yes" percentages, all at 95 percent confidence level by simple random sampling.

    To solve for Problem Two for the error rate, I have already been given a sample size, say, 1,011 (N), and the confidence level, say, 95 percent (C). Using the same formula, converting the confidence level (C) into an appropriate standard deviation, std(C), and assuming that my sample percentage of the "yes" answer (P) is 35 percent, my sampling error rate will again be calculated as ±3 percent. Remember that increased sample size generally means increased survey reliability, which must be traded off with increased cost and time.

    Exhibit 2 shows the calculated sampling errors under various sample sizes and estimated "yes" percentages, all at 95 percent confidence level by simple random sampling.

    Notice also that when P=0.5, or 50 percent, the value of [P x (1-P)] is at the maximum. What this implies is that, the more unsure I am about the survey outcome (i.e., the percentage estimate for the "yes" answer, P, would be close to 50 percent -- I am only certain half the time), the larger the sampling error will be.

    Going back to the Problem Two scenario, and changing my sample estimate for the "yes" percentage from the earlier 35 percent to 50 percent, now I would calculate a slightly larger sampling error (3.145 percent versus the earlier 3 percent):

    0.50 = Sq. rt. of [0.50 x (1-0.50)]

    31.7980 = Sq. rt. of 1,011

    E = 0.50 / 31.7980 x 2 = 0.03145 (or, 3.145%)

    Finally, I may want to enlarge the calculated sample size (done somewhat subjectively) because:

    1. My survey contains questions with multinomial answers. In such a case, I will pick the question with the highest number of answer categories to estimate my sample size. The resulting size should be good for the entire survey.

    2. I have to take into consideration the non-response rate.

    3. I want to ensure that when I crosstabulate one variable with another, I would have enough data in each cell.

  • Page Tools
    Bookmark and Share

    Related Suppliers: Research Companies from the SourceBook

    Click on a category below to see firms that specialize in the following areas of research and/or industries

    Specialties

    Conduct a detailed search of the entire Researcher SourceBook directory

    Related Articles

    There are 785 articles in our archive related to this topic. Below are 5 selected at random and available to all users of the site.

    Getting the most from demographics: things to consider for powerful market analysis
    This article describes various ways of analyzing demographic data. It explores the degrees of difficulty, underlying assumptions and potential risks of using these methods.
    Market intelligence versus marketing research
    The article puts forth a scheme for organizing a company’s market intelligence systems and processes.
    Hone your communication skills
    To increase the awareness of in-house clients of the research process, marketing researchers must learn to sell the value of the process and take steps to maximize the efficiency and effectiveness of projects that are undertaken. The author outlines these steps and offers other related tips to achieve success.
    Analytical software extends its reach
    The goal of this article is to clarify information on data mining and related topics (including “data warehousing” and “knowledge discovery”). Also discussed are software products (SPSS 10.0, DBMS/COPY 7.0, and SYSTAT Version 9.0) that can help with data mining.
    A fickle but valuable segment
    First-time car buyers are hard to entice but can provide a valuable source of ongoing sales. The article summarizes results from a Polk study that explored the various factors that motivate these consumers.

    See more articles on this topic

    Related Events

    ESOMAR ANNUAL CONGRESS: ODYSSEY 2010
    September 12-15, 2010
    ESOMAR will hold its annual congress, themed 'Odyssey 2010 - The Changing Face of Market Research,' on September 12-15 in Athens, Greece.
    AMA MARKETING RESEARCH CONFERENCE
    September 26-29, 2010
    The American Marketing Association will hold its annual marketing research conference on September 26-29 at the Hilton Atlanta in Atlanta.

    View more Related Events...

    Related Discussion Topics

    TURF Simulator
    01/11/2010 by William Bailey
    TURF Simulator
    01/08/2010 by Manmit J. Shrimali
    TURF in Excel
    07/14/2009 by William Bailey
    TURF excel-based simulator
    07/13/2009 by Kris Kumar
    Stat testing / Bonferroni correction
    05/06/2009 by Ian L. Straus

    View More

    Related Glossary Terms

    Search for more...