Skip to: Main Content / Navigation

  • Facebook
  • Twitter
  • LinkedIn
  • Add This

Estimating sample size for a descriptive study in quantitative research



Article ID:
19990603
Published:
June 1999
Author:
Gang Xu

Article Abstract

Sample size often must be calculated in quantitative marketing research, which requires knowing the variable of interest. Using two cases, this article discusses the variable of interest.

Editor’s note: Gang Xu is a senior research consultant in statistics at Brintnall & Nicolini, Inc., a Philadelphia, Pa., health care consulting and marketing research firm.

In quantitative marketing research, we frequently need to calculate the sample size in order to make inferences about the parent population with a given level of confidence. In general, the larger the sample size is, more precise your estimation is. However, more subjects in the study also leads to a higher cost. Therefore, we need to calculate the minimum number of subjects that are required for a study.

In calculating the required sample size, we need to know the characteristic of the variable of interest. Is that a continuous variable (e.g., mean) or a dichotomous variable (e.g., proportion)? In a descriptive quantitative research study, the sample size varies depending on this characteristic of the variable of interest. We’ll concentrate on the variable of interest as the focus of our discussion on the following two sections.

A. Variable of interest is a continuous variable

Case study one
A pharmaceutical company is interested in knowing the average weekly working hours of primary care physicians. You, as a researcher, want to be 95 percent confident that the true population mean of the working hours is within a specified number of units of the estimated mean you calculate from your sample. For instance, after the data is collected from your survey, you find that the average weekly working hours in your sample is 60. You want to be 95 percent confident that the population mean is within a 10 unit interval, that is, 60±10.

Here, the average working hour is the variable of interest. It is a mean. In estimating the sample size, the variability of the data in the parent population needs to be taken into consideration. Assuming that the distribution of the sample is approximately normal, the following formula can be used to calculate the size of the sample:

              Z2 S2
  n   =   --------
                d2

Where:
n is the size of sample;
Z is the z-statistics for the desired level of confidence;
S is the population standard deviation;
d is the half width of the desired interval.

Z is a fixed value set by you, the researcher. When we say "a desired level of confidence," we usually refer to two levels: 95 percent and 99 percent level of confidence. Holding other variables constant, a higher level of confidence (e.g., 99 percent) requires a larger sample size than a lower level of confidence (e.g., 95 percent). For 95 percent confidence level, Z = 1.96 and for 99 percent confidence level, Z = 2.58. In this example, you have chosen a 95 percent confidence level.

D is also the fixed value at your estimate and choice. In simple terms, d can be thought of as a measure of the precision of sample estimates. A narrow interval (say 55 to 65 with a mean of 60) is more precise than a wider one (say 50 to 70). The former requires a larger sample size than the latter. In this example, you have chosen d = 10.

We usually don’t know the population standard deviation (S). However, you may make educated guesses about it and calculate the size of the sample based on the guesses. For instance, you may guess that the population standard deviation is 30 and then the required size of the sample will be:

              1.962 * 302
  N   =   --------------   =   34.6
                      102

Rounding up the number of 34.6, you need a sample size of 35 to be 95 percent confident that the true mean of physicians’ weekly working hours is within a half width of 10 hours. In other words, you are 95 percent confident that the true population mean ranges from 10 hours lower to 10 hours higher than the mean you obtain from the survey of the sample of 35 physicians.

Note that a higher confidence level would require a larger sample size. In the example above, if you want to increase the confidence level from 95 percent to 99 percent, you then substitute 1.96 for 2.58. You would need a sample size of 60. The precision of the estimate of the variable is inversely related to the size of sample. Thus, a decrease of value of d (a higher precision) requires a larger sample size. Since the variability of population is positively related to the sample size, an increase of value of S increases the sample size.

Suppose now we have d = 5, S = 40 and the confidence level of 99 percent. Put these values into the formula, we find that the required sample size is:

              2.582 * 402
  n   =   ---------------   =   426
                      52

B. Variable of interest is a dichotomous variable

Case study two
A company is interested in knowing the percent of market share of drug X prescribed by primary care physicians for the treatment of diabetes Type I patients. You are asked by the company to conduct a survey among primary care physicians to find out the percent of these physicians’ prescription of drug X. Based on a pilot study, 10 percent of patients with diabetes Type I were prescribed drug X by primary care physicians. You want to be 95 percent confident that the true population percent of market share of drug X is no more than .05 greater or less than the proportion you estimate from your survey. What is the required sample size?

Here, the proportion of market share of drug X is the variable of interest. It is a dichotomous variable.

The formula of calculating the sample size is:

              Z ( p ( 1-p))
  n   =   -----------------
                      d2

Where:

n is the size of sample;
Z is the z-statistics for the desired level of confidence;
p is the estimate of expected proportion with the variable of interest in the population;
d is the half width of the desired interval.

Again, Z = 1.96 for the 95 percent confidence level and 2.58 for the 99 percent confidence level. In the example above, p = .10 and d = .05. Put these values into the formula, we have a required sample size:

              1.962 (.1 (1-.1))
  n   =   --------------------   =   138.3
                     .052

Thus you need to have 138 physicians in your sample to be 95 percent confident that the true proportion of market share for drug X in the population is within .05 of the proportion you estimate.

Here, p refers to the proportion you estimate from the survey about the market share for drug X. Since p (1-p) is positively related to the required sample size, the maximum value for p (1-p) is when p = 0.5. For that reason, when you have no prior knowledge or assumption about the market share for that drug, you can calculate the sample size based on a worst-case scenario when p = .50; d in this case equals .05:

              1.962 (.5 (1-.5))
  n   =   --------------------   =   384.16
                     .052

You thus need 384 physicians in your survey.

It should be noted that, in this article, sample size is calculated for descriptive study. For studies that may involve inference statistical tests such as t-test, analysis of variance, correlation or regression, separate estimations of sample size are needed.

Summary

1. For a descriptive study, the calculation of a sample size largely depends on whether the variable of interest is a mean or a proportion.

2. When the variable of interest is a mean, we need to estimate the population standard deviation, whereas the other values in the formula are fixed.

3. When the variable of interest is a proportion, we need to give an estimate of the expected proportion with such a variable of interest. A conservative approach to this estimate is to give an estimate of 50 percent, meaning that the sample size is estimated in a worst-case scenario.

For a study that may requires inference statistics, the calculation of a sample size may be based on a particular statistical test as needed.

Comment on this article

comments powered by Disqus

Related Glossary Terms

Search for more...

Related Events

PREDICTIVE ANALYTICS AND BUSINESS INSIGHTS 2014
September 23-24, 2014
Gateway Analytics Network will hold a conference, themed 'Predictive Analytics and Business Insights 2014,' on September 23-24 in Philadelphia.
NETWORKING EVENT BY THE RESEARCH CLUB
September 24th, 2014
The Research Club will host a networking event in conjunction with the MRMW conference on September 24th at the Riva Bar in Berlin, Germany.

View more Related Events...

Related Articles

There are 1509 articles in our archive related to this topic. Below are 5 selected at random and available to all users of the site.

U S West finds that color in yellow pages does more than just capture a shopper's attention
US West Dex used an original experimental design, real-world props and a logit model to reposition the marketing of its color ads by touting the ability of color to communicate key messages more effectively than standard black and yellow.
Despite relaxed market controls, research in China still faces restrictions
Market research is growing more and more feasible each day in China, but government controls and other roadblocks still stand in the way of a smooth process.
Qualitatively Speaking: Sampling for qualitative researchers
Using a comparison to interior design, the author explains sampling and how it can affect the qualitative research process
By the Numbers: Theory of adaptation or survival of the fittest?
SSI's Kristin Cavallaro reports on the firm's examination of the effects of respondent tenure on panel data.
By The Numbers: Internet data quality
A veteran researcher discusses the efforts undertaken by his research firm and others to ensure the quality of data gathered online.

See more articles on this topic

Related Suppliers: Research Companies from the SourceBook

Click on a category below to see firms that specialize in the following areas of research and/or industries

Specialties

Industries

Conduct a detailed search of the entire Researcher SourceBook directory

Related Discussion Topics

request
06/06/2014 by Monika Kunkowska
TURF excel-based simulator
04/17/2014 by Giovanni Olivieri
XLSTAT Turf
04/10/2014 by Felix Schaefer
TURF excel-based simulator
03/25/2014 by Werner Mueller
I would like Turf Macro too!
03/06/2014 by Neelam Hinduja

View More