Skip to: Main Content / Navigation

  • Facebook
  • Twitter
  • LinkedIn
  • Add This

Estimating sample size for a descriptive study in quantitative research



Article ID:
19990603
Published:
June 1999
Author:
Gang Xu

Article Abstract

Sample size often must be calculated in quantitative marketing research, which requires knowing the variable of interest. Using two cases, this article discusses the variable of interest.

Editor’s note: Gang Xu is a senior research consultant in statistics at Brintnall & Nicolini, Inc., a Philadelphia, Pa., health care consulting and marketing research firm.

In quantitative marketing research, we frequently need to calculate the sample size in order to make inferences about the parent population with a given level of confidence. In general, the larger the sample size is, more precise your estimation is. However, more subjects in the study also leads to a higher cost. Therefore, we need to calculate the minimum number of subjects that are required for a study.

In calculating the required sample size, we need to know the characteristic of the variable of interest. Is that a continuous variable (e.g., mean) or a dichotomous variable (e.g., proportion)? In a descriptive quantitative research study, the sample size varies depending on this characteristic of the variable of interest. We’ll concentrate on the variable of interest as the focus of our discussion on the following two sections.

A. Variable of interest is a continuous variable

Case study one
A pharmaceutical company is interested in knowing the average weekly working hours of primary care physicians. You, as a researcher, want to be 95 percent confident that the true population mean of the working hours is within a specified number of units of the estimated mean you calculate from your sample. For instance, after the data is collected from your survey, you find that the average weekly working hours in your sample is 60. You want to be 95 percent confident that the population mean is within a 10 unit interval, that is, 60±10.

Here, the average working hour is the variable of interest. It is a mean. In estimating the sample size, the variability of the data in the parent population needs to be taken into consideration. Assuming that the distribution of the sample is approximately normal, the following formula can be used to calculate the size of the sample:

              Z2 S2
  n   =   --------
                d2

Where:
n is the size of sample;
Z is the z-statistics for the desired level of confidence;
S is the population standard deviation;
d is the half width of the desired interval.

Z is a fixed value set by you, the researcher. When we say "a desired level of confidence," we usually refer to two levels: 95 percent and 99 percent level of confidence. Holding other variables constant, a higher level of confidence (e.g., 99 percent) requires a larger sample size than a lower level of confidence (e.g., 95 percent). For 95 percent confidence level, Z = 1.96 and for 99 percent confidence level, Z = 2.58. In this example, you have chosen a 95 percent confidence level.

D is also the fixed value at your estimate and choice. In simple terms, d can be thought of as a measure of the precision of sample estimates. A narrow interval (say 55 to 65 with a mean of 60) is more precise than a wider one (say 50 to 70). The former requires a larger sample size than the latter. In this example, you have chosen d = 10.

We usually don’t know the population standard deviation (S). However, you may make educated guesses about it and calculate the size of the sample based on the guesses. For instance, you may guess that the population standard deviation is 30 and then the required size of the sample will be:

              1.962 * 302
  N   =   --------------   =   34.6
                      102

Rounding up the number of 34.6, you need a sample size of 35 to be 95 percent confident that the true mean of physicians’ weekly working hours is within a half width of 10 hours. In other words, you are 95 percent confident that the true population mean ranges from 10 hours lower to 10 hours higher than the mean you obtain from the survey of the sample of 35 physicians.

Note that a higher confidence level would require a larger sample size. In the example above, if you want to increase the confidence level from 95 percent to 99 percent, you then substitute 1.96 for 2.58. You would need a sample size of 60. The precision of the estimate of the variable is inversely related to the size of sample. Thus, a decrease of value of d (a higher precision) requires a larger sample size. Since the variability of population is positively related to the sample size, an increase of value of S increases the sample size.

Suppose now we have d = 5, S = 40 and the confidence level of 99 percent. Put these values into the formula, we find that the required sample size is:

              2.582 * 402
  n   =   ---------------   =   426
                      52

B. Variable of interest is a dichotomous variable

Case study two
A company is interested in knowing the percent of market share of drug X prescribed by primary care physicians for the treatment of diabetes Type I patients. You are asked by the company to conduct a survey among primary care physicians to find out the percent of these physicians’ prescription of drug X. Based on a pilot study, 10 percent of patients with diabetes Type I were prescribed drug X by primary care physicians. You want to be 95 percent confident that the true population percent of market share of drug X is no more than .05 greater or less than the proportion you estimate from your survey. What is the required sample size?

Here, the proportion of market share of drug X is the variable of interest. It is a dichotomous variable.

The formula of calculating the sample size is:

              Z ( p ( 1-p))
  n   =   -----------------
                      d2

Where:

n is the size of sample;
Z is the z-statistics for the desired level of confidence;
p is the estimate of expected proportion with the variable of interest in the population;
d is the half width of the desired interval.

Again, Z = 1.96 for the 95 percent confidence level and 2.58 for the 99 percent confidence level. In the example above, p = .10 and d = .05. Put these values into the formula, we have a required sample size:

              1.962 (.1 (1-.1))
  n   =   --------------------   =   138.3
                     .052

Thus you need to have 138 physicians in your sample to be 95 percent confident that the true proportion of market share for drug X in the population is within .05 of the proportion you estimate.

Here, p refers to the proportion you estimate from the survey about the market share for drug X. Since p (1-p) is positively related to the required sample size, the maximum value for p (1-p) is when p = 0.5. For that reason, when you have no prior knowledge or assumption about the market share for that drug, you can calculate the sample size based on a worst-case scenario when p = .50; d in this case equals .05:

              1.962 (.5 (1-.5))
  n   =   --------------------   =   384.16
                     .052

You thus need 384 physicians in your survey.

It should be noted that, in this article, sample size is calculated for descriptive study. For studies that may involve inference statistical tests such as t-test, analysis of variance, correlation or regression, separate estimations of sample size are needed.

Summary

1. For a descriptive study, the calculation of a sample size largely depends on whether the variable of interest is a mean or a proportion.

2. When the variable of interest is a mean, we need to estimate the population standard deviation, whereas the other values in the formula are fixed.

3. When the variable of interest is a proportion, we need to give an estimate of the expected proportion with such a variable of interest. A conservative approach to this estimate is to give an estimate of 50 percent, meaning that the sample size is estimated in a worst-case scenario.

For a study that may requires inference statistics, the calculation of a sample size may be based on a particular statistical test as needed.

Comment on this article

comments powered by Disqus

Related Glossary Terms

Search for more...

Related Events

THE RESEARCH CLUB NETWORKING EVENT - SYDNEY, AU
December 4, 2014
The Research Club will host a networking event on December 4 in Sydney, Australia, in conjunction with the IIEX.
The Quirk"s Event
February 23-24, 2015
The Quirk’s Event is a two-day experience that flips the traditional conference model and centers around the exhibit hall. In Brooklyn, February 23-24.

View more Related Events...

Related Articles

There are 1525 articles in our archive related to this topic. Below are 5 selected at random and available to all users of the site.

Why open-ends hold the key to customer satisfaction and loyalty
In this retailer and brand perception case study, the analysis of previously unexplored open-ended survey responses revealed drivers of customer satisfaction and loyalty ratings.
Trade Talk: Homemakers, smokers, and moderators
This month's column briefly reviews three books: Segmenting the Women's Market; The Focus Group; and Latitudes & Attitudes.
The importance of context in conducting Asian research
This article discusses how context in Asian marketing research affects selection of appropriate research sample/respondents; framing of questions to effectively gather meaningful information; and establishing a productive researcher/respondent relationship.
Creating consumer-brand connections via social media - why consumers care and what works
Companies are becoming increasingly connected to consumers via social media and integrating social media into the marketing mix requires a clear brand-by-brand understanding of which methods will reach the target audience.
Has data collection improved or gotten worse?
From telephone and mail research to online, the author charts the progress of data gathering and calls into question whether the Internet is the savior many view it as.

See more articles on this topic

Related Suppliers: Research Companies from the SourceBook

Click on a category below to see firms that specialize in the following areas of research and/or industries

Specialties

Industries

Conduct a detailed search of the entire Researcher SourceBook directory

Related Discussion Topics

Hi Giovanni
10/17/2014 by Dohyun Kim
request
06/06/2014 by Monika Kunkowska
TURF excel-based simulator
04/17/2014 by Giovanni Olivieri
XLSTAT Turf
04/10/2014 by Felix Schaefer
TURF excel-based simulator
03/25/2014 by Werner Mueller

View More