Skip to: Main Content / Navigation

  • Facebook
  • Twitter
  • LinkedIn
  • Add This

Estimating sample size for a descriptive study in quantitative research



Article ID:
19990603
Published:
June 1999
Author:
Gang Xu

Article Abstract

Sample size often must be calculated in quantitative marketing research, which requires knowing the variable of interest. Using two cases, this article discusses the variable of interest.

Editor’s note: Gang Xu is a senior research consultant in statistics at Brintnall & Nicolini, Inc., a Philadelphia, Pa., health care consulting and marketing research firm.

In quantitative marketing research, we frequently need to calculate the sample size in order to make inferences about the parent population with a given level of confidence. In general, the larger the sample size is, more precise your estimation is. However, more subjects in the study also leads to a higher cost. Therefore, we need to calculate the minimum number of subjects that are required for a study.

In calculating the required sample size, we need to know the characteristic of the variable of interest. Is that a continuous variable (e.g., mean) or a dichotomous variable (e.g., proportion)? In a descriptive quantitative research study, the sample size varies depending on this characteristic of the variable of interest. We’ll concentrate on the variable of interest as the focus of our discussion on the following two sections.

A. Variable of interest is a continuous variable

Case study one
A pharmaceutical company is interested in knowing the average weekly working hours of primary care physicians. You, as a researcher, want to be 95 percent confident that the true population mean of the working hours is within a specified number of units of the estimated mean you calculate from your sample. For instance, after the data is collected from your survey, you find that the average weekly working hours in your sample is 60. You want to be 95 percent confident that the population mean is within a 10 unit interval, that is, 60±10.

Here, the average working hour is the variable of interest. It is a mean. In estimating the sample size, the variability of the data in the parent population needs to be taken into consideration. Assuming that the distribution of the sample is approximately normal, the following formula can be used to calculate the size of the sample:

              Z2 S2
  n   =   --------
                d2

Where:
n is the size of sample;
Z is the z-statistics for the desired level of confidence;
S is the population standard deviation;
d is the half width of the desired interval.

Z is a fixed value set by you, the researcher. When we say "a desired level of confidence," we usually refer to two levels: 95 percent and 99 percent level of confidence. Holding other variables constant, a higher level of confidence (e.g., 99 percent) requires a larger sample size than a lower level of confidence (e.g., 95 percent). For 95 percent confidence level, Z = 1.96 and for 99 percent confidence level, Z = 2.58. In this example, you have chosen a 95 percent confidence level.

D is also the fixed value at your estimate and choice. In simple terms, d can be thought of as a measure of the precision of sample estimates. A narrow interval (say 55 to 65 with a mean of 60) is more precise than a wider one (say 50 to 70). The former requires a larger sample size than the latter. In this example, you have chosen d = 10.

We usually don’t know the population standard deviation (S). However, you may make educated guesses about it and calculate the size of the sample based on the guesses. For instance, you may guess that the population standard deviation is 30 and then the required size of the sample will be:

              1.962 * 302
  N   =   --------------   =   34.6
                      102

Rounding up the number of 34.6, you need a sample size of 35 to be 95 percent confident that the true mean of physicians’ weekly working hours is within a half width of 10 hours. In other words, you are 95 percent confident that the true population mean ranges from 10 hours lower to 10 hours higher than the mean you obtain from the survey of the sample of 35 physicians.

Note that a higher confidence level would require a larger sample size. In the example above, if you want to increase the confidence level from 95 percent to 99 percent, you then substitute 1.96 for 2.58. You would need a sample size of 60. The precision of the estimate of the variable is inversely related to the size of sample. Thus, a decrease of value of d (a higher precision) requires a larger sample size. Since the variability of population is positively related to the sample size, an increase of value of S increases the sample size.

Suppose now we have d = 5, S = 40 and the confidence level of 99 percent. Put these values into the formula, we find that the required sample size is:

              2.582 * 402
  n   =   ---------------   =   426
                      52

B. Variable of interest is a dichotomous variable

Case study two
A company is interested in knowing the percent of market share of drug X prescribed by primary care physicians for the treatment of diabetes Type I patients. You are asked by the company to conduct a survey among primary care physicians to find out the percent of these physicians’ prescription of drug X. Based on a pilot study, 10 percent of patients with diabetes Type I were prescribed drug X by primary care physicians. You want to be 95 percent confident that the true population percent of market share of drug X is no more than .05 greater or less than the proportion you estimate from your survey. What is the required sample size?

Here, the proportion of market share of drug X is the variable of interest. It is a dichotomous variable.

The formula of calculating the sample size is:

              Z ( p ( 1-p))
  n   =   -----------------
                      d2

Where:

n is the size of sample;
Z is the z-statistics for the desired level of confidence;
p is the estimate of expected proportion with the variable of interest in the population;
d is the half width of the desired interval.

Again, Z = 1.96 for the 95 percent confidence level and 2.58 for the 99 percent confidence level. In the example above, p = .10 and d = .05. Put these values into the formula, we have a required sample size:

              1.962 (.1 (1-.1))
  n   =   --------------------   =   138.3
                     .052

Thus you need to have 138 physicians in your sample to be 95 percent confident that the true proportion of market share for drug X in the population is within .05 of the proportion you estimate.

Here, p refers to the proportion you estimate from the survey about the market share for drug X. Since p (1-p) is positively related to the required sample size, the maximum value for p (1-p) is when p = 0.5. For that reason, when you have no prior knowledge or assumption about the market share for that drug, you can calculate the sample size based on a worst-case scenario when p = .50; d in this case equals .05:

              1.962 (.5 (1-.5))
  n   =   --------------------   =   384.16
                     .052

You thus need 384 physicians in your survey.

It should be noted that, in this article, sample size is calculated for descriptive study. For studies that may involve inference statistical tests such as t-test, analysis of variance, correlation or regression, separate estimations of sample size are needed.

Summary

1. For a descriptive study, the calculation of a sample size largely depends on whether the variable of interest is a mean or a proportion.

2. When the variable of interest is a mean, we need to estimate the population standard deviation, whereas the other values in the formula are fixed.

3. When the variable of interest is a proportion, we need to give an estimate of the expected proportion with such a variable of interest. A conservative approach to this estimate is to give an estimate of 50 percent, meaning that the sample size is estimated in a worst-case scenario.

For a study that may requires inference statistics, the calculation of a sample size may be based on a particular statistical test as needed.

Comment on this article

comments powered by Disqus

Related Glossary Terms

Search for more...

Related Events

DATA GOVERNANCE CONFERENCE EUROPE 2015
May 18-21, 2015
The 2015 Data Governance Conference Europe, Co-located with the Master Data Management Summit Europe 2015, will be held on May 18-21 at the Radisson Blu Portman Hotel in London.
MASTER DATA MANAGEMENT SUMMIT EUROPE 2015
May 18-21, 2015
The Master Data Management Summit Europe 2015, Co-located with the 2015 Data Governance Conference Europe, will be held on May 18-21 in London.

View more Related Events...

Related Articles

There are 1564 articles in our archive related to this topic. Below are 5 selected at random and available to all users of the site.

An overview of Web site assessment techniques
A look at the drawbacks and advantages of several methods of testing Web sites, from focus groups (online and offline) to lab-based usability testing and unsolicited customer feedback.
Data Use: A tale of two tallies
There are two fundamental questions in marketing research that appear at some point in most questionnaires: What did they buy? What’ll they buy next? This article discusses probability analysis technique, a procedure used to maximize the output from these two basic questions.
Enhancing primary research with secondary research
Integrating secondary and primary research allows the market researcher to provide a much broader and higher-quality product that meets more of the information user's needs. The author describes benefits of secondary research and offers tips for conducting this type of research.
Data Use: A new approach for profiling brands and analyzing competitive information
The author lays out ways to avoid some of the problems that accompany reliance on indices to develop brand profiles.
Analysis of business-to-business survey database yields insights on creating competitive advantage
Priority Metrics Group compiled customer survey results into a proprietary database, the results of which were delineated into 17 categories and used to better understand the process of creating competitive advantage and the role of customer surveys in providing critical data necessary to achieving that understanding.

See more articles on this topic

Related Suppliers: Research Companies from the SourceBook

Click on a category below to see firms that specialize in the following areas of research and/or industries

Specialties

Industries

Conduct a detailed search of the entire Researcher SourceBook directory

Related Discussion Topics

TURF excel-based simulator
03/06/2015 by Nicky Turche
TURF excel-based simulator
12/16/2014 by Joseph O. Fayese
Hi Giovanni
10/17/2014 by Dohyun Kim
request
06/06/2014 by Monika Kunkowska
TURF excel-based simulator
04/17/2014 by Giovanni Olivieri

View More