Editor’s note: Steven Struhl is senior vice president and senior methodologist at Total Research Corporation in Chicago.

One of the fascinating things about Quirk’s Data Use columns (to this writer of them at least) is how many areas fall under this heading. Assuming that “data use” comprises, at the least, data collection and data analysis, we have an enormous field (or more accurately, set of fields) to cover. As many of us know firsthand, these fields still are fraught with difficulties at nearly every turn. We could spend plenty of time covering just a small sampling of the news related to ways in which you can use data better. Also, we could spend as much time - or more - covering areas that may not be news, but which nonetheless could be helpful to readers serious about their data.

The topics we will deal with in this article will illustrate the point that we indeed can follow many directions (likely no surprise to you alert readers). In fact, this review may have at least something for nearly everybody who needs to do some light to heavy analytical work. Specifically, we will cover these areas:

  • We’ll start with a program (called NCSS) that analyzes data and does much more at a price considerably lower than the major competitors.
  • We’ll then look into an intriguing program (called PASS) that does “power analysis” from the same company, and discuss the concept of power analysis for those to whom this is not familiar.
  • After having touched the edges of these two enormous areas (statistical packages and power analysis), we will conclude with a brief discussion of a new program from SPSS that promises to make the onerous task of creating paper-based questionnaires much easier.

NCSS: affordable statistics and power

Because NCSS can be a really excellent introduction to serious data analysis, in this section we will cover some topics that, in earlier reviews, we assumed readers knew well. This way, if you are just starting to consider a statistics analysis program, you may find some material that will help you get grounded in this topic. Other readers still can benefit from the awful jokes that are sprinkled at various spots.

NCSS could be an excellent choice both for the beginning user and many with more advanced needs. Experience has shown that a highly diverse set of people need to analyze, squeeze, and make sense of batches of numbers. Skill levels range all the way from incredibly expert to, shall we say, quite modest. Users’ budgets show almost as much variability - some people definitely need to find the most economical route to their answers, while others inherit such wonderfully costly items as complete “site licenses” to enormous analytical software suites.

If you lean more toward the first group, though, at least two salient problems are likely to confront you as you look into statistics programs. The first is surviving the sticker shock of pricing most major statistical programs. The other is a very real concern about whether the program will ever prove to be manageable - something that a human being could learn, live with, and put to good use on a regular basis.

Background on statistical and analytical packages, or, what’s a bargain (you can skip this if you already know nearly everything)

Most of us, I believe, would find it hard to name more than a few of the best-known packages that do heavy-duty statistical analysis. Many readers know about SAS and SPSS, and some doubtless are familiar with Systat and BMDP. However, there are over a dozen other packages which do some or many forms of advanced statistical processing. Aside from NCSS - the package that this portion of the article addresses - we can find many others including ABStat, CSS, EpiInfo, Kwikstat, Microstat, Minestat, PRODAS, Rats, Stata, Statistica, StatPac, and YStat. Then there are many, many other packages that cover just certain areas of analysis as well - and keeping up with these is even more difficult. For instance, we can find such packages as KnowledgeSeeker and CART (for classification tree analysis), Latent Gold (for latent class analysis), or Limdep (for linear programming, the last time your author checked). (Please be sure to drop us a note if we omitted your favorite package.) These programs cover an incredible range of areas, basic styles of doing things - and prices.

With this in mind, your author will offer his own, perhaps idiosyncratic, method for classifying statistical analysis packages.

Over-the-top packages

Starting at the high end, we have something usually called “enterprise class” software, which nearly always is truly, if not horribly, expensive. You can’t find most of the programs in this class on lists like the one above, since they typically are rarified beasts that attack specific problems. Generally, these are scaled to run in very large computing environments, although some will fit on a mere PC. If you have a problem that one of these programs can address, chances are that a friendly sales representative has called on you, or you already have the item in question. For the rest of us, costs are hard to get in actual dollars, but we can safely say that they typically range from “more than you want to know about” to truly incredible.

Many of these programs do important things, and some more or less run on their own. For instance, if you ever had a telephone card number stolen by professionals, you may have encountered a hard-working program in the “over-the-top” class. As the calls to Somalia, and/or Zimbabwe and/or the South Pole start to accumulate rapidly in your account (assuming these are not places you normally call 50 times a day), the sharp-eyed software will detect an unusual call pattern, and shut down your card number. This requires a sophisticated form of artificial intelligence, and one incredibly heavy-duty piece of software to monitor all the calling activity going through the system. This exemplifies a very practical form of statistical analysis, then, but also a very specialized one which you will not find for sale at your local computer superstore.

The really big packages

Returning to the analytical packages you are somewhat likely to use, your author will hazard that the most extensive (and costly) is SAS. This package still seems to retain its long-running title as the biggest and most feature-laden of all general-purpose statistical software suites. If you devote enough of your life to SAS, you most likely can get it to do anything imaginable to a helpless set of numbers - if not a few things beyond the imaginable.

SAS, however, is available only by licensing. (There are several other packages that use this scheme, so please don’t think we’re singling SAS out for excessive review, or invective. SAS just has the unenviable position of being a relatively familiar example.) Licensing is a form of remuneration largely restricted to the software industry, although it won’t be too unfamiliar to those of you with cable television - or for that matter, telephone service.

As the licensing system usually works, you pay a very large initial fee to obtain the honor of having a “license,” and then fork over a merely large fee each year thereafter to continue using the software. If you do not pay your renewal fees, the software itself most often handles this problem by (as they say in the industry) “dropping dead.” A key feature of most licensed software, then, is what’s called a “drop dead date” - which is what it sounds like. If you don’t pay on time, then one day the software just doesn’t work any more. This has the highly desirable benefit to all software users of making sure the manufacturer gets a nice steady stream of revenues. As mentioned earlier, it’s a lot like cable television.

Other statistics programs and the usual statistical software practice

Most other statistics programs of at least moderate familiarity appear to follow the more usual software model - but often on a larger scale than usual. That is, they command initial purchase fees (robust more often than not), and then follow these by a stream of upgrades which you coincidentally also must purchase to stay current with all the latest and greatest features.

SPSS follows a somewhat more complex variant of this plan, splitting its program into a “base” and “options.” The base has a wide array of features, but most likely if you are serious about doing some in-depth analysis, this base will be missing a few routines that you need in your specific area. For instance, many of us interested in market research and marketing (or anyhow, working in either of those fields) would want the base and several “options” to meet the analytical demands placed upon us. Specifically, at least a few of us would feel ill-armed against a hostile world without at least the SPSS “Categories” module (to do correspondence analysis and other related forms of mapping), the “Professional Statistics” module (to do choice-based modeling), and the “Conjoint” module (to do, you guessed it, conjoint analysis). Those with an econometric bent most likely would want the “Trends” module, which has excellent time series analysis capabilities - and so on.

By the time you have put together the base and a few modules, your purchase costs usually have gone well over four figures (of course, not counting the zeros after the decimal). Then, when upgrade time comes, it seems that you indeed will need to upgrade all the modules (or pieces) that you have grown to know and use, if not love.

Systat, billed as the scientific “alternative” statistical program to SPSS, comes all in one unit, and costs about $1,200. When a new version comes out, you are encouraged to upgrade. The upgrade to Systat 10 (just released) costs users of Systat 9 about $300. Information on other statistical programs is a bit harder to find, but most extant ones also seem to follow this type of model.

Serious power and serious decisions

By now, we hope all of you have noticed that the programs we’ve been discussing would require more of an investment - in both time and money - than most other types of PC-based programs that any of us ever would buy. Also, there can be no doubt that these programs are incredibly powerful and full-featured. Using just some of this power, the lucky user can do analyses which will pay for the purchase (or license) price many times over. We also should mention that the decisions in which these analyses figure often become truly momentous. To sum up what may already be obvious, then, making a good decision about one of these programs can make a big difference.

Enter NCSS

Still, some readers likely are still wondering if they can find some way to get serious and trustworthy analytical power - and a program that they actually can use - for a less formidable investment. Others may be just considering a statistics program, but are daunted by the high price of entry. NCSS is a program that will fill both needs amply. (NCSS is a wise abbreviation of the program’s long moniker: “Number Cruncher Statistical System.”) For roughly the last 20 years - extending all the way back to those primitive days of the DOS prompt and XT computers - NCSS, Inc. has (somewhat quietly) been building what is now a formidable statistical analysis package.

The newest version of NCSS (NCSS 2000) comes with a remarkable array of features, including many that cover advanced analyses - and as follows, quite a few features that you would not expect in anything but a very expensive piece of software.

Not to spoil our punch line, NCSS manages all this for $300 (with documentation on the CD-ROM) or $400 (with about 2,000 pages of nicely done large-format manuals).

NCSS: what’s inside the package

There are two basic aspects of a statistical analysis package that require some careful consideration. The first is how it treats you as a user. The second is the range and depth of the program’s features - or what it will do for you, assuming you can figure out how to get it working.

NCSS and how it treats the user

NCSS treats the user extremely well for a statistics program. “User friendliness” is one of those strange terms that did not exist before the advent of personal computers. Things either worked well and easily, or they didn’t. Items in the first class usually were the good products, and their operation was more or less taken for granted - for instance, toasters. (Back in the good old days, early makers of “user-unfriendly” toasters simply went out of business.) Those objects in the second group were tolerated as long as they were useful, but often were roundly cursed as much as appreciated. Early automobiles are an example. (Actually, so are a lot of more recent automobiles.) If they worked, they were wonderful. When they decided not to, they were (you may pick your modifiers) contraptions.

Only with the entry of computers and software do we now have large classes of products that, ambiguously, may or may not work, or may or may not be possible to use and keep using. Early computers and programs tended to be quite unfriendly. If you go back far enough in the PC era, you too will recall when at least 80 percent of the messages you got from your PC were, approximately, “ERROR ERROR ERROR.”

Statistics programs, more than most, have tended to cling to the old “user unfriendly” paradigm. Many retained a basic “command-line” structure - where you type in the program’s favorite syntax and hope that it runs — far into the Windows era. Some run this way still. (Others strike a sort of compromise, letting people choose from menus, point-and-click if they prefer, or type-in commands if that makes them feel better.) Even those with menu systems, button bars, and all the latest accoutrements can remain quite opaque in their operations to users less familiar with them.

NCSS has made quite a graceful transition to Windows, with the look and feel of a program that was actually designed for this environment. Taking a look at Figure 1, you can see that the opening screen looks something like a familiar data entry sheet from Excel. Those of you familiar with the more recent versions of SPSS will recognize the second tabbed sheet, which contains a description of the variables, including both their short names (eight characters or less) and their long labels.

NCSS and value labels: a neat solution

The way in which NCSS handles value labels is somewhat different from SPSS, though, and quite elegant. (Value labels are the descriptive values that can be given to numerically coded variables. If you have a variable such as region, for instance, you might have number codes such as 1, 2, 3, 4 to stand for North, South, East, and West. You could then use the actual names as the value labels for the numeric codes. In the analyses you run, these value labels could appear either with or instead of the actual numeric codes. A crosstabulated table that showed the actual names of the regions, for instance, would be much easier to use than one which had only the number codes.)

In any event, what NCSS does with value labels is to put them into a spot in the data sheet. Let’s say you specify that the value labels go in column 204. The program then looks at column 204 and, directly to the right, at column 205. The number codes should be in the left column (204) and the actual labels appear in the right column (205), something like the table below.

The elegant part of this is that any time you want to use these labels, you simply type into the appropriate spot on the “variable info” tab that the value labels are found in column 204. Any number of variables can point to the same labels.

This becomes very handy as you add new variables that have the same value labels as ones you already have in the data set. You simply type into the “variable info” tab that the appropriate labels are in the same spot (for instance, column 204 again).

This also can be very nice if you want to change the labels for a large group of variables at one time. Any change in the labels that appear in column 205 will apply to all variables that share those labels simultaneously. Also, if you have questions about the labels, they are right on the sheet in plain view. They are not hidden someplace inside the data structure as they are in many other statistics programs.

Overall this is quite a nice solution for labeling values, and likely to require less typing than others this writer has seen.

NCSS help, hints and tips

Another thing that you may notice about NCSS, looking at Figure 1, is that it uses areas of the screen to give hints about what things are. For instance, toward the bottom, it reminds users that what they are looking at is the data entry screen. The program is extremely liberal in using hints, so that at most times there are on-screen reminders or pointers about what you are doing. Note in Figure 2 that the program is giving some very detailed pointers about what exactly the type of variable highlighted is and what it can do in the analysis. If you are just a tad rusty about “fuzzy clustering,” this type of information can be extremely helpful, and may even save you a reading session in the program’s manual.

Figure 1

The only small area in which these hints could be improved is that they sometimes are rather long, and tend to disappear as you move the cursor over to scroll down through them. Learning how to read the longer hints takes just a little practice, and perhaps a little forbearance in moving the mouse to the scroll bar.

Figure 2

Getting the analysis out of the program

Another good feature of NCSS is its way of handling output, or the key analytical results. The program has a built-in word processor which keeps everything (charts and graphs included) in a document in RTF (rich text format), which nearly any word processing program can read and modify. This means that once you are done with NCSS, you do not need to open the program again to read and use the results. (As a contrast, the standard output from SPSS, which officially is a “.spo” file, requires that you use SPSS or an accompanying SPSS output-reader program. However, on the positive side for SPSS, the “.spo” file is a “dynamic document,” which allows you to do things like flip the rows and columns in tables, manipulate charts, and so on.)

NCSS yields nicely formatted output, and can create very good-looking charts and graphs. If you are not satisfied with the way things look, you can use your word processing program to embellish at will. Using the RTF format, NCSS tables come into a word processing program as real tables, for instance, and then you can use the full power of Word, Word Pro, or Word Perfect (of whatever else you may have) to add looks or styles.

NCSS does not organize output in the way that will be familiar to SPSS users, with a sort of tree diagram in a separate window to the left that shows you where each portion of the analysis resides. (The SPSS tree diagram approach also allows you to move or delete entire sections of the output easily.) This is one area where SPSS seems to have a small edge. Since you can type at will into the NCSS word processor’s output, though, and use search-and-replace to find and move whatever you wish, most users should find the NCSS approach more than adequate.

Data in and out

The makers of NCSS have sensibly noticed that they do not live alone in the world, and so NCSS can import data from and export data to a wide range of programs. These include databases like Access, spreadsheets like Excel and Quattro, and statistics programs like SPSS, SAS and Systat. The only hitch your author found was that value labels did not make the transition from the other statistics programs to NCSS. This, then, would require some retyping - unless there is some special import routine that I missed.

Features and capabilities

NCSS packs in an amazing number of features, especially for a program in its price range. This section will review the key ingredients in NCSS. As you may have noticed in Figure 2, NCSS includes some rather esoteric analytical methods, like fuzzy clustering. This section of the review will by necessity read something like a catalogue, since we cannot hope that any of you would stay awake for a complete review of each feature. Rather, we hope that most of your responses will fall into the first of three relatively polite classes, namely (1) some careful reading followed by surprise at the number and extent of procedures included. We hope that only a small minority will fall into the other classes of somewhat polite response, namely (2) a smoldering feeling someplace behind the eyes followed by an outburst of indignation because a favorite method is missing, or (3) a shrug of the shoulders, perhaps followed by a feeling that “this is all a foreign language to me.” Still less polite responses are strictly discouraged.

NCSS, in any event, includes all the basic descriptive and nonparametric statistics you are likely to want. These include frequency distributions, tables of means and related univariate statistics, Chi-square tests, confidence limits, trimmed means, crosstabulation, Spearman’s and Kendall’s Tau correlations, Mantel-Haenszel tests, Fischer’s exact tests, and assorted other nonparametric tests. The program also throws in, for good measure, calculators for probabilities and for areas under a curve. You also can get it to do linear programming for you.

The program’s regression capabilities are particularly diverse, including several forms of regression intended to deal with the problems of multiple correlations among predictor variables (or as it is sometimes called, collinearity or multicollinearity) or “outliers” (extreme values) in the data. These include: all-possible search regression, principal components regression, robust regression and ridge regression.

NCSS also handles several relatively rarified procedures including: canonical correlation, response surface analysis, non-linear regression and proportional-hazards regression in addition to the old standby, stepwise linear regression. The program in fact has full-fledged general linear model (GLM) capabilities, which appears roughly on a par with what you will find in the leading statistics programs.

NCSS also can do logistic regression, both binomial and multi-value (or as it is more often called, multinomial). This means that you can, in theory, use NCSS to solve discrete choice analysis problems. However, you should be forewarned that the manuals give you no help in this area, not being geared specifically toward market research analysis. Given that McFadden has just shared in a Nobel prize (in economics) for his work on choice-based analysis, perhaps NCSS will consider including more discussion of this method, and examples showing how to apply its multi-value logistic regression capabilities toward this end.

In its methods for one-way analysis of variance (ANOVA), the program could perhaps offer a little more depth. While it has a large number of methods for comparing multiple means, it does not include (for instance) Tukey’s Honest Significant Difference (HSD), which has emerged in research over the last few years as a good choice in many cases. Nor does it include some of the newer methods which do not make as many assumptions about the data (in particular, that the variances of the variables being compared are equal), such as Tamhone’s T2, Dunnett’s T3 and C, or the Games-Howell tests.

If you are shaking your head in bemusement now, let us assure you that relatively recent research has shown most tests of multiple means can break down if the variances (patterns of scattering) in the variables being compared are too different. In most practical applications, this is a relatively little-explored problem, but it is nice to have a statistical program that can address the issue should it arise. This probably is not a concern to most users, however, as informal observations suggest that the most widely used test of multiple means remains the good old Newman-Keuls (or its variant, the SNK) - which more recent research unfortunately has shown to be one of the least accurate means of doing multiple comparisons.

Beyond this slight limitation in test methods for one-way ANOVA, though, the program includes an impressive set of MANOVA capabilities, and can do repeated measures analysis of variance. Again, it does not seem that many users avail themselves of these procedures, and at times it seems like hardly anybody understands MANOVA, either as an audience member or as a user, but the program should perform handsomely in these areas if you need it to do so.

The program also offers a very wide range of other methods which fall under the heading of multivariate analysis. These include:

  • clustering - K means;
  • clustering - various hierarchical methods;
  • correspondence analysis;
  • discriminant analysis;
  • factor and principal components analysis;
  • item analysis and item response analysis;
  • loglinear models;
  • multi-way tables;
  • multidimensional scaling.

The correspondence analysis module is particularly nice. It produces nice scatter-diagrams with the row and column labels right next to the points, and with different symbols for the rows and the columns. It also allows you to add special rows of data that are not used in calculating the overall solution but that are plotted against it. This can be very helpful if you want to contrast data in a current analysis with data from some other analysis, without having the data from the second analysis influence the results. The charts paste nicely into PowerPoint, for instance, where you can nudge around any labels in an overcrowded diagram.

For those of you less familiar with correspondence analysis, we should point out this method can take any rectangular crosstabulated table of data, and turn it into a chart with one point for each column and one for each row. The proximities of the points to each other show their similarities. The distances of the points from the center of the chart show how “distinctive” or “average” each row or column is. This often is a tremendously useful means of visualizing data. For instance, a table with 10 columns and 12 rows would become a correspondence map with just 22 points on it. As we all should know by now, most audiences will quickly lose track of any table with more than three columns of numbers in it, so this method can prove most helpful in showing patterns that otherwise would become lost in a cloud of numbers.

Still more features

The program also includes a quite respectable time series analysis, which can handle ARIMA/Box-Jenkins analysis, and also can do decomposition, exponential and smoothing models, harmonic analysis and spectral analysis. This is not quite as fancy as the SPSS Trends module, which includes both X-11 ARIMA and the ability to include leading indicators, but still could be all you need for most time series investigations. If you do time series analysis, these terms all should be quite familiar to you. If not, we can assure you that NCSS has an impressive list of capabilities in this area.

Another surprise is the depth and extent of this program’s ability to do survival/reliability analyses. Again, this may reflect a slight bias toward the needs of those in engineering and the “hard sciences.” However, should you ever need to do this form of analysis, the program likely will have what you need, and enough descriptive information to help you make a good choice among its many methods. It can handle accelerated life tests, censoring of values (all types), exponential fitting, extreme-value fitting, the Kaplan-Meier method, lognormal fitting, log-rank tests, probit analysis, survival distributions, and Weibull analysis. This is quite a list of features - some of them even sent your reviewer back to his statistics books, just to make sure of what they are and what they do.

The program also can generate experimental designs. It does not have every design that has ever existed, but certainly has a wide range going beyond the fractional factorial (or Taguchi) designs that may be familiar to some of you from conjoint analysis. Should the situation arise, you can develop balanced incomplete block designs, Box-Behnken, central composite, Placket-Burman and response surface designs. The program also has a D-optimal designs generator, so it definitely is keeping up with the latest fashions in the world of experimental design.

The program also ventures into the area of quality control analysis. Some of the terminology here also may seem a little unfamiliar, but should you wish to read about these methods, you might well find some of them helpful — and even familiar (although you never quite knew this form of nomenclature for them). Anyhow, here is a brief run-down of some of the features you will find: Xbar-R charts, capability analysis, EWMA charts, moving average charts, Pareto charts, and R & R studies. (That last item is not “rest and relaxation,” which any of you who have read this far doubtless deserve. Rather it stands for “repeatability and reproducibility” study, and sometimes elsewhere is called a “gauge study.” These studies usually are done with analysis of production processes to determine if a particular measurement procedure is “adequate.” There is a lot more to this, but basically they are good for determining which part of “process variation” comes from a measurement procedure being used, and which part is due to the production process itself.)

Plots and graphs

As mentioned earlier, NCSS does nicely at plotting and charting your data. It produces both old favorites and a few methods that may be new to most readers. A partial list of these would include: bar charts, box plots, contour plot, dot plots, error bar charts, histograms, percentile plots, pie charts, probability plots, ROC curves, scatter plots, scatter plot matrices, surface plots and even violin plots. (This latter is not a conspiracy among members of an orchestra, but rather a fairly technical-looking composite of two plots, that shows both an overall distribution, and details about its density at various places. This could be quite handy for dazzling a recalcitrant audience.) Figure 3 shows a selection of NCSS charts and graphs.


Figure 3


An NCSS wish list

Although NCSS can generate the experimental designs that are generally used for conjoint analysis, it does not have a module that specifically handles the analysis itself. (If you are truly talented, you could manipulate the program into doing this, but then you probably would not be reading this review.) In any event, conjoint is not terribly difficult for a piece of software to do, so perhaps this is something the makers of NCSS could consider adding. Also, as mentioned earlier, some additional material in the section on multi-value (multinomial) logit, showing how this method can be used for discrete choice modeling, would be a strong addition.

Other items on the wish list are scattered. Aside from adding a few more test methods for one-way analysis of variance (ANOVA), these could just reflect your reviewer’s idiosyncrasies. For instance, the otherwise very strong discriminant analysis module does not allow for rotation of the discriminant coefficients or correlations. Rotating a discriminant solution has an effect something like rotating a factor analysis solution - that is, it tends to prevent most of the predictors from piling into the first function formed. Rotation does not change anything in the way the discriminant analysis predicts the groups in which people should belong, but if you are using discriminant analysis to do mapping, this can help make more useful maps. (Readers who are curious about discriminant-analysis based mapping can find an article on this in the Article Archive at the Quirk’s site [www.quirks.com/articles/search.htm] although it lacks the illustrations from the original magazine piece.) NCSS also is missing another of your reviewer’s favorites, discriminant “territorial maps,” which can both look nice and impress your audience. But again, these last items are more matters of preference than of need — and so we promise to spare you any others.

The bottom line on NCSS

For the price asked, and indeed even for considerably more, you hardly can go wrong with NCSS. We strongly recommend that you splurge a little and spend the extra $100 required to get the full set of manuals. You will find them clearly written, nicely laid out and produced, and full of helpful background and reference materials. The makers of NCSS even include a long section with annotated references, discussing their strengths and reasons for which they are recommended.

As mentioned earlier, you can get all the information in the manuals from the program’s CD-ROM in PDF format, which means you will need to get your hands on a (free from the Adobe Web site) Acrobat reader. NCSS always comes with a rather neatly named, large format book called “Quick Start and Self Help Manual,” which goes over the basics of running NCSS. You should find this extremely clear, with plenty of helpful screen shots and annotations, a real model of how to introduce a complex software program.

However, it is your reviewer’s considered judgment that you will like having the full set of manuals around, simply because it is easier to prop a book open and keep it on hand while you work than constantly to refer to a PDF document which most likely is buried a few screens down from where your analysis is taking place. Also, just to become annoyingly obvious (again), you cannot put flags or Post-its on select spots in PDF documents or write yourself little useful notes in their margins.

Overall, NCSS program could be an excellent starting point for those seeking an introduction to statistical analysis. Because of the program’s wealth of features, it also could serve as an excellent addition to your analytical armamentarium, even if you already own one of the larger statistics packages.

PASS: statistical power analysis

This is another program from the makers of NCSS, but with a slightly different goal. It does power analysis, which is the determination of how many respondents you will need to get results of some specified reliability. PASS does this with an amazing array of procedures, as we will discuss below.

Many of us are familiar with a relatively simple form of power analysis, namely, determining the sample size needed to reach a specified level of statistical confidence (such as 90 percent or 95 percent) with a sample proportion or percentage. In fact, for several years, one of the larger market research firms passed out nice cardboard devices (with either a sliding scale or a wheel you could rotate) that allowed you to do these calculations on the fly.

Real old-timers even could do these calculations with a slide-rule. For those of you who were not here in the exciting days before PCs, slide rules then were the ultimate in geek self-identification, especially if worn on the belt in their special leather cases designed just for that purpose. Slide rules contained several strangely labeled scales on each side, which you could slide by each other - and with enough practice, you could get pretty accurate answers for all sorts of strange calculations from them. Also, the best ones came from Germany - the former bastion of scientific precision - and cost far more than a powerful calculator does today.

Even with a Pentium IV humming in front of you, power analysis for other statistical methods requires some reading and thinking if you are not familiar with it. For most methods of power analysis, you also will need to make some assumptions about the patterns of scattering (or variance) in your data. Of course, you can make multiple sets of assumptions and see what sample size each one requires - which usually has been part of the power analyses your author has seen. Power analysis, as generally applied, lets you ask and many what-if type questions. PASS lets you ask many questions, as it can solve for power, sample size, effect size, and alpha level.

As your work becomes more “official,” power analysis seems to become more important. Many government agencies require power analysis before approving any research project. I would suspect that before too long more corporations of the highly serious (or pretentious) type will start asking for it as well.

The number of procedures that this program works with is almost mind numbing. For instance, among its analysis of variance procedures you can do power calculations for: factorial analysis of variance (AOV), fixed effects AOV, Geiser-Greenhouse, one-way AOV, planned comparisons, randomized block AOV, and repeated measures AOV.

Among the types of regression and correlation this program can handle are not only linear regression and simple correlations, but also logistic regression (binomial and normal) and intraclass correlation.

Along with power analyses that may be more familiar with proportions and t-tests (such as determining confidence intervals around a proportion, or comparing two proportions), PASS includes power analysis for a wide range of related procedures. This list is so comprehensive that some of these likely will be unfamiliar to most readers. Nonetheless, here are some of them: group sequential proportions, Simon’s two-stage designs, cluster randomization, Mann-Whitney tests, and Wilcoxon tests.

The program also can do power analysis for a wide range of survival analysis methods, which include: Logrank survival (simple and advanced), group sequential survival analysis and post-marketing surveillance analysis. I would suggest that you don’t feel too concerned if these are not familiar to you (not that you are worried), as these are, for the moment, applied more often to scientific studies and studies of industrial processes than to analyses related to marketing or market research. However, it would not surprise your reviewer too much if certain organizations started demanding these types of power analysis before spending their hard-won money (or for that matter, ill-won money) on research projects.

The program is indeed very up to date in the power analysis applications it has available (certainly, more so than your humble reviewer). It now includes a set of analytical methods for group sequential tests, with such “cutting-edge” topics as alpha spending functions using the Lan-DeMets approach. Finding out what this is and how it is used at the very least will promise that, in the right circles, you will sound impressive - or at least obscure (which often comes to the same thing).

Figure 4

Perhaps best of all, PASS comes with a 490-page manual that contains tutorials, examples, annotated output, references, formulas, verification, and complete instructions on each procedure. It seems safe to say that it does as much as is possible to make power analysis approachable, and even possible, for many users. Mostly you need to fill in the appropriate boxes in rather nicely constructed screen dialogues — although as Figure 4 suggests, you may have a lot of concerns to address, and understand, while doing your analysis. Also fortunately for most users, the program automatically creates appropriate tables and charts of the results. It generates text summaries in a “portable” format that you can easily cut and paste using any word processor. This, of course, means you can easily include the required results in your proposal.

PASS overall

Your reviewer is not aware of any other program that can calculate required sample sizes and power for as many different statistical procedures as does PASS. For a relatively modest outlay ($250), PASS will provide you with all the power analysis you are ever likely to need. If you are new to this area of statistical analysis, it will provide an excellent introduction, and should help you get up to speed when you need to know something about statistical power.

The new mrPaper program and the “Paper Solution” from SPSS

With all this power (both statistical and of the power analysis variety) available at a relatively modest cost, it is not surprising to see the larger players in this software arena branching out into related areas. With some luck, we are just starting to see more applications that prove to be both new and useful.

SPSS has focused particularly on market research, and now has a group (or division, or section of the company) identified specifically as SPSSmr (for “market research”). Your author recently had an opportunity to see a long interactive demonstration of a new product from SPSS intended to simplify the process of creating and revising paper-based surveys. SPSS calls this product “mrPaper.” I also had a glimpse at another product that works along with this one, and which appears to make it much easier to translate surveys. Perhaps unsurprisingly, this is called “mrTranslate,” and we will discuss this along with several companion programs briefly toward the end of the review.

As a small caveat, I need to say that while I saw a lengthy demonstration, and had the chance to ask as many questions as I could muster, I did not spend time using a real working version of the mrPaper product. This means that I cannot make any semi-intelligent comments about the performance of this product under fire, or when it is being tortured or abused. This program is so intriguing, though, that I am bending a long-standing rule (self-imposed) of never writing about a program until I have given a full working version a stiff workout. (We can, though, at least promise that no software was harmed in the creation of this section of the review.)

The mrPaper program neatly addresses the problems that arise in the creation of nearly all paper questionnaires, namely the formatting of complex questions and dealing with revisions. The program uses a so-called “questionnaire authoring” tool (either Quanquest or In2form), along with a set of additions and modifications to Microsoft Word, to allow incredible flexibility in creating and modifying questions. The program adds a set of “question looks” to the Microsoft Word set of paragraph styles. However, “looks” are more detailed than paragraph styles, specifying the entire set-up and appearance of a given question type. For instance, if you want all long lists to appear in two columns, you can specify the exact spacing, where the columns start after the main body of the question, whether check boxes or numbers appear, and so on. If you later decide that you want the question to appear as three columns, it only takes a click of a mouse button to make the transformation. Anybody who ever has tried to do this type of reformatting by hand can understand how much work this can save.

The use of “question looks” also ensures consistency in formatting, no matter who works on the questionnaire. These “looks” can be gathered into broader themes, so that you can (for instance) have one type of formatting for one client and still a different type of formatting for another. This seems like a nearly ideal solution for dealing with more than one pesky person who wants to have everything in a questionnaire “just so.”

Perhaps the cleverest feature of the mrPaper product is that it keeps all the information about the questions in the questionnaire in a centralized location that is not a part of the document. SPSS calls this “meta-data.” The practical importance of this is that you can insert or move questions and the program will automatically change the numbering of all questions accordingly. (With mrPaper, you may never have to see anything like “Q6a.2b” in the numbering of your questions again.) The centralized meta-data file keeps track of everything for you over the course of as many revisions as you need to make. It was amazing the watch how questions and formatting effortlessly flowed into place after revisions, guided by mrPaper.

The mrPaper program also links into several other programs that SPSS offers. As mentioned earlier, the mrPaper program works with another module called mrTranslate. Here the use of the “meta-data” approach and “question looks” attains great power. Translators only have to worry about translating the key phrases and terms used in the questionnaire. This occurs outside the translated questionnaire that will appear in the new language. Once the appropriate wording has been translated, mrPaper lays out the questions and formats the questionnaire in the other language automatically.

If revisions get made to the main, English-language questionnaire, such as adding questions, placeholders for these are reserved in the foreign language version, and again all the translators need to do is provide the appropriate translation terms. Any moving or reformatting of questions in the main (English-language) questionnaire is immediately reflected in the translated versions. Everything is kept in neat order by the program.

If you wish to scan questions, mrPaper will work with still another program, called mrScan. This program again relies heavily on the “meta-data” that resides outside the questionnaire. It quickly goes through the questionnaire and identifies (by a process that looks like an optical character recognition program in action) where all the questions and answers are. It also can encode each questionnaire with a unique ID number and put the “scanning marks” on the questionnaire that are needed to align each page properly as it goes into the scanner. Obviously, you will need a fairly high-quality sheet-feeding scanner to use this program’s full capabilities.

How the various programs fit together is summarized in Figure 5. Here I have taken the liberty of simplifying and annotating a diagram from SPSS. This figure shows only the way in which mrPaper would work with other programs to handle scanned documents — which requires more software than a standard key-punched paper questionnaire. With apologies to SPSS for defacing their beautiful original artwork, then, and with hopes that this simplifies matters, I refer you to the flowchart shown in Figure 5.

Figure 5

The mrPaper program along with Quanquest (which you also need) costs $3,650 for one copy of both for one year. SPSS also strongly encourages that you send one or more people to two training sessions (one for mrPaper and one for Quanquest) before using the product. This would cost $3,600 if you have it done on-site. The cost would be substantially less - as little as $1,200 - if you can go to an SPSS training facility.

It seems that this program would, with any reasonable volume of questionnaires, more than pay for itself in time and effort saved - or as the catch phrase goes, in “increased productivity.” At the very least, it should save much of the aggravation that seems to accompany the evolution of nearly every paper-based questionnaire. It is a very intriguing product that looks well worth investigating and considering carefully.

While we can point out that this product is interesting, though, there most certainly does not seem to be any “off the shelf” solution that we can recommend as likely to meet most organizations’ needs. As you can see from Figure 5, the program is intended to work best and most efficiently as part of a suite of SPSS applications, or as SPSS calls it, “the SPSS paper solution.” You most likely will want to speak to your friendly SPSS representative about this product, then, since you will need to make some decisions about your needs for one or more of the related programs. This most likely will require that you engage in some careful discussion about, and deliberation of, the alternatives.

And now to the conclusion

Looking back at the three programs discussed, your reviewer will get to enjoy an appropriately beneficent feeling, since this review is being completed right after the Christmas holiday. These programs all appear to be, in brief, “very good stuff.”

  • NCSS seems like an excellent choice, both for those hoping to start with more in-depth statistical analysis, and as a supplement to other programs for more experienced users who need some of the many features it offers. (As a reminder, while NCSS costs $300 with documentation on disk, we also recommend that you spend the extra $100 to get the excellent paper-based manuals.)
  • PASS offers the most comprehensive approach to “power analysis” that your reviewer has yet seen, and does as much as possible to make something approachable and useable out of this sometimes arcane area.
  • The new mrPaper program from SPSS looks like a remarkable step toward simplifying and streamlining the process of creating paper-based questionnaires.

Overall, then, there seems to be quite a bit to praise in a wide range of areas, and only a few scattered grumbles. Being able to say that is truly remarkable when dealing with the mysterious world of software.

Those of you who still have more than one eye half open are invited to address any questions or (positive) comments directly to your reviewer at the e-mail address following. Complaints and criticism can be sent to top management — although I seem to have lost their address, but doubtless will get right back to you with it. This will happen any time, very soon. Right now it’s practically in the mail.

Important addresses

  • You can find information and even free evaluation downloads of NCSS at: www.ncss.com.
  • You can find information on mrPaper and many other SPSS products at: www.spss.com.
  • We know we have that address for complaints somewhere. We’ll keep looking.