Editor’s note: Eric Cai is the senior data scientist at Acosta. This is an edited version of an article that originally appeared under the title “Using Microsoft Excel’s Data Analysis TookPak to Perform Correlational Analysis” on Generation.1ca.
Microsoft Excel is a useful point-and-click software for data entry, data analysis and data visualization. The Data Analysis ToolPak extension allows you to conduct many types of statistical analysis, such as regression, t-tests, ANOVA and random-number generation.
This article will introduce the concept of correlation and show how to use Excel’s Data Analysis ToolPak to perform correlational analysis.
Suppose that a grocery store sells five products:
The store records transactional data about the expenditures of each product in every customer visit. The table below shows the first five rows of data.
Note: The data above is simulated from a random-number generator. The correlations are unusually high for illustrative purposes only; they do not represent real business data.
The grocery store’s owner wants to know which products tend to sell well together. Correlational analysis can help in this situation.
The correlation coefficient is a measure of the strength of association between two variables. It can range from -1 to +1, and the sign indicates whether the association is in the same direction or in the opposite direction.
The magnitude of the correlation coefficient denotes the strength of the relationship. In mathematical terms, you can get the magnitude by taking the absolute value of the correlation. A bigger magnitude denotes a stronger relationship. A correlation of -1 or +1 denotes a perfect correlation.
Returning to our example of the five products in the grocery store, how can we use correlation to determine which products sell well together? You can obtain the correlation coefficient for the sales...