Statistics & the Analysis ToolPak
Describe a dataset with averages and spread, measure how two columns move together with correlation, and run a one-click descriptive-statistics report.
What you will learn
- Compute the mean, median and standard deviation
- Measure the link between two columns with CORREL
- Run the Analysis ToolPak’s Descriptive Statistics tool
Describing data, not just totalling it
A total tells you how much; statistics tell you what the data is like — where its centre sits, how spread out it is, and whether two things move together. Because this course is called Excel for Analytics, these are core skills. The good news: Excel has a function for each idea, plus a one-click tool that reports them all at once.
The centre: mean and median
Two functions describe the “typical” value:
AVERAGEgives the mean — add everything up and divide by how many.MEDIANgives the middle value when the numbers are lined up in order.
They can disagree, and the gap is informative. Take these five salaries (in thousands): 30, 32, 35, 38 and 200 — the last one is an unusually high outlier.
=AVERAGE(30, 32, 35, 38, 200)
=MEDIAN(30, 32, 35, 38, 200)Note: Output: 67 (the mean) 35 (the median) The single 200 dragged the mean up to 67, far above what most people earn. The median — the middle value, 35 — ignores the outlier’s size and better represents a “typical” salary. Whenever the mean and median differ a lot, suspect outliers or a skewed dataset.
The spread: standard deviation
Two datasets can share the same average yet feel completely different — one tightly clustered, one wildly scattered. Standard deviation measures that spread: a small value means the numbers hug the average, a large value means they are spread out. The function is STDEV.S (the .S means a sample, the usual choice when your data is a sample of something larger).
=STDEV.S(30, 32, 35, 38, 40)Note: Output: 4.04 A standard deviation of about 4 says the values typically sit within ~4 of the mean — a tight cluster. If the numbers were 10, 25, 40, 55, 70 instead, STDEV.S would return about 23.7, signalling a much wider spread even though both lists could share a similar average.
The link between two columns: correlation
Correlation measures whether two columns move together. The function CORREL returns a number from -1 to +1: near +1 means they rise together, near -1 means one rises as the other falls, and near 0 means no linear link. Suppose column A is advertising spend and column B is sales:
| Ad spend (A) | Sales (B) |
|---|---|
| 10 | 120 |
| 20 | 150 |
| 30 | 200 |
| 40 | 230 |
| 50 | 290 |
=CORREL(A2:A6, B2:B6)Note: Output: 0.99 A correlation of about 0.99 is very close to +1, so as ad spend rises, sales rise almost perfectly in step. That is a strong positive relationship.
Watch out: Correlation is not proof of cause. A high CORREL means two things move together, not that one causes the other — a third factor (like the season) may drive both. Always say “these are correlated”, never “this proves X causes Y”.
The Analysis ToolPak: a one-click report
Excel ships with the Analysis ToolPak, an add-in of ready-made statistics tools. Once switched on, its Descriptive Statistics tool produces mean, median, standard deviation, minimum, maximum, range and more for a column — all at once, no formulas. Turn it on first:
- Go to File > Options > Add-ins.
- At the bottom, next to Manage: Excel Add-ins, click Go.
- Tick Analysis ToolPak and press OK. A new Data Analysis button appears on the Data tab.
- Click Data > Data Analysis > Descriptive Statistics, choose your data range, tick Summary statistics, and press OK.
Note: Output: Excel writes a small report block, for example: Mean 35 Median 35 Standard Deviation 4.04 Minimum 30 Maximum 40 Count 5 One dialog produced every headline statistic for the column — the fast way to size up a new dataset before you analyse it. The ToolPak also offers Histogram, Correlation, Regression and more.
Tip: For a quick visual of spread, pair the ToolPak’s Histogram tool with these functions: it buckets your numbers into ranges and charts how many fall in each, instantly showing whether the data is clustered, spread, or skewed by outliers.
Q. Your data is 30, 32, 35, 38, 200. The AVERAGE is 67 but the MEDIAN is 35. What does this gap most likely tell you?
✍️ Practice
- Enter ten numbers and compute their mean (AVERAGE), median (MEDIAN) and spread (STDEV.S).
- Make two columns that rise together and one CORREL formula, then read whether the result is near +1.
🏠 Homework
- Collect a real set of at least 15 numbers (such as daily steps or prices). Compute the mean, median and standard deviation, turn on the Analysis ToolPak, run Descriptive Statistics, and write two sentences describing the centre and spread of your data.