Advanced AnalyticsPro· 45 min read

Statistics & the Analysis ToolPak

Describe a dataset with averages and spread, measure how two columns move together with correlation, and run a one-click descriptive-statistics report.

What you will learn

  • Compute the mean, median and standard deviation
  • Measure the link between two columns with CORREL
  • Run the Analysis ToolPak’s Descriptive Statistics tool

Describing data, not just totalling it

A total tells you how much; statistics tell you what the data is like — where its centre sits, how spread out it is, and whether two things move together. Because this course is called Excel for Analytics, these are core skills. The good news: Excel has a function for each idea, plus a one-click tool that reports them all at once.

The centre: mean and median

Two functions describe the “typical” value:

  • AVERAGE gives the mean — add everything up and divide by how many.
  • MEDIAN gives the middle value when the numbers are lined up in order.

They can disagree, and the gap is informative. Take these five salaries (in thousands): 30, 32, 35, 38 and 200 — the last one is an unusually high outlier.

Mean versus median on data with one large outlier
=AVERAGE(30, 32, 35, 38, 200)
=MEDIAN(30, 32, 35, 38, 200)

Note: Output: 67 (the mean) 35 (the median) The single 200 dragged the mean up to 67, far above what most people earn. The median — the middle value, 35 — ignores the outlier’s size and better represents a “typical” salary. Whenever the mean and median differ a lot, suspect outliers or a skewed dataset.

The spread: standard deviation

Two datasets can share the same average yet feel completely different — one tightly clustered, one wildly scattered. Standard deviation measures that spread: a small value means the numbers hug the average, a large value means they are spread out. The function is STDEV.S (the .S means a sample, the usual choice when your data is a sample of something larger).

How spread out these five tightly-grouped numbers are
=STDEV.S(30, 32, 35, 38, 40)

Note: Output: 4.04 A standard deviation of about 4 says the values typically sit within ~4 of the mean — a tight cluster. If the numbers were 10, 25, 40, 55, 70 instead, STDEV.S would return about 23.7, signalling a much wider spread even though both lists could share a similar average.

The link between two columns: correlation

Correlation measures whether two columns move together. The function CORREL returns a number from -1 to +1: near +1 means they rise together, near -1 means one rises as the other falls, and near 0 means no linear link. Suppose column A is advertising spend and column B is sales:

Ad spend (A)Sales (B)
10120
20150
30200
40230
50290
Measure how strongly ad spend and sales move together
=CORREL(A2:A6, B2:B6)

Note: Output: 0.99 A correlation of about 0.99 is very close to +1, so as ad spend rises, sales rise almost perfectly in step. That is a strong positive relationship.

Watch out: Correlation is not proof of cause. A high CORREL means two things move together, not that one causes the other — a third factor (like the season) may drive both. Always say “these are correlated”, never “this proves X causes Y”.

The Analysis ToolPak: a one-click report

Excel ships with the Analysis ToolPak, an add-in of ready-made statistics tools. Once switched on, its Descriptive Statistics tool produces mean, median, standard deviation, minimum, maximum, range and more for a column — all at once, no formulas. Turn it on first:

  1. Go to File > Options > Add-ins.
  2. At the bottom, next to Manage: Excel Add-ins, click Go.
  3. Tick Analysis ToolPak and press OK. A new Data Analysis button appears on the Data tab.
  4. Click Data > Data Analysis > Descriptive Statistics, choose your data range, tick Summary statistics, and press OK.

Note: Output: Excel writes a small report block, for example: Mean 35 Median 35 Standard Deviation 4.04 Minimum 30 Maximum 40 Count 5 One dialog produced every headline statistic for the column — the fast way to size up a new dataset before you analyse it. The ToolPak also offers Histogram, Correlation, Regression and more.

Tip: For a quick visual of spread, pair the ToolPak’s Histogram tool with these functions: it buckets your numbers into ranges and charts how many fall in each, instantly showing whether the data is clustered, spread, or skewed by outliers.

Q. Your data is 30, 32, 35, 38, 200. The AVERAGE is 67 but the MEDIAN is 35. What does this gap most likely tell you?

Answer: A large gap between mean and median signals an outlier or skew. The high 200 inflates the mean to 67, while the median (the middle value, 35) is unaffected by the outlier’s size and better represents the typical case.

✍️ Practice

  1. Enter ten numbers and compute their mean (AVERAGE), median (MEDIAN) and spread (STDEV.S).
  2. Make two columns that rise together and one CORREL formula, then read whether the result is near +1.

🏠 Homework

  1. Collect a real set of at least 15 numbers (such as daily steps or prices). Compute the mean, median and standard deviation, turn on the Analysis ToolPak, run Descriptive Statistics, and write two sentences describing the centre and spread of your data.
Want to learn this with a mentor?

CodingClave runs guided, project-based training (28-day, 45-day & 6-month batches).

Explore Training →