Pareto analysis

Sort your categories by size, sum them cumulatively, and circle where 80% of the impact comes from. Juran's generalization of Pareto's 1896 income-distribution observation, adapted for quality management — the fastest way to focus a team on the vital few causes instead of the trivial many.

What it is

The 80/20 observation is often called the Pareto principle, after Italian economist Vilfredo Pareto. In his 1896 study Cours d’économie politique, Pareto observed that roughly 80 percent of Italy’s land was owned by 20 percent of its population — a distribution he went on to find in other countries and datasets. Pareto himself never claimed 80/20 was a universal law; he was documenting a skewed distribution.

The generalization into a quality-management technique came later. Joseph M. Juran, working on post-war industrial quality, noticed the same distribution in defect data: a small number of defect types accounted for the majority of the defects. He coined the phrase the vital few and the trivial many and formalized the technique in his Quality Control Handbook (McGraw-Hill, 1951). Juran explicitly credited Pareto for the underlying observation while making clear the technique’s value was in the management application — focus on the vital few, ignore the trivial many.

The numbers 80 and 20 aren’t magic. In practice you’ll see 90/10 distributions, 70/30 distributions, sometimes even 99/1. The point isn’t that 80 is the right threshold — it’s that distributions of cost, defects, effort and value are almost always lopsided, and surfacing which few items dominate is high-leverage.

In a waste walk, you’re sitting on a dataset that’s tailor-made for Pareto analysis: hours invested in activities by stereotype, broken down by whether they’re value-add or non-value-add. Sort the non-value-add hours by category, compute the cumulative, find the 80 percent line, and you have your list of improvement candidates for this sprint.

When to use it

Use Pareto analysis when:

  • You have numeric data across several categories — hours, defect counts, support tickets, customer complaints, revenue, anything measurable.
  • You need to focus a team’s attention. Pareto converts a long list into a short one, which is usually the friction point preventing action.
  • You’re planning what to automate next, or what to eliminate from a process. Automation and elimination both benefit from going after the biggest category first.
  • You want to track improvement over time. Running Pareto at the end of each sprint shows whether your actions are shifting the distribution — categories that used to dominate should shrink.

Don’t use Pareto analysis when:

  • The data is too thin to tell a distribution. Two data points aren’t a Pareto. You’ll need at least five or six categories before the cumulative curve tells you anything useful.
  • The categories aren’t independent. If fixing category A also eliminates category B, Pareto will mislead you — the top category is doing double duty. Either merge the categories or accept that the analysis is indicative, not definitive.
  • You’re looking at rare catastrophic events, not recurring low-grade ones. A single outage that cost 200 hours dwarfs everything else; Pareto correctly tells you to focus there, but the analysis is trivial. A 5 Whys or fishbone is the better tool for the single event.
  • The distribution turns out to be flat. Every category contributes roughly the same amount. Pareto tells you nothing — which is itself useful information: the problem isn’t concentrated, it’s systemic. Switch to a different lens.

The automation workshop uses Pareto analysis as its primary prioritization tool for picking the first automation candidate. The rework workshop uses it to pick which rework-origin category is worth a root-cause dig.

How to run it

Total time: 10 minutes once the data is in hand.

Pull the data (before the workshop). Export the relevant numeric breakdown — hours by stereotype, defects by type, tickets by category, whatever the question is. One row per category, one column for the measurement. If you’re running this inside a workshop, have this ready in advance; doing the data pull live kills the flow.

Sort descending. Biggest category on top. This is non-negotiable — Pareto analysis depends on the ordering.

Add a cumulative column. For each row, the cumulative column is the sum of this row plus all preceding rows, divided by the grand total, as a percentage. Row 1’s cumulative is its own share; row 2 is row 1 + row 2; and so on.

Find the 80 percent line. The first row whose cumulative crosses 80 percent is the last row in the vital few. Everything above that row (inclusive) is the shortlist. Everything below is the trivial many.

Plot it (optional but valuable). A bar chart with bars in descending order, plus a line overlay showing cumulative percentage with a horizontal reference at 80 percent, makes the story visible in two seconds. Teams grasp a Pareto chart faster than a Pareto table.

Act on the vital few. The shortlist above the 80 percent line gets the next-sprint action. Root-cause the top one with 5 Whys or fishbone. Put the rest on the deferred list and re-run the Pareto next sprint to see whether the distribution moved.

Worked example

A six-person product team ran their sprint-ending waste walk and pulled the rework-hours data by origin category. Total rework this sprint: 38 hours. Sorted descending:

CategoryHoursCumulative hoursCumulative %
Requirements ambiguity121231.6%
Late customer feedback102257.9%
Missed acceptance criteria62873.7%
Test environment issues43284.2%
Integration surprises33592.1%
Tooling friction23797.4%
Miscellaneous138100.0%

The cumulative crosses 80 percent at row 4. The vital few are the first four categories — Requirements ambiguity, Late customer feedback, Missed acceptance criteria and Test environment issues — accounting for 84 percent of rework hours. The last three categories combined account for only 16 percent.

Here’s the same data as a bar chart with the cumulative percentage line overlaid. The dashed line at 80 percent is the threshold — the bars to the left of where the curve crosses it are the vital few.

Pareto chart of rework hours by origin category. Seven bars in descending order — Requirements ambiguity 12h, Late customer feedback 10h, Missed acceptance criteria 6h, Test environment issues 4h, Integration surprises 3h, Tooling friction 2h, Miscellaneous 1h — with a cumulative-percentage curve crossing the 80 percent reference line between the third and fourth bars.

What the team does with this. They don’t start seven improvement initiatives. They pick the top category — Requirements ambiguity — and root-cause it with 5 Whys. The chain ends at “we don’t have a lightweight mid-sprint customer-feedback ritual,” which is a concrete problem they can address. Acting on it this sprint is expected to reduce the first two categories at once, since both share the same root (late feedback). Next sprint’s Pareto will show whether the distribution actually shifted — that’s the verdict on whether the action worked.

Why they didn’t act on the trivial many. Miscellaneous at one hour isn’t worth a root-cause dig even if the cause were obvious. Tooling friction at two hours might be worth a quick fix if someone spots it in passing, but it doesn’t warrant a workshop. Focusing action on the trivial many is how teams stay busy without moving the numbers.

Common failure modes

  • Analyzing the wrong denominator. A team Paretos defects by count, but one category contains a single catastrophic outage and the rest are minor. Fix: be deliberate about the unit. Sometimes count is right. Sometimes cost, duration or customer impact is right. The unit changes which category wins.
  • Stopping at the ranking. The team produces the Pareto, nods, and doesn’t pick an action. Symptom: the ranking becomes an artifact instead of a decision input. Fix: the Pareto is only useful if it triggers work on the vital few. If you produce one and don’t act, you wasted the exercise.
  • Re-running without acting. The team runs a Pareto every sprint, same top category every sprint, nobody fixes it. Fix: stop running the Pareto. Run an A3 or 5 Whys on the top category instead. The Pareto is telling you what to fix; if you don’t fix it, running it again is just procrastination.
  • Pareto on a flat distribution. The team runs the numbers and the seven categories are all within 15 percent of each other. Fix: accept the finding. The problem isn’t concentrated, it’s systemic. Switch to a different lens — a SWOT for pattern-finding, or a retrospective on working norms.
  • Treating 80/20 as a rule. The cumulative crosses 80 percent at row 2, which would give you a vital few of two items. Teams sometimes force row 4 because “80/20 means four.” Fix: use whatever row the cumulative actually crosses. The threshold is a ritual, not a law.

References

In the playbook

External references

  • Vilfredo Pareto, Cours d’économie politique (F. Rouge, 1896) — the original observation about land ownership distribution in Italy.
  • Joseph M. Juran, Quality Control Handbook (McGraw-Hill, 1951; many editions since) — the work that generalized Pareto’s observation into a quality-management technique. Juran coined “the vital few and the trivial many.”
  • Joseph M. Juran, “Universals in Management Planning and Controlling,” The Management Review (November 1954) — an accessible article-length treatment by Juran himself.
  • American Society for Quality, Pareto chart — practitioner reference with worked examples.
  • Lean Enterprise Institute, Pareto chart — the Lean community’s reference entry.
  • Tague, The Quality Toolbox (ASQ Quality Press, 2005) — the standard single-volume reference for Pareto charts and their companion tools.