Value stream mapping

Toyota's pipeline-visualization technique codified by Mike Rother and John Shook. Draw the delivery flow end-to-end with process times, wait times, handoffs and defects — then compute how much of the total lead time is actually value-adding. Rarely more than ten percent, which is the whole point.

What it is

Value stream mapping originated in the Toyota Production System under Taiichi Ohno and was codified for Western practitioners by Mike Rother and John Shook in Learning to See, published by the Lean Enterprise Institute in 1999 (second edition 2003). The original manufacturing VSM traced materials and information from raw input to finished product. The software adaptation traces a unit of work — a story, a feature, a deploy — through the delivery pipeline from “idea” to “in the customer’s hands.”

The map itself is deceptively simple. A horizontal strip on a wall or board, divided into the steps the work passes through. Each step gets three numbers: process time (how long the work is actively being worked on), wait time (how long it sits idle between steps), and an optional first-pass yield or defect rate (how often the step has to redo work). Handoffs between steps get arrows, annotated with what’s transferred and how (ticket, email, PR, meeting). The map surfaces three things a backlog tool can’t:

  1. How much time the work actually sits idle. Usually more than the working time, often an order of magnitude more.
  2. Where the flow crosses a role or team boundary. Handoffs are where rework originates; each boundary is a place to investigate.
  3. Which single step is the throughput bottleneck. Per Goldratt’s Theory of Constraints, improving anything other than the bottleneck doesn’t improve the system — and the bottleneck is often not where the team thinks it is.

VSM is the highest-leverage discovery tool for a waste walk because it produces a single picture that everyone looks at, argues over, and agrees on. Arguments about whether the team is “slow” or “fast” evaporate in front of a VSM that shows the work sat in code review for three days and in release queue for two.

When to use it

Reach for value stream mapping when:

  • The automation workshop readout shows heavy manual work and you need to know where in the pipeline the manual work lives. VSM plus a manual-work inventory together reveals automation candidates with context, not just a pile of stickies.
  • The team says “we’re busy but nothing ships.” VSM exposes whether the problem is too much work, too much waiting, or one specific bottleneck.
  • You’re investigating a lead-time problem — why a typical story takes three weeks from pickup to production — and the team is guessing at causes.
  • You’re about to commit to a delivery-pipeline overhaul (new CI/CD, new test automation, new release orchestration) and want a baseline to measure against. Run VSM before, and again six months after, to quantify the improvement.
  • Multiple teams touch the same value stream. VSM across team boundaries surfaces the handoff friction nobody owns.

Don’t reach for VSM when:

  • The problem is already localized to a single step (“our tests take 40 minutes”). That’s a 5 Whys or a focused fishbone — you don’t need a full pipeline map.
  • The team has no shared example story to walk through. VSM is grounded in a real, concrete flow; “typical story” is usually too abstract to map usefully. Pick a specific recent story and trace its actual journey.
  • You don’t have the measurement data or the willingness to collect it. A VSM with guessed numbers is a diagram, not a data-driven exercise. If nobody knows how long code review takes, the mapping exercise has to pause and go collect — or the numbers stay unreliable and the rest of the analysis unravels.

How to run it

Total time: 60–90 minutes for a focused team with the data in hand. Stretches to 2 hours for cross-team flows or when the team needs to pause to pull real numbers.

Pick one concrete story (5 min). Not “a typical story” — an actual one the team recently shipped, representative of normal work. Write its title at the top of the board. The specific example anchors the conversation; abstractions drift.

Draw the step strip (15 min). On a long horizontal surface, list every step the story passed through, left to right, from trigger to customer. Typical software-delivery steps: ideation, refinement, sprint planning, in progress (dev), in review (PR), QA or automated testing, merge to main, build, deploy to staging, integration testing, release approval, deploy to prod, post-release verification. Your pipeline’s steps may differ — write yours, not the canonical list.

Annotate each step (20 min). For each step, record:

  • Process time (PT): how long the work was actively being worked on within this step. Coding, reviewing, testing — the wrench was turning.
  • Wait time (WT): how long the work sat idle between when the upstream step finished and this step started (or between handoffs inside this step). In queue, waiting for review, waiting for deploy slot.
  • First-pass yield (FPY) or defect rate: what percentage of the time the work passes through without being sent back upstream. A step with 70% FPY means 30% of the time it rejects work and makes the upstream step redo part of it.

Write the three numbers on each step. Use consistent units — hours is usually the right grain for software-delivery VSM.

Annotate the handoffs (10 min). Between each pair of steps, draw an arrow. Label it with:

  • What is transferred (ticket, pull request, merged commit, build artifact, release candidate).
  • How it’s communicated (ticket link, email, Slack message, automated pipeline, meeting).
  • Whether information is lost or reshaped at the handoff — common rework amplifier.

Compute the totals (5 min).

  • Total process time: sum of all PT values.
  • Total lead time: sum of all PT + WT across the whole map.
  • Process cycle efficiency (PCE): PT / lead time. Expressed as a percentage.

A healthy manufacturing VSM might see 25–30% PCE. Software delivery rarely gets above 10–15%. If you’re at 3–5% you’re not alone — that’s where many teams start. The point is the gap between process time and lead time: every hour of wait time is a place where improvement might live.

Identify the waste (15 min). Walk the map and label the biggest wait times. For each, ask:

  1. Why does the work wait here? (Queue? Capacity? Single-owner? Scheduled batch?)
  2. Could this wait be eliminated or compressed?
  3. What step is the bottleneck — the one that determines overall throughput?

Mark the bottleneck with a star. Improvement effort that isn’t targeted at the bottleneck won’t move the overall lead time — that’s the Theory of Constraints discipline.

Draft the future-state map (optional, 15 min). Some teams end the exercise by sketching what a target-state VSM would look like — which waits could be eliminated, which steps automated, which handoffs simplified. This is helpful for committing to specific improvement work but isn’t required to produce a useful present-state map.

Worked example

A team maps the lead time for a representative recent story — a small API change with a UI-visible behavior, which they consider typical work.

StepPT (hrs)WT (hrs)FPY
Refinement1.048.085%
Sprint planning (committed)0.516.0100%
In progress (dev)6.04.095%
Pull request open, waiting for review0.018.0100%
In review1.54.075%
Merged, waiting for CI0.01.5100%
Build + automated tests0.50.090%
Deploy to staging0.250.5100%
Integration testing2.04.080%
Release approval (weekly release window)0.2572.0100%
Deploy to prod0.50.5100%
Post-release verification0.750.0100%
Totals13.25168.5

Process cycle efficiency: 13.25 / (13.25 + 168.5) = 7.3%.

Value stream map of the representative story. Twelve process boxes flow left-to-right from Refinement through Post-release verification, each annotated with process time, wait time and first-pass yield. Below the boxes, a bar chart shows wait time (red) and process time (blue) for each step — the Release approval step dominates with 72 hours of wait and is marked as the bottleneck with a star. A totals panel reports 13.25 hours of process time, 168.5 hours of wait, 181.75 hours of total lead time, and 7.3 percent process cycle efficiency.

The team was expecting a number around 30%. Seeing 7% in writing creates immediate alignment that lead time is a problem, not a perception issue. Walking the map together, three sources of waste emerge:

  • The weekly release window adds 72 hours of wait time to every story. It’s the single largest line on the map — 40% of the total lead time, and 43% of the total wait time. Moving to on-demand deploys (with appropriate release orchestration) would compress this to near-zero.
  • 48 hours of wait time in refinement — stories sit in the refinement backlog longer than they spend being actively refined. The refinement step itself has an 85% first-pass yield, meaning 15% of stories get sent back for clarification. That’s a planning-waste signal.
  • In-review wait time (18 hours) plus a 75% first-pass yield in review means reviewers both take time to get to work and send a meaningful fraction back for rework. Two mitigations: (1) reduce wait by routing reviews more explicitly, (2) reduce rework by pairing earlier or pre-reviewing designs before code.

The bottleneck — the step that determines overall throughput — is the release window. Anything the team improves elsewhere gets absorbed by that 72-hour ceiling. Automating tests faster, reviewing PRs faster, deploying staging faster — none of it moves lead time as long as releases batch weekly. That’s the star on the map, and that’s where the next improvement sprint targets its effort.

Common failure modes

  • Mapping “typical” instead of concrete. The team maps a hypothetical story and the numbers are rounded guesses. The resulting map is a diagram, not a data artifact. Fix: always map a specific recent story, preferably with the original ticket or PR open so you can verify timestamps.
  • Only counting process time. The team produces a VSM that shows PT but omits WT, which hides the real problem (wait is usually 5–10× process time). Fix: enforce both numbers on every step; absence of WT data is itself a signal that nobody’s measuring flow.
  • Skipping the efficiency calculation. Without the PCE number, the map is just a pretty diagram. The ratio is what makes the argument impossible to ignore. Fix: always compute and write the PCE prominently.
  • Targeting improvements away from the bottleneck. The team sees the 18-hour review wait and attacks it, ignoring the 72-hour release window. Local optimization that doesn’t move the bottleneck is wasted effort. Fix: explicitly tag the bottleneck on the map and apply the Theory of Constraints lens — fix the constraint first, then re-measure.
  • Drawing the map once and never updating it. VSM is a baseline, not a one-time artifact. Remap after a major improvement to prove (or disprove) the recovery. Fix: schedule a re-map 8–12 weeks after the baseline, using the same story type, and compare PCE.

References

In the playbook

External references