Monte Carlo for FI planning — what a 90% success rate actually means

Run any FI calculator — Personal Capital's retirement planner, FIRECalc, ProjectionLab, NewRetirement, HELM's Monte Carlo, doesn't matter which — and you'll get a number like "87% success" or "92% success" or "your plan succeeds in 9,142 of 10,000 simulations." Most operators read that number, see it's above 80%, and conclude they're fine.

That conclusion is incomplete. The success rate doesn't mean what most readers think it means, and the parts of the simulation that actually matter usually don't get displayed. This post explains the math behind FI Monte Carlo, what success rate is and isn't, why the P10 line matters more than the median, and how to read a Monte Carlo output without lying to yourself.

What Monte Carlo actually does

A Monte Carlo simulator runs N independent simulations (typically 10,000) of your portfolio across the planning horizon. Each simulation samples a sequence of annual returns from a probability distribution — usually normal-shaped around historical means and standard deviations. The simulator applies your contributions during accumulation years, applies your withdrawals during retirement years, and tracks whether the portfolio survives to the horizon end.

"Success" almost always means: the portfolio balance at the end of the horizon is greater than zero. "Failure" means: the portfolio ran out of money before the horizon ended. The success rate is the percent of simulations that survived.

The technical details of how the random returns are generated:

Normal-distribution sampling. Most simulators (including HELM's) use the Box-Muller transform to convert two uniform-random samples into a normally-distributed sample. The mean = expected real return; the standard deviation = historical volatility. US stocks ~6.7% mean, 18.4% stdev; bonds ~2.0% mean, 5.8% stdev (Damodaran 1928–2024 data).
Correlated stock-bond returns. Better simulators use Cholesky decomposition (or simpler approximations) to apply a correlation between stock and bond returns. Long-run correlation has been mildly negative; period-specific correlation can flip dramatically. Most simulators use ρ = -0.10 as a default.
Sequence randomization. Each simulation samples 30 (or whatever your horizon is) independent return values, applied year-by-year. This is what creates "sequence-of-returns risk" — same average return, different order, different terminal outcomes.
Real vs nominal. Better simulators run in real (inflation-adjusted) terms — your returns and your spending are both in real dollars, so inflation drops out of the math. Nominal-dollar simulators have to model inflation as a separate distribution and the results get fuzzier.

What success rate is — and isn't

"87% success rate" does NOT mean: "87% probability your specific real-life plan will succeed."

It means: "87% of randomly-generated futures, drawn from a distribution that approximates historical US returns, ended with a positive balance under your stated assumptions about contributions, withdrawals, and asset allocation."

This distinction matters because the distribution-of-historical-returns is not the distribution-of-real-life-futures. Several specific places it diverges:

Future returns may be lower than past. Higher starting valuations (CAPE, current bond yields well below historical averages) suggest forward expected returns may be 1-2% lower than the historical mean. Many simulators let you input "expected real return" rather than using historical mean — use that input.
Tail events aren't normally distributed. Box-Muller normal sampling underestimates the frequency of extreme events. Real markets have fatter tails — 3+ standard-deviation moves happen more often than a normal model predicts. Simulators using historical-bootstrap (sampling from actual past return sequences) capture this better; pure-normal simulators don't.
You aren't a passive automaton. Real retirees adjust spending in bad years. The simulator assumes you withdraw your fixed real spend amount every year regardless of market conditions; in real life you'd cut back if your portfolio dropped 35%. Your real-life success rate is probably higher than the simulator says — but only if you actually do cut back.
Black-swan events. War, pandemic, civil unrest, currency-debasement events. These exist in the tail of the distribution but their conditional impact (when they happen) is often worse than the model predicts.
Personal-life shocks. Disability, divorce, kids who need help, parents who need help, healthcare costs spiraling. Simulators don't model these; they're additive risks on top of the market-return risk being modeled.

The reasonable framing Read "87% success rate" as: "If I lived through 100 different parallel futures with returns drawn from a normalized version of US history, I'd succeed in roughly 87 of them, conditional on my stated plan and conditional on returns actually behaving like history." It's a useful planning number. It's not a probability.

Why P10 matters more than median

Most simulator UIs show three lines: P10 (the bottom 10% of paths), P50 (the median), P90 (the top 10%). Most readers focus on the median because it's the middle.

That's the wrong line to focus on. The median tells you what your portfolio looks like in an average future. The P10 tells you what your portfolio looks like in a bad-but-not-catastrophic future — roughly 1 in 10 odds. If your P10 line stays positive throughout the horizon, you have meaningful margin. If your P10 line zeroes out at year 22, you don't — you're betting that the actual future will be better than the worst 10% of historical paths.

Two plans · same median · different P10

Plan A: Median P50 = $4.2M @ Y30P10 = $1.8M (positive throughout)

Plan B: Median P50 = $4.2M @ Y30P10 = $0 @ Y23 (failed)

Same median outcome.Plan A is dramatically safer.

How does this happen? Plan B has higher allocation to volatile assets, or higher real spend rate, or both — same expected return but more failure paths in the lower band. The median averages those out; the P10 surfaces them.

If your simulator only shows you the success rate and the median, you're looking at a partial picture. The P10 is the line that tells you whether the plan is robust or just nominally OK.

Sequence-of-returns risk — what it actually is

The phrase "sequence-of-returns risk" gets thrown around a lot in FI discussions without clarity on what it means. Specifically: same average return, different order, dramatically different outcomes — but only in the withdrawal phase.

During accumulation, sequence doesn't matter much. If you contribute $50K/yr for 30 years and earn an average 7% real, the terminal value is roughly the same whether the bad years come first or last. The math compounds the same way regardless of order.

During withdrawal, sequence matters enormously. If you retire and the first 5 years deliver -25%, +5%, -8%, +12%, +3% (cumulative ~-20%), you've withdrawn 5 years of real spending while your portfolio shrank. The portfolio is now smaller AND you're 5 years older — the math doesn't recover even if the next 25 years average +10%. If the same 5-year sequence happened to land at year 25-30 instead, you'd barely notice.

The Trinity Study (the 4% rule's foundation) captured this — the historical 4%-fail cases all happened to retirees who started just before 1929, 1937, or 1973. Same average returns over the full 30 years; the timing was the killer.

This is why FI thinkers obsess about the first 5 years of retirement. It's also why a "bridge bucket" of 2-3 years of real spending in cash/short-bonds is a meaningful risk reducer — it lets you avoid selling stocks in down years.

What a "failed" simulation actually looks like

"Plan failed at year 23" means the portfolio hit zero. In real life, nobody passively withdraws to zero. They adjust:

Cut spending 20-30%
Pick up part-time / consulting work
Sell the house and downsize
Move to a lower-cost geography
Tap home equity through HELOC or reverse mortgage
Lean on Social Security earlier (suboptimal but available)

Each of these is a real lever, and a sophisticated planner models them. Most consumer simulators don't. So the displayed failure rate is an upper bound on actual personal-finance failure — your real-life failure rate is lower because you'd actually adapt.

This is also why simulators that let you input variable spending (e.g., "spend 4% of remaining portfolio annually" rather than "spend $120K real annually") show dramatically higher success rates. The variable-spending rule mathematically can't run out of money — you can always spend the remaining balance — but the floor of what you spend in bad years can drop uncomfortably low.

The numbers worth running yourself

Three sanity-check simulations every operator approaching FI should run:

Base case: your current contribution rate, target retirement year, target spend, current allocation. What's the success rate? What's the P10 trajectory?
Bear case: reduce expected stock return by 1.5% (proxy for high starting valuations). Same other inputs. Success rate?
Sequence-stress: add 5 years to your retirement horizon (live to 95 instead of 90). Same other inputs. Success rate? If your portfolio dies at year 28 in the base case but year 33 in the long-life scenario, you have a longevity-risk gap that needs work.

If all three scenarios show >80% success and a P10 that stays positive, your plan is genuinely robust. If your base case is 92% but your bear case is 67%, you're betting heavily on returns matching the historical mean.

What HELM's Monte Carlo specifically does

HELM's Monte Carlo cash-flow forecast runs 10,000 paths client-side using Box-Muller normal sampling with a -0.10 stock-bond correlation. Inputs you control: starting portfolio, annual contribution + growth rate, years to retirement, total horizon, retirement spend, asset mix (stocks/bonds/cash). Outputs: success/failure rate, final P10/P50/P90 values, an SVG chart with shaded P10–P90 band + retirement-year marker, milestone table at year 5/10/15/20/retire/end.

The math is real and matches industry-standard simulator output for the same inputs. What we explicitly don't model (yet): variable-spending rules, healthcare-cost trajectories, Social Security integration, tax drag during withdrawal phase, sequence-stress scenarios with bear-market starting points. Those are queued for v1.5+.

The visible P10 band is the part most consumer simulators bury. Looking at "your worst 10% of paths" alongside the success rate gives you a more honest picture of your plan than either number alone.

Run the Monte Carlo on your own portfolio data.

10,000 simulations · Box-Muller normal sampling · sequence-of-returns risk visible in the lower band · saved scenarios for revisiting. Educational only — confirm any retirement decision with a CFP. Founding-25 lock $79/mo for the lifetime of the subscription.

Become a founding operator →

Asset location vs allocation · AMT for ISO exercise · QSBS guide · Wash sales · Compare HELM →

Disclosure: Vantage Digital LLC publishes this post and builds HELM. We sell software that includes a Monte Carlo simulator. Numerical illustrations above use simplified historical-mean assumptions for explanatory purposes; your actual outcome depends on returns, sequence, tax drag, fees, and behavioral choices we don't model.

Educational only. Monte Carlo simulation is a planning tool, not a forecast. Confirm retirement-readiness decisions with a licensed CFP. HELM is software, not an investment advisor.

Monte Carlo for FI planning — what a 90% success rate actually means.

What Monte Carlo actually does

What success rate is — and isn't

Why P10 matters more than median

Sequence-of-returns risk — what it actually is

What a "failed" simulation actually looks like

The numbers worth running yourself

What HELM's Monte Carlo specifically does

Run the Monte Carlo on your own portfolio data.