Lyft: Quality Engineer → Developer Experience PM
When to stop optimizing for conventional metrics — and start optimizing what actually matters.
Developer ExperienceInternal ToolsSimulationSystems ThinkingValidation StrategyPlatform Reliability

Challenge
- Major driver workflow change touched pricing, dispatch, and earnings services across regions
- Weekly SEVs tied to earnings/pricing logic exposed gaps in existing validation
- Critical bugs slipped past manual regression suites and UI automation
- Severe failures emerged only when multiple services interacted under real-world conditions
- Move beyond “maximize coverage” to prevent high-stakes system failures before production
Role
- System Quality Engineer → Internal Tool Product Manager
- Set quality strategy for complex, multi-service driver systems
- Created alignment on where validation effort mattered most (failure modes over coverage)
- Identified simulation + metrics as the highest-leverage solution
- Owned roadmap, reliability, and adoption of the internal simulation tool
Approach & Decisions
UI automation → acceptance tests → still not enough → simulation + metrics
Used evidence to question UI automation ROI
Reviewed six months of Jira bugs and SEVs to see what automation could realistically prevent.
- UI tests would have caught only a small fraction of high-impact issues
Named where automation breaks down
The worst failures came from real-world combinations (region logic, ride types, pricing edges) that tests can’t cover reliably.
- Multi-service interactions
- Edge-case condition explosions
Rebalanced the validation strategy
Shifted effort away from flaky UI tests toward deterministic backend acceptance tests.
- Automate what’s deterministic
- Acknowledge the limits of coverage
Made simulation the center of gravity
Postmortems + metrics made it clear: system-level simulation paired with alerts was the only reliable early-warning mechanism.
- Surface failures before launch
- Detect behavioral changes across services
Productized the solution for adoption
Took ownership of the rider–driver simulation tool and improved reliability, usability, and adoption under constraints.
- Roadmap from user feedback + usage data
- Better support loops and documentation
Outcomes
- Earlier detection: shifted discovery of high-risk failures from post-launch to pre-release
- Prevented recurring SEVs with metrics, alerts, and simulation checks automation couldn’t cover
- Improved reliability/adoption of internal validation tooling (+10%)
- Reduced support load and on-call noise (−30%), validating ~$150K quarterly savings
Learnings
- Judgment beats tools: redirect effort when it stops creating value
- Leverage comes from understanding system interactions, not optimizing a single layer
- Evidence is the fastest way to influence and align without authority
- Internal tools need product rigor: clear users, workflows, metrics, and positioning