Designing and evaluating metrics

27-08-2025

  • https://medium.com/@seanjtaylor/designing-and-evaluating-metrics-5902ad6873bf
  • measurement forms the basis of science
    • investments in our ability to capture data and measure outcomes often precede step-function changes in our understanding of the world and the ability to better solve problems
  • five properties of metrics
    • cost
      • you can measure anything if willing to pay an arbitrary cost (money, time, resources, technical debt)
    • simplicity
      • the worst metric is one that people mistrust, second-guess, or ignore
    • faithfullness
      • measurements may fail to accurately represent the thing you care about
      • metrics without construct validity measure the wrong thing (human-labelled data can be misleading as different people make different observations)
      • measures with sampling bias measure it for the wrong set of units (e.g. people, items, events, etc)
    • precision
      • transformations (taking logs, winsorizing, variance stabalising, continuous outcomes into discrete outcomes)
      • normalisations (ie if both numerator and denominator are skewed then the ratio will be less noisy)
      • summing or averages (esp for few uncorrelated ways of measuring the same thing)
    • causal proximity
      • when causal proximity is low you will unlikely move the metric with your changes because a squence of outcomes must occur (low causal proximity means metrics like profit or revenue or very ineffective)
      • preferr metrics with high causal proximity, and describe a theory of change that links your actions to the desired outcomes (sacrificing faithfulness)
  • metric design
    • proxy metrics
      • acknowledge that this may be things we don't care about, but which we can detect effects
    • surrogate metrics
      • estimates of long-term outcomes from short-term metrics
    • metric design is iterative and cross-functional
    • don't just try and get metrics which are cheap or convenient
    • people believe metrics if there are a small number of examples which agree with intuitions, moving in expected directions for good or bad changes, helps build initial trust
    • bad metrics should be ignored from experimental results (they reduce signal/noise)
    • for many metrics there is a point of saturation, keep Goodhart's Law in mind ("When a measure becomes a target, it ceases to be a good measure") - https://en.wikipedia.org/wiki/Goodhart%27s_law