outcome-dependent weighting in randomized experiments

Posted by Yuling Yao on Apr 08, 2026.       Tag: causal  

In causal inference, we usually think of inverse-probability weighting (IPW) as using weights that are functions of treatment assignment $z$ and pretreatment variables $x$. But mathematically there is nothing preventting us from considering weights that are functions of the observed outcome $y$.

Start with the simplest case: binary treatment $z\in{0,1}$, equal randomization, and target \(\tau = E(Y\mid Z=1)-E(Y\mid Z=0).\)

The usual estimator is the difference in sample means, \(\hat\tau=\bar y_1-\bar y_0.\)

Equivalently, this comes from the usual IPW summand; here the weight is just a constant: \(\psi(Z,Y)=2ZY-2(1-Z)Y.\)

But we can also think of the problem differently. Suppose we want to compute the target expectation $E(Y\mid Z=1)$ using the mixture distribution \(0.5\,p(Y\mid Z=1)+0.5\,p(Y\mid Z=0),\) which is the distribution we observe under equal randomization. Then we can define the outcome-space importance weight \(w(y)=4P(Z=1\mid Y=y)-2 = \frac{p(Y\mid Z=1)-p(Y\mid Z=0)}{0.5\,p(Y\mid Z=1)+0.5\,p(Y\mid Z=0)}.\)

An importance-weighted estimator of the ATE is then the sample average of $y_i w(y_i)$. The weight depends on $y$, but that is OK. It is not only as unbiased as the usual IPW estimator; it is guaranteed to reduce variance, because \(E[\psi(Z,Y)\mid Y=y]=y\,w(y).\) So the outcome-space importance weighting is just the Rao–Blackwellization of the usual randomized-experiment estimator.

If $w(y)$ were known, this weighted estimator would have lower variance. So yes, even in a randomized experiment, you can go beyond ordinary IPW and let the weight be a function of the outcome.

But you have to estimate the weight

So far this sounds like a neat trick. But then comes the paradox.

In the same simple setup, the difference in sample means is also the MLE under the standard two-group model. It then becomes harder to imagine that a weighted estimator could outperform the MLE. Suppose you run a regression $y\sim z$, and from that compute the conditional outcome model $p(y\mid z)$. Then the plug-in estimator from this regression is the MLE, which in this case is just the group mean. By contrast, the weighted estimator seems to take the data, fit something extra, and then inject additional Monte Carlo noise. From the efficiency of the MLE, we know that such a weighted estimator will be asymptotically worse than the MLE, and hence worse than the classical randomized-experiment estimator.

Two guarantees, pulling in opposite directions

So the story is really driven by two different guarantees.

  1. If $w(y)$ were known, the $w(y)$-weighted estimator is a genuine Rao–Blackwell improvement over the usual IPW estimator.

  2. If $w(y)$ must be estimated, there is no longer any automatic variance-reduction result. In particular, if you compute $w(y)$ from a generative model by estimating $p(y\mid z)$, then you cannot beat the MLE, so the final weighted estimator is asymptotically worse.

  3. If $w(y)$ is estimated from the reverse(discriminative) regression $Z\mid Y$, we are no longer obviously in the same model comparison. We are using a different factorization of the joint distribution, and the usual “MLE wins” argument is no longer automatic. At that point, it is no longer clear which estimator is better.

So there is a real statistical point here. Outcome-dependent weighting is kosher. In principle it can improve the classical randomized-experiment estimator because it uses more information. But once the weights are estimated, we leave the tidy Rao–Blackwell world and enter a messier reality: bias-variance tradeoffs, model misspecification, and no universal winner.