When dealing with random variables, we're often concerned with bounding the probability of large deviations. For example, if the random variable is the number of parking tickets I get next year, I may be interested in bounding .
Markov's inequality for a nonnegative random variable bounds the probability of large deviation when we know that :
A typical proof follows from the fact that
where is the probability mass function of the distribution.
But what if we know more than just ? How do we incorporate the fact that we have bounds on the third moment? Or that we know ? In this post, we'll define probability bounds as an optimization problem in which we can incorporate additional prior information. We'll also see that Markov's inequality can be derived from the dual of this optimization problem.
Note that this post is similar in flavor and techniques to one of my previous posts. When in doubt, take the dual...
When we only have mean information, we can formulate finding a bound on as a simple optimization problem:
where the optimization variable is the probability distribution . Note that the objective and constraint are both linear in .
For simplicity, we'll make take nonnegative discrete values , so the distribution is a nonnegative vector that sums to . The optimization problem can be formulated as
To derive a bound on the objective, we will construct a dual problem, since any feasible point in the dual provides a valid upper bound on the primal (by weak duality). In fact, this problem is a linear program (a type of convex optimization problem), so a minimizer of the dual problem will give a tight bound.
We take a particular dual where we keep the nonnegativity constraint intact. This approach results in the (partial) Lagrangian
where the indicator function equals if and otherwise, and is in the first entries and otherwise. We find the dual function by maximizing over the primal variable :
Thus, the dual problem is
Recall that, by weak duality, any feasible point for this problem gives a valid upper bound on the primal objective . One such point is , . These values satisfy the constraints since for . This point has objective value , which recovers Markov's inequality.
We can also show that this inequality is tight (for example, one can find a solution to the primal problem that matches the bound).However, Markov's inequality is usually a pretty terrible bound for practical purposes. Just knowing the mean of a random variable isn't much information.
To get a better bound, we can incorporate additional primal information with this optimization-based framework. In fact, we can compute any bound of the form
for arbitrary functions , by simply solving a linear program. Similarly to in a previous post, we can extend these ideas to the continuous case with a little more care.
While a computational approach is not always appropriate, in many situations we can do better than the bounds given by the standard set of inequalities. Furthermore, computational bounds can scale to higher dimensions and more easily incorporate non-i.i.d. information.
Thank you to Kshitij Kulkarni for reviewing a draft of this post!