Let g() be a function of m variables = (X_{1},..., X_{m}) over a domain R^{m}, such that computing g() for any is feasible. Consider the problem of approximate computation of the integral
Importance sampling approaches this problem by writing the integral (1) asAfter we have independently sampled n points , , ..., , , according to the probability density function f (), we can estimate the integral I by
and estimate the variance of by It is straightforward to show that this estimator has the following properties:
Although in practice sampling from precisely f () = g()/I will occur rarely, we expect that functions that are close enough to it can still reduce the variance effectively. Usually, the closer the shape of the function f () is to the shape of the function g(), the smaller is . In high-dimensional integrals, selection of the importance function, f (), is far more critical than increasing the number of samples, since the former can dramatically affect . It seems prudent to put more energy in choosing an importance function whose shape is as close as possible to that of g() than to apply the brute force method of increasing the number of samples.
It is worth noting here that if f () is uniform, importance sampling becomes a general Monte Carlo sampling. Another noteworthy property of importance sampling that can be derived from Equation 4 is that we should avoid f () g() - I^{ . }f () in any part of the domain of sampling, even if f () matches well g()/I in important regions. If f () g() - I^{ . }f (), the variance can become very large or even infinite. We can avoid this by adjusting f () to be larger in unimportant regions of the domain of .
While in this section we discussed importance sampling for continuous variables, the results stated are valid for discrete variables as well, in which case integration should be substituted by summation.