Monday, August 12, 2024

Important Sampling: intuitive explanation

We'd like to use Monte Carlo method to estimate the integral by randomly sampling from a distribution and take their average. But for large dataset with many dimensions, that would require taking too many samples to achieve a good standard deviation. Importance sampling is a way to reduce the sampling requirement and achieve a good standard deviation.

This overall ideas is to invent a new normal distribution q(x) around position X on the axis, where the product of probability p(x) and value f(x) is large. Then you can randomly sample from the new distribution, and let the value function be f(x) * p(x) / q(x) and then take an average.

Example:

  We have weather data of 10 years with temperature by days and its rainfall amount. The goal is to estimate the overall rainfall in a year. Let x be the temperature. p(x) is the probability of temperature given a day. f(x) is the rainfall amount given a temperature. So overall rainfall is ∫ p(x) * f(x) dx. We can iterate through all temperatures to take an average, or by Monte Carlo method - randomly sampling some temperatures (thus by distribution of p(x)) and take an average. Or by Monte Carlo method important sampling to sample from a new distribution q(x) to center at:

- where p(x) is large (where a temperature is common)

- where f(x) is large (where a rain fall is large)

- and hopefully we pick where both p(x) * f(x) are large, aka the importance values.

Take a Gaussian distribution q(x) around that region of x temperature values to centering the important values. Sample from q(x) distribution and take an average of f(x) * p(x) / q(x).  (You can take a smaller sample. That's the whole point.)

Intuitively, at the temperature x where q(x) is small (under sampled), 1/q(x) applies a higher weight. At the places where q(x) is large (over sampled), 1/q(x) applies a lower weight.

Questions from the example:

- What happens to negative temperature? Answer: we are getting p(x) for probability of a temperature - not the temperature value itself. So the value will be positive.

- How to pick the distribution q(x)? Answer: it is arbitrary and a manual step. 


Reference:

https://www.youtube.com/watch?v=C3p2wI4RAi8

https://www.youtube.com/watch?v=V8kruQjqpuw

https://www.youtube.com/watch?v=3Mw6ivkDVZc


No comments: