The problem is to generate random pixel positions from a probability distribution described by a grayscale image. To visualize the generated pixels we add a constant amount of intensity to the generated position. In the end we rescale the resulting image back to [0, 1] range. Using such a visualization, a correct generation algorithm should convergence towards the original image (or a multiple of it) as the number of generated pixels increases.
S = an array of 3-tuples (x, y, value), empty at start
cumulativeValue = 0
for each pixel (x, y)
if image(x, y) > 0
cumulativeValue += image(x, y)
append (x, y, cumulativeValue) to S
repeat desired amount of times
randomValue = random value in [0, cumulativeValue]
pick the first element s of S that has s.value >= randomValue [*]
output s
[*] Since S is an array ordered by 'value', we can use a binary search to find the element in logarithmic time.
For 10 000 000 points, the algorithm took 10 seconds on my machine.
Below is the original image.
Below you can see 10 000 random pixels.
Below you can see 100 000 random pixels.
Below you can see 1 000 000 random pixels.
Below you can see 10 000 000 random pixels.
Below you can see 100 000 000 random pixels.
I also tried different approaches to this problem which worked in 2d. They worked by subdividing the image hierarchically down to a single pixel. However, these approaches gave worse results since they needed log(n) + log(m) = log(nm) random variables per point compared to one random variable for the presented technique. After going through the implementation of these algorithms I am pretty sure that the presented technique is near optimal in practical terms.