Hash Crash Reloaded

Suppose our hash table has 1 0 3 10^3 slots. Under the simple uniform hashing assumption, what is the expected number of distinct elements that needs to be inserted into the table to get the first hash collision, i.e, a slot with two elements?


This problem is a part of the Hash Crash Series


The answer is 40.3032.

This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try refreshing the page, (b) enabling javascript if it is disabled on your browser and, finally, (c) loading the non-javascript version of this page . We're sorry about the hassle.

1 solution

Christopher Boo
Jul 9, 2016

A great problem, @Agnishom Chattopadhyay .

The probability that an element collides after n n elements are inserted (means there are total of n + 1 n+1 elements) is

P ( n ) = 1000 × ( 1000 1 ) × × ( 1000 n + 1 ) 100 0 n × n 1000 P(n)=\frac{1000\times (1000-1)\times \dots \times (1000-n+1)}{1000^{n}} \times \frac{n}{1000} .

Hence, the expected value is

n = 0 999 ( n + 1 ) P ( n ) = 40.3032129262 \sum_{n=0}^{999} (n+1) P(n) = 40.3032129262

Here is the code :

1
2
3
4
5
6
7
8
9
def prob(n):
    v = 1.0
    for i in range(n):
        v = v * (1000 - i) / 1000
    return v * n / 1000

ans = sum([(n + 1) * prob(n)  for n in range(1000)])

print ans

If this is not convincing enough, let's try Monte-Carlo simulation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from random import randint

tst = 0
cnt = 0
while True:
    L = [0 for i in range(1001)]
    while True:
        cnt += 1
        x = randint(1,1000)
        if L[x] == 1:
            break
        L[x] = 1
    tst += 1
    if tst % 10000 == 0:
        print tst, cnt * 1.0 / tst

Which returns 40.316 .

Nice job on the calculations. Mine were pretty messy. And as always, nice to see verification by a simulation. They almost never lie.

Agnishom Chattopadhyay - 4 years, 11 months ago

0 pending reports

×

Problem Loading...

Note Loading...

Set Loading...