Find the smallest example of (strict) Simpson's paradox; that is, construct such table where the number of cases is minimum. Formally, suppose that are nonnegative integers and are positive integers such that , and also , , but . Determine the minimum value of .
Example: There are two kinds of kidney stone problems, those with small stones and those with large stones. There are also two kinds of treatments, a simple treatment and a complex treatment. The number of success cases, divided by the number of cases for each stone/treatment combination, is displayed in the table below.
Small stone | Large stone | Both | |||||
Complex treatment | 81/87 (93%) | 192/263 (73%) | 273/350 (78%) | ||||
Simple treatment | 234/270 (87%) | 55/80 (69%) | 289/350 (83%) |
As one can see, the complex treatment performs better with small stone cases, and so as with large stone cases, but when the data is combined, the simple treatment performs better.
In the sample above, there are a total of 700 cases considered, with 350 complex treatments and 350 simple treatments (or alternatively 357 small stone cases and 343 large stone cases). This problem asks for the minimum possible total number of cases considered.
Clarification: In usual Simpson's paradox, it's allowed to have several weak inequalities (some of the inequalities above may actually be equalities). This problem thus has a stronger form of Simpson's paradox, where none of the inequalities may be an equality.
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
Observe that a , b ≥ 1 . This is because if a = 0 , we have A a = 0 . But since X x ≥ 0 , it cannot happen that 0 = A a > X x ≥ 0 , so our assumption a = 0 is false. Similarly with b .
Also observe that x ≤ X − 1 , y ≤ Y − 1 . The reasoning is similar as above. If x > X − 1 , as x ≤ X we must have x = X , and so X x = 1 . But since a ≤ A , we also have A a ≤ 1 , so it cannot happen that 1 = X x < A a ≤ 1 . So our assumption x = X is wrong, and hence x ≤ X − 1 . Similar with y .
Let C = A + B , Z = X + Y , so we want to minimize C + Z . Thus we have C 2 ≤ C a + b < Z x + y ≤ Z X − 1 + Y − 1 = Z Z − 2 , or in other words C 2 < Z Z − 2 . This can be simplified:
C 2 < Z Z − 2
C 2 < 1 − Z 2
C 2 + Z 2 < 1
2 ( C + Z ) < C Z
8 ( C + Z ) < 4 C Z
Since C , Z > 0 , by AM-GM inequality we have C Z ≤ ( 2 C + Z ) 2 , or in other words 4 C Z ≤ ( C + Z ) 2 . So,
8 ( C + Z ) < 4 C Z ≤ ( C + Z ) 2
8 ( C + Z ) < ( C + Z ) 2
Since C + Z > 0 , we can cancel it out, giving 8 < C + Z . Since C + Z is an integer, we have C + Z ≥ 9 , so the theoretical minimum is 9 . Can we construct it?
Indeed, we can, with ( a , A , b , B , x , X , y , Y ) = ( 1 , 4 , 1 , 1 , 0 , 1 , 2 , 3 ) , or in table format:
Thus the minimum is 9 .