Chi Square test for polydactyly in cats

This is part of the "DNA Fingerprint" section of the Computational biology course. The mechanics of how you'd apply the Chi-Square test in the scenario isn't explained and I would have liked more info on it since its such a useful tool in statistical analysis. The scenario is:

You are given 2 arrays: one with gene sequences from 10 cats with polydactyly (cases) and the other with gene sequences from 10 normal cats(controls). Each gene sequence is 113 bases long. How would you go about conducting a chi-square test to find the position and nucleotide change which is the most likely cause for polydactyly. This is what I'd like to discuss.

Read on if you'd like to understand the mechanics of the code from Brilliant .

sum up the number(frequency) of A,C,T or G nucleotides per position for the cases and controls which produces 2 separate arrays, one for cases and another for controls. Each array is a 2D NumPy array with 4 nested arrays (each 113 bases long) for each nucleotide. Each array is of the form:

[[113 frequencies of Nucleotide A] , [113 frequencies of Nucleotide T] , [113 frequencies of Nucleotide G] , [113 frequencies of Nucleotide C]]

take the corresponding values for nucleotide and position from the cases array and from the controls array and plug into the "chisquare" function (from "scipy.stats") as one array as follows.

n,p = chisquare([cases[nucleotide,position], controls[nucleotide,position]])

it returns the chi-square statistic as "n" and the probability as "p"
after looking at the documentation for the "chisquare" function, I found that passing a single array into the function causes it to calculate the average between the values in the array which is used uniformly as the expected value. Surely this cannot be correct as you want to be doing cases - control or observed - expected to calculate the chi-square statistic.

Markdown

Appears as

*italics* or _italics_

italics

**bold** or __bold__

bold


- bulleted
- list

bulleted
list


1. numbered
2. list

numbered
list

Note: you must add a full line of space before and after lists for them to show up correctly

paragraph 1

paragraph 2

paragraph 1

paragraph 2

[example link](https://brilliant.org)

example link

> This is a quote

This is a quote

    # I indented these lines
    # 4 spaces, and now they show
    # up as a code block.

    print "hello world"

# I indented these lines
# 4 spaces, and now they show
# up as a code block.

print "hello world"

Math

Appears as

Remember to wrap math in $ ... $ or \[ ... \] to ensure proper formatting.

2 \times 3

2 \times 3

2^{34}

2^{34}

a_{i-1}

a_{i-1}

\frac{2}{3}

\frac{2}{3}

\sqrt{2}

\sqrt{2}

\sum_{i=1}^3

\sum_{i=1}^3

\sin \theta

\sin \theta

\boxed{123}

\boxed{123}

Easy Math Editor

This discussion board is a place to discuss our Daily Challenges and the math and science related to those challenges. Explanations are more than just a solution — they should explain the steps and thinking strategies that you used to obtain the solution. Comments should further the discussion of math and science.

When posting on Brilliant:

Use the emojis to react to an explanation, whether you're congratulating a job well done , or just really confused .
Ask specific questions about the challenge or the steps in somebody's explanation. Well-posed questions can add a lot to the discussion, but posting "I don't understand!" doesn't help anyone.
Try to contribute something new to the discussion, whether it is an extension, generalization or other idea related to the challenge.
Stay on topic — we're all here to learn more about math and science, not to hear about your favorite get-rich-quick scheme or current world events.

Markdown	Appears as
`italics` or `_italics_`	italics
`bold` or `__bold__`	bold
- bulleted - list	bulleted list
1. numbered 2. list	numbered list
Note: you must add a full line of space before and after lists for them to show up correctly
paragraph 1 paragraph 2	paragraph 1 paragraph 2
`[example link](https://brilliant.org)`	example link
`> This is a quote`	This is a quote
# I indented these lines # 4 spaces, and now they show # up as a code block. print "hello world"	# I indented these lines # 4 spaces, and now they show # up as a code block. print "hello world"

Math	Appears as
Remember to wrap math in `$` ... `$` or `\[` ... `\]` to ensure proper formatting.
`2 \times 3`	$2 \times 3$
`2^{34}`	$2^{34}$
`a_{i-1}`	$a_{i-1}$
`\frac{2}{3}`	$\frac{2}{3}$
`\sqrt{2}`	$\sqrt{2}$
`\sum_{i=1}^3`	$\sum_{i=1}^3$
`\sin \theta`	$\sin \theta$
`\boxed{123}`	$\boxed{123}$

Comments

@Samarth Satish hi Samarth, you're right, we basically employ the $\chi^2$ test in computational biology but don't stop to explain it. As it happens, our new course on statistical methods is set to publish in the next few months and it treats the t-test, the $\chi^2$ test, and ANOVA from the ground up. I can email you when it's released if you like.

Josh Silverman Staff - 11 months, 1 week ago

@Josh Silverman. Thats good to hear. I'll be sure to jump on that course to clear my doubts. Thank you very much for offering to email me, but I've selected the option to notify me on the course page itself.

Samarth Satish - 11 months, 1 week ago

Math	Appears as
Remember to wrap math in `\(` ... `\)` or `\[` ... `\]` to ensure proper formatting.
`2 \times 3`	$2 \times 3$
`2^{34}`	$2^{34}$
`a_{i-1}`	$a_{i-1}$
`\frac{2}{3}`	$\frac{2}{3}$
`\sqrt{2}`	$\sqrt{2}$
`\sum_{i=1}^3`	$\sum_{i=1}^3$
`\sin \theta`	$\sin \theta$
`\boxed{123}`	$\boxed{123}$