Python Help

Hi, I am doing a science project that requires me to perform analysis of very long strings of text. I have to compare two strings with each other and determine how many elements between them are different. For example, the strings 1835129418351294 and 1935299419352994 have 33 differences. The difference is that my strings are about 250250 chars long, are made of letters, and I have to compare 1010 of them against each other, so I can't compare them manually. I can do this with Excel, but it would take me a really long time. My friend told me about these things called "while loops" that I could use in Python, but I don't know anything about it. Obviously, I could learn it on sites like Codecademy or KhanAcademy, but I am really\textit{really} pressed for time (I have to have the code written and run by next weekend). Can someone please post an example of a while loop that would be able to compare the chars of two strings of text and return the number of differences? Thank you very much!

Note: This is not me being lazy and trying to take advantage of you. I am going to learn Python at some point, but I am really busy and have very little time to work.

#ComputerScience #HelpMe! #Advice

Note by Trevor B.
7 years, 6 months ago

No vote yet
1 vote

  Easy Math Editor

This discussion board is a place to discuss our Daily Challenges and the math and science related to those challenges. Explanations are more than just a solution — they should explain the steps and thinking strategies that you used to obtain the solution. Comments should further the discussion of math and science.

When posting on Brilliant:

  • Use the emojis to react to an explanation, whether you're congratulating a job well done , or just really confused .
  • Ask specific questions about the challenge or the steps in somebody's explanation. Well-posed questions can add a lot to the discussion, but posting "I don't understand!" doesn't help anyone.
  • Try to contribute something new to the discussion, whether it is an extension, generalization or other idea related to the challenge.
  • Stay on topic — we're all here to learn more about math and science, not to hear about your favorite get-rich-quick scheme or current world events.

MarkdownAppears as
*italics* or _italics_ italics
**bold** or __bold__ bold

- bulleted
- list

  • bulleted
  • list

1. numbered
2. list

  1. numbered
  2. list
Note: you must add a full line of space before and after lists for them to show up correctly
paragraph 1

paragraph 2

paragraph 1

paragraph 2

[example link](https://brilliant.org)example link
> This is a quote
This is a quote
    # I indented these lines
    # 4 spaces, and now they show
    # up as a code block.

    print "hello world"
# I indented these lines
# 4 spaces, and now they show
# up as a code block.

print "hello world"
MathAppears as
Remember to wrap math in \( ... \) or \[ ... \] to ensure proper formatting.
2 \times 3 2×3 2 \times 3
2^{34} 234 2^{34}
a_{i-1} ai1 a_{i-1}
\frac{2}{3} 23 \frac{2}{3}
\sqrt{2} 2 \sqrt{2}
\sum_{i=1}^3 i=13 \sum_{i=1}^3
\sin \theta sinθ \sin \theta
\boxed{123} 123 \boxed{123}

Comments

This is a classic problem in bioinformatics, comparing strings of DNA. As long as these strings are the same size, this is easy to do. The number of corresponding symbols that differ, by the way, is called the Hamming distance between the two strings. Check out this link. The website in general is great for practicing programming and bioinformatics skills.

Anyway, to the code. Since you don't know what while loops are, I'm going to assume that you are very novice when it comes to programming. While your problem could be solved with a while loop, I'm going to use a for loop, so that we can be sure that our process terminates. Here is the code that will make give you your desired answer.

count = 0

for i in range(0,len(string1)):
    if string1[i] != string2[i]: count += 1

print count

If you're more descriptive about your problem (i.e., tell me whether the strings are all the same size, or how you want to be able to compare all ten of them more easily), I'd be happy to write you another code. (And to those who actually code well, I know that this isn't the shortest or most efficient piece of code for this problem. However, I think that it is probably the most understandable to a beginner.)

Bob Krueger - 7 years, 6 months ago

Log in to reply

It's funny you mention bioinformatics, because that is exactly my project. I'm comparing the amino acid sequences of a protein from ten different animals. The strings have the same length. I had originally intended to copy and paste the code for the 5555 different comparisons to be made, but now that I think about it, there is probably a way to repeat it in Python.

I can sort of see how that program works. It puts ii in a range of numbers from 00 to the length of the first string, and then tests if that position [ii] is the same as in the second string. Then it prints the count, the number of times the first string's [ii] is not the same as the second string's. (I think)

I am a novice in programming (except for LaTeX, which will do nothing except make my project look pretty); in fact, I only starting beginning to program in Python 1515 minutes ago.

Thank you very much!

Trevor B. - 7 years, 6 months ago

Log in to reply

You're Welcome. What format do you currently have the information in? Is it in a text file? In what way is it positioned? Or is it easiest to copy the information in a list in the code? I could easy whip something out that would cycle through all the possibilities for you. It would just use two for loops, but I'm sure you wouldn't know how to do it.

Also, note that some complications could arise. When you compare them in this way, you are only looking for point mutations in the AA string. Deleted or included AA can completely change this picture, and the process above would be an inaccurate representation of its differences. If that is the case, the code becomes much more complex, but still doable.

Bob Krueger - 7 years, 6 months ago

Log in to reply

@Bob Krueger Sorry it's been a while. I have the text in a Word document, copied off of a database. A little editing to the text enabled me to account for the additions and omissions in the text. I added dashes to the text and added loops to the code based off of your original post to count those. I'd put the code, but I don't know how to insert code into those grey boxes using LaTeX. Can you tell me what commands you used?

Trevor B. - 7 years, 6 months ago

Log in to reply

@Trevor B. I'm glad you were able to figure it out. To post the code, just indent each line, including the empty ones, four spaces. I hope everything turns out well for your project.

Bob Krueger - 7 years, 6 months ago

Log in to reply

@Bob Krueger Thank you very much for the help, Bob. Here is the code.

protein_1 = '1st prion protein here'
protein_2 = '2nd prion protein here'

count_1 = 0

for i in range(0,len(protein_1)):
    if protein_1[i] != protein_2[i]:
        if protein_1[i] == '-':
            count_1 = count_1 - 1
        elif protein_2[i] == '-':
            count_1 = count_1 - 1
        count_1 = count_1 + 1

count_2 = 0

for i in range(0,len(protein_1)):
    if protein_1[i] == '-':
        count_2 = count_2 + 1

count_3 = 0

for i in range(0,len(protein_1)):
    if protein_2[i] == '-':
        count_3 = count_3 + 1


if count_1 == 1:
    print str(count_1) + ' difference'
else:
    print str(count_1) + ' differences'

if count_2 == 1:
    print str(count_2) + ' addition'
else:
    print str(count_2) + ' additions'

if count_3 == 1:
    print str(count_3) + ' omission'
else:
    print str(count_3) + ' omissions'

Trevor B. - 7 years, 6 months ago

Log in to reply

@Trevor B. That's awesome. Although I bet you have already done this procedure to all the proteins, there is a way to cycle through all the pairs of AA sequences. The idea isn't tricky, but the syntax is relatively hard to figure out. If you'd like to know how to do that, feel free to ask.

Bob Krueger - 7 years, 6 months ago

Log in to reply

@Bob Krueger I'm good. I actually performed this code this morning and I got the data I needed. I copied information into the first two variables from a Word file and was done with the code in 1515 minutes (instead of the hours it would have taken me to do manually). Thanks for all of the help.

Trevor B. - 7 years, 6 months ago
×

Problem Loading...

Note Loading...

Set Loading...