Hi, I am doing a science project that requires me to perform analysis of very long strings of text. I have to compare two strings with each other and determine how many elements between them are different. For example, the strings and have differences. The difference is that my strings are about chars long, are made of letters, and I have to compare of them against each other, so I can't compare them manually. I can do this with Excel, but it would take me a really long time. My friend told me about these things called "while loops" that I could use in Python, but I don't know anything about it. Obviously, I could learn it on sites like Codecademy or KhanAcademy, but I am pressed for time (I have to have the code written and run by next weekend). Can someone please post an example of a while loop that would be able to compare the chars of two strings of text and return the number of differences? Thank you very much!
Note: This is not me being lazy and trying to take advantage of you. I am going to learn Python at some point, but I am really busy and have very little time to work.
Easy Math Editor
This discussion board is a place to discuss our Daily Challenges and the math and science related to those challenges. Explanations are more than just a solution — they should explain the steps and thinking strategies that you used to obtain the solution. Comments should further the discussion of math and science.
When posting on Brilliant:
*italics*
or_italics_
**bold**
or__bold__
paragraph 1
paragraph 2
[example link](https://brilliant.org)
> This is a quote
\(
...\)
or\[
...\]
to ensure proper formatting.2 \times 3
2^{34}
a_{i-1}
\frac{2}{3}
\sqrt{2}
\sum_{i=1}^3
\sin \theta
\boxed{123}
Comments
This is a classic problem in bioinformatics, comparing strings of DNA. As long as these strings are the same size, this is easy to do. The number of corresponding symbols that differ, by the way, is called the Hamming distance between the two strings. Check out this link. The website in general is great for practicing programming and bioinformatics skills.
Anyway, to the code. Since you don't know what while loops are, I'm going to assume that you are very novice when it comes to programming. While your problem could be solved with a while loop, I'm going to use a for loop, so that we can be sure that our process terminates. Here is the code that will make give you your desired answer.
If you're more descriptive about your problem (i.e., tell me whether the strings are all the same size, or how you want to be able to compare all ten of them more easily), I'd be happy to write you another code. (And to those who actually code well, I know that this isn't the shortest or most efficient piece of code for this problem. However, I think that it is probably the most understandable to a beginner.)
Log in to reply
It's funny you mention bioinformatics, because that is exactly my project. I'm comparing the amino acid sequences of a protein from ten different animals. The strings have the same length. I had originally intended to copy and paste the code for the 55 different comparisons to be made, but now that I think about it, there is probably a way to repeat it in Python.
I can sort of see how that program works. It puts i in a range of numbers from 0 to the length of the first string, and then tests if that position [i] is the same as in the second string. Then it prints the count, the number of times the first string's [i] is not the same as the second string's. (I think)
I am a novice in programming (except for LaTeX, which will do nothing except make my project look pretty); in fact, I only starting beginning to program in Python 15 minutes ago.
Thank you very much!
Log in to reply
You're Welcome. What format do you currently have the information in? Is it in a text file? In what way is it positioned? Or is it easiest to copy the information in a list in the code? I could easy whip something out that would cycle through all the possibilities for you. It would just use two for loops, but I'm sure you wouldn't know how to do it.
Also, note that some complications could arise. When you compare them in this way, you are only looking for point mutations in the AA string. Deleted or included AA can completely change this picture, and the process above would be an inaccurate representation of its differences. If that is the case, the code becomes much more complex, but still doable.
Log in to reply
Log in to reply
Log in to reply
Log in to reply
Log in to reply
15 minutes (instead of the hours it would have taken me to do manually). Thanks for all of the help.
I'm good. I actually performed this code this morning and I got the data I needed. I copied information into the first two variables from a Word file and was done with the code in