In this practical, we'll be writing a function to find the overlap between two strings. This function will be a building block in a lot of our later genome assembly functions. So I'm going to define a function overlap, and this will take as arguments two strings, a and b, and the minimum length of overlap. [NOISE] And it will finally overlap between a and b, ensuring that b comes after a. So it won't put b before a for this overlap. >> So if the suffix of a and a prefix of b involved in the overlap >> Right. So. We'll create an index start. And, so for this function, we're going to be using the find algorithm in Python and we're going to be saying a.find. >> So we're going to look in a for the prefix of main length of b, and this will find us the next occurrence of this in a. And we can also add the argument start, and this tells the function what index to start looking from in a. And so we're going to do this multiple times, so we're going to put this in a while loop. We're going to say while True, search for the next occurrence of the prefix of b in a. Now if there's no occurrence of this, there's no occurrence of this prefix in a, then this function will return b, or sorry, will return negative one. So, I'll say if start = -1. Then there is no overlap between them. >> You need a double equal sign there. >> All right. So in this case we'll return 0. But if start is not negative one, this means we have found an occurrence of the prefix of b in a. So, what we have to do now is just check that the rest of b matches the rest of a. So we'll say if b.startswith(a)start:]): this would, just verifies that the prefix of b is equal to the suffix of a starting at position start. Then we'll return. The len(a)- start, and this is the length of the overlap. If this is not true, then we're going to reenter our loop and search for the occurrence of that prefix in a. So you want to increment start so that we don't just find the same location over and over again. >> Hm. So when you call a.find with those two arguments, the second argument is telling it this is the leftmost offset that I care about finding a match at? >> Right. >> Okay. >> And this is our overlap function. So we'll either return zero if there's no overlap, or if the overlap, if the smallest overlap is less than or equal to three or is less than three. If not it will return the length of the longest overlap between a and b. So let's test this out with some strings. So in this case, I made these strings so that CGT should match between both of them, and we get an overlap of 3. If our overlap is less than 3, so let me copy this but make an overlap of only 2, now it will return 0, which is what we want.