So, what we will discuss next is something called edit distance. Edit distance is often used when we cannot assume whether the time is synchronous. Now, if you remember, we had discussed edit distance before in lectures on sequences. We had discussed what an edit distance is, right? So let's remember that. So, let's assume that we have two sequences or two strings. Say one of the string is cattle, the other thing is cat, and we want to basically measure the similarity or distance between these. As you can see, these two strings are similar to each other, except that to convert one of the strings to the other we need to apply some edit operations. In this case to convert the string cattle to the string cat, I need to delete L and E. So I have edit operations which are delete that converts, in this case two delete operations, that convert cattle to cat. I can have more complex scenario, there's value if you remember. So for example, I can have table, and I can have tackle. So once again, these are two strings that are kind of similar to each other. To convert table to tackle, I might replace B with C, so I have a replace operation, and then, I might insert K. So, I have again two operations, one replace and one insertion. In that way I can convert table to tackle. Now, if you remember, if you have seen the edit distance operation that we had learned for sequences, can also apply for time series. The only difference was that we need to replace the insertion and deletion and replacement costs of symbols with insertion, deletion, and replacement costs for numeric values and result. Now, while edit distance can be also used for time series, since indicator times says we work with numeric values, often we use a variant of edit distance that's called dynamic time warping. If you look at the literature, if you look at basically the time series literature, you will see that dynamic time warping, which we'll learn next, is often preferred over the edit distance based algorithms. This is because it has been shown that dynamic time warping usually gives more accurate comparisons of time series. In terms of the structures, the dynamic time warping algorithm is very similar to the edit distance algorithm. In the edit distance algorithm reverse, taking an eye length prefix of P and J length prefix of Q, and we were basically comparing the last symbol P_i and Q_j and I were trying to decide I'm deleting P_i, I'm inserting Q_j or my replacing P_i with Q_j? That's basically what we have done in edit distance. In the case of dynamic time warping, we only have one operation, we replace. So, what we say is that, let's assume that the P_i's value is two, and let's assume that Q_j value is say five, in the case dynamic time warping, will say that we'll be replacing two with five so the cost of this operation will be three, and that's it. We do a replace operation. We don't have insertion, we don't have deletion, we will always replace the last symbol with the other symbol. The difference, the major difference between dynamic time warping and an edit distance is what happens after we do the replace operation. In the case of dynamic time warping, we have three options after we do the replace operation. One option is after I replace VPi with the value of Q_j, I progress and I made the compare P_i with Q_j minus one. I compare P_i with Q_j minus one. My second alternative in dynamic time warping after I replace P_i with Q_j, is I do progress on the string P. That is I compare Q_j with P_i minus one. I compare Q_j with P_i minus one. My third alternative in dynamic time warping is that I progress in both of them, that is, I replace P\_i with the value of Q_j, so I compare P_i minus one with Q_j minus one. I compare P_i minus one with Q_j minus one. I have three operations. In all these cases, first I replace with P_i and Q_j, and then I either progress on the string P, I progress on string Q, or I progress in both of these strings. Note that the look and feel is very similar to edit distance. Once again, I compute the distance between the prefix, I length prefix of P, and J length prefix of Q using recursion. However, the semantics of the edit distance is different. I don't have insertion and deletion, I always do replacement but then I progress on either P or Q or both of them. One thing and theoretically there is really no difference, basically there is no reason why you should use edit distance or dynamic time warping. They are both asynchronous measures and they can both be used to compare two time series asynchronously. However, in practice it has been shown that dynamic time warping often works very effectively. It often leads to distance measures that in practice has good behavior. Because of this, there has been a very large amount of research in dynamic time warping based algorithms for comparing time series, and efficient ways to compute dynamic time warping. So this brings us to our next question. How do we compute dynamic time warping? Yes, we have given a recursive algorithm, but in practice how was this computed, how is dynamic time warping computed? So what we will learn next is that dynamic programming base implementation of the dynamic time warping algorithm. If you remember, dynamic programming is a way to implement recursive algorithms using a memorization table. In the case of dynamic time warping, essentially our memorization table is a two-dimensional array. One of the dimensions correspond to string Q, the other dimension correspond to string P. Note that the size of the table, if P has length N, and if Q has length M, the size of the table is N times M, which essentially means that the complexity of the dynamic time warping algorithm is order M times N. It is essentially quadratic in the length of the strings which means it is complex, it is expansive, it's competition expansive. So, first we will learn about the dynamic time warping algorithm, and now we'll learn ways to make this dynamic time warping algorithm a bit more efficient. So let's basically see an example, let's learn the dynamic time warping algorithm through an example. Let us assume that we are given two strings. We are given a string P and a string Q. Our string P has one, two, three, four, five, six, seven entries and our string Q or our time series Q has one, two, three, four, five, six, seven, eight, nine entries. They are of different length. Note that, if we were doing euclidean distance or if you want to do correlation we cannot compare these two time series, because they are of different lengths. Euclidean distance or correlation require that the two given strings have the very same length. But, if we are using edit distance or if you're using dynamic time warping, then this not a problem since we are not assuming that the time series on synchronous. Since we are not assuming that they are of the same time frame, we can actually go ahead and compare two time series of different length. So, how does the dynamic time warping works, or the dynamic time warping algorithm works? So, the first step in the dynamic time warping algorithm, is to consider all possible positions in this table, and to fill this table with the cost of replacement. Remember, the way the dynamic time warping works, is through replacements. So, I will replace one symbol with another symbol. So, what the dynamic time warping algorithm does, is its first step, is it feeds the table with the cost of replacement. So, for example, if I'm replacing this value with this value, it's cost is the difference between them is two, so the corresponding entry has value two. If I'm replacing this entry with this entry here, the value that I have here is six, the value I have here is two, so the cost of replacement is four, so the value that I'm going to record in the table is four. So, as the first step of my algorithm, I fill the table with the differences between the values for each possible combination. So, since I have seven entries in my string, P and since I have nine entries in my stream, Q, the size of the table that I fill is nine times seven, 63. So, that is the cost that I have to pay, that is the number of operations that I have to do. Okay, so I am done with the first step of the algorithm, what do I do next? Once I've filled the table with the differences for all pair of entries, my next step is to fill the rest of the table using a left to right, bottom up and diagonal manner with the actual dynamic time warping distances. So, let's basically go over the step-by-step, and see how I fill the table. So, let's pay focus first on the first initial left to right process. So, let's assume that we are given a two-length prefix of the string Q, and a one-length prefix of string, P. Remember here, this is my string P and this is my string Q, so I'm given a two-length prefix of my string Q and one-length prefix of string P. So, what is the operation I can do, is as we say so will consider the last symbols of the two prefixes. In this case both of them are two, and I know that the cost of basically doing the replacement here, since both of them have value nine is zero. Once I make this replacement in this case, the only alternative that I have, is to try to align the value nine with the value seven, which basically has on my table, I see that it has cost two. Which essentially means that what I have to do is basically I have to add the value zero that I have here, with the value two which will give me again two. After I do that, what I will do is that I will keep considering one-length substring of P and I will consider three-length prefix of Q. When I look at the three-length prefix of Q, I know that the first replacement operation that I have to do, has caused seven. Indeed basically if I look at my table I see that the cost that I have for that replacement is seven. But, if you remember, once I do this replacement seven, the next operation that I have to do is move my pointer and compare the one-length prefix of P, with two-length prefix of Q because I've already aligned this symbol and I already pay the cost seven. But, I had already computed this. This cost I already computed. This cost was already recorded in my table is value two, which essentially means that basically what I have to do, is to add the value seven, with the value two which will give me the value nine. As you can see, essentially what I can do is basically I can go left to right on this table, and I can add values that I have. So, I started two, then I add zero to two, this gives me two, then I add seven to two, this gives me nine, then I add zero to nine, this gives me another nine, then I add seven to nine, this gives me 16, then I add two to 16, this gives me 18, then I add three to 18, this gives me 21, and so on, and so on. So, I can essentially go in a left-right manner and I can fill my table. So, this essentially corresponds to comparing computing the time dynamic time warping, in this manner. So, I'm basically keeping the length one in my string P and I am basically computing dynamic time warping for different sequence prefix, different length prefix for the string Q. Once I'm done with this, then I can switch gears, and what I can do is that I can consider one-length prefix of Q and I can compute incrementally the dynamic time warping distances for different length prefixes of the string P, and in my table, essentially this means that I'm going to go in a top-down manner. In this case, what I'm going to do is I will start with two, then I will add two to two, this is going to give me the value four, then I am going to add value five to four, this is going to give me value nine, that I'm going to add the value three to my value nine, this going to give me value 12 and so on. This way, first I fill left right the first row and then I fill in a bottom-up manner the first column, and I'm done with the first row and first column. What do I do next? Then I move to the next row and the next column. After that, I will compute the following row and the following column, and after that I will compute the following row and the following column, and this way I will fill the entire table. That's essentially basically how the dynamic time warping works. So, let's see the following step, in the following step, as I mentioned I will fill the following row starting from this entry. So, how do I obtain the value of the ten three. Let's take a look at this, so first of all in my table, I know that the cost of replacing nine with five is four. So, that I know that my table records that it tells me that the cost of replacement is four. Then I have three options, I need to to either move on the string P or move on string Q or move on both strings or both time series. The cost of moving on time series P, I'd already computed its value was four. The cost of moving on time-series Q, once again I had computed before, its cost is two. The cost of moving on both time series again I had computed earlier, its cost was two. So, the cost of moving on P on Q or on both of them, is either four plus four eight, or four plus two six, or four plus two six. So, given that I'm trying to find the minimum distance, essentially what I will do I will simply select the smallest of these values. That is basically what algorithm is telling me to do, the algorithm is telling me to do is, either move on P or move on Q or move on both of them, at the cost of replacement and select the minimum among them. So, if I do that essentially what I have to do, as you can see is replaced the entry with three. The next one let's do it again, so the cost of replacement is three, I can either basically move on P, would cost nine, I can move on Q with cost 12 or I can move on both of them with the cost three plus two cost five. So, obviously, I will select the smallest one which is five and then I repeat this process to fill the entire table. So, once I fill the entire table, essentially, the last entry here tells me the cost of dynamic time warping between these two strings 80. So essentially, what this tells me that the cost of converting from one time series to the other using the dynamic time warping algorithm has caused 80. The difference between these two time series is 18, that's the minimum error that I have to introduce to warp one time series to the others. Note that this eighteen essentially can correspond to different ways of warping the time. So, one alternative of this is to see that essentially what we are doing is, we are first aligning for an eight. Then, we are aligning four and five. Then, we are aligning five and five. Then, we are aligning five and six. Then, we are aligning four and seven, and so on. Essentially, the way you're going to find the alignments is to remember how the algorithm is working. So, remember the first step is basically the way algorithm works data lines, the last elements. In this case, the alignment of four and eight is given, I align those elements. After that, remember, I basically when I compute this 18, I compute that by selecting the smallest of the three alternative replacement that I have. In this case, I can either select 17, 14, 14. Seventeen is a large replacement, I will not select that for sure. So I will basically select either this 14 or that 14. Let's assume that I had selected this 14 to compute 18. Which essentially would mean that basically I'm aligning this entry with this actually. Now, once I have this alignment, I will again select basically between 14, 14, and 13. Obviously, dynamic tower the way it works is that I would have selected the smallest among them which is the 13 which is basically alignment between five and five, so I would align as the next element five and five. To compute this 13, I had three options, I will have either selected 13,14, or 15. Obviously, the way the algorithm works is that it always select the smallest one, given that among 13, 14, and 15, the smallest one is 13. It is actually means that to compute this dynamic time warping I would have aligned six and five which is what this algorithm tells me. So, this essentially tells me that my dynamic time warping 18 would have been computed by aligning nine with seven nine nine, by five with two and nine, two with two, four with seven and so on. Note it is not the only alignment that I have. I could have also computed same dynamic time warping by doing an alternative alignment. So, I could have aligned first, eight and four. Then I could have aligned five and five, then I could have aligned five and six. Then I could have aligned four and seven. Then I could have aligned two and two. Then I could have aligned five and nine. Then I could move on string two and align five and two. Then I could have aligned nine and nine. Then finally, I would have aligned nine and seven. So, this is a different dynamic time warping but it has the same cost. So, the cost of converting my time-series P to my time-series Q is 18, whether I use this alignment strategy or this alignment strategy. In both of them, my cost is 18, it essentially means that the dynamic time warping distance between my time-series P and my time-series Q is 18. So, now we have an algorithm to compute dynamic time warping, and it is a very simple algorithm, implementation is very simple and it works in a dynamic programming base, very simple algorithm. The problem obviously is its cost. The cost of computing dynamic time warping is quadratic. That is, if I have once again a string length seven and another string of length nine, the cost is seven times nine, 63. Now that if I had a string 70 units, 70 length and another string of 90, the cost will be 6300. If basically my string length was 700 and the other string was 900, the cost grows very quickly. So that's basically one problem with dynamic time warping. If I'm given very long time series, the cost of computing the dynamic time warping distance becomes large and large very quickly. So what's the solution for that? Well, the first solution for that is to add additional constraints on alignment. Note that actually what's happening if you basically once again consider these alignments, what's happening is that basically each entry here corresponds to an alignment between two time series. But one thing that you will notice is that, in many cases, when I'm aligning the time series, I often don't have alignment like this. I don't have an alignment that basically starts all the way from here and go all the way there. Most of the alignments that I am doing are nearby, because I'm not basically introducing significant shifts. Remember we want to be shift in variant but usually shifts are localized. I don't basically shift from the beginning of one time series to the end of the other one. Usually the shift that I do are near each other, they are localized.