Hi everyone. Remember, and I'm sure you can't forget all of those hybrid assembly jobs I gave you using the comprehensive genome analysis service. When I asked you to submit all of those and to assemble all the data, I talked about how you could use that to look at not only assembly quality, but downstream annotation quality. I wanted to go over some of those results with you. In terms of time, remember, we were comparing Unicycler and Canu because this was a hybrid assembly with both long and short reads. We also compared Racon which is fine tuning for the long reads, and Pilon, which is fine tuning for the short reads. That's at single nucleotide positions. When we compare Unicycler versus Canu, hands down, Canu is so much faster. It's these guys in green here, and Canu generally takes between 1-2 hours. Unicycler takes 10 to more than 24 hours. Is the Unicycler worth it? I guess is the question we would ask. When you look at coverage, remember that the assembly report gives you coverage for both long reads and for short reads. Canu does a little bit better for long read coverage and strain play enough Unicycler does a little bit better for short read coverage. I wanted to point out one thing. We have this as part of the assembly report and it starts telling you when you were seeing changes made, and you can see that it's in the first Pilon iteration that most of the changes are made, not as the second or the third, or the fourth or actually this is at zero. All the changes are made pretty quickly and do you need to do all of these successive iterations? That's one thing to think about when you're doing it. The time change isn't too different, but you don't want to do that. When we set the default at PATRIC, it's at two rounds of Racon and two rounds of Pilon because from our experience it doesn't change much after the two rounds. How about the number of context size? If you look at this, Canu calls three contigs, Unicycler calls two contigs. At first glance I would say, "Oh, clearly unicycle or must win, because two contigs gives me an indication of better assembly than three," but not so fast, they wanted to show you the bandage plots for Unicycler with the two and for Canu with the three. It's got that one on this poor little wave isolated over here, and note that this tells you anything really, but it's just very pretty. If you go into the sequence part of the genome overview for particular genomes, this is where you can really see the differences. You can see that Unicycler has this big one at 2.8 mega-bases and then 5,386 base pairs for the second one. When you start getting into it and you look at Canu, it's a little bit better on the size, and these other contigs are a lot bigger. It's hard for me to distinguish what this means, is it that Unicycler is taking the short reads and creating more a cohesive, single contig, or that Canu is better? The fact that these are bigger in size, I'm more inclined to feel comfortable with them. But the jury is out and I think it would take someone going really deep to figure it. Total length, how big are they? That's what you see right here. Canu, the total size of the genome is bigger, and when you look at the GC, which is here, it's a little bit bigger in Unicycler, but total size is generally something that people are looking at and thinking about. Let's move on to annotation. When you do an annotation job in PATRIC, you get measures of consistency, completeness, and contamination. We have core and fine consistency. You can see that there's absolutely no difference in the course consistency. But when you get to the fine consistency, Unicycler is a little bit better. When you get to the completeness, there's no difference. But when you get to contamination, you can see some funny numbers here, and these seem to be when you're doing a Racon iteration with no Pilon iterations, you start seeing some contamination flags come up. Although those resolve, the more you do. This is something that you might want to think about. If it's your own data, you might want to mix up the parameters and look at those things and try to figure out what works best for you. But I thought that was interesting. Now one of the things that to me I care about the most is the genes that are called. You can see that there are more genes called with Unicycler than with Canu, generally. But what really matters to me is if the functional genes, genes with an assignment other than hypothetical, are called more frequently, to me, that's a better annotation. Although Unicycler calls more genes, it also calls less genes with functional assignments than does Canu. I think I like Canu better because of that, because it gives me more functional genes and if you walk down, every single one of these is better in Canu. To summarize for time, Canu is a clear winner. If we are in a hurry, Canu is the one to go with. Coverage, Unicycler was a little bit better. Contigs, if you're interested in the total number of contigs, you'd go with Unicycler. But if you were looking at the size of the Contigs, I might go with Canu instead. It's got the biggest contig and the two fragments that is had, were also really big contigs too. Total length, Canu did better. Annotation fine consistency, Canu did better. Annotation contamination, well, this was just to point out that in both of them, Racon iterations influence this. If you looked at genes with functional assignment in the annotation, it would be Canu. But still, when you're doing this, even though at this point I'm thinking can do is pretty awesome, go ahead, and this is just for hybrid. It's not necessarily true for when you're doing short reads alone, because the studies I've done with that Unicycler generally does a better job than spades. Go ahead and do the different comparisons and try to get the best assembly you have. This was just to show you how you could interpret those things. Next, we're moving on. We are taking our annotated genome, and we're going to start building some trees. I'll see you in the next cycle.