Today, I'm going to step you through one of the most valuable documents, I think that we produce and Patrick. And that's the comprehensive genome analysis report that is constructed as part of the comprehensive genome analysis service. So you submit your job and let's look at this report. The first thing is how do I find that job? Well, you could go here and click on the Jobs monitor and that would take you to your jobs page. Or you could go up and put the workspaces, click on the down arrow and click on My Jobs. This will overwrite the page and will show all the jobs you've ever submitted and Patrick. It's brutal with its results, because it'll show you the jobs that failed as well as the ones that are succeeding. And it will also get you a status update on jobs that are currently running. I want to look at the results for a particular job. So I highlight the row and you notice that this populates this vertical green bar to the right with possible downstream functions. I want to click the view icon. Now in subsequent videos, I'll talk about each of these documents that is part of this job and will also go into detail on the annotation job in the assembly job. But today, we want to see my favorite which is the full genome report, HTML. So I highlight that. I can download it or I can view it. So let's click the View in it. Once again, overwrites the page. At the top, it shows me a bread crumb of where I am. You never get lost in Patrick. Gives you another opportunity to download the job. This gives me a summary of the entire job. It tells me that I submitted the jobs to the service and notice this here, which will give you links at the bottom of this document to the references. When we constructed this, we want to make it as easy for you to publish sometimes. The way I'm vision these things isn't giving you information that you can put directly into a materials and methods section or something like that, and Patrick. So this gives you the reference that you can use and please it's a free service. If you use it, please side us. It helps us continue to keep providing you with these services being submitted it to Patrick and this is the most valuable part. This genome appears to be a good quality within left. A lot of our users weren't able to interpret it when they would submit an assembly and then an annotation, and then do a tree, and maybe they would do some. Analysis with the perfume. Comparison tool are the protein family disorder and we realized it was taking them along time to figure out if they needed to do some more work with this genome. So we decided to put it straight upfront. This genome appears to be a good quality or it'll tell you if it's a poor quality. I submitted reads for the assembly. It tells me that Canu. Well, this is what I call the job was assembled using Canu and that will take me to the reference for that. So that it's very easy for you when you're writing your publication for context and this is the size. So these are some of the statistics that you can use when you are submitting information or creating a publication for a particular genome. It gives you information about the annotation. Now if you provided a genus and species name, it will give you the breadcrumb for the taxonomic tree for this organism and then it goes into the detail about the annotated genes. How many coding sequences there were? How many repeat regions they found? How many tRNA's? How many rRNA's? All the information there and then we break it down further to show you how many hypothetical proteins. How many proteins that had functional assignments, which is basically any name isn't really hypothetical. Proteins with EC numbers, which is in sign commission numbers which indicates that they have a specific activity. Proteins within go assignments for gene ontology. Proteins that we could assign to pathways and also the proteins that could be mapped to the genus specifics families, which are the PLfams or the cross genus fam. Which are the PG files? Now this is a circular view of all the annotations in the genome, and each of these rings has a bit of different information. Outside this dark blue ring, this is an indication of how many contents. And we know that therefore in this genome, so there's one big one, and there are three small ones. Having looked at this organism previously, I know that this is a Staph aureus genome that was done with PAC bio sequencing. So it's got one chromosome and three plasmid, so you're seeing the three plasmids here. The next circle in are the teams on the forward strand, the next one are the jeans on the reverse strand. And you don't have to remember it showing you the order of these things right here. The third one are the RNA chains which include the ribosomal RNA genes and the tRNA genes. These are fits to virulence factors, any genome annotated in Patrick is blasted against specific databases for variance factors for antimicrobial resistance genes for transposons for drug targets in human homologs. In this view, we only show you the virulence factors here and the antimicrobial resistance genes here. And then the last two ones are GC content and GC skew. Why would you want to see that? because sometimes they are an indication of horizontal transfer between different organisms. When you see a significant change in the GC content like is indicated here and here, that's an indication of potential area that might have come in by horizontal transfer. Now you notice that some of these jeans are colored. What do the colors mean? If go down, we have a specific thing that we do at Patrick and also at Rask which is doing this subsystem analysis, where we have curators who are trying to group together jeans that are known to have a specific function by particular classes. So if these genes are blue it's an indication that they are involved in metabolism. If they're orange, it's an indication that they are involved in protein processing. It's just a way for you to see how things are distributed across the organism. Earlier I told you that we blast against specific databases, transporters virulence factors, drug card the pits, an antibiotic resistance genes. The citations for those are provided here. And this shows you the number of chains that this particular organism had that had significant blast his. When it was blasted against CARD, or NDARO, or the Patrick antimicrobial resistance genes, etc. Another thing that we do in Patrick for specific organisms where we have enough data, we've done in machine learning process to try to identify specific regions of the genome that are indicative of susceptibility or resistance to antibiotics. Staphylococcus aureus is one of those genomes that we have a lot of data on, and that we were able to run this machine learning process on. When any genome up that genus and species is annotated and Patrick, we look at that data and try to see if it has any of those classifiers that we identify with that. And you can see that this particular genome has indications of being resistant to Penicillin, and susceptible to a number of antibiotics. So that's a pretty useful service for some organisms that we have in Patrick. As antimicrobial resistance is very important NAIAD, and it's becoming of global important. We have a calmer based antimicrobial genes detection method that are blasted against them. And these are jeans that were found in my particular genome that might be confirming resistant to specific antibiotics. And then the last thing we do is we go through when you submit a genome to Patrick, it looks across the Nyack reference and representative genomes to see if it can find some close relatives, to give you an indication of where that genome is in the greats. Team of bacterial life. We go in and identify the closest reference and representative genomes by Maschmann Hash, and here's the citation for it. And then we gather all the global protein families, which are the cross-genus families, and we use five of those to generate a phylogenetic tree. Now in PATRIC we use our phylogenetic tree pipeline, which takes the amino acids and the nucleotide sequences for both of those yet concatenates them together in the alignment and then this alignment is used to build the tree. And here's the tree here and here at the support values for that tree, this is only on five genes. You should try to use more than that. If you're going to publish on this genome and publish a phylogenetic tree, you should do a more robust analysis of the phylogeny rather than just five genes. I looked at every paper that cites PATRIC, and I tweet about it and Facebook about it. So you should cite us because I tried to publicize your data. And when I see someone using these particular trees, because I know what they look like and I'm like, no, please do a more robust analysis. Here are the references that are included up in the information above. So this is a very useful report. And the main things that gives you is some indication of the quality of your genome and where it rests on the grand scheme of things. Now in the subsequent videos, we'll look more at the different files that come with the comprehensive genome analysis job and then we'll explore what that genome looks like when it's in PATRIC. Thanks for joining. Remember when we submitted all those jobs for the hybrid assembly with those different SRR numbers with the PacBio and? For your third assignment, I want you to open two of the reports, one were unicycler was the strategy and one were canu was la strategy. Choose the ones where you did 0 racon iterations and 0 pilon iterations. So both of those open. Look at them together and let's look at some of the differences. Do you see differences in the number of context? Do you see differences in the number of genes annotated between the two? Are the specialty genes, are those predictions similar? And, finally, scroll down to that phylogenetic tree. When you submitted the job, we just said it was bacteria. What does it look like it is based on the tree? Good luck.