And if you have a very short sequence, remember when we read DNA
sequences we're only reading a few hundred base pairs at a time, so,
it's not unusual to get a sequence which consists entirely of repeats.
And that means we don't really know where it came from,
because repeats occur all over the genome.
So there's also structures of RNA that are important to understand.
So a typical human messenger RNA,
which is what we get our proteins from, looks like this.
We have some sequence which is shown in green which is the coding sequence.
And sometimes when we talk about MRNA or
RNA we're only talking about what encodes proteins, that's just the green part.
But it's important to understand that actually the part that copied from
the DNA, the transcribed portion of the DNA which goes into messenger RNA
is longer than that.
So, typically you'll have some stretch at the beginning of the, of the messenger RNA
that's not translated, and we call that the UTR or untranslated region.
And because it's at the beginning,
and remember, the beginning is the 5 prime end, we call it the 5 prime UT, UTR.
And on the other end we have the 3 prime UTR which is usually a longer stretch for
various reasons.
That's also not translated.
And somewhere in the middle, is the coding sequence.
So we want to know what the protein sequence is, if we can identify the UTR's
and get rid of those, then we can read off the coding sequences from that,
we can directly translate our amino acid sequences.
And another important, very important feature of eukaryotic cells,
not just human cells, is that they get, they get a poly-A tail added to them.
So after transcription occurs, after you copy the RNA into, the DNA into RNA,
and remove the introns, a long series of A's gets added.
And we're going to use that as, as a, as a,
kind of a hook to grab these out of the cell with some technologies.
So things are a little more complicated than what I just showed, so
actually the DNA that gets transcribed is much longer than that picture.
So DNA gets transcribed into RNA and
that DNA that eventually produces a protein includes introns as well.
So in this picture you see a gene with five exons shown in different col,
as different colored rectangles.
In between those exons are sequences that we call introns.
Those are actually chopped out or spliced out and discarded by the cell,
recycled by the cell.
And the remaining parts, the exons,
are the part that get translated into a protein.
So that coding sequence on the previous slide was just the exons
concatenated together.
But an important feature of this, one thing that it allows the cell to do,
this structure, is that, well, you're,
while you're in the process of splicing out and removing the, the introns,
you can do, you can do different things with putting the exons together.
You can combine them in different combinations.
So that's called alternative splicing.
And this is very, a very common phenomenon.
When it was first discovered it was considered to be a very unusual and
a probably rare phenomenon, but
now we know that over 90% of human genes undergo some form of alternative splicing.
Meaning that even when you know the complete sequence of the DNA and
even when you know what gets transcribed into RNA,
you still have more work to do to figure out exactly what proteins might be formed.
So by forming different combinations of exons into different mature messenger
RNAs, you can make different proteins from the same original gene part of the genome.