0:04
Let's talk about my favorite toys.
Intel makes a variety of processors and you are probably familiar
with consumer grade processors such as Intel Core and Intel Atom.
I will introduce you to Enterprise Class general purpose processors branded Intel Xeon.
In addition to them,
Intel produces Intel Xeon Phi processors and
co-processors which are specialized for computing applications.
In this course, you will have remote access to
these platforms for exercises and test assignments.
Finally, to allow multiple processors to work in tandem on one application,
Intel makes its own high performance network interconnects such as Intel Omni-Path.
Plus you may be interested in learning
that Intel is working on visual compute accelerators,
machine learning appliances and deep learning inference appliances based on FBG's.
The traditional processors such as Xeons are general purpose, highly parallel CPUs.
You can find the Xeon as a one-way,
two-way or a four-way CPU,
which means that you will install
not one but two or four CPU chips on the same platform to double the number of cores.
Xeons are resource rich and it means that they have high clock speed,
large caches and smart cores that make the Xeon forgiving.
Even if your application is not highly optimized,
it will likely do well on a Xeon.
That theoretical peak performance of the top of
the line Xeon two-way form factor with
Broadwell architecture processor is around one teraflop per second in double precision.
This is one trillion floating point operations per second in double precision.
This estimate is for the fused multiply add operation.
The memory bandwidth, the rate at which Xeon can read data
from its memory is around 150 gigabytes per second.
Intel Xeon Phi was first introduced in
2012 and back then it was only available as a coprocessor.
This means that it was a PCI Express add-in card.
It required a host processor usually a Xeon,
and then you will install
one or several Xeon Phi coprocessors into the PCI Express bus on that computer.
Xeon Phi is specialized,
it is specialized for computational applications.
That is because it is highly parallel with up to 61 cores in top of
the line first generation Xeon Phi and because it is balanced for computation.
It uses lower clock speed,
it has more simplistic cores,
it has smaller caches but in return it has more floating point processing power.
Of course there's a catch.
This makes Xeon Phi less forgiving than a Xeon,
meaning that if your application is not fully
optimized to take advantage of its capabilities,
it will likely do not as well as on a Xeon.
The theoretical peak performance of a first generation Xeon Phi
is 1.2 teraflops per second in double precision.
So 1.2 trillion floating point operations again for a fuse multiply add operation.
Not much greater than the number that I showed for Broadwell,
but mind that this is at 2012 figure.
So four years before the Broadwell that I showed in the previous slide,
the Xeon Phi was faster and this performance was at
your disposal if only you knew how to optimize your application.
The same goes for memory bandwidth.
In 2016, Xeon Phi has experienced a major update.
The second generation of Xeon Phi processors are available as bootable CPUs.
So a traditional CPU without PCI Express connector.
They will also be available as coprocessors that you install in the PCI Express bus.
This device is also specialized and it is specialized for
computing because it is highly parallel and because it has smaller caches,
more simplistic cores and lower clock speed.
It makes it less forgiving than a Xeon.
So if your application is not highly optimized,
it will likely do worse on a Xeon Phi than it will do on the Xeon.
But in return, Xeon Phi can deliver
three times the theoretical peak performance of a Xeon.
Same goes for memory bandwidth.
In this course, you will have access to
a Xeon Phi processor but the techniques that we are going to
learn for programming these devices will also apply to a traditional CPU be it a Xeon or
an Intel Core or even Intel Atom in some cases.
That is because we will rely on practices known as modern code that allow
you to have one code base for all platforms.
In the next video we'll discuss these modern code practices.