This essentially speeds up the processing of loops with independent iterations.
Of course, this is only safe to do if your iterations are
independent or almost independent of each other.
To create a parallel loop or rather to transform a serial loop into a parallel loop,
you can issue the directive pragma omp parallel four.
This is a combination directive which creates
threads and then teams up to these threads to process this iteration space in parallel,
so different values of i will be assigned to different threads.
If you don't want to issue
the parallel and the four directives at the same time you can split them.
And this could be useful in some cases, for example,
you can start the parallel region and while
you are in the region that is executed by every thread,
you can create and initialize thread private storage.
Then, when it's time to do the loop processing,
you insert the directive pragma on the parallel 4.
Notice, that there is no keyword parallel in this directive.
What will happen here is multiple threads reaching this line
will be teamed up to process this iteration space in parallel.
At the end of the 4 loop you will have an implicit barrier,
meaning that threads will wait for each
other before they proceed into the parallel region.
The beauty of OpenMP is that with
parallel loops it is very easy to change the scheduling mode.
Scheduling mode is the algorithm with which iterations are assigned to threads.
All you have to do is to add the clause schedule and in parentheses specify the mode.
Alternatively, you can set the scheduling mode using an environment variable.
The scheduling modes that are supported are static,
dynamic, and guided scheduling.
Static scheduling is the most simple algorithm.
In this mode OpenMP decides which iterations go to
what thread at the beginning of the loop and the decision is not changed at runtime.
So you have very low overhead but the penalty to pay
here is that if some iterations take longer than others,
you will end up with a load imbalance.
The dynamic mode does the opposite.
It assigns only a few iterations to
every thread and then the scheduler waits for one of the threads to become available.
Whichever thread finishes its work first gets the next chunk of work.
So you can get great load balancing but
potentially you can experience high scheduling overhead.
There's also the guided mode that tries to find the best of the static and dynamic modes.
It also dynamically assigns iterations to threads
but it begins by handing out large chunks of threads to minimize the scheduling overhead,
large chunks of iterations, and towards the end,
the chunks get smaller and smaller.
In addition to the mode you can specify the chunk size.
This is an integer controlling the behavior of the algorithm.
With aesthetic scheduling the chunk size is
how many iterations you assign to one thread before moving on to the next thread.
When you're out of threads, but not out of iterations, you loop over.
In dynamic mode the chunk size is how many iterations, at least,
the scheduler is going to assign to every thread in every scheduling event.
And finally in the guided mode,
chunk size is the minimum number
of iterations assigned to every thread towards the end of the calculation.
With parallel loops and OpenMP you have access to a very powerful functionality but as I
mentioned iterations have to be independent or nearly independent of each other.
And how to control situations when iterations depend on each other in certain ways,
this we will discuss in the next video.