blob: 85d693410e75ef9de860f403aa7f2006de1c7f4a [file] [log] [blame]
Name
Loop unroll pragma extension
Name Strings
cl_nv_pragma_unroll
Number
OpenCL Extension #19
Dependencies
OpenCL 1.0 is required
Contributors
Jian-Zhong Wang
Bastiaan Aarts
Vinod Grover
Overview
This extension extends the OpenCL C language with a hint that
allows loops to be unrolled. This pragma must be used for a loop
and can be used to specify full unrolling or partial unrolling by
a certain amount. This is a hint and the compiler may ignore this
pragma for any reason.
Goals
The principal goal of the pragma unroll is to improve the
performance of loops via unrolling. Typically this enables other
optimizations or improves instruction level parallelism of a
thread.
Details
A user may specify that a loop in the source program be
unrolled. This is done via a pragma. The syntax of this pragma is
as follows
#pragma unroll [unroll-factor]
The pragma unroll may optionally specify an unroll factor. The
pragma must be placed immediately before the loop and only applies
to that loop.
If unroll factor is not specified then the compiler will try to do
complete or full unrolling of the loop. If a loop unroll factor is
specified the compiler will perform partial loop unrolling. The
loop factor, if specified, must be a compile time non negative
integer constant.
A loop unroll factor of 1 means that the compiler should not
unroll the loop.
A complete unroll specification has no effect if the trip count of
the loop is not compile-time computable.
Examples
This sections lists a few examples illustrating valid and invalid
uses.
- Complete unrolling example
#pragma unroll
for (int i = 0; i < 32; i++) {
...
}
This example full unrolling is requested from the compiler. Note
that, since the trip count is known to be 32 the compiler will
most likely honor this request. In the following example, the
trip count is not known so unrolling pragma will be ignored.
#pragma unroll
for (int i = 0; i < n; i++) {
...
}
- no unrolling example
#pragma unroll 1
for (int i = 0; i < 64; i++) {
...
}
- partial unrolling example
#pragma unroll 4
for (int i = 0; i < n; i++) {
...
}
Note that, in this example the trip count is not knownt at
compile time, but a partial unroll factor of 4 is valid.
- invalid unroll pragma usage
The following examples describe some invalid uses of loop
unrolling pragmas.
#pragma unroll -1
for (...) {
...
}
This is invalid because the loop unroll factor is negative.
#pragma unroll
if (...) {
...
}
This is invalid because the pragma is used on a non loop
construct
#pragma unroll x+1
for (...) {
...
}
This is invalid since the loop unroll factor is not a
compile-time known value.