blob: 97d6d3f3a6b05499e6b355275d4c6313b6898a04 [file] [log] [blame]
Name
NV_compiler_options
Name Strings
cl_nv_compiler_options
Number
OpenCL Extension #17
Dependencies
OpenCL 1.0 is required
Contributors
Cyril Zeller
Joshua Newman
Overview
This extension allows the programmer to pass options to the PTX assembler
allowing greater control over code generation.
Details
Section 5.4.3 of the OpenCL 1.0 specification lists compiler options that
can be passed to clBuildProgram. This extension adds the following
options:
-cl-nv-maxrregcount <N>
Passed on to ptxas as --maxrregcount <N>
N is a positive integer.
Specify the maximum number of registers that GPU functions can use.
Until a function-specific limit, a higher value will generally increase
the performance of individual GPU threads that execute this function.
However, because thread registers are allocated from a global register
pool on each GPU, a higher value of this option will also reduce the
maximum thread block size, thereby reducing the amount of thread
parallelism. Hence, a good maxrregcount value is the result of a
trade-off.
If this option is not specified, then no maximum is assumed. Otherwise
the specified value will be rounded to the next multiple of 4 registers
until the GPU specific maximum of 128 registers.
-cl-nv-opt-level <N>
Passed on to ptxas as --opt-level <N>
N is a positive integer, or 0 (no optimization).
Specify optimization level.
Default value: 3.
-cl-nv-verbose
Passed on to ptxas as --verbose
Enable verbose mode.
Output will be reported in the build log (accessible through the
callback parameter to clBuildProgram).