blob: f8e0c78f369435b6b8c2a7205fb93e57f87883c9 [file] [log] [blame]
Name String
cl_intel_required_subgroup_size
Contributors
Ben Ashbaugh, Intel
Contact
Ben Ashbaugh, Intel (ben.ashbaugh 'at' intel.com)
Version
Version 1, July 14, 2016
Number
OpenCL Extension #43
Status
Final Draft
Dependencies
Support for OpenCL 2.1, cl_khr_subgroups, or cl_intel_subgroups is required.
This extension is written against revision 23 of the OpenCL 2.1 API
specification, against revision 30 of the OpenCL 2.0 OpenCL C specification,
against version 31 of the OpenCL 2.0 Extensions specification, and against
version 3 of the cl_intel_subgroups specification.
Overview
The goal of this extension is to allow programmers to optionally specify
the required subgroup size for a kernel function. This information is
important for the correctness of many subgroup algorithms, and in some cases
may be used by the compiler to generate more optimal code.
New API Enums
Accepted as the <param_name> parameter of clGetDeviceInfo:
CL_DEVICE_SUB_GROUP_SIZES_INTEL 0x4108
Accepted as the <param_name> parameter of clGetKernelWorkGroupInfo:
CL_KERNEL_SPILL_MEM_SIZE_INTEL 0x4109
Accepted as the <param_name> parameter of clGetKernelSubGroupInfo and/or
clGetKernelSubGroupInfoKHR:
CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL 0x410A
New OpenCL C Optional Attribute Qualifiers
Optional __kernel qualifier:
__attribute__((intel_reqd_sub_group_size(<int>)))
Add to Table 4.3 - "OpenCL Device Queries":
"--------------------------------------------------------------------------------------
cl_device_info Return Type Description
------------------------------- ----------- ----------------------------------------
CL_DEVICE_SUB_GROUP_SIZES_INTEL size_t[] Returns the set of subgroup sizes
supported by the device.
--------------------------------------------------------------------------------------"
Add to Table 5.21 - "clGetKernelWorkGroupInfo parameter queries":
"--------------------------------------------------------------------------------------
cl_kernel_work_group_info Return Type Description
------------------------- ----------- ----------------------------------------------
CL_KERNEL_SPILL_MEM_ cl_ulong Returns the amount of spill memory used by
SIZE_INTEL a kernel. The meaning of this value will
vary from implementation-to-implementation,
however a return value of 0 will always
indicate that compiler was able to compile
the kernel to fit into the device's register
file without spilling registers to memory.
--------------------------------------------------------------------------------------"
Add to Table 5.22 - "clGetKernelSubGroupInfo parameter queries" in the OpenCL 2.1 API
spec, to the table in Section 9.17.2.1 for clGetKernelSubGroupInfoKHR in the OpenCL
2.0 Extensions spec, and to the table describing the changes to Section 5.9.3 for
clGetKernelSubGroupInfoKHR in the cl_intel_subgroups spec:
"--------------------------------------------------------------------------------------
cl_kernel_sub_group_info Input Type Return Type Description
------------------------ ---------- ----------- -----------------------------------
CL_KERNEL_COMPILE_ ignored size_t Returns the subgroup size specified
SUB_GROUP_SIZE_INTEL by the __attribute__((
intel_reqd_sub_group_size(<int>)))
qualifier. Refer to section 6.7.2.
If the subgroup size is not
specified using the above attribute
qualifier then 0 is returned.
--------------------------------------------------------------------------------------"
Add to Section 6.7.2 - "Optional Attribute Qualifiers"
"The optional __attribute__((intel_reqd_sub_group_size(<int>))) can be used to indicate
that the kernel must be compiled and executed with the specified subgroup size. When
this attribute is present, get_max_sub_group_size() is guaranteed to return the
specified integer value. This is important for the correctness of many subgroup
algorithms, and in some cases may be used by the compiler to generate more optimal
code.
Note that there is no guarantee for the value of get_sub_group_size() even when this
attribute is present, particularly when the work-group size is not evenly divisible by
the required subgroup size.
Note as well that some devices may support a limited number of subgroup sizes, and
that some devices may not support all language constructs with all subgroup sizes.
This means that some kernels may fail compilation with one required subgroup size and
succeed with another required subgroup size, even if both subgroup sizes are supported
by the device.
Finally, note that requiring one subgroup size (particularly, a larger subgroup size)
may require more spill memory than another subgroup size, and may negatively impact
application performance."
Revision History
Version 1 - Initial Revision