| Name |
| |
| ARM_core_id |
| |
| Name Strings |
| |
| cl_arm_core_id |
| |
| Contributors |
| |
| Robert Elliott, ARM Ltd. (robert.elliott 'at' arm.com) |
| Hui Chen, ARM Ltd. (hui.chen 'at' arm.com) |
| Kevin Petit, ARM Ltd. (kevin.petit 'at' arm.com) |
| |
| Contacts |
| |
| Kevin Petit, ARM Ltd. (kevin.petit 'at' arm.com) |
| |
| Status |
| |
| Shipping. |
| |
| Version |
| |
| Revision: #2, Feb 26th, 2018 |
| |
| Number |
| |
| OpenCL Extension #26 |
| |
| Dependencies |
| |
| Requires OpenCL version 1.2 or later. |
| |
| Overview |
| |
| This extension provides a built-in function which returns a unique ID |
| for the compute unit that a work-group is running on. This value is |
| uniform for a work-group. |
| |
| This value can be used for a core-specific cache or atomic pool where the |
| storage is required to be in global memory and persistent (but not ordered) |
| between work-groups. This does not provide any additional ordering on top |
| of the existing guarantees between workgroups, nor does it provide any |
| guarantee of concurrent execution. |
| |
| The IDs for the compute units may not be consecutive and applications must |
| make sure they allocate enough memory to accommodate all the compute units |
| present on the device. A device info query allows the application to |
| know the IDs associated with the compute units on a given device. |
| |
| The extension string cl_arm_core_id is returned for devices and platforms |
| which support this extension. |
| |
| Glossary |
| |
| No new terminology is introduced by this extension. |
| |
| New Types |
| |
| None |
| |
| New Procedures and Functions |
| |
| Device Info query |
| |
| CL_DEVICE_COMPUTE_UNITS_BITFIELD_ARM (return type cl_ulong) returns a |
| bitfield where each bit set represents the presence of compute unit whose |
| ID is the bit position. The highest ID for any compute unit on the device |
| is the position of the most significant bit set. The total number of |
| elements an application should allocate in an array indexed by core IDs is |
| thus given by: |
| |
| ALLOC = sizeof(cl_ulong) * 8 - CLZ(CL_DEVICE_COMPUTE_UNITS_BITFIELD_ARM) |
| |
| Built-in Function |
| |
| uint arm_get_core_id( void ) |
| |
| Description |
| |
| Returns the compute unit id as a uint, in the range [0;ALLOC-1] |
| |
| Example |
| |
| // Host code, size pool based on required_instance_size * ALLOC |
| size_t required_instance_size = 1024; |
| cl_mem core_pool = clCreateBuffer(context, |
| CL_MEM_READ_WRITE, |
| required_instance_size * ALLOC, NULL, NULL); |
| |
| // Device/Kernel code, select memory instance |
| kernel void test( global char *per_core_pool, global char *input, uint required_instance_size ) |
| { |
| global char *core_pool_instance = per_core_pool[ arm_get_core_id() * required_instance_size ]; |
| ... |
| } |
| |
| New Tokens |
| |
| OpenCL kernel code Now has access to: |
| |
| #pragma OPENCL EXTENSION cl_arm_core_id : enable |
| The preprocessor macro cl_arm_core_id is also present |
| |
| Revision History |
| |
| Revision: #1, Apr 2nd, 2013 - Initial revision |
| Revision: #2, Feb 26th, 2018 - Added support for sparsely allocated compute |
| unit IDs. |