| Name |
| |
| ARM_import_memory |
| |
| Name Strings |
| |
| cl_arm_import_memory |
| cl_arm_import_memory_host |
| cl_arm_import_memory_dma_buf |
| |
| cl_arm_import_memory will be reported if at least one of the other extension |
| strings is also reported. |
| |
| Contributors |
| |
| Robert Elliott, ARM |
| Vatsalya Prasad, ARM |
| |
| Contact |
| |
| Robert Elliott, ARM (robert.elliott 'at' ARM.com) |
| Vatsalya Prasad, ARM (vatsalya.prasad 'at' ARM.com) |
| |
| IP Status |
| |
| No claims or disclosures are known to exist. |
| |
| Version |
| |
| Revision: #4, August 4th, 2015 |
| |
| Number |
| |
| OpenCL Extension #38 |
| |
| Status |
| |
| Complete. |
| |
| Extension Type |
| |
| OpenCL device extension |
| |
| Dependencies |
| |
| Requires OpenCL version 1.0 or later. Requires OpenCL 1.2 for host access |
| cl_mem_flags use when importing, as these were introduced in OpenCL 1.2. |
| |
| Overview |
| |
| This extension adds a new function that allows for direct memory import into |
| OpenCL via the clImportMemoryARM function. |
| |
| Memory imported through this interface will be mapped into the device's page |
| tables directly, providing zero copy access. It will never fall back to copy |
| operations and aliased buffers, instead producing an error when mapping is |
| not possible. |
| |
| Types of memory supported for import are specified as additional extension |
| strings. |
| |
| Header File |
| |
| cl_ext.h |
| |
| Glossary |
| |
| No new terminology is introduced by this extension. |
| |
| New Types |
| |
| None |
| |
| New Procedures and Functions |
| |
| The new function |
| |
| cl_mem clImportMemoryARM( cl_context context, |
| cl_mem_flags flags, |
| const cl_import_properties_arm *properties, |
| void *memory, |
| size_t size, |
| cl_int *errorcode_ret ); |
| |
| Description |
| |
| Given a suitable pointer to an external memory allocation <memory> this |
| function will map the memory into the device page tables. |
| |
| The flags argument provides standard OpenCL memory object flags. |
| |
| Valid <flags> are: |
| * CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY |
| * CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_NO_ACCESS - |
| where the host flags are only hints and only apply from OpenCL 1.2 |
| onwards. |
| * CL_MEM_USE_HOST_PTR - this flag has no effect, it is ignored. |
| |
| The properties argument provides a list of key value pairs, with a zero |
| terminator. properties can be NULL or point to a single zero value if the |
| default behaviour is desired. |
| |
| Valid <properties> are: |
| * CL_IMPORT_TYPE_ARM |
| Valid values for CL_IMPORT_TYPE_ARM are: |
| * CL_IMPORT_TYPE_HOST_ARM - this is the default |
| * CL_IMPORT_TYPE_DMA_BUF_ARM |
| If <properties> is NULL, default values are taken. |
| |
| Valid <memory> pointer is dependent on the TYPE passed in properties. |
| |
| Errors |
| |
| CL_INVALID_CONTEXT on invalid context. |
| |
| CL_INVALID_VALUE on invalid flag input. |
| |
| CL_INVALID_VALUE when invalid properties are passed. |
| |
| CL_INVALID_VALUE if memory is NULL. |
| |
| CL_INVALID_OPERATION when host virtual pages in the range of <memory> to |
| <memory>+<size> are not mapped in the userspace address space. This does |
| _not_ include cases where physical pages are not allocated. For specific |
| behaviour see documentation for those memory types. |
| |
| Futher error information may be reported via the cl_context callback |
| function. |
| |
| New memory import types |
| |
| Linux DMA_BUF memory type - CL_IMPORT_TYPE_DMA_BUF_ARM |
| |
| If the extension string cl_arm_import_memory_dma_buf is exposed then |
| importing from DMA_BUF file handles is supported. |
| |
| DMA_BUF memory coherency can be complicated by devices not part of the |
| normal coherency domain for the CPU and GPU. It is the |
| responsibility of the application to ensure that memory is up to date and |
| that the CPU has a valid view (i.e. it has no stale data in cache) before |
| the OpenCL command using the resource is able to run. This is achieved, |
| either by not Enqueueing the workload, or by using a user event to prevent |
| the command being executed. Before the worklload executes the application |
| must ensure other drivers or devices operating on the resource have |
| synchronised with main memory and the CPU. |
| |
| DMA_BUF allocation modes supersede the memory flag argument. For example, |
| if a DMA allocation is originally created with read only flag, if passed to |
| clImportMemoryARM with the READ_WRITE flag, the more restrictive READ_ONLY |
| will take precidence. |
| |
| See also, the code example below. |
| |
| |
| Host process memory type - CL_IMPORT_TYPE_HOST_ARM |
| |
| If the extension string cl_arm_import_memory_host is exposed then importing |
| from normal process allocations (such as malloc) is supported. |
| |
| If the host OS is linux and oversubscription of VA is allowed, then this |
| function will commit and pin physical pages for the VA range. This may |
| cause larger physical allocations than the application typically provokes |
| if memory is sparsely used. In this case sub-ranges of the host allocation |
| should be passed to the import function individually. |
| |
| It is the application's responsibility to align for the datatype being |
| accessed. Though the application is free to provide allocations without |
| any specific alignment. When Alignment is less than a page size then whole |
| pages touched by addresses in the range of <memory> to <memory>+<size> |
| will be mapped into the device. If the page is already mapped by another |
| unaligned import, an error will occur. |
| |
| This method is recommended to be used when interoperating with an existing |
| host library which performs its own allocation and cannot be passed |
| handles to mapped OpenCL buffers. |
| |
| See also, the code example below. |
| |
| |
| New Tokens |
| |
| None |
| |
| Interactions with other extensions |
| |
| This extension produces cl_mem allocations which are compatible with all |
| other uses of cl_mem in the standard API. |
| |
| This extension also provides a alternative to image import via EGL. |
| |
| Sample Code |
| |
| CL_IMPORT_TYPE_DMA_BUF_ARM |
| |
| #define WIDTH 1024 |
| #define HEIGHT 512 |
| |
| // Create buffer to be used as a hardware texture with graphics APIs (can also |
| // include video/camera use flags here) |
| int dma_buf_handle = get_dma_buf_handle_from_exporter_kernel_module( ..., WIDTH * HEIGHT * 2 ); |
| |
| cl_int error = CL_SUCCESS; |
| cl_mem buffer = clImportMemoryARM( ctx, |
| CL_MEM_READ_WRITE, |
| { CL_IMPORT_TYPE_ARM, CL_IMPORT_TYPE_DMA_BUF_ARM, 0 }, |
| &dma_buf_handle |
| WIDTH * HEIGHT * 2, |
| &error ); |
| |
| if( error == CL_SUCCESS ) |
| { |
| // Use <buffer> as you would any other cl_mem buffer |
| } |
| |
| |
| CL_IMPORT_TYPE_HOST_ARM |
| |
| #define WIDTH 1024 |
| #define HEIGHT 512 |
| |
| // tightly packed buffer we will treat as RGB565 |
| char *buffer = malloc( WIDTH * HEIGHT * 2 ); |
| |
| // The type CL_IMPORT_TYPE_HOST_ARM can be omitted as it is the default |
| cl_int error = CL_SUCCESS; |
| cl_mem buffer = clImportMemoryARM( ctx, |
| CL_MEM_READ_WRITE, |
| NULL, |
| buffer, |
| WIDTH * HEIGHT * 2, |
| &error ); |
| |
| if( error == CL_SUCCESS ) |
| { |
| // Use <buffer> as you would any other cl_mem buffer |
| } |
| |
| Issues |
| |
| 1. Should we have properties to pass in type? |
| A: Yes, added cl_import_properties_arm and associated flags |
| |
| 2. How does the application select coherency behaviour? |
| A: Same flags as the coherency extension, cl_mem_flags passed HOST_ALLOC |
| will result in a buffer suitable for frequent sharing. |
| |
| 3. How does alignment and page alignment for import work? |
| A: dma_buf allocations are page aligned and multiple of page sizes. Host |
| allocations are mapped to the nearest page with padding either side. |
| |
| 4. What is the default if we don't support TYPE_HOST_ARM? Ideally the |
| shortest form is for malloc import. |
| A: The type is exposed, but we don't expose the extension string. Attempts |
| to use TYPE_HOST_ARM return CL_INVALID_OPERATION. |
| |
| 5. Host allocations are always outer shared on CPU (and already mapped) so |
| we just do the mapping into GPU to select coherency. How does this fall |
| out for dma_api allocations? |
| A: We follow the same coherency options for dma_api allocations, the only |
| issue is that other devices in the system may not be coherent. This is an |
| application problem, as it has to perform the necessary coherency steps |
| for other non-GPU devices. The GPU driver will handle this once caches |
| of devices that aren't coherent have been flushed. |
| |
| Note that with dma_api memory, the act of using it with a command |
| requires that the GPU caches are flushed before the GPU can access that |
| memory region. Other than the CPU and GPU masters with dma_api memory |
| access are not part of the coherency domain. |
| Cache maintenance when dma_api memory may be coming from non-coherent |
| sources vs the optimal path with IO/full coherent CPU and GPU are not |
| detectable with the current dma_api design, and so it's recommended that |
| malloc memory is used when just communicating between CPU and GPU. |
| |
| If malloc memory is being used with a non-coherent device outside of the |
| visibility of the GPU driver, GPU caches may contain stale data. This |
| behaviour model is not guaranteed to be supported. Note: this may work |
| today as the GPU agressively flushes caches due to graphics workloads. |
| |
| 6. What happens if dma_api pages are mapped into the CPU and then this is |
| imported using TYPE_HOST_ARM? |
| A: Undefined. |
| |
| 7. What happens if two unaligned allocations are imported which share a |
| page, but request different memory flags? |
| A: This is considered an application bug, it must choose the same memory |
| flags. An error will be returned. |
| |
| 8. What guarantees are given to have the same VA, as in SVM. |
| A: For this functionality the SVM import extension should be used. |
| |
| 9: Image support and image properties? |
| A: This works through clCreateImage (you can pass it a buffer object). |
| This feature is available as part of the CL2 functionality which allows |
| the same memory to be referenced as an image. |
| |
| 10: Hardware alignment for images? Specify bytes for most but require app |
| to manage? |
| We have no more than component alignment other than for packed formats. |
| A: This should be exposed in CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT as a |
| start. We can relax rules if needed for interop with host image |
| manipulation libraries later. |
| |
| Conformance Tests |
| |
| None |
| |
| Revision History |
| |
| Revision: #1, Apr 27th, 2015 - Initial revision |
| Revision: #2, Apr 28th, 2015 - Added properties field to avoid type |
| inferrence. Added Issues section. |
| Revision: #3, May 5th, 2015 - Added image support info in Issues. |
| Revision: #4, Aug 4th, 2015 - Revised based on implementation and design |
| changes made during review. |