blob: 68cea11701e65e025bbbd71ac4411943f493ba81 [file] [log] [blame]
Name
ARM_import_memory
Name Strings
cl_arm_import_memory
cl_arm_import_memory_host
cl_arm_import_memory_dma_buf
cl_arm_import_memory will be reported if at least one of the other extension
strings is also reported.
Contributors
Robert Elliott, ARM
Vatsalya Prasad, ARM
Contact
Robert Elliott, ARM (robert.elliott 'at' ARM.com)
Vatsalya Prasad, ARM (vatsalya.prasad 'at' ARM.com)
IP Status
No claims or disclosures are known to exist.
Version
Revision: #4, August 4th, 2015
Number
OpenCL Extension #38
Status
Complete.
Extension Type
OpenCL device extension
Dependencies
Requires OpenCL version 1.0 or later. Requires OpenCL 1.2 for host access
cl_mem_flags use when importing, as these were introduced in OpenCL 1.2.
Overview
This extension adds a new function that allows for direct memory import into
OpenCL via the clImportMemoryARM function.
Memory imported through this interface will be mapped into the device's page
tables directly, providing zero copy access. It will never fall back to copy
operations and aliased buffers, instead producing an error when mapping is
not possible.
Types of memory supported for import are specified as additional extension
strings.
Header File
cl_ext.h
Glossary
No new terminology is introduced by this extension.
New Types
None
New Procedures and Functions
The new function
cl_mem clImportMemoryARM( cl_context context,
cl_mem_flags flags,
const cl_import_properties_arm *properties,
void *memory,
size_t size,
cl_int *errorcode_ret );
Description
Given a suitable pointer to an external memory allocation <memory> this
function will map the memory into the device page tables.
The flags argument provides standard OpenCL memory object flags.
Valid <flags> are:
* CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY
* CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_NO_ACCESS -
where the host flags are only hints and only apply from OpenCL 1.2
onwards.
* CL_MEM_USE_HOST_PTR - this flag has no effect, it is ignored.
The properties argument provides a list of key value pairs, with a zero
terminator. properties can be NULL or point to a single zero value if the
default behaviour is desired.
Valid <properties> are:
* CL_IMPORT_TYPE_ARM
Valid values for CL_IMPORT_TYPE_ARM are:
* CL_IMPORT_TYPE_HOST_ARM - this is the default
* CL_IMPORT_TYPE_DMA_BUF_ARM
If <properties> is NULL, default values are taken.
Valid <memory> pointer is dependent on the TYPE passed in properties.
Errors
CL_INVALID_CONTEXT on invalid context.
CL_INVALID_VALUE on invalid flag input.
CL_INVALID_VALUE when invalid properties are passed.
CL_INVALID_VALUE if memory is NULL.
CL_INVALID_OPERATION when host virtual pages in the range of <memory> to
<memory>+<size> are not mapped in the userspace address space. This does
_not_ include cases where physical pages are not allocated. For specific
behaviour see documentation for those memory types.
Futher error information may be reported via the cl_context callback
function.
New memory import types
Linux DMA_BUF memory type - CL_IMPORT_TYPE_DMA_BUF_ARM
If the extension string cl_arm_import_memory_dma_buf is exposed then
importing from DMA_BUF file handles is supported.
DMA_BUF memory coherency can be complicated by devices not part of the
normal coherency domain for the CPU and GPU. It is the
responsibility of the application to ensure that memory is up to date and
that the CPU has a valid view (i.e. it has no stale data in cache) before
the OpenCL command using the resource is able to run. This is achieved,
either by not Enqueueing the workload, or by using a user event to prevent
the command being executed. Before the worklload executes the application
must ensure other drivers or devices operating on the resource have
synchronised with main memory and the CPU.
DMA_BUF allocation modes supersede the memory flag argument. For example,
if a DMA allocation is originally created with read only flag, if passed to
clImportMemoryARM with the READ_WRITE flag, the more restrictive READ_ONLY
will take precidence.
See also, the code example below.
Host process memory type - CL_IMPORT_TYPE_HOST_ARM
If the extension string cl_arm_import_memory_host is exposed then importing
from normal process allocations (such as malloc) is supported.
If the host OS is linux and oversubscription of VA is allowed, then this
function will commit and pin physical pages for the VA range. This may
cause larger physical allocations than the application typically provokes
if memory is sparsely used. In this case sub-ranges of the host allocation
should be passed to the import function individually.
It is the application's responsibility to align for the datatype being
accessed. Though the application is free to provide allocations without
any specific alignment. When Alignment is less than a page size then whole
pages touched by addresses in the range of <memory> to <memory>+<size>
will be mapped into the device. If the page is already mapped by another
unaligned import, an error will occur.
This method is recommended to be used when interoperating with an existing
host library which performs its own allocation and cannot be passed
handles to mapped OpenCL buffers.
See also, the code example below.
New Tokens
None
Interactions with other extensions
This extension produces cl_mem allocations which are compatible with all
other uses of cl_mem in the standard API.
This extension also provides a alternative to image import via EGL.
Sample Code
CL_IMPORT_TYPE_DMA_BUF_ARM
#define WIDTH 1024
#define HEIGHT 512
// Create buffer to be used as a hardware texture with graphics APIs (can also
// include video/camera use flags here)
int dma_buf_handle = get_dma_buf_handle_from_exporter_kernel_module( ..., WIDTH * HEIGHT * 2 );
cl_int error = CL_SUCCESS;
cl_mem buffer = clImportMemoryARM( ctx,
CL_MEM_READ_WRITE,
{ CL_IMPORT_TYPE_ARM, CL_IMPORT_TYPE_DMA_BUF_ARM, 0 },
&dma_buf_handle
WIDTH * HEIGHT * 2,
&error );
if( error == CL_SUCCESS )
{
// Use <buffer> as you would any other cl_mem buffer
}
CL_IMPORT_TYPE_HOST_ARM
#define WIDTH 1024
#define HEIGHT 512
// tightly packed buffer we will treat as RGB565
char *buffer = malloc( WIDTH * HEIGHT * 2 );
// The type CL_IMPORT_TYPE_HOST_ARM can be omitted as it is the default
cl_int error = CL_SUCCESS;
cl_mem buffer = clImportMemoryARM( ctx,
CL_MEM_READ_WRITE,
NULL,
buffer,
WIDTH * HEIGHT * 2,
&error );
if( error == CL_SUCCESS )
{
// Use <buffer> as you would any other cl_mem buffer
}
Issues
1. Should we have properties to pass in type?
A: Yes, added cl_import_properties_arm and associated flags
2. How does the application select coherency behaviour?
A: Same flags as the coherency extension, cl_mem_flags passed HOST_ALLOC
will result in a buffer suitable for frequent sharing.
3. How does alignment and page alignment for import work?
A: dma_buf allocations are page aligned and multiple of page sizes. Host
allocations are mapped to the nearest page with padding either side.
4. What is the default if we don't support TYPE_HOST_ARM? Ideally the
shortest form is for malloc import.
A: The type is exposed, but we don't expose the extension string. Attempts
to use TYPE_HOST_ARM return CL_INVALID_OPERATION.
5. Host allocations are always outer shared on CPU (and already mapped) so
we just do the mapping into GPU to select coherency. How does this fall
out for dma_api allocations?
A: We follow the same coherency options for dma_api allocations, the only
issue is that other devices in the system may not be coherent. This is an
application problem, as it has to perform the necessary coherency steps
for other non-GPU devices. The GPU driver will handle this once caches
of devices that aren't coherent have been flushed.
Note that with dma_api memory, the act of using it with a command
requires that the GPU caches are flushed before the GPU can access that
memory region. Other than the CPU and GPU masters with dma_api memory
access are not part of the coherency domain.
Cache maintenance when dma_api memory may be coming from non-coherent
sources vs the optimal path with IO/full coherent CPU and GPU are not
detectable with the current dma_api design, and so it's recommended that
malloc memory is used when just communicating between CPU and GPU.
If malloc memory is being used with a non-coherent device outside of the
visibility of the GPU driver, GPU caches may contain stale data. This
behaviour model is not guaranteed to be supported. Note: this may work
today as the GPU agressively flushes caches due to graphics workloads.
6. What happens if dma_api pages are mapped into the CPU and then this is
imported using TYPE_HOST_ARM?
A: Undefined.
7. What happens if two unaligned allocations are imported which share a
page, but request different memory flags?
A: This is considered an application bug, it must choose the same memory
flags. An error will be returned.
8. What guarantees are given to have the same VA, as in SVM.
A: For this functionality the SVM import extension should be used.
9: Image support and image properties?
A: This works through clCreateImage (you can pass it a buffer object).
This feature is available as part of the CL2 functionality which allows
the same memory to be referenced as an image.
10: Hardware alignment for images? Specify bytes for most but require app
to manage?
We have no more than component alignment other than for packed formats.
A: This should be exposed in CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT as a
start. We can relax rules if needed for interop with host image
manipulation libraries later.
Conformance Tests
None
Revision History
Revision: #1, Apr 27th, 2015 - Initial revision
Revision: #2, Apr 28th, 2015 - Added properties field to avoid type
inferrence. Added Issues section.
Revision: #3, May 5th, 2015 - Added image support info in Issues.
Revision: #4, Aug 4th, 2015 - Revised based on implementation and design
changes made during review.