blob: feddc9fb13390243c904477015b2414e79cb060e [file] [log] [blame]
Name
ARM_import_memory
Name Strings
cl_arm_import_memory
cl_arm_import_memory_host
cl_arm_import_memory_dma_buf
cl_arm_import_memory will be reported if at least one of the other extension
strings is also reported.
Contributors
Robert Elliott, ARM
Vatsalya Prasad, ARM
Kévin Petit, ARM
Contact
Kévin Petit, ARM (kevin.petit 'at' ARM.com)
IP Status
No claims or disclosures are known to exist.
Version
Revision: #6, Jan 5th, 2018
Number
OpenCL Extension #38
Status
Complete.
Extension Type
OpenCL device extension
Dependencies
Requires OpenCL version 1.0 or later. Requires OpenCL 1.2 for host access
cl_mem_flags use when importing, as these were introduced in OpenCL 1.2.
Overview
This extension adds a new function that allows for direct memory import into
OpenCL via the clImportMemoryARM function.
Memory imported through this interface will be mapped into the device's page
tables directly, providing zero copy access. It will never fall back to copy
operations and aliased buffers, instead producing an error when mapping is
not possible.
Types of memory supported for import are specified as additional extension
strings.
Header File
cl_ext.h
Glossary
No new terminology is introduced by this extension.
New Types
None
New Procedures and Functions
The new function
cl_mem clImportMemoryARM( cl_context context,
cl_mem_flags flags,
const cl_import_properties_arm *properties,
void *memory,
size_t size,
cl_int *errorcode_ret );
Description
Given a suitable pointer to an external memory allocation <memory> this
function will map the memory into the device page tables.
The flags argument provides standard OpenCL memory object flags.
Valid <flags> are:
* CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY
* CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_NO_ACCESS -
where the host flags are only hints and only apply from OpenCL 1.2
onwards.
* CL_MEM_USE_HOST_PTR - this flag has no effect, it is ignored.
The properties argument provides a list of key value pairs, with a zero
terminator. properties can be NULL or point to a single zero value if the
default behaviour is desired.
Valid <properties> are:
* CL_IMPORT_TYPE_ARM
Valid values for CL_IMPORT_TYPE_ARM are:
* CL_IMPORT_TYPE_HOST_ARM - this is the default
* CL_IMPORT_TYPE_DMA_BUF_ARM
If <properties> is NULL, default values are taken.
Valid <memory> pointer is dependent on the TYPE passed in properties.
Errors
CL_INVALID_CONTEXT on invalid context.
CL_INVALID_VALUE on invalid flag input.
CL_INVALID_PROPERTY when invalid properties are passed.
CL_INVALID_VALUE if memory is NULL.
CL_INVALID_OPERATION when host virtual pages in the range of <memory> to
<memory>+<size> are not mapped in the userspace address space. This does
_not_ include cases where physical pages are not allocated. For specific
behaviour see documentation for those memory types.
CL_INVALID_OPERATION when an imported memory object has been passed to
one of the following API functions (they can't be used with imported
memory objects):
* clEnqueueMapBuffer
* clEnqueueMapImage
* clEnqueueUnmapMemObject
* clEnqueueReadImage
* clEnqueueWriteImage
* clEnqueueReadBuffer
* clEnqueueReadBufferRect
* clEnqueueWriteBuffer
* clEnqueueWriteBufferRect
* clEnqueueCopyBuffer
* clEnqueueCopyBufferRect
* clEnqueueCopyBufferToImage
* clEnqueueCopyImageToBuffer
* clEnqueueCopyImage
* clEnqueueFillBuffer
* clEnqueueFillImage
Futher error information may be reported via the cl_context callback
function.
New memory import types
Linux dma_buf memory type - CL_IMPORT_TYPE_DMA_BUF_ARM
If the extension string cl_arm_import_memory_dma_buf is exposed then
importing from dma_buf file handles is supported.
The CL runtime manages dma_buf memory coherency between the host CPU and
GPU. It is the application's responsibility to ensure visibility in memory
of changes done by devices which aren't in the same coherency domain as
the GPU and CPU before using that memory from an OpenCL command. This can
be achieved either by not enqueueing the workload until the data is
visible, or by using a user event to prevent the command from being
executed until the expected data has reached memory.
Flags attached to a dma_buf file handle take precedence over memory flags
supplied to clImportMemoryARM. For example, if a dma_buf allocation
originally created with a read-only flag is passed to clImportMemoryARM
with the READ_WRITE flag, the more restrictive READ_ONLY will take
precedence.
dma_buf allocations are page-aligned and their size is a whole number of
pages.
If an application only requires communication between the host CPU and
the GPU, it should favour using host imports as described further.
See also, the code example below.
Host process memory type - CL_IMPORT_TYPE_HOST_ARM
If the extension string cl_arm_import_memory_host is exposed then importing
from normal userspace allocations (such as those created via malloc) is
supported.
If the host OS is linux and overcommit of VA is allowed, then this
function will commit and pin physical pages for the VA range. This may
cause larger physical allocations than the application typically provokes
if memory is sparsely used. In this case sub-ranges of the host allocation
should be passed to the import function individually.
It is the application's responsibility to align for the datatype being
accessed. Though the application is free to provide allocations without
any specific alignment on coherent systems, there is a requirement to
provide pointers aligned to a cache line on systems where there is no
HW-managed coherency between CPU and GPU. When alignment is less than a
page size then whole pages touched by addresses in the range of <memory>
to <memory>+<size> will be mapped into the device. If the page is already
mapped by another unaligned import, an error will occur.
Cache coherency will be HW-managed on systems where it is supported.
Otherwise, cache maintenance operations will be added by the CL runtime
where needed.
Importing host memory that is otherwise being used by a device outside
of the CPU/GPU coherency domain isn't guaranteed to work and the GPU
caches may contain stale data.
Importing dma_buf pages through a CPU mapping is undefined.
Importing two allocations that aren't page-aligned and that request
different memory flags is unsupported; an error will be returned.
This method is recommended to be used when interoperating with an existing
host library which performs its own allocation and cannot be passed
handles to mapped OpenCL buffers.
See also, the code example below.
New Tokens
None
Interactions with other extensions
This extension produces cl_mem memory objects which are compatible with all
other uses of cl_mem in the standard API, including creating images from
the resulting cl_mem and subject to the restrictions listed in this
document.
In order to guarantee data consistency, applications must ensure that neither
the host nor any device attempt to perform simultaneous read and write
operations on any part of the memory backing an imported buffer or sub-buffers
created therefrom, even if these accesses do not overlap. For example, this
implies that it is not possible to write part of the memory backing an imported
buffer on the host while reading a sub-buffer created from that buffer on a
device, even if the memory written by the host is not visible through the
sub-buffer.
This extension also provides an alternative to image import via EGL.
Sample Code
CL_IMPORT_TYPE_DMA_BUF_ARM
#define WIDTH 1024
#define HEIGHT 512
// Create buffer to be used as a hardware texture with graphics APIs (can also
// include video/camera use flags here)
int dma_buf_handle = get_dma_buf_handle_from_exporter_kernel_module( ..., WIDTH * HEIGHT * 2 );
cl_int error = CL_SUCCESS;
cl_mem buffer = clImportMemoryARM( ctx,
CL_MEM_READ_WRITE,
{ CL_IMPORT_TYPE_ARM, CL_IMPORT_TYPE_DMA_BUF_ARM, 0 },
&dma_buf_handle
WIDTH * HEIGHT * 2,
&error );
if( error == CL_SUCCESS )
{
// Use <buffer> as you would any other cl_mem buffer
}
CL_IMPORT_TYPE_HOST_ARM
#define WIDTH 1024
#define HEIGHT 512
// tightly packed buffer we will treat as RGB565
char *buffer = malloc( WIDTH * HEIGHT * 2 );
// The type CL_IMPORT_TYPE_HOST_ARM can be omitted as it is the default
cl_int error = CL_SUCCESS;
cl_mem buffer = clImportMemoryARM( ctx,
CL_MEM_READ_WRITE,
NULL,
buffer,
WIDTH * HEIGHT * 2,
&error );
if( error == CL_SUCCESS )
{
// Use <buffer> as you would any other cl_mem buffer
}
Conformance Tests
None
Revision History
Revision: #1, Apr 27th, 2015 - Initial revision
Revision: #2, Apr 28th, 2015 - Added properties field to avoid type
inferrence. Added Issues section.
Revision: #3, May 5th, 2015 - Added image support info in Issues.
Revision: #4, Aug 4th, 2015 - Revised based on implementation and design
changes made during review.
Revision: #5, May 3rd, 2017 - Additional restrictions on host operations
and general cleanup / clarification.
Revision: #6, Jan 5th, 2018 - Support creating a sub-buffer from an imported
buffer.