| Name String |
| |
| cl_intel_motion_estimation |
| |
| Contributors |
| |
| Nico Galoppo, Intel |
| Craig Hansen-Sturm, Intel |
| Yejun Guo, Intel |
| Ben Ashbaugh, Intel |
| |
| Contact |
| |
| Ben Ashbaugh, Intel (ben.ashbaugh 'at' intel.com) |
| |
| IP Status |
| |
| TBD |
| |
| Version |
| |
| Version 2, March 18, 2016 |
| |
| Number |
| |
| OpenCL Extension #23 |
| |
| Status |
| |
| Final draft |
| |
| Dependencies |
| |
| OpenCL 1.2 and support for the cl_intel_accelerator extension is required. |
| This extension is written against revision 19 of the OpenCL 1.2 |
| specification and against version 2 of the cl_intel_accelerator extension |
| specification. |
| |
| Overview |
| |
| This document describes an OpenCL extension to expose basic motion estimation |
| capabilities. There are two key parts of this extension: The first is a |
| built-in kernel for frame-based motion estimation that is exposed using the |
| built-in kernel infrastructure added to OpenCL 1.2. The second is the ability |
| to create motion estimation accelerators, which can be used to control |
| behavior of the motion estimation engine. The motion estimation accelerators |
| use the framework described in the cl_intel_accelerator extension. |
| |
| New Procedures and Functions |
| |
| None |
| |
| New Tokens |
| |
| Accepted as the <accelerator_type> parameter of clCreateAcceleratorINTEL: |
| |
| CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL 0x0 |
| |
| Accepted as values for fields of the cl_motion_estimation_desc_intel structure: |
| |
| Accepted values for <mb_block_type>: |
| |
| CL_ME_MB_TYPE_16x16_INTEL 0x0 |
| CL_ME_MB_TYPE_8x8_INTEL 0x1 |
| CL_ME_MB_TYPE_4x4_INTEL 0x2 |
| |
| Accepted values for <subpixel_mode>: |
| |
| CL_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0 |
| CL_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1 |
| CL_ME_SUBPIXEL_MODE_QPEL_INTEL 0x2 |
| |
| Accepted values for <sad_adjust_mode>: |
| |
| CL_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0 |
| CL_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x1 |
| |
| Accepted values for <search_path_type>: |
| |
| CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL 0x0 |
| CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL 0x1 |
| CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL 0x5 |
| |
| New Types |
| |
| Accepted as a type pointed to by <accelerator_desc> for |
| clCreateAcceleratorINTEL, and pointed to by <param_value> for |
| clGetAcceleratorInfoINTEL: |
| |
| typedef struct _cl_motion_estimation_desc_intel { |
| cl_uint mb_block_type; |
| cl_uint subpixel_mode; |
| cl_uint sad_adjust_mode; |
| cl_uint search_path_type; |
| } cl_motion_estimation_desc_intel; |
| |
| Modifications to Section 5.XX.1 - "Creating Accelerator Objects" in the |
| cl_intel_accelerator specification: |
| |
| Add a table of accelerator types, immediately following the description of |
| <accelerator_type>: |
| |
| "---------------------------------------------------------------------------- |
| cl_accelerator_type_intel Description |
| ------------------------- ----------- |
| CL_ACCELERATOR_TYPE_ Create a basic full-frame motion estimation |
| MOTION_ESTIMATION_INTEL accelerator |
| ----------------------------------------------------------------------------" |
| |
| Add a table of accelerator descriptors, immediately following the |
| description of <descriptor>: |
| |
| "---------------------------------------------------------------------------- |
| Descriptor Type Description |
| --------------- ----------- |
| cl_motion_estimation_desc_intel Used to represent the configuration state |
| of basic full-frame motion estimation |
| accelerators |
| ----------------------------------------------------------------------------" |
| |
| Add a new sub-section "Motion Estimation Accelerators" to Section 5.XX.3 - |
| "Accelerator Descriptors" in the cl_intel_accelerator specification: |
| |
| "The cl_motion_estimation_desc_intel descriptor structure is used to create |
| an accelerator to control behavior of the motion estimation engine. This |
| motion estimation descriptor structure is defined as: |
| |
| typedef struct _cl_motion_estimation_desc_intel { |
| cl_uint mb_block_type; |
| cl_uint subpixel_mode; |
| cl_uint sad_adjust_mode; |
| cl_uint search_path_type; |
| } cl_motion_estimation_desc_intel; |
| |
| <mb_block_type> describes the size of the blocks (partitioning mode) |
| considered by the motion estimator. A frame is first divided into |
| macroblocks of size 16x16, and then may be sub-divided into blocks |
| of size 8x8 or 4x4, or retained as 16x16 blocks. This field influences |
| the size of the output image(s) and buffer(s) of the motion estimation |
| algorithms defined in this extension. Valid values are described in the |
| table below. |
| |
| ---------------------------------------------------------------------------- |
| Type Description |
| ---- ----------- |
| CL_ME_MB_TYPE_16x16_INTEL Each block is of size 16x16 pixels |
| CL_ME_MB_TYPE_8x8_INTEL Each block is of size 8x8 pixels |
| CL_ME_MB_TYPE_4x4_INTEL Each block is of size 4x4 pixels |
| ---------------------------------------------------------------------------- |
| |
| <subpixel_mode> defines the search precision (and hence, the precision of |
| the returned motion vectors). Valid values are described in the table below. |
| |
| ---------------------------------------------------------------------------- |
| Type Description |
| ---- ----------- |
| CL_ME_SUBPIXEL_MODE_INTEGER_INTEL Integer pixel mode searching |
| CL_ME_SUBPIXEL_MODE_HPEL_INTEL Half-pixel mode searching |
| CL_ME_SUBPIXEL_MODE_QPEL_INTEL Quarter-pixel mode searching |
| ---------------------------------------------------------------------------- |
| |
| <sad_adjust_mode> specifies distortion measure adjustment used for the motion |
| search SAD (Sum of Absolute Difference) comparison. Valid values are |
| described in the table below. |
| |
| ---------------------------------------------------------------------------- |
| Type Description |
| ---- ----------- |
| CL_ME_SAD_ADJUST_MODE_NONE_INTEL Non-adjusted SAD |
| CL_ME_SAD_ADJUST_MODE_HAAR_INTEL Haar transformed SATD (frequency space) |
| ---------------------------------------------------------------------------- |
| |
| <search_path_type> specifies the search path and search radius when matching |
| blocks in the neighborhood of each pixel block. Currently, all search |
| algorithms match the source block with pixel blocks in the reference |
| area exhaustively within a [Rx, Ry] radius from the center of the co-located |
| source macroblock in the reference frame that the block belongs to, |
| optionally offset by a predicted motion vector. |
| |
| ---------------------------------------------------------------------------- |
| Flag Description |
| ---- ----------- |
| CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL Exhaustive search within |
| [+/-2, +/-2] radius |
| CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL Exhaustive search within |
| [+/-4, +/-4] radius |
| CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL Exhaustive search within |
| [+/-16, +/-12] radius |
| ----------------------------------------------------------------------------" |
| |
| Add a new sub-section "Using Motion Estimation Accelerators" to Section 5.XX.4 - |
| "Using Accelerators" in the cl_intel_accelerator specification: |
| |
| "Motion estimation accelerators control behavior of the motion estimation |
| engine and are used by motion estimation kernels. The |
| cl_intel_motion_estimation extension adds a built-in kernel for basic |
| motion estimation. This section describes the motion estimation built-in |
| kernel and how to set the motion estimation accelerator as an argument to |
| the motion estimation built-in kernel. |
| |
| The motion estimation built-in kernel uses the built-in kernel functionality |
| added in OpenCL 1.2. After creating the motion estimation built-in kernel, |
| the motion estimation accelerator can be set as a kernel argument to control |
| the motion estimation engine. |
| |
| There is currently no OpenCL C type for motion estimation accelerators, and |
| motion estimation accelerators can only be used with the motion estimation |
| built-in kernel. |
| |
| The kernel |
| |
| __kernel void block_motion_estimate_intel( |
| accelerator_intel_t accelerator, |
| __read_only image2d_t src_image, |
| __read_only image2d_t ref_image, |
| __global short2 * prediction_motion_vector_buffer, |
| __global short2 * motion_vector_buffer, |
| __global ushort * residuals); |
| |
| computes motion vectors by comparing a 2d source image with a 2d reference |
| image, producing an output field of motion vectors. The algorithm searches |
| for the best match of each pixel block in the source image by searching a |
| region of the reference image, centered on the coordinates of that pixel |
| block in the source image. |
| |
| The starting search coordinate may optionally be offset by a prediction |
| motion vector. Additionally, the kernel may optionally return a vector |
| field of per-pixel-block best-match distortion (SAD) values. |
| |
| When enqueuing this kernel, the <global_work_size> and <global_work_offset> |
| determine the region of interest (ROI) of the input frame in pixels. The |
| <global_work_offset> determines the offset to the start of the region of |
| interest. The <global_work_size> determines the width and height of the |
| region of interest. The amount of data written to the <motion_vector_buffer> |
| is dependent on the size of the region of interest and the partitioning |
| mode specified by the accelerator. |
| |
| <accelerator> must be a valid accelerator object created by |
| clCreateAcceleratorINTEL. For the block_motion_estimate_intel kernel, the |
| type of the accelerator must be CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL. |
| |
| <src_image> is the input source image, typically representing 8-bit luminance |
| information. The <image_channel_order> and <image_data_type> of <src_image> |
| are restricted as follows: |
| |
| ---------------------------------------------------------------------------- |
| Channel Order Src Channel Data Type |
| ------------- --------------------- |
| CL_R CL_UNORM_INT8 |
| ---------------------------------------------------------------------------- |
| |
| <ref_image> is the input reference image, typically representing 8-bit |
| luminance information. The <image_channel_order> and <image_data_type> of |
| <ref_image> must match src_image, as follows: |
| |
| ---------------------------------------------------------------------------- |
| Channel Order Ref Channel Data Type |
| ------------- --------------------- |
| CL_R CL_UNORM_INT8 |
| ---------------------------------------------------------------------------- |
| |
| <motion_vector_buffer> is the output motion vector buffer, representing a |
| vector field of pixel block motion vectors, stored linearly in row-major |
| order. The elements of this buffer describe a motion vector for the |
| corresponding pixel block, with its x/y components packed as two 16-bit |
| integer values. Each component is encoded as a S13.2 fixed point value |
| (two's complement). The precision is specified by the <subpixel_mode> |
| of the <accelerator>. The size of the returned data is determined by the |
| <mb_block_type> of the <accelerator>. The <motion_vector_buffer> needs to |
| be sized that it fits the results of all pixel blocks of the source image, |
| i.e. the number of full or partial 16x16 source pixel macroblocks times the |
| number of returned motion vectors per source block (1, 4, or 16). |
| |
| <prediction_motion_vector_buffer> is an optional input buffer representing a |
| vector field of prediction motion vectors, stored linearly in row-major |
| order. When provided, this buffer must contain one prediction motion vector |
| for each full or partial 16x16 pixel macroblock in the region of interest. |
| The prediction motion vector is used to offset the default search center |
| for each 16x16 pixel block search window. The default search center is the |
| center of the co-located source macroblock in the reference frame. Note |
| that in feedback algorithms, where the output <motion_vector_buffer> of one |
| frame is used as the input <prediction_motion_buffer> for the next frame, |
| the output buffer may need to be downsampled. The |
| <prediction_motion_vector_buffer> argument is optional and may be set to |
| NULL, in which case the prediction motion vectors are implied to be (0,0). |
| |
| <residuals> is an optional output buffer representing a field of residuals |
| or "distortion values", one for each returned motion vector, stored linearly |
| in row-major order. These "residuals of the compensated image" represent the |
| sum-of-absolute-differences (SAD) between the source frame pixel block and |
| the best-match reference frame pixel block that produced the returned motion |
| vector. The <sad_adjust_mode> of the <accelerator> determines whether plain |
| SAD or SATD (Haar-adjusted) values are returned. The <rediduals> argument is |
| optional and may be set to NULL, in which case residual information is not |
| returned. |
| |
| The motion estimation built-in kernels are queued for execution by the host |
| application using clEnqueueNDRangeKernel(). This function will return the |
| usual error codes, augmented with the following specific error codes for |
| this kernel: |
| |
| * CL_INVALID_WORK_DIMENSION if <work_dim> is not 2. This built-in kernel |
| requires a 2D ND-range. |
| * CL_INVALID_WORK_GROUP_SIZE if <local_work_size> is not NULL. This |
| built-in kernel requires the work-group size to be set by the runtime. |
| * CL_INVALID_IMAGE_SIZE if the region of interest defined by the |
| <global_work_size> and <global_work_offset> exceed the dimensions of |
| the <src_image> or <ref_image>. |
| * CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the <src_image> or <ref_image> |
| kernel arguments do not meet the image format restrictions described |
| above." |
| |
| Revision History |
| |
| Version 1 - Initial Revision |
| Version 2 - Formatting and minor bug fixes. |