blob: 98d633b2661da341ca33f8b7e384960db397f3b2 [file] [log] [blame]
Name String
cl_intel_motion_estimation
Contributors
Nico Galoppo, Intel
Craig Hansen-Sturm, Intel
Yejun Guo, Intel
Ben Ashbaugh, Intel
Contact
Ben Ashbaugh, Intel (ben.ashbaugh 'at' intel.com)
IP Status
TBD
Version
Version 2, March 18, 2016
Number
OpenCL Extension #23
Status
Final draft
Dependencies
OpenCL 1.2 and support for the cl_intel_accelerator extension is required.
This extension is written against revision 19 of the OpenCL 1.2
specification and against version 2 of the cl_intel_accelerator extension
specification.
Overview
This document describes an OpenCL extension to expose basic motion estimation
capabilities. There are two key parts of this extension: The first is a
built-in kernel for frame-based motion estimation that is exposed using the
built-in kernel infrastructure added to OpenCL 1.2. The second is the ability
to create motion estimation accelerators, which can be used to control
behavior of the motion estimation engine. The motion estimation accelerators
use the framework described in the cl_intel_accelerator extension.
New Procedures and Functions
None
New Tokens
Accepted as the <accelerator_type> parameter of clCreateAcceleratorINTEL:
CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL 0x0
Accepted as values for fields of the cl_motion_estimation_desc_intel structure:
Accepted values for <mb_block_type>:
CL_ME_MB_TYPE_16x16_INTEL 0x0
CL_ME_MB_TYPE_8x8_INTEL 0x1
CL_ME_MB_TYPE_4x4_INTEL 0x2
Accepted values for <subpixel_mode>:
CL_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0
CL_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1
CL_ME_SUBPIXEL_MODE_QPEL_INTEL 0x2
Accepted values for <sad_adjust_mode>:
CL_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0
CL_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x1
Accepted values for <search_path_type>:
CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL 0x0
CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL 0x1
CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL 0x5
New Types
Accepted as a type pointed to by <accelerator_desc> for
clCreateAcceleratorINTEL, and pointed to by <param_value> for
clGetAcceleratorInfoINTEL:
typedef struct _cl_motion_estimation_desc_intel {
cl_uint mb_block_type;
cl_uint subpixel_mode;
cl_uint sad_adjust_mode;
cl_uint search_path_type;
} cl_motion_estimation_desc_intel;
Modifications to Section 5.XX.1 - "Creating Accelerator Objects" in the
cl_intel_accelerator specification:
Add a table of accelerator types, immediately following the description of
<accelerator_type>:
"----------------------------------------------------------------------------
cl_accelerator_type_intel Description
------------------------- -----------
CL_ACCELERATOR_TYPE_ Create a basic full-frame motion estimation
MOTION_ESTIMATION_INTEL accelerator
----------------------------------------------------------------------------"
Add a table of accelerator descriptors, immediately following the
description of <descriptor>:
"----------------------------------------------------------------------------
Descriptor Type Description
--------------- -----------
cl_motion_estimation_desc_intel Used to represent the configuration state
of basic full-frame motion estimation
accelerators
----------------------------------------------------------------------------"
Add a new sub-section "Motion Estimation Accelerators" to Section 5.XX.3 -
"Accelerator Descriptors" in the cl_intel_accelerator specification:
"The cl_motion_estimation_desc_intel descriptor structure is used to create
an accelerator to control behavior of the motion estimation engine. This
motion estimation descriptor structure is defined as:
typedef struct _cl_motion_estimation_desc_intel {
cl_uint mb_block_type;
cl_uint subpixel_mode;
cl_uint sad_adjust_mode;
cl_uint search_path_type;
} cl_motion_estimation_desc_intel;
<mb_block_type> describes the size of the blocks (partitioning mode)
considered by the motion estimator. A frame is first divided into
macroblocks of size 16x16, and then may be sub-divided into blocks
of size 8x8 or 4x4, or retained as 16x16 blocks. This field influences
the size of the output image(s) and buffer(s) of the motion estimation
algorithms defined in this extension. Valid values are described in the
table below.
----------------------------------------------------------------------------
Type Description
---- -----------
CL_ME_MB_TYPE_16x16_INTEL Each block is of size 16x16 pixels
CL_ME_MB_TYPE_8x8_INTEL Each block is of size 8x8 pixels
CL_ME_MB_TYPE_4x4_INTEL Each block is of size 4x4 pixels
----------------------------------------------------------------------------
<subpixel_mode> defines the search precision (and hence, the precision of
the returned motion vectors). Valid values are described in the table below.
----------------------------------------------------------------------------
Type Description
---- -----------
CL_ME_SUBPIXEL_MODE_INTEGER_INTEL Integer pixel mode searching
CL_ME_SUBPIXEL_MODE_HPEL_INTEL Half-pixel mode searching
CL_ME_SUBPIXEL_MODE_QPEL_INTEL Quarter-pixel mode searching
----------------------------------------------------------------------------
<sad_adjust_mode> specifies distortion measure adjustment used for the motion
search SAD (Sum of Absolute Difference) comparison. Valid values are
described in the table below.
----------------------------------------------------------------------------
Type Description
---- -----------
CL_ME_SAD_ADJUST_MODE_NONE_INTEL Non-adjusted SAD
CL_ME_SAD_ADJUST_MODE_HAAR_INTEL Haar transformed SATD (frequency space)
----------------------------------------------------------------------------
<search_path_type> specifies the search path and search radius when matching
blocks in the neighborhood of each pixel block. Currently, all search
algorithms match the source block with pixel blocks in the reference
area exhaustively within a [Rx, Ry] radius from the center of the co-located
source macroblock in the reference frame that the block belongs to,
optionally offset by a predicted motion vector.
----------------------------------------------------------------------------
Flag Description
---- -----------
CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL Exhaustive search within
[+/-2, +/-2] radius
CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL Exhaustive search within
[+/-4, +/-4] radius
CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL Exhaustive search within
[+/-16, +/-12] radius
----------------------------------------------------------------------------"
Add a new sub-section "Using Motion Estimation Accelerators" to Section 5.XX.4 -
"Using Accelerators" in the cl_intel_accelerator specification:
"Motion estimation accelerators control behavior of the motion estimation
engine and are used by motion estimation kernels. The
cl_intel_motion_estimation extension adds a built-in kernel for basic
motion estimation. This section describes the motion estimation built-in
kernel and how to set the motion estimation accelerator as an argument to
the motion estimation built-in kernel.
The motion estimation built-in kernel uses the built-in kernel functionality
added in OpenCL 1.2. After creating the motion estimation built-in kernel,
the motion estimation accelerator can be set as a kernel argument to control
the motion estimation engine.
There is currently no OpenCL C type for motion estimation accelerators, and
motion estimation accelerators can only be used with the motion estimation
built-in kernel.
The kernel
__kernel void block_motion_estimate_intel(
accelerator_intel_t accelerator,
__read_only image2d_t src_image,
__read_only image2d_t ref_image,
__global short2 * prediction_motion_vector_buffer,
__global short2 * motion_vector_buffer,
__global ushort * residuals);
computes motion vectors by comparing a 2d source image with a 2d reference
image, producing an output field of motion vectors. The algorithm searches
for the best match of each pixel block in the source image by searching a
region of the reference image, centered on the coordinates of that pixel
block in the source image.
The starting search coordinate may optionally be offset by a prediction
motion vector. Additionally, the kernel may optionally return a vector
field of per-pixel-block best-match distortion (SAD) values.
When enqueuing this kernel, the <global_work_size> and <global_work_offset>
determine the region of interest (ROI) of the input frame in pixels. The
<global_work_offset> determines the offset to the start of the region of
interest. The <global_work_size> determines the width and height of the
region of interest. The amount of data written to the <motion_vector_buffer>
is dependent on the size of the region of interest and the partitioning
mode specified by the accelerator.
<accelerator> must be a valid accelerator object created by
clCreateAcceleratorINTEL. For the block_motion_estimate_intel kernel, the
type of the accelerator must be CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL.
<src_image> is the input source image, typically representing 8-bit luminance
information. The <image_channel_order> and <image_data_type> of <src_image>
are restricted as follows:
----------------------------------------------------------------------------
Channel Order Src Channel Data Type
------------- ---------------------
CL_R CL_UNORM_INT8
----------------------------------------------------------------------------
<ref_image> is the input reference image, typically representing 8-bit
luminance information. The <image_channel_order> and <image_data_type> of
<ref_image> must match src_image, as follows:
----------------------------------------------------------------------------
Channel Order Ref Channel Data Type
------------- ---------------------
CL_R CL_UNORM_INT8
----------------------------------------------------------------------------
<motion_vector_buffer> is the output motion vector buffer, representing a
vector field of pixel block motion vectors, stored linearly in row-major
order. The elements of this buffer describe a motion vector for the
corresponding pixel block, with its x/y components packed as two 16-bit
integer values. Each component is encoded as a S13.2 fixed point value
(two's complement). The precision is specified by the <subpixel_mode>
of the <accelerator>. The size of the returned data is determined by the
<mb_block_type> of the <accelerator>. The <motion_vector_buffer> needs to
be sized that it fits the results of all pixel blocks of the source image,
i.e. the number of full or partial 16x16 source pixel macroblocks times the
number of returned motion vectors per source block (1, 4, or 16).
<prediction_motion_vector_buffer> is an optional input buffer representing a
vector field of prediction motion vectors, stored linearly in row-major
order. When provided, this buffer must contain one prediction motion vector
for each full or partial 16x16 pixel macroblock in the region of interest.
The prediction motion vector is used to offset the default search center
for each 16x16 pixel block search window. The default search center is the
center of the co-located source macroblock in the reference frame. Note
that in feedback algorithms, where the output <motion_vector_buffer> of one
frame is used as the input <prediction_motion_buffer> for the next frame,
the output buffer may need to be downsampled. The
<prediction_motion_vector_buffer> argument is optional and may be set to
NULL, in which case the prediction motion vectors are implied to be (0,0).
<residuals> is an optional output buffer representing a field of residuals
or "distortion values", one for each returned motion vector, stored linearly
in row-major order. These "residuals of the compensated image" represent the
sum-of-absolute-differences (SAD) between the source frame pixel block and
the best-match reference frame pixel block that produced the returned motion
vector. The <sad_adjust_mode> of the <accelerator> determines whether plain
SAD or SATD (Haar-adjusted) values are returned. The <rediduals> argument is
optional and may be set to NULL, in which case residual information is not
returned.
The motion estimation built-in kernels are queued for execution by the host
application using clEnqueueNDRangeKernel(). This function will return the
usual error codes, augmented with the following specific error codes for
this kernel:
* CL_INVALID_WORK_DIMENSION if <work_dim> is not 2. This built-in kernel
requires a 2D ND-range.
* CL_INVALID_WORK_GROUP_SIZE if <local_work_size> is not NULL. This
built-in kernel requires the work-group size to be set by the runtime.
* CL_INVALID_IMAGE_SIZE if the region of interest defined by the
<global_work_size> and <global_work_offset> exceed the dimensions of
the <src_image> or <ref_image>.
* CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if the <src_image> or <ref_image>
kernel arguments do not meet the image format restrictions described
above."
Revision History
Version 1 - Initial Revision
Version 2 - Formatting and minor bug fixes.