blob: 363da393c6a4d2ce88c9cbdee189cd9f37b49b97 [file] [log] [blame]
Name String
cl_intel_advanced_motion_estimation
Contributors
Biju George
James Holland
RaghuKrishnan Embar
Adam Herr
Tomasz Olejniczak
Scott Pillow
Ben Ashbaugh
Contact
Biju George (biju.george 'at' intel.com)
IP Status
TBD
Version
Version 2, February 15, 2016
Number
OpenCL Extension TBD
Status
Final Draft
Extension Type
OpenCL platform extension
Dependencies
OpenCL 1.2
cl_intel_accelerator version 1
cl_intel_motion_estimation version 1
Overview
This document presents the advanced motion estimation extension for OpenCL. This
extension builds upon the cl_intel_motion_estimation extension by providing block-
based estimation and greater control over the estimation algorithm.
This extension reuses the set of host-callable functions and "motion estimation
accelerator objects" defined in the cl_intel_motion_estimation extension version 1.
This extension depends on the OpenCL 1.2 built-in kernel infrastructure and on the
cl_intel_accelerator extension version 1, which provides an abstraction for domain-
specific acceleration in the OpenCL runtime.
New Procedures and Functions
None
New Tokens
Accepted as arguments to clGetDeviceInfo
CL_DEVICE_ME_VERSION_INTEL 0x407E
Accepted as flags passed to the kernel:
CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL 0x1
CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL 0x2
CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL 0x0
CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL 0x4
CL_ME_COST_PENALTY_NONE_INTEL 0x0
CL_ME_COST_PENALTY_LOW_INTEL 0x1
CL_ME_COST_PENALTY_NORMAL_INTEL 0x2
CL_ME_COST_PENALTY_HIGH_INTEL 0x3
CL_ME_COST_PRECISION_QPEL_INTEL 0x0
CL_ME_COST_PRECISION_HEL_INTEL 0x1
CL_ME_COST_PRECISION_PEL_INTEL 0x2
CL_ME_COST_PRECISION_DPEL_INTEL 0x3
Valid intra-search predictor mode constants:
CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL 0x0
CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1
CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL 0x2
CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL 0x3
CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL 0x4
CL_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL 0x4
CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL 0x5
CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL 0x6
CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL 0x7
CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL 0x8
CL_ME_CHROMA_PREDICTOR_MODE_DC_INTEL 0x0
CL_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1
CL_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL 0x2
CL_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL 0x3
Valid constant values returned by clGetDeviceInfo:
CL_ME_VERSION_ADVANCED_VER_1_INTEL 0x1
CL_ME_VERSION_ADVANCED_VER_2_INTEL 0x2
Valid macroblock type constants:
CL_ME_MB_TYPE_16x16_INTEL 0x0
CL_ME_MB_TYPE_8x8_INTEL 0x1
CL_ME_MB_TYPE_4x4_INTEL 0x2
Valid skip mode constants:
CL_ME_FORWARD_INPUT_MODE_INTEL 0x1
CL_ME_BACKWARD_INPUT_MODE_INTEL 0x2
CL_ME_BIDIRECTION_INPUT_MODE_INTEL 0x3
Valid bidirectional weight constants:
CL_ME_BIDIR_WEIGHT_QUARTER_INTEL 0x10
CL_ME_BIDIR_WEIGHT_THIRD_INTEL 0x15
CL_ME_BIDIR_WEIGHT_HALF_INTEL 0x20
CL_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL 0x2B
CL_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL 0x30
New Types
None
Additions to Chapter 4 of the OpenCL Specification:
Modify the description of function clGetDeviceInfo
Table 4.3 must be extended to include the following enumeration constants:
cl_device_info Return Type Description
-------------- ----------- ---------------
CL_DEVICE_ME_VERSION_INTEL cl_uint The motion estimation API version number
supported by the device and driver.
This extension requires a minimum motion estimation device version number of
CL_ME_VERSION_ADVANCED_VER_1_INTEL.
The cl_intel_motion_estimation extension defines a motion estimation accelerator
object. This object is used without modification in this extension.
The tokens defined in the "New Tokens" section of the cl_intel_motion_estimation
extension are used by this extension.
This extension includes two new built-in kernels for block-based motion estimation.
The second built-in kernel is supported only for devices that report a motion estimation
device version number of CL_ME_VERSION_ADVANCED_VER_2_INTEL. A program object for one or
both kernels is obtained via clCreateProgramWithBuiltInKernels, passing the kernel name
as a string to the kernel_names argument. A kernel object is obtained from this program
by calling the clCreateKernel function passing the kernel name as a string to the
kernel_names argument.
Each kernel operates on 16x16 pixel blocks (macroblocks) on the source and
reference images. The number of macroblocks (MBs) in a given image is determined by
number of 16x16 regions that evenly divide the global_work_size[0] (width) and
global_work_size[1] (height) arguments passed to the clEnqueNDRangeKernel function. If
the image dimensions are not evenly divisible by 16, a partial MB is defined for the
remaining pixels. The kernel references macroblocks sequentially using contiguous
row-major ordering. For example, a 128x128 source image would have the following
macroblock ordering:
-------------------------
| 0| 1| 2| 3| 4| 5| 6| 7|
+--+--+--+--+--+--+--+--+
| 8| 9|10|11|12|13|14|15|
+--+--+--+--+--+--+--+--+
|16|17|18|19|20|21|22|23|
+--+--+--+--+--+--+--+--+
|24|25|26|27|28|29|30|31|
+--+--+--+--+--+--+--+--+
|32|33|34|35|36|37|38|39|
+--+--+--+--+--+--+--+--+
|40|41|42|43|44|45|46|47|
+--+--+--+--+--+--+--+--+
|48|49|50|51|52|53|54|55|
+--+--+--+--+--+--+--+--+
|56|57|58|59|60|61|62|63|
-------------------------
The data-layout of the kernel's input and output arrays are based on this ordering and
require a specific data layout per macroblock as described below.
Inter-prediction is the process of determining the best inter-frame motion vectors
that describe the transform from a 2D reference image to another 2D source image. This
is done by searching for temporal patterns, usually in adjacent frames in a video
sequence. The estimation algorithm operates on 16x16 macroblocks, with either 4x4,
8x8 or 16x16 sub-block sizes. Each of these sub-block sizes has a corresponding
number of motion vectors within a given macroblock:
Sub-block Size MVs per MB
-------------- --------------
4x4 16
8x8 4
16x16 1
The algorithm searches for the best match of each pixel block in the source image by
searching an image region in the reference image, centered on the coordinates of that
pixel block in the source image. This center coordinate can be offset by a set of
prediction motion vectors (MVs). The predictor_motion_vector_buffer argument is used
to define up to eight prediction MVs per macroblock. The count_motion_vector_buffer
argument is used to configure the number of actual prediction motion vectors used
within each macroblock.
A cost function scheme can be specified for motion search. Distortion for a MV is
computed as a sum of the SAD and the MV cost penalty. Cost penalty is computed based
on the distance between the computed MV and a specific cost-center. This cost-center
is specified as the first predictor motion vector configured for a given MB. The
search_cost_penalty argument specifies the cost penalty function and can be configured
for low, normal or high penalty. The search_cost_precision argument is used to
configure the range of the cost function by specifying the precision of control
points at which the cost penalties are applied to quarter, half, full, or double pixel
precision. The cost penalties at in-between control points are linearly interpolated.
Generally, a low penalty can be used when using low quantization parameter values
during encoding and a high penalty can be used when using high quantization parameter
values during encoding.
Search results are populated in the search_motion_vector_buffer array. This array
contains a set of best-search motion vectors per MB; the number of MVs per MB is
determined by the sub-block size. It is also possible to obtain the SAD-adjusted
residual values corresponding to the best search MVs via the array specified via the
search_residuals argument.
The kernel can perform skip-checks to produce distortion values based on the skip-
check MVs specified for each macroblock. Skip-checks may be configured with either 8x8
or 16x16 sub-block sizes, via the skip_block_type argument. The
skip_motion_vector_buffer is used to configure multiple sets of skip-check MVs per
MB. The number of vectors in each set is determined by the sub-block size:
Sub-block Size MVs per MB
-------------- --------------
8x8 4
16x16 1
Results are obtained via the skip_residuals argument as SAD-adjusted distortion values
corresponding to each skip-check MV defined for each macroblock.
Intra-prediction describes the transform from previous adjacent macroblocks to
subsequent macroblocks within the same 2D source frame by searching for spatial
patterns and produces the predictor modes from previous adjacent macroblocks within
the same frame. The kernel may be configured to report the intra-prediction modes via
the intra_search_prediction_modes_buffer argument. This array contains, for each
macroblock, a record containing the predictor mode constants for 1 16x16, 4 8x8 and 16
4x4 luma blocks. There is also a predictor mode entry reserved for an 8x8 chroma
block. Residual values derived during intra-prediction process are accessible via the
intra_search_residuals argument.
The first kernel
block_advanced_motion_estimate_check_intel(
accelerator_intel_t accelerator,
__read_only image2d_t src_image,
__read_only image2d_t ref_image,
uint flags,
uint skip_block_type,
uint search_cost_penalty,
uchar search_cost_precision,
__global short2 *count_motion_vector_buffer,
__global short2 *predictor_motion_vector_buffer,
__global short2 *skip_motion_vector_buffer,
__global short2 *search_motion_vector_buffer,
__global char *intra_search_predictor_modes,
__global ushort *search_residuals,
__global ushort *skip_residuals,
__global ushort *intra_residuals );
defines a kernel that provides various block-based motion estimation computations.
There are three basic use cases for this kernel:
1.) Perform inter-prediction motion estimation on the source and reference images to
obtain the best search motion vectors and their associated distortion values.
2.) Perform skip-checks on the source and reference images by providing a set of
motion vectors, then obtain the corresponding distortion values.
3.) Perform intra-prediction computations to obtain the best-search prediction modes
between adjacent macroblocks and associated residual values.
This kernel can be set up to do some or all of these operations in a single enqueue.
block_advanced_motion_estimate_check_intel arguments:
accelerator is a valid accelerator object created by clCreateAcceleratorINTEL, where
the type of the accelerator must be CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL.
Refer to the cl_intel_motion_estimation extension for a detailed description of
configuring accelerator object with the cl_motion_estimation_desc_intel structure.
src_image is the input source image, typically representing 8-bit luminance
information. Currently, the image_channel_order and the image_data_type of src_image
are restricted as follows:
Channel Order Src Channel Data Type
-------------- ---------------------
CL_R CL_UNORM_INT8
Additional formats will be support by future extensions. The host program is
responsible for populating the tiled image using the clEnqueueWriteImage or other
appropriate API function.
ref_image is the input reference image, representing 8-bit luminance information.
image_channel_order and the image_data_type must match src_image, as follows:
Channel Order Src Channel Data Type
-------------- ---------------------
CL_R CL_UNORM_INT8
Additional formats will be support by future extensions. The host program is
responsible for populating the tiled image using the clEnqueueWriteImage or other
appropriate API function function.
flags defines any optional modes or behaviors used in computing motion estimation,
skip check and/or intra-prediction algorithms. Currently supported are:
Type Description
-------------------------------------- -----------
CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL Enabled Luma-based intra-prediction.
The following additional token are reserved for future support:
Type Description
-------------------------------------- -----------
CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL Enabled chroma- based intra-prediction.
skip_block_type flag specifies the sub-block size used in evaluating skip checks. The
specified sub-block size will determine the data layout of the
skip_motion_vector_buffer array:
Type Sub-block size MVs per MB entry
--------------------- --------------- ----------------
CL_ME_MB_TYPE_16x16_INTEL 16x16 1
CL_ME_MB_TYPE_8x8_INTEL 8x8 4
search_cost_penalty defines the cost function scheme used in computing cost penalties.
Type Description
-------------------------------------- -----------
CL_ME_COST_PENALTY_NONE_INTEL penalty is zero
CL_ME_COST_PENALTY_LOW_INTEL penalty for low motion
CL_ME_COST_PENALTY_NORMAL_INTEL penalty for normal motion
CL_ME_COST_PENALTY_HIGH_INTEL penalty for high motion
search_cost_precision defines the pixel precision of the cost penalty calculations. If
the search_cost_penalty flag is set to CL_ME_COST_PENALTY_NONE_INTEL, this argument
is ignored. Possible values are:
Type Description
-------------------------------------- -----------
CL_ME_COST_PRECISION_QPEL_INTEL quarter pixel
CL_ME_COST_PRECISION_HPEL_INTEL half pixel
CL_ME_COST_PRECISION_PEL_INTEL full pixel
CL_ME_COST_PRECISION_DPEL_INTEL double pixel
count_motion_vector_buffer defines the number of predictor motion vectors and skip-
check motion vectors defined for each macroblock. The buffer contains an array of
short integer pairs, one pair per MB. The indices of the array correspond to the
contiguous row-major block layout of the input frame. The first value in each pair
defines the number of predictor motion vectors for a given MB; this value defines the
range of valid entries for the MB contained within the predictor_motion_vector_buffer
array. The second value in each pair defines the number of skip-check motion vectors
for the MB; this value defines the range of valid entries in the
skip_motion_vector_buffer array. All size values must be between 0 and 8 inclusive;
size values greater than 8 result in undefined behavior.
predictor_motion_vector_buffer defines an input array of signed short integer
predictor MVs with quarter-pixel resolution. The array is partitioned into clusters of
8 motion vectors per MB in contiguous row-major ordering. The buffer layout assumes
the maximum size of 8 predictor MVs per MB even if the count_motion_vector_buffer
array specifies a smaller predictor count. If the value of the search_cost_penalty
argument does not equal CL_ME_COST_PENALTY_NONE_INTEL, the first predictor MV for
each MB is used as the cost center for cost penalty calculations. If the array passed
to count_motion_vector_buffer argument specifies a predictor size of zero for all
macroblocks this argument can be NULL.
skip_motion_vector_buffer defines an input array of signed short integer skip-check
MVs. The array is partitioned into clusters of 8 sets of motion vectors per MB, in
contiguous row-major ordering. The value of skip_block_type determines the number of
MVs for each of the 8 entries:
Value of skip_block_type Number MVs in each entry
------------------------- ---------------------------
CL_ME_MB_TYPE_16x16_INTEL 1 MVs per entry
CL_ME_MB_TYPE_8x8_INTEL 4 MVs per entry
The buffer layout assumes the maximum size of 8 MV entries per MB, even if the
count_motion_vector_buffer array specifies a smaller skip-check count. If the array
passed to count_motion_vector_buffer specifies a skip-check size of zero for all macro
blocks, no skip check computation is performed and this argument can be NULL.
search_motion_vector_buffer defines an output array of signed short integers pairs
defining the best search motion vectors per macro block. The array contains 1, 4 or 16
motion vectors per MB in contiguous row-major ordering. The number of vectors per MB
is determined by the value of mb_block_type specified during the creation of the
accelerator object:
Value of mb_block_type Number of MVs
---------------------- --------------
CL_ME_MB_TYPE_16x16_INTEL 1 MVs per MB
CL_ME_MB_TYPE_8x8_INTEL 4 MVs per MB
CL_ME_MB_TYPE_4x4_INTEL 16 MVs per MB
intra_search_prediction_modes_buffer specifies an output buffer containing a sequence
of signed chars describing the predictor modes used during motion estimation. The
array is divided into a sequence of 22 bytes per MB in contiguous row-major ordering.
Each entry in the array has the following form:
struct search_predictor_modes
{
char luma_16x16_block;
char luma_8x8_block[4];
char luma_4x4_block[16];
char chroma_8x8_block;
};
The luma_16x16_block, luma_8x8_block and luma_4x4_block fields contain valid values
only when the CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL flag is set.
Each value in the luma_8x8_block and luma_4x4_block arrays contains one of the
following constants:
CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL
CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL
CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL
CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL
CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL
CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL
CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL
CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL
CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL
The value of luma_16x16_block contains one of the following constants:
CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL
CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL
CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL
CL_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL
The chroma_8x8_block field only contain valid values if the
CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL flag is set. If enabled, the chroma_8x8_block
contains one of the following constants:
CL_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL
CL_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL
CL_ME_CHROMA_PREDICTOR_MODE_DC_INTEL
CL_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL
This argument can be NULL.
search_residuals defines an output buffer containing vectors of unsigned short SAD-
adjusted values corresponding to the best search motion vectors populated in the
search_motion_vector_buffer array. The array is divided into one vector per MB in
contiguous row-major block ordering. Each vector contains 1, 4, or 16 components
depending on the value of mb_block_type specified during the creation of the
accelerator object:
Value of mb_block_type Vector Size
------------------------- -------------
CL_ME_MB_TYPE_16x16_INTEL 1
CL_ME_MB_TYPE_8x8_INTEL 4
CL_ME_MB_TYPE_4x4_INTEL 16
This argument can be NULL.
skip_residuals defines an output buffer containing vectors of unsigned short SAD-
adjusted values corresponding to the skip-check MVs defined by
skip_motion_vector_buffer. The array is partitioned into clusters of 8 sets of
residual values per MB, in contiguous row-major ordering. The value of skip_block_type
determines the number of values in each of the 8 entries:
Value of skip_block_type Number MVs in each entry
------------------------ ------------------------
CL_ME_MB_TYPE_16x16_INTEL 1 residual per entry
CL_ME_MB_TYPE_8x8_INTEL 4 residuals per entry
The buffer layout assumes the maximum size of 8 residual values per MB, however the
number of valid residual entries corresponds to the skip-check MV count specified in
count_motion_vector_buffer for each MB. This argument can be NULL.
intra_search_residuals defines an output buffer of unsigned short SAD-adjusted vectors
that correspond to the residual values used during intra-prediction. The buffer
contains 4 values per MB in contiguous row-major ordering using the following layout:
struct intra_search_residuals
{
short luma_16x16_block_residual;
short luma_8x8_block_residual;
short luma_4x4_block_residual;
short chroma_8x8_block_residual;
};
The chroma_8x8_block_residuals value is only valid if the
CL_ME_CHROMA_INTRA_PREDICT_ENABLED flag is set. This argument can be NULL.
The second built-in kernel extends upon the functionality provided by the first
kernel by additionally supporting bidirectional skip checks and minor additional
control over the estimation algorithm. The additional control includes the
ability to perform skip checks on a different set of source and reference frames
as used for motion search, and the option to specify a pair of scalar counts for
input search predictors and skip motion vectors globally for all macro-blocks in
the frame instead of having to specify such counts on a per macroblock basis. This
kernel is only supported for devices that report a motion estimation device version
number of CL_ME_VERSION_ADVANCED_VER_2_INTEL.
The second kernel
block_advanced_motion_estimate_bidirectional_check_intel(
accelerator_intel_t accelerator,
__read_only image2d_t src_search_image,
__read_only image2d_t ref_search_image,
__read_only image2d_t src_check_image,
__read_only image2d_t ref0_check_image,
__read_only image2d_t ref1_check_image,
uint flags,
uint search_cost_penalty,
uint search_cost_precision,
short2 count_global,
uchar bidir_weight,
__global short2 * count_motion_vector_buffer,
__global short2 * prediction_motion_vector_buffer,
__global char *skip_input_mode_buffer,
__global short2 * skip_motion_vector_buffer,
__global short2 *search_motion_vector_buffer,
__global char *intra_search_predictor_modes,
__global ushort *search_residuals,
__global ushort * skip_residuals,
__global ushort * intra_residuals );
defines a kernel that provides various block-based motion estimation computations.
There are three basic use cases for this kernel:
1.) Perform unidirectional inter-prediction motion estimation on the source and
reference images to obtain the best search motion vectors and their associated
distortion values.
2.) Perform unidirectional or bidirectional skip-checks on the source and
reference images by providing a set of motion vectors, then obtain the
corresponding distortion values.
3.) Perform intra-prediction computations to obtain the best-search prediction
modes between adjacent macroblocks and associated residual values.
This kernel can be set up to do some or all of these operations in a single
enqueue.
The kernel can perform unidirectional or bidirectional skip-checks to produce
distortion values based on the input skip-check motion vectors specified for
each sub-block in the macroblock. Skip-checks for each motion vector for a
sub-block may be independently configured for either unidirectional or
bidirectional skip-checks by means of the skip_input_mode_buffer argument.
A sub-block configured for bidirectional skip-check will have two component
input motion vectors; one specifying a rectangular region in the forward
reference frame and the other in the backward reference frame. A
bidir_weight argument is additionally specified for sub-blocks configured
for bidirectional skip checks. The effective reference region is a
weighted blend of the forward and backward reference region as specified by
the bidirectional skip motion vector and bidir_weight arguments. If a motion
vector is configured for bidirectional skip check, then the ref0_check_image
argument is taken as the forward reference image and ref1_check_image is
taken as the backward reference image, else only ref0_check_image is taken
as the single reference image. Skip-checks may be configured with either
8x8 or 16x16 sub-block sizes, via the flags argument. The
skip_motion_vector_buffer is used to configure up to four sets of bidirectional
skip-check MVs pairs per MB. The number of vectors in each set is determined
by the sub-block size:
Sub-block Size Bidirectional MV pairs per MB
-------------- -----------------------------
8x8 4
16x16 1
Results are obtained via the skip_residuals argument as SAD-adjusted distortion
values corresponding to each skip-check MV defined for each macroblock.
block_advanced_motion_estimate_bidirectional_check_intel arguments:
accelerator is a valid accelerator object created by clCreateAcceleratorINTEL,
where the type of the accelerator must be
CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL.
Refer to the cl_intel_motion_estimation extension for a detailed description
of configuring accelerator object with the cl_motion_estimation_desc_intel
structure.
src_search_image is the input source image for motion search operations,
typically representing 8-bit luminance information. Currently, the
image_channel_order and the image_data_type of src_search_image are
restricted as follows:
Channel Order Src Channel Data Type
------------- ---------------------
CL_R CL_UNORM_INT8
Additional formats will be support by future extensions. The host program
is responsible for populating the tiled image using the clEnqueueWriteImage
function or other appropriate API function.
ref_search_image is the input reference image for motion search operations,
representing 8-bit luminance information. The image_channel_order and the
image_data_type must match src_search_image. The host program is responsible
for populating the tiled image using the clEnqueueWriteImage function or other
appropriate API function.
src_check_image is the input source image for skip checks operations,
typically representing 8-bit luminance information. It has the same
restrictions as src_search_image. The host program is responsible for
populating the tiled image using the clEnqueueWriteImage function or other
appropriate API function.
ref0_check_image is the input forward reference image for unidirectional
and bidirectional skip check operations, representing 8-bit luminance
information. The image_channel_order and the image_data_type must match
src_check_image. The host program is responsible for populating the tiled
image using the clEnqueueWriteImage function or other
appropriate API function.
ref1_check_image is the input backward reference image for bidirectional
skip check operations, representing 8-bit luminance information. The
image_channel_order and the image_data_type must match src_check_image.
If bidirectional skip checks are not used then, this must be set to
the same image as ref0_check_image. The host program is responsible for
populating the tiled image using the clEnqueueWriteImage function or other
appropriate API function.
flags defines any optional modes or behaviors used in computing motion
estimation, skip check and/or intra-prediction algorithms.
Currently supported are:
Type Description
---- -----------
CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL Specifies a 16x16
skip check sub-block
type.
CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL Specifies a 8x8
skip check sub-block
type.
CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL Enables Luma-
based intra-
prediction
The following additional token is reserved for future support:
Type Description
---- -----------
CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL Enables chroma-based intra-
prediction.
The CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL flag cannot be set along with
CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL as skip checks can either be configured
for a 16x16 sub-block size or an 8x8 sub-block size. The behavior is
undefined if both flags are set.
search_cost_penalty defines the cost function scheme used in computing
cost penalties.
Refer to the description of argument search_cost_penalty in the first
built-in kernel block_advanced_motion_estimate_check_intel for details.
search_cost_precision defines the pixel precision of the cost penalty
calculations.
Refer to the description of argument search_cost_precision in the first
built-in kernel block_advanced_motion_estimate_check_intel for details.
count_global can be used to specify the scalar counts of predictor
motion vectors and skip-check motion vectors globally for all
macroblocks. This can be used in lieu of setting the
count_motion_vector_buffer if the number of predictor motion vectors
and skip-check motion vectors are uniform for all macro-blocks.
It is specified as a pair of short integers. The first value in the
pair is the count of predictor MVs and the second value is the count
of skip check MVs. If either value is -1, then the corresponding count
is taken from the per macroblock entry for each macroblock as
specified in count_motion_vector_buffer.
count_motion_vector_buffer defines the number of predictor motion
vectors and skip-check motion vectors defined for each macroblock.
The buffer contains an array of short integer pairs, one pair per
MB. The indices of the array correspond to the contiguous row-major
block layout of the input frame. The first value in each pair
defines the number of predictor motion vectors for a given MB; this
value defines the range of valid entries for the MB contained within
the predictor_motion_vector_buffer array. This value is used only if
the predictor count in count_global is -1. The second value in each
pair defines the number of skip-check motion vectors for the MB;
this value defines the range of valid entries in the
skip_motion_vector_buffer array. This value is used only if the
skip-check MV count in count_global is -1. All size values must be
between 0 and 4 inclusive; size values greater than 4 result in
undefined behavior. If both the pair values in count_global is -1,
then this argument can be NULL.
bidir_weight defined the implicit bidirectional weight to be used
when performing bidirectional skip checks. This is used to obtain
the weighted reference pixels from the forward and backward blocks.
There are 5 possible weights:
Weight Description
------ -----------
CL_ME_BIDIR_WEIGHT_QUARTER_INTEL quarter distance from
forward and three-
quarters from
backward
CL_ME_BIDIR_WEIGHT_THIRD_INTEL one-third distance
from forward and two-
thirds from backward
CL_ME_BIDIR_WEIGHT_HALF_INTEL half distance from
forward and backward
CL_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL two-third distance
from forward and one-
quarters from
backward
CL_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL quarter distance from
forward and three-
quarters from
backward
predictor_motion_vector_buffer defines an input array of signed
short integer predictor MVs with quarter-pixel resolution. The array
is partitioned into clusters of 4 motion vectors per MB in
contiguous row-major ordering. The buffer layout assumes the maximum
size of 4 predictor MVs per MB even if the count_motion_vector_buffer
array specifies a smaller predictor count. If the value of the
search_cost_penalty argument does not equal
CL_ME_COST_PENALTY_NONE_INTEL, the first predictor MV for each MB is
used as the cost center for cost penalty calculations. If the array
passed to count_motion_vector_buffer argument specifies a predictor
size of zero for all macroblocks this argument can be NULL.
skip_input_mode_buffer defines an input array of unsigned char
integers defining the skip modes for each macroblock. The indices of
the array correspond to the contiguous row-major block layout
of the input frame. There is one unsigned char integer per macroblock.
The following 2-bit skip mode enumeration values are defined for
each macroblock sub-block:
Skip Mode Enumeration Description
--------------------- -----------
CL_ME_FORWARD_INPUT_MODE_INTEL Unidirectional skip MV
from forward frame
CL_ME_BACKWARD_INPUT_MODE_INTEL Unidirectional skip MV
from backward frame
CL_ME_BIRECTIONAL_INPUT_MODE_INTEL Bidirectional skip MV
from forward and
backward frames
The format of each macroblock unsigned char entry depends on the
skip block type specified in the flags parameter.
Skip Block Type Format
--------------- ------
CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL 1 2-bit skip mode
enumeration value for
one sub-block
CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL 4 2-bit skip modes
enumeration values (one
for each sub-block
component MV) packed
into successive two
bits of char
skip_motion_vector_buffer defines an input array of pairs of signed
short integer skip-check MVs – one component MV for the forward
reference image and the other component MV for the backward reference
image. The buffer layout assumes the MVs for both the forward and
backward reference images are specified. If unidirectional forward
or backward check is specified for a skip-check sub-block, then the
corresponding backward or forward component of the MV pair will
be ignored. The array is partitioned into clusters of 4 sets of
bidirectional pairs motion vectors per MB, in contiguous row-major
ordering. The value of skip_block_type determines the number of pairs
of MVs for each of the 4 entries:
Value of skip_block_type Number MV pairs in each entry
---------------------- ---------------------------
CL_ME_MB_TYPE_16x16_INTEL 1 MV forward/backward pair
per entry
CL_ME_MB_TYPE_8x8_INTEL 4 MV forward/backward pairs
per MB
The buffer layout assumes the maximum size of 4 MV pair entries per MB,
even if the count_motion_vector_buffer array specifies a smaller skip-
check count. If the array passed to count_motion_vector_buffer specifies
a skip-check size of zero for all macro blocks, no skip check
computation is performed and this argument can be NULL.
search_motion_vector_buffer defines an output array of signed short
integers pairs defining the best search motion vectors per macro block.
Refer to the description of argument search_motion_vector_buffer in
the first built-in kernel block_advanced_motion_estimate_check_intel
for details.
intra_search_prediction_modes_buffer specifies an output buffer
containing a sequence of signed chars describing the predictor modes
used during motion estimation.
Refer to the description of argument intra_search_prediction_modes in
the first built-in kernel block_advanced_motion_estimate_check_intel
for details.
search_residuals defines an output buffer containing vectors of
unsigned short SAD-adjusted values corresponding to the best
search motion vectors populated in the search_motion_vector_buffer
array.
Refer to the description of argument search_residuals in the first
built-in kernel block_advanced_motion_estimate_check_intel for
details.
skip_residuals defines an output buffer containing vectors of
unsigned short SAD-adjusted values corresponding to the skip-check
MVs defined by skip_motion_vector_buffer. The array is partitioned
into clusters of 4 sets of residual values per MB, in contiguous
row-major ordering. The value of skip_block_type determines
the number of values in each of the 4 entries:
Value of skip_block_type Number MVs in each entry
---------------------- ---------------------------
CL_ME_MB_TYPE_16x16_INTEL 1 residual per entry
CL_ME_MB_TYPE_8x8_INTEL 4 residuals per entry
The buffer layout assumes the maximum size of 4 residual values
per MB, however the number of valid residual entries corresponds
to the skip-check MV count specified in count_motion_vector_buffer
for each MB. This argument can be NULL.
intra_search_residuals defines an output buffer of unsigned short
SAD-adjusted vectors that correspond to the residual values used
during intra-prediction.
Refer to the description of argument intra_search_residuals in the
first built-in kernel block_advanced_motion_estimate_check_intel
for details.
This kernel is queued for execution using clEnqueueNDRangeKernel(). Several arguments
passed to this function are specific for this kernel:
For both built-in kernels:
work_dim must be 2.
global_work_size represents the height and width of the area of interest to be
processed.
global_work_offset specifies the top-left point of the area of interest.
local_work_size must NULL.
The count and layout of macroblocks processed in the frame is based on the arguments
passed to the global_work_size and global_work_offset argument and not on the
dimensions of the input and reference images. The client must ensure that the data
layout of all arrays passed as arguments define the correct number of macroblocks.
The height and width dimensions of the area of interest specified by global_work_size
and global_work_offset must be less than or equal to the width and height of the
source image.
The clEnqueueNDRangeKernel function returns the usual error codes, augmented with the
following specific error codes for this kernel:
- CL_INVALID_WORK_DIMENSION if work_dim is not 2. This built-in kernel requires a 2D
ND-range.
- CL_INVALID_WORK_GROUP_SIZE if local_work_size is not NULL.
- CL_INVALID_WORK_GROUP_SIZE if the respective values of global_work_size[0] and
global_work_size[1] exceed the width and/or height of input images.
- CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if an image object passed as an argument does not
have a supported format, as listed above.
- CL_INVALID_IMAGE_FORMAT if the image objects passed to src_image and ref_image
arguments do not contain matching formats and sizes.
- CL_INVALID_GLOBAL_OFFSET if the respective values of global_work_offset[0] and
global_work_offset[1] exceed the width and/or height of input images.
- CL_INVALID_KERNEL_ARGS if predictor_motion_vector_buffer is NULL and one or more
predictor MV sizes passed to count_motion_vector_buffer are greater than 0.
- CL_INVALID_KERNEL_ARGS if skip_motion_vector_buffer is NULL and one or more but
skip-check MV sizes passed to count_motion_vector_buffer are greater than 0.
- CL_INVALID_BUFFER_SIZE if any of the cl_mem objects passed as arguments has a size
less than the expected size.
Interactions with Other Extensions
The advanced motion estimation extension is based on the cl_intel_accelerator and
cl_intel_motion_estimation extensions, and is defined in terms of additions to the
base accelerator and motion extension documents.