extensions/intel/cl_intel_device_side_avc_motion_estimation.txt - external/github.com/KhronosGroup/OpenCL-Registry - Git at Google

 Name String

    cl_intel_device_side_avc_motion_estimation

 Contributors

    Biju George, Intel
    Raghukrishnan Embar, Intel
    Ryan Lei, Intel
    Ben Ashbaugh, Intel
    Bartosz Sochacki, Intel
    Krishna Madaparambil, Intel
    Mateusz Tabaka, Intel

 Contact

    Biju George, Intel (biju.george 'at' intel.com)

 Version

    Version 5, November 9, 2018

 Number

    OpenCL Extension #50

 Status

    First Draft

 Dependencies

    OpenCL 1.2 is required.

    The OpenCL Intel vendor extension cl_intel_subgroups is required. The
    VME built-in functions are an extension of the subgroup functions
    defined in cl_intel_subgroups.

    The built-in functions which perform intra estimation depend on the
    OpenCL Intel vendor extension cl_intel_media_block_io in order to
    read-in the neighboring macroblock edge pixels.

    The built-in functions which perform chroma based intra estimation
    operations depend on the OpenCL Intel vendor extension
    cl_intel_planar_yuv in order to create NV12 source images.

    This extension is written against revision 29 of the OpenCL 2.0 API
    specification, against revision 33 of the OpenCL 2.0 OpenCL C
    specification, and against revision 32 of the OpenCL 2.0 extension
    specification.

 Overview

    Video motion estimation (VME) is defined as of set motion estimation
    operations that are used to determine the motion vectors, intra
    estimation angles and macroblock partitioning combination that best
    describe the transformation to the source macroblock, from blocks in
    one or more previous reference pictures (inter-prediction), or from
    other blocks in the same source picture (intra-prediction). It does
    this by searching for spatial and temporal patterns on the current and
    various forward and backward reference pictures.

    The goal of this extension is to provide programmers with a fine-
    grained interface to the AVC VME media sampler in Intel graphics
    processors. It describes the specification of low-level built-in
    functions, callable from OpenCL kernels, that facilitate the
    programming of the VME media sampler to evaluate specific AVC motion
    estimation operations. If only a coarser-level interface at the level
    of built-in kernels suffices, then the Intel vendor extensions
    cl_intel_motion_estimation and cl_advanced_motion_estimation may be
    considered.

    Built-in functions are defined for all the major operations of the VME
    media sampler. The major operations of the AVC VME media sampler in
    Intel Graphics Processors can be described as follows:

    1. Integer motion estimation (IME)

       Perform motion estimation on a given source macroblock in
       a source image over a single or dual reference window in a
       reference image, at full-pixel resolution, to determine the
       best integer motion vectors and their associated distortions,
       and the best macroblock shape partitioning combination.

    2. Motion estimation refinement (REF)

       Perform refinement operations on the results of IME. The two sub-
       operations are:

       Fractional motion estimation (FME)

       Perform sub-pixel refinement on the results of an IME operation.
       half-pixel (HPEL) or quarter-pixel (QPEL) refinements are performed
       to determine the best sub-pixel motion vectors and their associated
       distortions.

       Bidirectional motion estimation (BME)

       Perform bidirectional refinement on the results of an IME
       operation using two reference images to check if the bidirectional
       mode using two references yields lesser distortions. An FME can
       optionally be performed implicitly as part of a bidirectional
       refinement.

    3. Skip and Intra check (SIC)

       Performs the following two sub-operations:

       Skip check (SKC)

       Compute the pixel distortion of a user-specified shape and motion
       vector combination. The VME media sampler fetches necessary pixels,
       performs fractional and bidirectional filtering (as necessary), and
       then computes the distortion between the derived reference and
       source. The skip decision can optionally be enhanced to include a
       4x4 forward transform, the results of which are compared against a
       user specified threshold to emulate the effects of the forward
       quantization zeroing effect.

       Intra prediction estimation (IPE)

       Perform intra prediction on a given source macroblock to determine
       the best intra prediction modes and the best shape partitioning
       combination.

 New API Enums

     Accepted as arguments to clGetDeviceInfo

     CL_DEVICE_AVC_ME_VERSION_INTEL                      0x410B
     CL_DEVICE_AVC_ME_SUPPORTS_TEXTURE_SAMPLER_USE_INTEL 0x410C
     CL_DEVICE_AVC_ME_SUPPORTS_PREEMPTION_INTEL          0x410D

     Additional valid constant values returned by clGetDeviceInfo:

     CL_AVC_ME_VERSION_1_INTEL                           0x1

 Additions to Chapter 4 of the OpenCL Specification:

     Modify the description of function clGetDeviceInfo(..)

     Table 4.3 is extended to include the following enumeration constants.

     +--------------------------+-----------+----------------------------+
     |cl_device_info            |Return Type|Description                 |
     +--------------------------+-----------+----------------------------+
     |CL_DEVICE_AVC_ME_VERSION_ |cl_uint    |The AVC device-side motion  |
     |INTEL                     |           |estimation API version      |
     |                          |           |number supported by the     |
     |                          |           |device and driver.          |
     +--------------------------+-----------+----------------------------+
     |CL_DEVICE_AVC_ME_SUPPORTS_|cl_bool    |Is CL_TRUE if built-in      |
     |TEXTURE_SAMPLER_USE_INTEL |           |functions using the texture |
     |                          |           |sampler may be used along   |
     |                          |           |with AVC VME built-in       |
     |                          |           |functions using the media   |
     |                          |           |sampler on the device, and  |
     |                          |           |CL_FALSE otherwise.         |
     +--------------------------+-----------+----------------------------+
     |CL_DEVICE_AVC_ME_SUPPORTS_|cl_bool    |Is CL_TRUE if an enqueue of |
     |PREEMPTION_INTEL          |           |an OpenCL kernel that uses  |
     |                          |           |AVC VME built-in functions  |
     |                          |           |supports preemption on the  |
     |                          |           |device, and CL_FALSE        |
     |                          |           |otherwise. (This may be     |
     |                          |           |useful on platforms that    |
     |                          |           |have strict latency         |
     |                          |           |requirements on the GPU).   |
     +--------------------------+-----------+----------------------------+

     This extension requires a value of CL_AVC_ME_VERSION_1_INTEL returned
     by clGetDeviceInfo for CL_DEVICE_AVC_ME_VERSION_INTEL.

 Terms, Acronyms and Definitions

    Adds a new section 6.13.14.X "VME built-in functions" to the OpenCL
    2.0 C specification.

    "
    The following terms, acronyms and definitions are used in and
    provide context for the VME built-in functions.

    Macro-block (MB):
    -----------------
    An image is partitioned into macro-blocks of size 16x16 pixels. It is
    the basic unit of processing for AVC video motion estimation
    operations.

    Shape:
    ------
    A MB may be partitioned into sub-blocks of one of the major shapes. A
    sub-block with an 8x8 major shape may be further independently
    partitioned into sub-blocks of one of the minor shapes. It is
    represented by pre-defined shape enumeration values.

    Major Shapes:
    -------------
    Shapes of 16x16, 16x8, 8x16, or 8x8 partitions of a MB. A 16x16 major
    shape merely indicates that the MB was not further partitioned.

    Minor Shapes:
    -------------
    Shapes of 8x8, 8x4, 4x8, or 4x4 sub-partitions of an 8x8 partition. A
    8x8 minor shape merely indicates that the 8x8 major partition was not
    further sub-partitioned.

    Block:
    ------
    A sub-block of a MB with one of the major or minor shapes.

    Reference Image:
    ----------------
    An image (typically from the previously decoded buffer in an encoder
    pipeline) from which motion estimation predictions are made.

    Source Image:
    -------------
    The current image for which motion estimation predictions are made.

    Source Macro-block Offset:
    --------------------------
    The 2D offset of the top left corner of the source MB in pixel units.
    It is represented by a pair of unsigned 16-bit integers.

    Reference Window Offset:
    ------------------------
    The 2D offset of the top left corner reference search window w.r.t to
    the top left corner of the source MB in pixel units. It is represented
    by a pair of signed 16-bit integers in the range [-2048, 2047].

    Reference identifier:
    ---------------------
    Reference identifiers are associated to pairs of forward(L0)/
    backward(L1) reference image parameters. Up to 16 pairs of reference
    pairs of reference image parameters are permitted, with the permitted
    values of reference identifiers ranging from 0 to 15. The reference
    identifers are assigned in increasing order in which the reference
    image parameter pairs are declared in the OpenCL kernel parameter
    list. See new section 6.13.14.X "VME built-in functions" described
    below for more details.

    Motion Vector (MV):
    -------------------
    A 2D vector used for inter motion estimation that provides an offset
    from the top left corner of a block in the source image to the top
    left corner an identically sized block in the reference image.
    Generally it is used to represent the best match of a block in the
    reference image to a block in the source image. The best match is
    determined as the block minimizing the distortion. MVs are specified
    in QPEL resolution with the 2 LSB representing the fractional part
    of the offset. It is represented by a pair of signed 16-bit signed
    integers.

    Packed Motion Vector:
    --------------------------
    A motion vector represented as a packed 32-bit unsigned integer. The
    lower 16 bits contains the X coordinate and the upper 16 bits contains
    the Y coordinate.

    Bidirectional Motion Vector (BMV):
    ----------------------------------
    A pair of MVs for the forward(L0) and backward(L1) images. Depending
    on how the VME operation is configured only the forward or the
    backward MV or both may be valid.

    Packed Bidirectional Motion Vector:
    -----------------------------------
    A bidirectional MV represented as a packed 64-bit unsigned integer.
    The lower 32-bits contain the forward packed MV, and the upper 32-bits
    contain the backward packed MV.

    Sum Of Absolute Difference (SAD):
    ---------------------------------
    The sum of absolute differences of every full/sub-pixel location in
    the source block w.r.t every corresponding full/sub pixel in the
    reference block as specified by a given MV. The sum of absolute
    differences may be optionally Haar transform adjusted. It is
    represented by an unsigned 16-bit integer value.

    Haar Transform (HAAR):
    ----------------------
    A simple wavelet transform that is used to refine the distortion
    measure of SAD. The per pixel difference goes through a 4x4 Haar
    transform. Then the SAD is replaced by the sum of the absolute
    values of the transform domain coefficients in the distortion. Haar
    transform is used as a coarse estimation of the integer transform.

    Motion Vector Cost Center (CC):
    -------------------------------
    A MV has an associated cost w.r.t a cost center coordinate. The
    further away from the cost center, the larger will be the cost
    associated with the MV. Cost centers are specified in QPEL resolution
    with the 2 LSB representing the fractional part of the offset.

    Motion Vector Cost Center Delta (CCD):
    --------------------------------------
    The 2D offset of the cost center relative to the top left corner of
    the source MB. Cost center deltas are specified in QPEL resolution
    with the 2 LSB representing the fractional part of the offset.
    It is represented by a pair of signed 16-bit integers.

    Packed Motion Vector Cost Center Delta:
    ---------------------------------------
    A motion vector cost center delta represented as a packed 32-bit
    unsigned integer. The lower 16 bits contains the X coordinate and the
    upper 16 bits contains the Y coordinate.

    Bidirectional Motion Vector Cost Center Delta:
    ----------------------------------------------
    A pair of cost center deltas for the forward and backward images.

    Packed Bidirectional Motion Vector Cost Center Delta:
    -----------------------------------------------------
    A packed bidirectional motion vector cost center delta represented as
    a 64-bit unsigned integer. The lower 32-bits contain the forward
    packed CCD, and the upper 32-bits contain the backward packed CCD.

    Motion Vector Cost:
    -------------------
    The MV cost is determined using a cost function described by a cost
    table that is indexed based on power-of-two distances from the user
    specified cost center, with a user specified precision (or unit) of
    the distances from the cost center.

    U4U4 Byte Format:
    -----------------
    Represents a value of (B<<S), where B, called base, is the 4 bit
    LSB of the byte and S, called shift, is the 4 bit MSB of the byte.

    Motion Vector Cost Table:
    -------------------------
    A table which specifies the cost penalties at 8 control points. The
    first 7 control points represent the distances from cost center at
    powers-of-two locations (2^0 to 2^6), and the last control point
    represents the base penalty for distances that are out of range of the
    cost function curve. It is represented by a packed array of 8 U4U4
    unsigned integer values.

    Motion Vector Cost Precision:
    -----------------------------
    The precision (or unit) of the control points in the MV cost table.
    It can be used to control the precision and range of the cost
    function. It is represented by pre-defined cost precision enumeration
    values.

    Shape Cost:
    -----------
    The cost associated with encoding a particular partition shape using
    inter or intra prediction. It is represented by a packed array of 10
    U4U4 unsigned integer values.

    Distortion:
    -----------
    The distortion is the sum of SAD, MV cost, shape cost and multi-
    reference cost for inter estimation, and the sum of SAD, mode cost,
    shape cost and non-dc cost for intra estimation. It is a measure of
    the cost of encoding a block and is represented by an unsigned 16-bit
    integer value.

    Intra Mode:
    -----------
    An intra-prediction angle which provides a prediction for the current
    block from the edge pixels in its neighboring blocks. It is
    represented by pre-defined intra mode enumeration values.

    Intra Mode Cost:
    ----------------
    The cost associated with a computed intra mode for a block w.r.t a
    predicted intra mode based on the computed intra modes for its
    neighboring blocks.

    Mode:
    -----
    The decision whether the inter-prediction or intra-prediction
    minimizes distortion of a given MB.

    Search unit (SU):
    -----------------
    The basic unit of searching. Possible reference search locations are
    grouped in a predefined 4x4 pattern, and all locations within the same
    group must be completely chosen or completely skipped. These
    predefined groups are called search units.

    Search Path (SP):
    -----------------
    The path taken during searching in a reference window. The steps taken
    in a search path are in units of SUs. The search path must lie within
    the defined search window.

    Luma:
    -----
    Luma refers to either the Y-plane of a NV12 image or a regular image
    with the image_channel_order and image_data_type restricted as CL_R
    and CL_UNORM_INT8.

    Chroma:
    ------
    Chroma refer to the UV-plane of a NV12 image.

    Search Window (SW):
    -------------------
    The search area that will be covered during searching. The area of the
    search window is limited to 2K luma pixels.

    Search Window Configuration:
    ----------------------------
    The configuration of a search window which is a combination of the
    search path and search window.

    The pre-defined search window configurations are:

    +--------------+-----------------------------------------------------+
    | EXHAUSTIVE   | 48x40 SW with exhaustive single reference search (or|
    |              | 32x32 dual SW for exhaustive dual-reference search);|
    |              | an exhaustive search means that all SU within the   |
    |              | search window are searched in a spiral pattern with |
    |              | the search center being the middle of the search    |
    |              | window                                              |
    +--------------+-----------------------------------------------------+
    | SMALL        | 28x28 SW with exhaustive search                     |
    +--------------+-----------------------------------------------------+
    | TINY         | 24x24 SW with exhaustive search                     |
    +--------------+-----------------------------------------------------+
    | EXTRA TINY   | 20x20 SW with exhaustive search                     |
    +--------------+-----------------------------------------------------+
    | DIAMOND      | 48x40 SW with diamond single reference search (or   |
    |              | 32x32 dual SW for diamond dual-reference search); a |
    |              | diamond pattern search path is used for the first 16|
    |              | (or 7 per reference for dual reference search) SUs, |
    |              | and then gradient based searching is used for up to |
    |              | a maximum of 57 search unit.                        |
    +--------------+-----------------------------------------------------+
    | LARGE DIAMOND| 48x40 SW with large diamond single reference search |
    |              | (or 32x32 dual SW for large diamond dual-reference  |
    |              | search); a diamond pattern search pattern is used   |
    |              | for the first 32 (or 10 per reference for dual      |
    |              | reference search) SUs, and then gradient based      |
    |              | searching is used for up to a maximum of 57 search  |
    |              | units.                                              |
    +--------------+-----------------------------------------------------+

    Inter Estimation:
    -----------------
    The process of determining motion vectors and shapes that best
    describe the transformation from 2D images from previously decoded
    images in a video sequence to the currently processed image.

    Intra-Prediction Estimation (IPE):
    ----------------------------------
    The process of determining prediction angles and shapes that best
    describe the transformation from neighboring MBs in an image to the
    currently processed MB in the same image.

    Luma Mode:
    ----------
    The prediction angle returned by IPE for the luma component for a
    block. It is represented by an unsigned 8-bit integer with the upper 4
    bits set to zero.

    Integer Motion Estimation (IME):
    -------------------------------
    Inter-motion estimation in integer pixel resolution.

    Fractional Motion Estimation (FME):
    -----------------------------------
    Inter-motion estimation in sub-pixel resolution. The result of integer
    motion estimation on a reference image is used to perform fractional
    refinement.

    Bidirectional Motion Estimation (BME):
    --------------------------------------
    The process of determining if the bi-directional prediction minimizes
    the distortion w.r.t to unidirectional prediction. The results of IME
    on forward(L0) and backward(L1) reference images are used to perform
    bi-directional refinement. BME can be performed in integer or sub-
    pixel resolution. If performed in sub-pixel resolution an implicit FME
    operation is done before performing the BME.

    Refinement (REF):
    -----------------
    A FME and/or BME refinement operation.

    Skip/Spot Check (SKC):
    ----------------------
    The operation determining the distortion associated with a given
    (uni or bidirectional) MV in a reference image(s) w.r.t a source
    image.

    Skip and Intra Check (SIC):
    ---------------------------
    The process of performing both SKC and IPE in the same operation.

    Motion Check or Estimation (MCE):
    ---------------------------------
    A generic IME, REF, or SIC operation.

    Forward Transform (FT):
    -----------------------
    An 8x8 or 4x4 integer transform used to transform the residual to the
    frequency domain.
  "

  New OpenCL C Enums

    Adds the following enums to the OpenCL C language. The corresponding
    API enums have the same name with the prefix "CLK_" replaced with
    "CL_".

    These enums are used as arguments and return values of some of the
    VME built-in function parameters and return values described in the
    new section 6.13.14.X "VME built-in functions".


    Interlaced image field polarity values:
    ---------------------------------------
    #define CLK_AVC_ME_INTERLACED_SCAN_TOP_FIELD_INTEL    0x0
    #define CLK_AVC_ME_INTERLACED_SCAN_BOTTOM_FIELD_INTEL 0x1

    Inter macro-block major shape values:
    -------------------------------------
    #define CLK_AVC_ME_MAJOR_16x16_INTEL 0x0
    #define CLK_AVC_ME_MAJOR_16x8_INTEL  0x1
    #define CLK_AVC_ME_MAJOR_8x16_INTEL  0x2
    #define CLK_AVC_ME_MAJOR_8x8_INTEL   0x3

    Inter macro-block minor shape values:
    -------------------------------------
    #define CLK_AVC_ME_MINOR_8x8_INTEL 0x0
    #define CLK_AVC_ME_MINOR_8x4_INTEL 0x1
    #define CLK_AVC_ME_MINOR_4x8_INTEL 0x2
    #define CLK_AVC_ME_MINOR_4x4_INTEL 0x3

    Inter macro-block major direction values:
    -----------------------------------------
    #define CLK_AVC_ME_MAJOR_FORWARD_INTEL       0x0
    #define CLK_AVC_ME_MAJOR_BACKWARD_INTEL      0x1
    #define CLK_AVC_ME_MAJOR_BIDIRECTIONAL_INTEL 0x2

    Inter (IME) partition mask values:
    ----------------------------------
    #define CLK_AVC_ME_PARTITION_MASK_ALL_INTEL   0x0
    #define CLK_AVC_ME_PARTITION_MASK_16x16_INTEL 0x7E
    #define CLK_AVC_ME_PARTITION_MASK_16x8_INTEL  0x7D
    #define CLK_AVC_ME_PARTITION_MASK_8x16_INTEL  0x7B
    #define CLK_AVC_ME_PARTITION_MASK_8x8_INTEL   0x77
    #define CLK_AVC_ME_PARTITION_MASK_8x4_INTEL   0x6F
    #define CLK_AVC_ME_PARTITION_MASK_4x8_INTEL   0x5F
    #define CLK_AVC_ME_PARTITION_MASK_4x4_INTEL   0x3F

    Slice type values:
    ------------------
    #define CLK_AVC_ME_SLICE_TYPE_PRED_INTEL  0x0
    #define CLK_AVC_ME_SLICE_TYPE_BPRED_INTEL 0x1
    #define CLK_AVC_ME_SLICE_TYPE_INTRA_INTEL 0x2

    Search window configuration:
    ----------------------------
    #define CLK_AVC_ME_SEARCH_WINDOW_EXHAUSTIVE_INTEL    0x0
    #define CLK_AVC_ME_SEARCH_WINDOW_SMALL_INTEL         0x1
    #define CLK_AVC_ME_SEARCH_WINDOW_TINY_INTEL          0x2
    #define CLK_AVC_ME_SEARCH_WINDOW_EXTRA_TINY_INTEL    0x3
    #define CLK_AVC_ME_SEARCH_WINDOW_DIAMOND_INTEL       0x4
    #define CLK_AVC_ME_SEARCH_WINDOW_LARGE_DIAMOND_INTEL 0x5
    #define CLK_AVC_ME_SEARCH_WINDOW_RESERVED0_INTEL     0x6
    #define CLK_AVC_ME_SEARCH_WINDOW_RESERVED1_INTEL     0x7

    SAD adjustment mode:
    --------------------
    #define CLK_AVC_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0
    #define CLK_AVC_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x2

    Pixel resolution:
    -----------------
    #define CLK_AVC_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0
    #define CLK_AVC_ME_SUBPIXEL_MODE_HPEL_INTEL    0x1
    #define CLK_AVC_ME_SUBPIXEL_MODE_QPEL_INTEL    0x3

    Cost precision values:
    ----------------------
    #define CLK_AVC_ME_COST_PRECISION_QPEL_INTEL 0x0
    #define CLK_AVC_ME_COST_PRECISION_HPEL_INTEL 0x1
    #define CLK_AVC_ME_COST_PRECISION_PEL_INTEL  0x2
    #define CLK_AVC_ME_COST_PRECISION_DPEL_INTEL 0x3

    Inter bidirectional weights:
    ----------------------------
    #define CLK_AVC_ME_BIDIR_WEIGHT_QUARTER_INTEL       0x10
    #define CLK_AVC_ME_BIDIR_WEIGHT_THIRD_INTEL         0x15
    #define CLK_AVC_ME_BIDIR_WEIGHT_HALF_INTEL          0x20
    #define CLK_AVC_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL     0x2B
    #define CLK_AVC_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL 0x30

    Inter border reached values:
    ----------------------------
    #define CLK_AVC_ME_BORDER_REACHED_LEFT_INTEL   0x0
    #define CLK_AVC_ME_BORDER_REACHED_RIGHT_INTEL  0x2
    #define CLK_AVC_ME_BORDER_REACHED_TOP_INTEL    0x4
    #define CLK_AVC_ME_BORDER_REACHED_BOTTOM_INTEL 0x8

    Intra macro-block shape values:
    -------------------------------
    #define CLK_AVC_ME_INTRA_16x16_INTEL  0x0
    #define CLK_AVC_ME_INTRA_8x8_INTEL    0x1
    #define CLK_AVC_ME_INTRA_4x4_INTEL    0x2

    Inter skip block partition type:
    --------------------------------
    #define CLK_AVC_ME_SKIP_BLOCK_PARTITION_16x16_INTEL 0x0
    #define CLK_AVC_ME_SKIP_BLOCK_PARTITION_8x8_INTEL   0x04000

    Inter skip motion vector mask:
    ------------------------------
    #define CLK_AVC_ME_SKIP_BLOCK_16x16_FORWARD_ENABLE_INTEL   (0x1<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_16x16_BACKWARD_ ENABLE_INTEL (0x2<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_16x16_DUAL_ENABLE_INTEL      (0x3<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_FORWARD_ENABLE_INTEL     (0x55<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_BACKWARD_ENABLE_INTEL    (0xAA<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_DUAL_ENABLE_INTEL        (0xFF<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_0_FORWARD_ENABLE_INTEL   (0x1<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_0_BACKWARD_ENABLE_INTEL  (0x2<<24)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_1_FORWARD_ENABLE_INTEL   (0x1<<26)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_1_BACKWARD_ENABLE_INTEL  (0x2<<26)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_2_FORWARD_ENABLE_INTEL   (0x1<<28)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_2_BACKWARD_ENABLE_INTEL  (0x2<<28)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_3_FORWARD_ENABLE_INTEL   (0x1<<30)
    #define CLK_AVC_ME_SKIP_BLOCK_8x8_3_BACKWARD_ENABLE_INTEL  (0x2<<30)

    Block based skip type values:
    -----------------------------
    #define CLK_AVC_ME_BLOCK_BASED_SKIP_4x4_INTEL 0x0
    #define CLK_AVC_ME_BLOCK_BASED_SKIP_8x8_INTEL 0x80

    Luma intra partition mask values:
    ---------------------------------
    #define CLK_AVC_ME_INTRA_LUMA_PARTITION_MASK_ALL_INTEL   0x0
    #define CLK_AVC_ME_INTRA_LUMA_PARTITION_MASK_16x16_INTEL 0x6
    #define CLK_AVC_ME_INTRA_LUMA_PARTITION_MASK_8x8_INTEL   0x5
    #define CLK_AVC_ME_INTRA_LUMA_PARTITION_MASK_4x4_INTEL   0x3

    Intra neighbor availability mask values:
    ----------------------------------------
    #define CLK_AVC_ME_INTRA_NEIGHBOR_LEFT_MASK_ENABLE_INTEL        0x60
    #define CLK_AVC_ME_INTRA_NEIGHBOR_UPPER_MASK_ENABLE_INTEL       0x10
    #define CLK_AVC_ME_INTRA_NEIGHBOR_UPPER_RIGHT_MASK_ENABLE_INTEL 0x8
    #define CLK_AVC_ME_INTRA_NEIGHBOR_UPPER_LEFT_MASK_ENABLE_INTEL  0x4

    Luma intra modes:
    -----------------
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL            0x0
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL          0x1
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_DC_INTEL                  0x2
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL  0x3
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL 0x4
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL               0x4
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL      0x5
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL     0x6
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL       0x7
    #define CLK_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL       0x8

    Chroma intra modes:
    -------------------
    #define CLK_AVC_ME_CHROMA_PREDICTOR_MODE_DC_INTEL         0x0
    #define CLK_AVC_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1
    #define CLK_AVC_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL   0x2
    #define CLK_AVC_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL      0x3

    Reference image select values:
    ------------------------------
    #define CLK_AVC_ME_FRAME_FORWARD_INTEL  0x1
    #define CLK_AVC_ME_FRAME_BACKWARD_INTEL 0x2
    #define CLK_AVC_ME_FRAME_DUAL_INTEL     0x3

    VME media sampler type default initialization literal:
    ------------------------------------------------------
    Pre-defined enumeration CLK_AVC_ME_INITIALIZE_INTEL.

    IME payload type default initialization literal:
    ------------------------------------------------
    Pre-defined enumeration CLK_AVC_IME_PAYLOAD_INITIALIZE_INTEL.

    REF payload type default initialization literal:
    ------------------------------------------------
    Pre-defined enumeration CLK_AVC_REF_PAYLOAD_INITIALIZE_INTEL.

    SIC payload type default initialization literal:
    ------------------------------------------------
    Pre-defined enumeration CLK_AVC_SIC_PAYLOAD_INITIALIZE_INTEL.

    IME result type default initialization literal:
    -----------------------------------------------
    Pre-defined enumeration CLK_AVC_IME_RESULT_INITIALIZE_INTEL.

    IME result single reference streamout type default initialization literal:
    --------------------------------------------------------------------------
    Pre-defined enumeration CLK_AVC_IME_RESULT_SINGLE_REFERENCE_STREAMOUT_INITIALIZE_INTEL.

    IME result dual reference streamout type default initialization literal:
    ------------------------------------------------------------------------
    Pre-defined enumeration CLK_AVC_IME_RESULT_DUAL_REFERENCE_STREAMOUT_INITIALIZE_INTEL.

    IME result single reference streamin type default initialization literal:
    -------------------------------------------------------------------------
    Pre-defined enumeration CLK_AVC_IME_RESULT_SINGLE_REFERENCE_STREAMIN_INITIALIZE_INTEL.

    IME result dual reference streamin type default initialization literal:
    -----------------------------------------------------------------------
    Pre-defined enumeration CLK_AVC_IME_RESULT_DUAL_REFERENCE_STREAMIN_INITIALIZE_INTEL.

    REF result type default initialization literal:
    -----------------------------------------------
    Pre-defined enumeration CLK_AVC_REF_RESULT_INITIALIZE_INTEL.

    SIC result type default initialization literal:
    -----------------------------------------------
    Pre-defined enumeration CLK_AVC_SIC_RESULT_INITIALIZE_INTEL.

 New OpenCL C Types

    Adds the following built-in types to the OpenCL C language. Appends
    the following built-in type definitions to Table 6.3 in section 6.1.3
    "Other Built-in Data Types".

    These built-in types are used to declare the types of some of the VME
    built-in function parameters and return values in the new section
    6.13.14.X "VME built-in functions".

    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_mce_payload_t |This is a parameter and/or      |
    |                                   |return type of a generic AVC MCE|
    |                                   |operation. It is used in MCE    |
    |                                   |built-in functions and          |
    |                                   |represents the payload for a    |
    |                                   |basic IME/REF/SIC operation.    |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ime_payload_t |This is a parameter and/or      |
    |                                   |return type of a basic AVC IME  |
    |                                   |operation. It is used in IME    |
    |                                   |built-in functions and          |
    |                                   |represents the payload for an   |
    |                                   |IME operation.                  |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ref_payload_t |This is a parameter and/or      |
    |                                   |return type of an AVC REF       |
    |                                   |operation. It is used in REF    |
    |                                   |built-in functions and          |
    |                                   |represents the payload for a REF|
    |                                   |operation.                      |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_sic_payload_t |This is a parameter and/or      |
    |                                   |return type of an AVC SIC       |
    |                                   |operation. It is used in SIC    |
    |                                   |built-in functions and          |
    |                                   |represents the payload for a SIC|
    |                                   |operation.                      |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_mce_result_t  |This is a parameter and/or      |
    |                                   |return type of a MCE operation. |
    |                                   |It is used in MCE built-in      |
    |                                   |functions and represents the    |
    |                                   |evaluation result for a basic   |
    |                                   |IME/REF/SIC operation.          |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ime_result_t  |This is a parameter and/or      |
    |                                   |return type of a basic AVC IME  |
    |                                   |operation not using the stream- |
    |                                   |in/streamout functionality.It is|
    |                                   |used in IME built-in functions  |
    |                                   |and represents the result for a |
    |                                   |basic IME evaluation operation. |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ime_result_   |This is a parameter and/or      |
    | single_reference_streamout_t      |return type of an AVC single    |
    |                                   |reference IME operation using   |
    |                                   |the streamout functionality. It |
    |                                   |is used in specific IME built-  |
    |                                   |in functions and represents the |
    |                                   |evaluation result of an IME     |
    |                                   |streamout operation.            |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ime_result_   |This is a parameter and/or      |
    | dual_reference_streamout_t        |return type of an AVC dual      |
    |                                   |reference IME operation using   |
    |                                   |the streamout functionality. It |
    |                                   |is used in specific IME built-in|
    |                                   |functions and represents the    |
    |                                   |evaluation result for an IME    |
    |                                   |streamout operation.            |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ime_          |This is a parameter and/or      |
    | single_reference_streamin_t       |return type of an AVC single IME|
    |                                   |operation using the stream-in   |
    |                                   |reference functionality. It is  |
    |                                   |used in specific IME built-in   |
    |                                   |functions. It represents the    |
    |                                   |additional results from the     |
    |                                   |result of an IME evaluation     |
    |                                   |using the streamout             |
    |                                   |functionality that may be       |
    |                                   |streamed-in in a subsequent IME |
    |                                   |streamin call.                  |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ime_          |This is a parameter and/or      |
    | dual_reference_streamin_t         |return type of an AVC dual      |
    |                                   |reference IME operation using   |
    |                                   |the streamin functionality. It  |
    |                                   |is used in specific IME built-  |
    |                                   |in functions. It represents the |
    |                                   |additional results from the     |
    |                                   |result of an IME evaluation     |
    |                                   |using the streamout             |
    |                                   |functionality that may be       |
    |                                   |streamed-in in a subsequent IME |
    |                                   |call.                           |
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_ref_result_t  |This is a parameter and or/     |
    |                                   |return type of an AVC REF       |
    |                                   |operation. It is used in REF    |
    |                                   |built-in functions and          |
    |                                   |represents the evaluation result|
    |                                   |from a REF evaluation operation.|
    +-----------------------------------+--------------------------------+
    | intel_sub_group_avc_sic_result_t  |This is a parameter and/or      |
    |                                   |return type of an AVC SKC, IPE, |
    |                                   |or SIC operation. It is used in |
    |                                   |SIC built-in functions and      |
    |                                   |represents the evaluation result|
    |                                   |from a SIC evaluation operation.|
    +-----------------------------------+--------------------------------+

    Amends the following built-in type definition in Table 6.3 in section
    6.1.3 "Other Built-in Data Types".

    +---------+----------------------------------------------------------+
    |sampler_t|A texture or media sampler type. Refer to section 6.13.14 |
    |         |for a detailed description the built-in functions that use|
    |         |of this type.                                             |
    |         |                                                          |
    |         |In general, the use of the word "sampler" by itself refers|
    |         |to a texture sampler. A media sampler will be explicitly  |
    |         |refered to as "media sampler".                            |
    |         |                                                          |
    |         |In general, sampler_t parameters of built-in functions    |
    |         |refer to texture sampler parameter. Media sampler         |
    |         |parameters are exclusively used in the VME built-in       |
    |         |functions (described in new section 6.13.14.X "VME        |
    |         |built-in functions").                                     |
    +---------+----------------------------------------------------------+


    Amends the restrictions defined for sampler types in (b) section 6.9
    "Restrictions".

    "...
     The texture sampler type (sampler_t) can only be used as the type of
     a function argument or a variable declared in the program scope or
     the outermost scope of a kernel function. The behavior of a sampler
     variable declared in a non-outermost scope of a kernel function is
     implementation-defined. The media sampler type cannot be used as a
     function argument of a kernel function, but otherwise can be used as
     the type of a function argument or a variable declared in any scope.
     ... "

    Amends the following section 6.13.14.1 "Samplers".

    Replace all strings "sampler(s)" with "texture sampler(s)".

    Appends the following paragraphs in section 6.13.14.1 "Samplers".

    "The VME evaluation phase built-in functions (described in new section
    6.13.14.X "VME built-in functions" take a VME media sampler argument.
    The media sampler can be declared as a constant variable in any scope,
    and must be initialized with the 32-bit unsigned integer constant
    CLK_AVC_ME_INITIALIZE_INTEL."

    Note that VME media samplers declared using the constant qualifier are
    not counted towards the maximum size of constant address space allowed
    per device (i.e. CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE as described in
    table 4.3)

    Amends the following section 6.13.14.2 "Built-in Image Read
    Functions".

    "The following built-in function calls to read images with a texture
     sampler are supported."

    Amends the following section 6.13.14.3 "Built-in Image Sampler-less
    Read Functions".

    "The sampler-less read image functions behave exactly as the
     corresponding read image functions described in section 6.13.14.2
     that take integer coordinates and a texture sampler with filter mode
     set to CLK_FILTER_NEAREST, normalized coordinates set to
     CLK_NORMALIZED_COORDS_FALSE and addressing mode to CLK_ADDRESS_NONE"


 New OpenCL C built-in functions

    Appends the following to the new section 6.13.14.X "VME built-in
    functions".

    "
    The OpenCL C programming language implements the following built-in
    functions to evaluate VME operations as a block operation by all work-
    items in a subgroup. Furthermore they are defined only for subgroup
    size of 16, and thus using these built-in functions in a kernel will
    force a subgroup size of 16. The use of kernel attribute
    intel_reqd_sub_group_size with a value other than 16 as described in
    the Intel vendor extension cl_intel_required_subgroup_size is not
    allowed.

    These built-in functions are defined to operate on a 16x16 MB and must
    be encountered by all work items in a subgroup executing the kernel,
    otherwise the behavior is undefined (i.e. they can only be used only
    in convergent control flow where all the work-items in the subgroups
    are enabled).

    ______________________________________________________________________


    If VME built-in functions are used in an OpenCL kernel, then the
    following additional requirements must be satisfied.

    1. The VME source and reference images must either be 8-bit luma
       2D images, or planar YUV images with NV12 format. If chroma based
       intra estimation operations are being performed, then NV12 source
       images are required, otherwise either 8-bit luma or NV12 images may
       be used. For 8-bit luma images, the image_channel_order and the
       image_data_type are restricted as CL_R and CL_UNORM_INT8
       respectively. NV12 images are described in the Intel vendor
       extension cl_intel_planar_yuv. Images created from buffers are not
       allowed as VME source and reference images.
    2. OpenCL image kernel parameters that are used as VME source and
       reference images in the VME evaluation built-in functions must
       exclusively be used only by VME built-in functions. If the same
       image needs to be used for other image operations within the same
       kernel, then an additional image parameter may be used that is
       bound to the same image object as for the VME image parameter in
       the OpenCL host application for the kernel enqueue API call.
    3. OpenCL image kernel parameters that are used as VME source and
       reference images in the VME evaluation built-in functions must be
       declared in a specific order in the kernel parameter list of a
       kernel that calls any of the VME built-in functions as shown below;
       otherwise it will result in a complier reported error.

          +-----------------------------------------------+
          |   __kernel void vme_custom(                   |
          |      read_only image2d_t src_image,           |
          |      read_only image2d_t fwd0_ref_image,      |
          |      read_only image2d_t bwd0_ref_image,      |
          |      read_only image2d_t fwd1_ref_image,      |
          |      read_only image2d_t bwd1_ref_image,      |
          |      ...                                      |
          |      read_only image2d_t fwd15_ref_image,     |
          |      read_only image2d_t bwd15_ref_image,     |
          |      ...)                                     |
          +-----------------------------------------------+

       The first forward(L0) reference image parameter must immediately
       follow the source image parameter, and the first backward(L1)
       reference image parameter must immediately follow the first forward
       reference image parameter. Up to 16 consecutive pairs of forward
       and backward reference image parameters can be provided, with the
       reference images closer to the source image in timing order
       appearing earlier in the parameter list.
    ______________________________________________________________________


    The VME built-in functions are organized into categories of functions
    based on the major VME operations as follows:

    1. MCE built-in functions
    --------------------------
    A set of generic built-in functions which are common across IME, REF,
    or SIC operations. Certain functions may not be applicable for some
    specific operation as indicated in their descriptions.

    2. IME built-in functions
    -------------------------
    A set of built-in functions to initialize, configure and evaluate an
    integer motion estimation result.

    3. REF built-in functions
    -------------------------
    A set of built-in functions to initialize, configure and evaluate a
    fractional and/or bidirectional refinement operation on the results
    of IME built-in functions.

    4. SIC built-in functions
    -------------------------
    A set of built-in functions to initialize, configure and evaluate a
    skip check or intra estimation operation result.
    ______________________________________________________________________


    A set of ordered phases of built-in functions need to be called to
    evaluate each category of built-in functions. Some phases may not be
    specified for certain function categories, but if present it must be
    called in the correct order, otherwise the behavior is undefined.

    1. Initialization (required phase)
    ---------------------------------
    This creates and initializes the payload for the VME operation and
    may perform some basic configuration that invariably must be set.

    2. Operation configuration (required phase, if present)
    ------------------------------------------------------
    This configures important sets of parameters for the VME operation
    that were not set in the initialization phase in the payload.

    3. Inter reference frame cost configuration (optional phase)
    -----------------------------------------------------------
    This configures the cost assigned to matching blocks in reference
    frames based on timing distance from the source frame.

    4. Inter motion vector/Intra mode cost configuration (optional phase)
    --------------------------------------------------------------------
    This configures the inter motion vector cost function or the intra
    mode cost function to be used for the VME operation in the payload.

    5. Inter/Intra shape cost configuration (optional phase)
    -------------------------------------------------------
    This configures the inter/intra shape cost penalty to be used for the
    VME operation in the payload.

    6. Miscellaneous property configuration (optional phase)
    -------------------------------------------------------
    This configures miscellaneous operation sub-functions in the VME
    operation payload.

    7. Evaluation (required phase)
    -----------------------------
    This phase performs the evaluation of the VME operation using the
    payload configured in the previous phases by issuing it to the VME
    unit for processing.

    8. Result processing (required phase)
    ------------------------------------
    The phase performs the extraction of various motion estimation
    operation component results from the result of the evaluation phase.
    The result components may be distributed among multiple work-items.
    The results components of one VME operation may be used as input to
    another VME operation at phase (1) or phase (7).

    A set of built-in functions are defined for each of the major VME
    operations of IME, REF, and SIC (SKC and IPE). These are categorized
    as IME, REF, and SIC built-in functions. Functions that are common
    across all categories are classified as MCE (motion check and
    estimation) functions. Several restrictions for function argument
    values are stated in the descriptions of the built-in function
    descriptions. In general, if the restrictions are not satisfied the
    behavior is undefined, unless otherwise stated.

    ______________________________________________________________________


    MCE built-in functions
    ----------------------

    A set of generic MCE operations which may be called for IME, REF, or
    SIC operations with the restrictions as stated in their descriptions.
    They can be called only during specific phases of these operations as
    indicated in the description of the built-in functions.

    Multi-reference cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++

    These built-in functions enable multi-reference image costing. They
    allow for the configuration of the payloads to favorably bias the
    major partitions coming from reference images that are closer to the
    source image, than the ones coming from reference images that are
    further away. The distance of the reference images, in the timing
    order, from the source image is implied based on the order in which
    the reference images are declared in the kernel parameter list.

    +-------------------------------------+-----------------------------+
    |uchar                                |Get the default base         |
    |intel_sub_group_avc_mce_get_default_ |multi-reference cost penalty |
    |inter_base_multi_reference_penalty(  |in U4U4 format when HW       |
    |   uchar slice_type,                 |assisted multi-reference     |
    |   uchar qp )                        |search is used.              |
    |                                     |                             |
    |                                     |The value of slice_type must |
    |                                     |be a valid slice type        |
    |                                     |enumeration value.           |
    |                                     |                             |
    |                                     |The value of qp must be a    |
    |                                     |valid quantization parameter |
    |                                     |value between 0 and 51.      |
    +-------------------------------------+-----------------------------+
    |intel_sub_group_avc_mce_payload_t    |Set the multi-reference base |
    |intel_sub_group_avc_mce_set_inter_   |penalty when HW assisted     |
    |base_multi_reference_penalty(        |multi-reference search is    |
    |   uchar reference_base_penalty,     |performed.                   |
    |   intel_sub_group_avc_mce_payload_t |                             |
    |      payload )                      |Reference major partitions   |
    |                                     |get associated with a penalty|
    |                                     |based on its distance from   |
    |                                     |the source image. The        |
    |                                     |reference_base_penalty is    |
    |                                     |scaled using a scaling factor|
    |                                     |based on the implied distance|
    |                                     |of the reference image from  |
    |                                     |the source image as shown    |
    |                                     |below.                       |
    |                                     |                             |
    |                                     |   0       => 0x             |
    |                                     |   1 to 2  => 1x             |
    |                                     |   3 to 6  => 2x             |
    |                                     |   7 to 15 => 3x             |
    |                                     |                             |
    |                                     |The value of                 |
    |                                     |reference_base_penalty is    |
    |                                     |specified in U4U4 format and |
    |                                     |the integer value must fit   |
    |                                     |within 12 bits.              |
    +-------------------------------------+-----------------------------+

    Inter shape and direction cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These built-in functions enable shape costing for inter estimation.
    They allow for the configuration of payloads for the biasing of
    certain shapes over others based on the configured parameters.

    +------------------------------------+-------------------------------+
    |ulong intel_sub_group_avc_mce_get_  |Get the default packed shape   |
    |default_inter_shape_penalty(        |cost for inter estimation in   |
    |   uchar slice_type,                |U4U4 format.                   |
    |   uchar qp )                       |                               |
    |                                    |The value of slice_type must be|
    |                                    |a valid slice type enumeration |
    |                                    |value.                         |
    |                                    |                               |
    |                                    |The value of qp must be a valid|
    |                                    |quantization parameter value   |
    |                                    |between 0 and 51.              |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_mce_payload_t   |Set the shape penalty for inter|
    |intel_sub_group_avc_mce_set_inter_  |motion estimation.             |
    |shape_penalty(                      |                               |
    |   ulong packed_shape_penalty,      |The value of packed_shape_cost |
    |   intel_sub_group_avc_mce_payload_t|is an unsigned long integer    |
    |      payload )                     |value with the following bits  |
    |                                    |specifying the shape penalty in|
    |                                    |U4U4 format as follows:        |
    |                                    |                               |
    |                                    |   7:0   => 16x8 and 8x16      |
    |                                    |   15:8  => 8x8                |
    |                                    |   23:16 => 8x4 and 4x8        |
    |                                    |   31:24 => 4x4                |
    |                                    |   39:32 => 16x16              |
    |                                    |   63:40 => must be zero       |
    |                                    |                               |
    |                                    |The U4U4 decoded integer values|
    |                                    |for byte 0 and byte 4 must bit |
    |                                    |fit in 12 bits, while the U4U4 |
    |                                    |decoded integer values for the |
    |                                    |other bytes must fit within 10 |
    |                                    |bits.                          |
    +------------------------------------+-------------------------------+
    |uchar intel_sub_group_avc_mce_get_  |Get the default direction      |
    |default_inter_direction_penalty(    |penalty for inter estimation in|
    |   uchar slice_type,                |U4U4 format.                   |
    |   uchar qp )                       |                               |
    |                                    |The value of slice_type must be|
    |                                    |a valid slice type enumeration |
    |                                    |value.                         |
    |                                    |                               |
    |                                    |The value of qp must be a valid|
    |                                    |quantization parameter value   |
    |                                    |between 0 and 51.              |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_mce_payload_t   |Set the direction penalty for  |
    |intel_sub_group_avc_mce_set_        |backward images used in inter  |
    |inter_direction_penalty(            |motion estimation.             |
    |   uchar direction_cost,            |                               |
    |   intel_sub_group_avc_mce_payload_t|The value of direction_cost    |
    |      payload )                     |must be in U4U4 format and its |
    |                                    |decoded integer value must bit |
    |                                    |fit in 12 bits.                |
    +------------------------------------+-------------------------------+

    Intra shape cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++

    These built-in functions enable shape costing for intra estimation.
    They allow for the configuration of payloads for biasing of certain
    shapes over others based on the configured parameters. Only the built-
    in function providing the default shape penalty is specified as an MCE
    function. The function which actually configures the payload for the
    intra estimation operation is specified as a SIC built-in function.

    +----------------------------------+--------------------------------+
    |uint intel_sub_group_avc_mce_get_ |Get the default packed luma     |
    |default_intra_luma_shape_penalty( |intra shape penalty in U4U4     |
    |   uchar slice_type,              |format.                         |
    |   uchar qp )                     |                                |
    +----------------------------------+--------------------------------+

    Inter motion vector cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These built-in functions enable motion vector costing for inter
    estimation. The distortion measure is augmented to favor motion
    vectors closer to the cost-center considered in conjunction with the
    primary objective of minimizing the SAD between the source and
    reference blocks.

    +------------------------------------+-------------------------------+
    |uint2 intel_sub_group_avc_mce_get_  |Get the default inter motion   |
    |default_inter_motion_vector_        |vector cost table for the      |
    |cost_table(                         |pre-defined control points in  |
    |   uchar slice_type,                |U4U4 format for the input Qp   |
    |   uchar qp )                       |and slice type.                |
    |                                    |                               |
    |                                    |The value of slice_type must be|
    |                                    |a valid slice type enumeration |
    |                                    |value.                         |
    |                                    |                               |
    |                                    |The value of qp must be a valid|
    |                                    |quantization parameter value   |
    |                                    |between 0 and 51.              |
    +------------------------------------+-------------------------------+
    |uint2 intel_sub_group_avc_mce_get_  |Get the default predefined     |
    |default_high_penalty_cost_table(    |packed U4U4 format high cost   |
    |   void)                            |table for high Qp. This may be |
    |                                    |more appropriate for frame     |
    |                                    |sequences with high motion.    |
    +------------------------------------+-------------------------------+
    |uint2 intel_sub_group_avc_mce_get_  |Get the default predefined     |
    |default_medium_penalty_cost_table(  |packed U4U4 format medium cost |
    |   void)                            |table for medium Qp. This may  |
    |                                    |be more appropriate for frame  |
    |                                    |sequences with normal motion.  |
    +------------------------------------+-------------------------------+
    |uint2 intel_sub_group_avc_mce_get_  |Get the default predefined     |
    |default_low_penalty_cost_table(     |packed U4U4 format low cost    |
    |   void)                            |table for low Qp. This may be  |
    |                                    |more appropriate for frame     |
    |                                    |sequences with low motion.     |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_mce_payload_t   |Update the input payload to set|
    |intel_sub_group_avc_mce_set_        |the cost precision along with  |
    |motion_vector_cost_function(        |the cost center and cost table |
    |   ulong packed_cost_center_delta,  |and return it.                 |
    |   uint2 packed_cost_table,         |                               |
    |   uchar cost_precision,            |The value of                   |
    |   intel_sub_group_avc_mce_payload_t|packed_cost_center_delta is the|
    |      payload )                     |packed bidirectional cost      |
    |                                    |center delta value relative to |
    |                                    |the source macroblock, which   |
    |                                    |specifies the 4 bidirectional  |
    |                                    |cost centers of each of the 8x8|
    |                                    |partitions of the reference    |
    |                                    |image.  If only unidirectional |
    |                                    |search is performed then the   |
    |                                    |values of the backward         |
    |                                    |reference cost centers must be |
    |                                    |zero. Work-item 'n' provides   |
    |                                    |the value of cost center       |
    |                                    |'n'. It is specified in QPEL   |
    |                                    |units. For 16x16 partitions    |
    |                                    |work-item '0' provides the cost|
    |                                    |center. For 8x16 partitions    |
    |                                    |work-items '0' and '1' provide |
    |                                    |the cost centers. For 16x8     |
    |                                    |partitions work-items '0' and  |
    |                                    |'2' provide the cost           |
    |                                    |centers. The X and Y           |
    |                                    |coordinates of each cost center|
    |                                    |delta must be in the range     |
    |                                    |[-2048, 2047] and [-512.00 to  |
    |                                    |511.75] respectively, otherwise|
    |                                    |the results are undefined.     |
    |                                    |                               |
    |                                    |The packed cost table specifies|
    |                                    |the cost penalties for         |
    |                                    |pre-defined control points in  |
    |                                    |U4U4 format in the cost        |
    |                                    |function curve. The first 7    |
    |                                    |bytes specify 7 control points |
    |                                    |representing consecutive       |
    |                                    |powers-of-two delta units (2^0 |
    |                                    |to 2^6).  Each delta unit, dx, |
    |                                    |is the distance of a motion    |
    |                                    |vector, mv, from the specified |
    |                                    |cost center, cc                |
    |                                    |(dx=|mv-cc|). The cost penalty |
    |                                    |values at in-between control   |
    |                                    |points are linearly            |
    |                                    |interpolated. The range of the |
    |                                    |cost function is defined to be |
    |                                    |from 2^0 to 2^6 delta          |
    |                                    |units. The 8th byte of the     |
    |                                    |packed cost table specifies the|
    |                                    |penalty base factor (over_cost)|
    |                                    |for dx distances that are      |
    |                                    |out-of-range. The penalty of   |
    |                                    |out-of-range cost dx distances |
    |                                    |is computed as min(over_cost + |
    |                                    |int(dx) - 64, 255).            |
    |                                    |                               |
    |                                    |The value of the cost precision|
    |                                    |parameter must be a valid cost |
    |                                    |precision enumeration value,   |
    |                                    |and specifies the precision of |
    |                                    |the delta units from the cost  |
    |                                    |center, dx. This effectively   |
    |                                    |can be used to control the     |
    |                                    |range of the cost function as  |
    |                                    |follows:                       |
    |                                    |                               |
    |                                    |PRECISION DELTAS PIXEL RANGE   |
    |                                    |--------- ------ -----------   |
    |                                    |                               |
    |                                    |PEL       pixel   0-64         |
    |                                    |DPEL      dual    0-127        |
    |                                    |          pixel                |
    |                                    |HALF      half    0-31         |
    |                                    |          pixel                |
    |                                    |QUARTER   quarter 0-15         |
    |                                    |          pixel                |
    |                                    |                               |
    |                                    |The inter distortion for a     |
    |                                    |block can be described by the  |
    |                                    |following formula:             |
    |                                    |                               |
    |                                    |Distortion =                   |
    |                                    |   SAD (or HAAR) +             |
    |                                    |   MV_Cost_Penalty +           |
    |                                    |   Shape_Penalty +             |
    |                                    |   Direction_Cost +            |
    |                                    |   Multi_Reference_Penalty     |
    +------------------------------------+-------------------------------+

    Intra mode cost configuration phase functions
    +++++++++++++++++++++++++++++++++++++++++++++

    These built-in functions enable mode costing for intra estimation.
    They allow for the configuration of payloads to bias the computed
    intra modes to be closer to their configured neighbor modes. This form
    of costing is similar to the inter motion vector costing. Only the
    built-in functions providing the defaults mode costs are specified as
    MCE functions. The remaining functions which actually configure the
    payload for the intra estimation operation is specified as SIC built-
    in functions.

    +----------------------------------+---------------------------------+
    |uchar intel_sub_group_avc_mce_get_|Get the default intra mode cost  |
    |default_intra_luma_mode_penalty(  |penalty for in U4U4 format when  |
    |   uchar slice_type,              |the estimated mode differs from  |
    |   uchar qp )                     |its predicted mode from its      |
    |                                  |neighbors.                       |
    |                                  |                                 |
    |                                  |The value of slice_type must be a|
    |                                  |valid slice type enumeration     |
    |                                  |value.                           |
    |                                  |                                 |
    |                                  |The value of qp must be a valid  |
    |                                  |quantization parameter value     |
    |                                  |between 0 and 51.                |
    +----------------------------------+---------------------------------+
    |uint intel_sub_group_avc_mce_get_ |Get the default intra non-dc cost|
    |default_non_dc_luma_intra_penalty(|penalty for intra luma estimation|
    |   void )                         |in packed 32-bit integer format. |
    +----------------------------------+---------------------------------+
    |uchar intel_sub_group_avc_mce_get_|Get the default chroma mode base |
    |default_intra_chroma_mode_base_   |penalty in U4U4 format.          |
    |penalty(                          |                                 |
    |   void )                         |                                 |
    +----------------------------------+---------------------------------+

    Miscellaneous property configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++

    These built-in functions enable miscellaneous MCE properties settings.

    +--------------------------------------+-----------------------------+
    |intel_sub_group_avc_mce_payload_t     |Update the input payload to  |
    |intel_sub_group_avc_mce_set_          |enable an AC only HAAR SAD   |
    |ac_only_haar(                         |mode and return it. It       |
    |   intel_sub_group_avc_mce_payload_t  |overrides any previous       |
    |       payload )                      |setting for sad adjustment.  |
    |                                      |                             |
    |                                      |This feature is mainly       |
    |                                      |intended for improved block  |
    |                                      |matching in frame-rate       |
    |                                      |conversion (FRC) kernels.    |
    +--------------------------------------+-----------------------------+
    |intel_sub_group_avc_mce_payload_t     |Update the input payload to  |
    |intel_sub_group_avc_mce_set_source_   |specify the field polarities |
    |interlaced_field_polarity(            |for interlaced source images |
    |   uchar src_field_polarity,          |used for inter or intra      |
    |   intel_sub_group_avc_mce_payload_t  |operations.                  |
    |      payload )                       |                             |
    |                                      |The value of                 |
    |                                      |src_field_polarity must be a |
    |                                      |valid field polarity         |
    |                                      |enumeration value indicating |
    |                                      |the field polarity for the   |
    |                                      |source image.                |
    +--------------------------------------+-----------------------------+
    |intel_sub_group_avc_mce_payload_t     |Update the input payload to  |
    |intel_sub_group_avc_mce_set_single_   |specify the field polarities |
    |reference_interlaced_field_polarity(  |for interlaced reference     |
    |   uchar ref_field_polarity,          |images used for single       |
    |   intel_sub_group_avc_mce_payload_t  |reference inter search or    |
    |      payload )                       |check operation.             |
    |                                      |                             |
    |                                      |The value of                 |
    |                                      |ref_field_polarity must be a |
    |                                      |valid field polarity         |
    |                                      |enumeration value indicating |
    |                                      |the field polarity for the   |
    |                                      |reference image.             |
    +--------------------------------------+-----------------------------+
    |intel_sub_group_avc_mce_payload_t     |Update the input payload to  |
    |intel_sub_group_avc_mce_set_dual_     |specify the field polarities |
    |reference_interlaced_field_polarities(|for interlaced reference     |
    |   uchar fwd_ref_field_polarity,      |images used for dual         |
    |   uchar bwd_ref_field_polarity,      |reference inter search or    |
    |   intel_sub_group_avc_mce_payload_t  |check operation.             |
    |      payload )                       |                             |
    |                                      |The value of                 |
    |                                      |fwd_ref_field_polarity must  |
    |                                      |be a valid field polarity    |
    |                                      |enumeration value indicating |
    |                                      |the field polarity for the   |
    |                                      |forward image.               |
    |                                      |                             |
    |                                      |The value of                 |
    |                                      |bwd_ref_field_polarity must  |
    |                                      |be a valid field polarity    |
    |                                      |enumeration value indicating |
    |                                      |the field polarity for the   |
    |                                      |backward image.              |
    +--------------------------------------+-----------------------------+

    Result processing phase functions
    +++++++++++++++++++++++++++++++++

    These built-in functions facilitate the extraction of components of
    the result from VME unit.

    +-----------------------------------+--------------------------------+
    |ulong intel_sub_group_avc_mce_get_ |Get the MCE packed BMVs result. |
    |motion_vectors(                    |                                |
    |   intel_sub_group_avc_mce_result_t|Up to 16 packed BMVs are        |
    |      result )                     |returned, one per work-item. If |
    |                                   |the MCE search operation's      |
    |                                   |payload was setup for           |
    |                                   |unidirectional search then only |
    |                                   |the forward packed MV will be   |
    |                                   |valid in each BMV, otherwise    |
    |                                   |both packed MVs will be         |
    |                                   |valid. The BMVs have to be      |
    |                                   |selected by their respective    |
    |                                   |work-items based on the result  |
    |                                   |block major and minor shapes.   |
    |                                   |                                |
    |                                   |If the major shape is:          |
    |                                   |   - 16x16, then one BMV is     |
    |                                   |     returned by work-item 0    |
    |                                   |   - 16x8, or 8x16, then two    |
    |                                   |     BMVs are returned by work- |
    |                                   |     items 0 and 8              |
    |                                   |   - 8x8, then four sets of BMVs|
    |                                   |     corresponding to the four  |
    |                                   |     partitions in traditional  |
    |                                   |     Z-order are returned by    |
    |                                   |     work-items in the ranges   |
    |                                   |     [0, 3], [4, 7], [8, 11],   |
    |                                   |     and [12, 15]; the minor    |
    |                                   |     shape will determine       |
    |                                   |     exactly which work-items   |
    |                                   |     in the reserved inclusive  |
    |                                   |     range for the partition    |
    |                                   |     returns the BMVs for that  |
    |                                   |     partition                  |
    |                                   |                                |
    |                                   |If the range of work-items for  |
    |                                   |the 8x8 major partition is [n,  |
    |                                   |n+3] and the minor shape is:    |
    |                                   |   - 8x8, then work-item 'n'    |
    |                                   |     returns the BMV for each   |
    |                                   |     minor partition            |
    |                                   |   - 8x4 or 4x8, then work-items|
    |                                   |     'n' and 'n+2' returns the  |
    |                                   |     BMVs for each minor        |
    |                                   |     partition                  |
    |                                   |   - 4x4, then all work-items in|
    |                                   |     [n, n+3] return the BMVs   |
    |                                   |     for each minor partition   |
    |                                   |     in traditional Z-order     |
    |                                   |                                |
    |                                   |NOTES:                          |
    |                                   |                                |
    |                                   |(1) All sub-block BMVs get      |
    |                                   |replicated for each partition.  |
    |                                   |For example, for a 16x16        |
    |                                   |partition, all smaller sub-block|
    |                                   |BMVs are replicated to the same |
    |                                   |BMV, and for 8x8 partition, each|
    |                                   |8x8 must have its respective    |
    |                                   |sub-block BMVs replicated. This |
    |                                   |is not important to extract the |
    |                                   |component BMVs itself, but is   |
    |                                   |needed if the result of this    |
    |                                   |function is used to initialize  |
    |                                   |the input motion vectors of a   |
    |                                   |REF initialization function.    |
    |                                   |                                |
    |                                   |(2) With interlaced images, the |
    |                                   |MBs for the top field MBs are   |
    |                                   |considered as logically         |
    |                                   |overlapping with the bottom MBs.|
    +-----------------------------------+--------------------------------+
    |ushort intel_sub_group_avc_mce_get_|Get the MCE inter distortions   |
    |inter_distortions(                 |result corresponding to the BMVs|
    |   intel_sub_group_avc_mce_result_t|returned by intel_sub_group_avc_|
    |      result )                     |mce_get_motion_vectors(..). The |
    |                                   |MCE inter directions result     |
    |                                   |returned by intel_sub_group_avc_|
    |                                   |mce_get_inter_directions(..)    |
    |                                   |will specify if the distortion  |
    |                                   |corresponds to the forward MV,  |
    |                                   |backward MV, or the             |
    |                                   |bidirectional MV in the BMV. Up |
    |                                   |to 16 distortions are returned, |
    |                                   |one per work-item.              |
    |                                   |                                |
    |                                   |The distortions have to be      |
    |                                   |selected by their respective    |
    |                                   |work-items based on the result  |
    |                                   |block major and minor shapes    |
    |                                   |just as for the result MVs as   |
    |                                   |described above.                |
    +-----------------------------------+--------------------------------+
    |ushort intel_sub_group_avc_mce_get_|Get the best inter distortion   |
    |best_inter_distortion(             |for the whole MB.               |
    |   intel_sub_group_avc_mce_result_t|                                |
    |      result )                     |                                |
    +-----------------------------------+--------------------------------+
    |uchar intel_sub_group_avc_mce_get_ |Get the MCE inter MB major      |
    |inter_major_shape(                 |partition shape.                |
    |   intel_sub_group_avc_mce_result_t|                                |
    |      result )                     |The returned values are as per  |
    |                                   |the inter-MB major shapes       |
    |                                   |enumeration values.             |
    |                                   |                                |
    |                                   |This can only be called as part |
    |                                   |of an IME or REF operation      |
    |                                   |evaluation.                     |
    +-----------------------------------+--------------------------------+
    |uchar intel_sub_group_avc_mce_get_ |Get the MCE inter MB minor      |
    |inter_minor_shapes(                |partition shapes.               |
    |   intel_sub_group_avc_mce_result_t|                                |
    |      result )                     |It returns a bit field with the |
    |                                   |minor shapes for the 4 8x8      |
    |                                   |sub-partitions in traditional Z |
    |                                   |order. Two bits are reserved for|
    |                                   |each of the four sub-partitions |
    |                                   |in row-major order.  The        |
    |                                   |returned 2-bit values are as per|
    |                                   |the inter-MB minor shapes       |
    |                                   |enumeration values.             |
    |                                   |                                |
    |                                   |This function returns valid     |
    |                                   |results only if the major shape |
    |                                   |is 8x8, otherwise the results   |
    |                                   |are undefined.                  |
    |                                   |                                |
    |                                   |This can only be called as part |
    |                                   |of an IME or REF operation      |
    |                                   |evaluation.                     |
    +-----------------------------------+--------------------------------+
    |uchar intel_sub_group_avc_mce_get_ |Get the MCE inter MB major      |
    |inter_directions(                  |partition directions.           |
    |   intel_sub_group_avc_mce_result_t|                                |
    |      result )                     |It returns a bit field with the |
    |                                   |direction for up to 4 major     |
    |                                   |sub-partitions in traditional Z |
    |                                   |order. Two bits are reserved for|
    |                                   |each of the four                |
    |                                   |sub-partitions. The returned    |
    |                                   |2-bit values are as per the     |
    |                                   |inter-MB major shape direction  |
    |                                   |enumeration values.             |
    |                                   |                                |
    |                                   |If the major partition is:      |
    |                                   |                                |
    |                                   |- 16x16, then bits in the range |
    |                                   |  [0, 1] contains the direction |
    |                                   |- 16x8 or 8x16, then bits in the|
    |                                   |  ranges [0, 1] and [2, 3]      |
    |                                   |  contains the two partitions   |
    |                                   |  directions                    |
    |                                   |- 8x8, then bits in the ranges  |
    |                                   |  [0, 1], [2, 3], [4, 5], and   |
    |                                   |  [6, 7] contains the four      |
    |                                   |  partitions directions         |
    |                                   |                                |
    |                                   |The returned values are as per  |
    |                                   |the inter direction enumeration |
    |                                   |values.                         |
    |                                   |                                |
    |                                   |This can only be called as part |
    |                                   |of an IME or REF operation      |
    |                                   |evaluation.                     |
    +-----------------------------------+--------------------------------+
    |uchar intel_sub_group_avc_mce_get_ |Get the count of motion vectors |
    |inter_motion_vector_count(         |(based on the partitioning      |
    |   intel_sub_group_avc_mce_result_t|decision) returned by the search|
    |      result )                     |operation.                      |
    |                                   |                                |
    |                                   |This can only be called as part |
    |                                   |of an IME or REF operation      |
    |                                   |evaluation.                     |
    +-----------------------------------+--------------------------------+
    |uint intel_sub_group_avc_mce_get_  |Get the MCE inter MB reference  |
    |inter_reference_ids(               |identifiers in a packed integer |
    |   intel_sub_group_avc_mce_result_t|format, with the following bits |
    |      result )                     |specifying the reference        |
    |                                   |identifiers for the major       |
    |                                   |partitions.                     |
    |                                   |                                |
    |                                   |3:0   => Fwd reference block 0  |
    |                                   |7:4   => Bwd reference block 0  |
    |                                   |11:8  => Fwd reference block 1  |
    |                                   |15:12 => Bwd reference block 1  |
    |                                   |19:16 => Fwd reference block 2  |
    |                                   |23:20 => Bwd reference block 2  |
    |                                   |27:24 => Fwd reference block 3  |
    |                                   |31:28 => Bwd reference block 3  |
    |                                   |                                |
    |                                   |The values of each individual   |
    |                                   |4-bit reference identifier range|
    |                                   |from 0 to 15, with each value   |
    |                                   |identifying the distance of     |
    |                                   |ordered pair of forward/backward|
    |                                   |reference images as declared in |
    |                                   |the VME kernel parameter        |
    |                                   |interface list.                 |
    |                                   |                                |
    |                                   |If the dual-reference evaluation|
    |                                   |functions are not used, then the|
    |                                   |values of the backward reference|
    |                                   |identifiers are undefined.      |
    |                                   |                                |
    |                                   |The blocks are numbered using   |
    |                                   |the traditional Z order. For    |
    |                                   |larger block sizes, the         |
    |                                   |sub-block reference identifier  |
    |                                   |pairs are replicated. For       |
    |                                   |example, for a 16x16 block all  |
    |                                   |four pairs of reference         |
    |                                   |identifiers are replicated to   |
    |                                   |the value of the first pair for |
    |                                   |block 0.                        |
    |                                   |                                |
    |                                   |NOTE: Unless HW assisted        |
    |                                   |multi-reference search was      |
    |                                   |performed using the IME         |
    |                                   |streamin/streamout evaluation   |
    |                                   |functions, the individual 4-bit |
    |                                   |reference identifier pair values|
    |                                   |will all be the same (pointing  |
    |                                   |to the same pair for            |
    |                                   |forward/backward reference      |
    |                                   |images).                        |
    +-----------------------------------+--------------------------------+
    |uchar                              |Get the MCE inter MB reference  |
    |intel_sub_group_avc_mce_get_inter_ |field polarities for the        |
    |reference_interlaced_field_        |corresponding reference         |
    |polarities(                        |identifiers returned by         |
    |   uint packed_reference_ids,      |intel_sub_group_avc_            |
    |   uint                            |mce_get_inter_reference_ids(..) |
    |   packed_reference_parameter_     |in a packed integer format, with|
    |   field_polarities,               |the following bits specifying   |
    |   intel_sub_group_avc_mce_result_t|the reference field polarities  |
    |      result )                     |for the major partitions.       |
    |                                   |                                |
    |                                   | 0 : Fwd reference block 0      |
    |                                   | 1 : Fwd reference block 1      |
    |                                   | 2 : Fwd reference block 2      |
    |                                   | 3 : Fwd reference block 3      |
    |                                   | 4 : Bwd reference block 0      |
    |                                   | 5 : Bwd reference block 1      |
    |                                   | 6 : Bwd reference block 2      |
    |                                   | 7 : Bwd reference block 3      |
    |                                   |                                |
    |                                   |If the dual-reference evaluation|
    |                                   |functions are not used, then the|
    |                                   |values of the backward reference|
    |                                   |field polarities are undefined. |
    |                                   |                                |
    |                                   |The blocks are numbered using   |
    |                                   |the traditional Z order. For    |
    |                                   |larger block sizes, the         |
    |                                   |sub-block reference field       |
    |                                   |polarities are replicated. For  |
    |                                   |example, for a 16x16 block all  |
    |                                   |four pairs of reference field   |
    |                                   |polarities are replicated to the|
    |                                   |value of the first pair for     |
    |                                   |block 0.                        |
    |                                   |                                |
    |                                   |The value of                    |
    |                                   |packed_reference_ids is as      |
    |                                   |defined by the return value of  |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |inter_reference_ids(..).        |
    |                                   |                                |
    |                                   |The value of                    |
    |                                   |packed_reference_parameter_     |
    |                                   |field_polarities specifies the  |
    |                                   |packed bit field of field       |
    |                                   |polarities for each of the (up  |
    |                                   |to 16) forward/backward         |
    |                                   |interleaved pairs of reference  |
    |                                   |images in the same order as     |
    |                                   |specified in the kernel         |
    |                                   |parameter list, as used for the |
    |                                   |inter search operation. If less |
    |                                   |than 16 pairs are used then the |
    |                                   |corresponding bit field values  |
    |                                   |are ignored.                    |
    |                                   |                                |
    |                                   |NOTE: An important restriction  |
    |                                   |is that when multiple IME       |
    |                                   |operations are performed for a  |
    |                                   |HW multi-assisted               |
    |                                   |multi-reference search operation|
    |                                   |using the streamin/streamout    |
    |                                   |capabilities, the same reference|
    |                                   |image parameter cannot be used  |
    |                                   |with different polarities in the|
    |                                   |sequence of IME operations used |
    |                                   |for a HW-assisted search        |
    |                                   |operation. In other words, the  |
    |                                   |field polarities for reference  |
    |                                   |image parameters must be used   |
    |                                   |consistently across IME         |
    |                                   |operations used in a HW assisted|
    |                                   |multi-reference search          |
    |                                   |operation.                      |
    +-----------------------------------+--------------------------------+

    ______________________________________________________________________


    IME built-in functions
    ----------------------
    A set of ordered phases of functions are required to be called to
    evaluate an integer motion estimation result.

    Initialization phase functions
    ++++++++++++++++++++++++++++++

    These built-in functions create a properly initialized payload that
    can be used for further configured for evaluating IME operations. This
    is a required initial phase.

    +---------------------------------+----------------------------------+
    |intel_sub_group_avc_ime_payload_t|Return an initialized payload for |
    |intel_sub_group_avc_ime_         |a VME integer search (IME)        |
    |initialize(                      |operation.                        |
    |    ushort2 src_coord,           |                                  |
    |    uchar partition_mask,        |The payload is initialized for    |
    |    uchar sad_adjustment )       |progressive frame operations, and |
    |                                 |the cost configuration values and |
    |                                 |the miscellaneous property values |
    |                                 |are all initialized to zero. The  |
    |                                 |cost configuration and the        |
    |                                 |miscellaneous property            |
    |                                 |configuration phase functions must|
    |                                 |be used to override the initial   |
    |                                 |configurations in the payload.    |
    |                                 |                                  |
    |                                 |The src_coord value represents the|
    |                                 |2D offset of the top left corner  |
    |                                 |of the source MB in pixel units in|
    |                                 |the source image. Source MBs at   |
    |                                 |the image borders are allowed to  |
    |                                 |be partial, but the top-left      |
    |                                 |corner must be within the image.  |
    |                                 |                                  |
    |                                 |If the source image is an         |
    |                                 |interlaced scan image, then the   |
    |                                 |bottom field lines are considered |
    |                                 |as logically overlapping with the |
    |                                 |top field lines (i.e. the top     |
    |                                 |field MBs are considered as       |
    |                                 |logically overlapping with the    |
    |                                 |bottom MBs) for the purposes for  |
    |                                 |specifying the src_coord value.   |
    |                                 |                                  |
    |                                 |The legal values for              |
    |                                 |partition_mask can be composed by |
    |                                 |setting the appropriate bit fields|
    |                                 |specified by partition mask       |
    |                                 |enumeration values using the '&'  |
    |                                 |operator.                         |
    |                                 |                                  |
    |                                 |The legal values for              |
    |                                 |sad_adjustment is specified by its|
    |                                 |respective enumeration values.    |
    |                                 |                                  |
    |                                 |If the sad_adjustment is set to   |
    |                                 |CLK_AVC_ME_SAD_ADJUST_MODE_HAAR_  |
    |                                 |INTEL, a simple wavelet transform,|
    |                                 |Haar transform, is used to refine |
    |                                 |the distortion measure of         |
    |                                 |SAD. Haar transform here is used  |
    |                                 |as a coarse estimation of the     |
    |                                 |integer transform.                |
    +---------------------------------+----------------------------------+

    Configuration phase functions
    +++++++++++++++++++++++++++++

    These built-in functions allow for configuration of the search window.
    A call to either intel_sub_group_avc_ime_set_single_reference(..)  or
    intel_sub_group_avc_ime_set_dual_reference(..) is required. This is a
    required phase immediately following the initialization phase

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |Update the input payload for a |
    |intel_sub_group_avc_ime_set_        |VME single-reference search    |
    |single_reference(                   |with the configuration for the |
    |   short2 ref_offset,               |reference window search region,|
    |   uchar search_window_config,      |and return it.                 |
    |   intel_sub_group_avc_ime_payload_t|                               |
    |      payload )                     |The 2D ref_offset specifies the|
    |                                    |reference window offset. The X |
    |                                    |and Y coordinates must be in   |
    |                                    |the range [-2048, 2047],       |
    |                                    |otherwise the results are      |
    |                                    |undefined. The reference window|
    |                                    |is allowed to be partially     |
    |                                    |outside the image. Pixel       |
    |                                    |replication is applied to      |
    |                                    |generate out-of-bound reference|
    |                                    |pixels. It is specified in PEL |
    |                                    |units. Results are undefined in|
    |                                    |the reference region is        |
    |                                    |completely outside the image.  |
    |                                    |                               |
    |                                    |If the reference image is an   |
    |                                    |interlaced scan image, then the|
    |                                    |top field lines are considered |
    |                                    |as logically overlapping with  |
    |                                    |the bottom field lines         |
    |                                    |(i.e. the top field MBs are    |
    |                                    |considered as logically        |
    |                                    |overlapping with the bottom    |
    |                                    |MBs) for the purposes for      |
    |                                    |specifying the ref_offset      |
    |                                    |value.                         |
    |                                    |                               |
    |                                    |The parameter                  |
    |                                    |search_window_config must be a |
    |                                    |compile-time constant.         |
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |search_window_config must be   |
    |                                    |one of the unreserved search   |
    |                                    |window configuration           |
    |                                    |enumeration values.            |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |Update the input payload for a |
    |intel_sub_group_avc_ime_            |VME dual-reference search with |
    |set_dual_reference(                 |the configurations for the     |
    |   short2 fwd_ref_offset,           |reference window search        |
    |   short2 bwd_ref_offset,           |regions, and return it.        |
    |   uchar search_window_config,      |                               |
    |   intel_sub_group_avc_ime_payload_t|The 2D                         |
    |      payload )                     |fwd_ref_offset/bwd_ref_offset  |
    |                                    |specifies the forward/backward |
    |                                    |reference window offsets. The X|
    |                                    |and Y coordinates must be in   |
    |                                    |the range [-2048, 2047],       |
    |                                    |otherwise the results are      |
    |                                    |undefined. The reference       |
    |                                    |windows are allowed to be      |
    |                                    |partially outside the          |
    |                                    |image. Pixel replication is    |
    |                                    |applied to generate            |
    |                                    |out-of-bound reference         |
    |                                    |pixels. It is specified in PEL |
    |                                    |units.                         |
    |                                    |                               |
    |                                    |If a reference image is an     |
    |                                    |interlaced scan image, then the|
    |                                    |top field lines are considered |
    |                                    |as logically overlapping with  |
    |                                    |the bottom field lines         |
    |                                    |(i.e. the top field MBs are    |
    |                                    |considered as logically        |
    |                                    |overlapping with the bottom    |
    |                                    |MBs) for the purposes for      |
    |                                    |specifying the corresponding   |
    |                                    |fwd_ref_offset and/or          |
    |                                    |bwd_ref_offset values.         |
    |                                    |                               |
    |                                    |The parameter                  |
    |                                    |search_window_config must be a |
    |                                    |compile-time constant and must |
    |                                    |be one of the unreserved search|
    |                                    |window configuration           |
    |                                    |enumeration values.            |
    +------------------------------------+-------------------------------+
    |ushort2 intel_sub_group_avc_ime_    |Get the 2D size of the         |
    |ref_window_size(                    |reference window in pixel      |
    |  uchar search_window_config,       |units.                         |
    |  char dual_ref )                   |                               |
    |                                    |The value of                   |
    |// deprecated                       |search_window_config must be   |
    |ushort2 intel_sub_group_ime_        |one of the unreserved search   |
    |ref_window_size(                    |window configuration           |
    |  uchar search_window_config,       |enumeration values.            |
    |  char dual_ref )                   |                               |
    |                                    |The value of dual_ref must be  |
    |                                    |set to zero for a single       |
    |                                    |reference search window and one|
    |                                    |for a dual-reference search    |
    |                                    |window.                        |
    +------------------------------------+-------------------------------+
    |short2                              |If the input 2D reference      |
    |intel_sub_group_avc_ime_            |window offset, ref_offset,     |
    |adjust_ref_offset(                  |causes the reference window to |
    |   short2 ref_offset,               |be fully out-of-bound of the   |
    |   ushort2 src_coord,               |reference image, adjust it such|
    |   ushort2 ref_window_size,         |that the reference window is   |
    |   ushort2 image_size )             |within bounds of the reference |
    |                                    |image.                         |
    |                                    |                               |
    |                                    |The 2D ref_offset specifies the|
    |                                    |reference window offset. The X |
    |                                    |and Y coordinates must be in   |
    |                                    |the range [-2048, 2047],       |
    |                                    |otherwise the results are      |
    |                                    |undefined.                     |
    |                                    |                               |
    |                                    |The src_coord value represents |
    |                                    |the 2D offset of the top left  |
    |                                    |corner of the source MB in     |
    |                                    |pixel units in the source      |
    |                                    |image. Source MBs at the image |
    |                                    |borders are allowed to be      |
    |                                    |partial. Results are undefined |
    |                                    |if the source MB is completely |
    |                                    |outside the image.             |
    |                                    |                               |
    |                                    |The ref_window_size specifies  |
    |                                    |the 2D size of the reference   |
    |                                    |window in pixel units.         |
    |                                    |                               |
    |                                    |The image_size specifies the 2D|
    |                                    |size of the progressive scan,  |
    |                                    |or top or bottom fields, of the|
    |                                    |interlaced scan image in pixel |
    |                                    |units.                         |
    |                                    |                               |
    |                                    |If the reference image is an   |
    |                                    |interlaced scan image, then the|
    |                                    |bottom field lines are         |
    |                                    |considered as logically        |
    |                                    |overlapping with the top field |
    |                                    |lines the purposes for         |
    |                                    |specifying the image_size      |
    |                                    |value. Since, the actual layout|
    |                                    |of the top and bottom fields in|
    |                                    |the reference image is in an   |
    |                                    |interleaved fashion, the height|
    |                                    |of the top or bottom fields    |
    |                                    |should be exactly half of the  |
    |                                    |actual reference image height. |
    |                                    |                               |
    |                                    |A call to                      |
    |                                    |intel_sub_group_avc_ime_       |
    |                                    |adjust_ref_coord(..)  is       |
    |                                    |optional. It is required only  |
    |                                    |if the reference window offsets|
    |                                    |inputs to                      |
    |                                    |intel_sub_group_avc_ime_set_   |
    |                                    |single_reference(..)  or       |
    |                                    |intel_sub_group_avc_ime_set_   |
    |                                    |dual_reference (..)  is        |
    |                                    |potentially out-of-bounds and  |
    |                                    |need to be adjusted.           |
    +------------------------------------+-------------------------------+

    Payload type conversion functions
    +++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    search configuration phase to convert IME payload to MCE payloads and
    vice-versa.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_mce_payload_t   |Convert the IME payload to a   |
    |intel_sub_group_avc_ime_convert_    |generic MCE payload.           |
    |to_mce_payload(                     |                               |
    |   intel_sub_group_avc_ime_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |Convert the generic MCE payload|
    |intel_sub_group_avc_                |to a IME payload.              |
    |mce_convert_to_ime_payload(         |                               |
    |   intel_sub_group_avc_mce_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+

    Multi-reference cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    search configuration phase to enable multi-reference image costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ime_set_        |intel_sub_group_avc_mce_       |
    |inter_base_multi_reference_penalty( |set_inter_base_multi_reference_|
    |   uchar reference_base_penalty,    |penalty(..) the payload        |
    |   intel_sub_group_avc_ime_payload_t|conversions with to/from MCE   |
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+

    Inter shape and direction cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    search configuration phase to enable shape costing.

    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_payload_t    |This is a wrapper for         |
    |intel_sub_group_avc_ime_set_         |intel_sub_group_avc_mce_set_  |
    |inter_shape_penalty(                 |inter_shape_penalty(..)       |
    |   ulong packed_shape_cost,          |with the payload conversions  |
    |   intel_sub_group_avc_ime_payload_t |to/from MCE types. See MCE    |
    |      payload )                      |version for description.      |
    |                                     |                              |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_payload_t    |This is a wrapper for         |
    |intel_sub_group_avc_ime_set_         |intel_sub_group_avc_mce_set_  |
    |inter_direction_penalty(             |inter_direction_penalty(..)   |
    |    uchar direction_cost,            |with the payload conversions  |
    |    intel_sub_group_avc_ime_payload_t|to/from MCE types. See MCE    |
    |       payload )                     |version for description.      |
    +-------------------------------------+------------------------------+

    Inter motion vector cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    search configuration phase to enable motion vector costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ime_set_        |intel_sub_group_avc_mce_set_   |
    |motion_vector_cost_function(        |motion_vector_cost_function(..)|
    |   ulong packed_cost_center_delta,  |with the payload conversions   |
    |   uint2 packed_cost_table,         |to/from MCE types. See MCE     |
    |   uchar cost_precision,            |version for description.       |
    |   intel_sub_group_avc_ime_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+

    Miscellaneous property configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    search configuration phase to enable miscellaneous properties setting
    in the payload.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ime_set_        |intel_sub_group_avc_mce_set_   |
    |source_interlaced_field_polarity(   |source_interlaced_field_       |
    |   uchar src_field_polarity,        |polarity(..)  with the result  |
    |   intel_sub_group_avc_ime_payload_t|conversions to/from MCE        |
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ime_set_        |intel_sub_group_avc_mce_       |
    |single_reference_interlaced_field_  |set_single_reference_          |
    |polarity(                           |interlaced_field_polarity(..)  |
    |   uchar ref_field_polarity,        |with the result conversions    |
    |   intel_sub_group_avc_ime_payload_t|to/from MCE types. See MCE     |
    |      payload )                     |version for description.       |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ime_set_        |intel_sub_group_avc_mce_set_   |
    |dual_reference_interlaced_field_    |dual_reference_interlaced_     |
    |polarities(                         |field_polarities(..) with the  |
    |   uchar fwd_ref_field_polarity,    |result conversions to/from MCE |
    |   uchar bwd_ref_field_polarity,    |types. See MCE version for     |
    |   intel_sub_group_avc_ime_payload_t|description.                   |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |Specify the maximum number of  |
    |intel_sub_group_avc_ime_set_        |motion vectors allowed for the |
    |max_motion_vector_count(            |current MB. The default setting|
    |   uchar max_motion_vector_count,   |is 32. Any other value may     |
    |   intel_sub_group_avc_ime_payload_t|alter the MB partitioning      |
    |      payload )                     |decision. The IME operation    |
    |                                    |will compute the best allowed  |
    |                                    |partitioning such that the     |
    |                                    |number of sub-block motion     |
    |                                    |vectors will not exceed        |
    |                                    |max_motion_vector_count.       |
    |                                    |                               |
    |                                    |The value of max_motion_vector_|
    |                                    |count(..) specifies the maximum|
    |                                    |number of motion vectors       |
    |                                    |allowed for the current MB. It |
    |                                    |must be in the range [1, 32],  |
    |                                    |otherwise the results are      |
    |                                    |undefined.                     |
    |                                    |                               |
    |                                    |NOTE: This can be used to      |
    |                                    |handle the restriction for     |
    |                                    |certain profiles for AVC in    |
    |                                    |that the maximum number of     |
    |                                    |motion vectors allowed for two |
    |                                    |consecutive MBs can only be 16.|
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |Update the input payload to    |
    |intel_sub_group_avc_ime_set_        |disable a mix of forward and   |
    |unidirectional_mix_disable(         |backward MVs in the result.    |
    |   intel_sub_group_avc_ime_payload_t|                               |
    |      payload )                     |Default is to enable it.       |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |Specifies the threshold value  |
    |intel_sub_group_avc_ime_set_early_  |of a distortion compute of a   |
    |search_termination_threshold(       |16x16 partition of a MB for a  |
    |   uchar threshold,                 |single-reference search, below |
    |   intel_sub_group_avc_ime_payload_t|which no more searching is     |
    |      payload )                     |performed for the MB.          |
    |                                    |                               |
    |                                    |The value of threshold is      |
    |                                    |specified in U4U4 format and   |
    |                                    |the integer value must fit     |
    |                                    |within 14 bits.                |
    |                                    |                               |
    |                                    |The input payload must have    |
    |                                    |been configured for a single-  |
    |                                    |reference search with the 16x16|
    |                                    |partition enabled for this     |
    |                                    |threshold to be set, or else   |
    |                                    |the results are undefined.     |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |Set the (16) SAD weights for   |
    |intel_sub_group_avc_ime_set_        |each 4x4 sub-block.            |
    |weighted_sad(                       |                               |
    |   uint packed_sad_weights,         |These values are used to       |
    |   intel_sub_group_avc_ime_payload_t|decrease the SAD magnitude of  |
    |      payload )                     |each 4x4 sub-block by dividing |
    |                                    |the SAD of 4x4 sub-block of the|
    |                                    |source MB by its mapped weight.|
    |                                    |                               |
    |                                    |It requires a partition_mask of|
    |                                    |16x16 and forward search window|
    |                                    |configuration.                 |
    |                                    |                               |
    |                                    |The weighting pattern used is  |
    |                                    |the traditional Z order for    |
    |                                    |each 4x4 block. Weighted-SAD   |
    |                                    |Control Mapping:               |
    |                                    |                               |
    |                                    |   0 1 4 5                     |
    |                                    |   2 3 6 7                     |
    |                                    |   8 9 C D                     |
    |                                    |   A B E F                     |
    |                                    |                               |
    |                                    |Each weight is of 2 bits       |
    |                                    |represented in a packed format |
    |                                    |in packed_sad_weights for each |
    |                                    |of the 4x4 blocks in Z order.  |
    |                                    |                               |
    |                                    |A prior call to                |
    |                                    |intel_sub_group_avc_ime_       |
    |                                    |set_single_reference(..) set up|
    |                                    |for the forward reference image|
    |                                    |is required.                   |
    |                                    |                               |
    |                                    |This feature is mainly intended|
    |                                    |for improved block matching in |
    |                                    |image-rate conversion (FRC)    |
    |                                    |kernels.                       |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ime_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ime_set_        |intel_sub_group_avc_mce_set_ac_|
    |ac_only_haar(                       |only_haar(..) with the payload |
    |   intel_sub_group_avc_ime_payload_t|conversions to/from MCE        |
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+

    Evaluation phase functions
    ++++++++++++++++++++++++++

    These built-in functions perform the evaluation of the IME operation
    configured in the payload with a VME media sampler and return the
    results.

    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_t     |Evaluate the basic IME        |
    |intel_sub_group_avc_ime_evaluate_    |operation with a single       |
    |with_single_reference(               |reference and return its      |
    |   read_only image2d_t src_image,    |results. The IME payload must |
    |   read_only image2d_t ref_image,    |have been configured with     |
    |   sampler_t vme_media_sampler,      |intel_sub_group_avc_ime_set_  |
    |   intel_sub_group_avc_ime_payload_t |single_reference(..).         |
    |      payload )                      |                              |
    |                                     |The parameter ref_image must  |
    |                                     |be a valid forward image      |
    |                                     |kernel parameter per the      |
    |                                     |ordering conventions for the  |
    |                                     |kernel parameter list.        |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_t     |Evaluate the basic IME        |
    |intel_sub_group_avc_ime_evaluate_    |operation with dual reference |
    |with_dual_reference(                 |and return its results. The   |
    |   read_only image2d_t src_image,    |IME payload must have been    |
    |   read_only image2d_t fwd_ref_image,|configured with               |
    |   read_only image2d_t bwd_ref_image,|intel_sub_group_avc_ime_set_  |
    |   sampler_t vme_media_sampler,      |dual_reference(..).           |
    |   intel_sub_group_avc_ime_payload_t |                              |
    |      payload )                      |The parameter fwd_ref_image[  |
    |                                     |bwd_ref_image] must be a valid|
    |                                     |forward[backward] image kernel|
    |                                     |parameter per the ordering    |
    |                                     |conventions for the kernel    |
    |                                     |parameter list.               |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_      |Evaluate the single reference |
    |single_reference_streamout_t         |IME operation with streamout  |
    |intel_sub_group_avc_ime_evaluate_    |and return its results. The   |
    |with_single_reference_streamout(     |IME payload must have been    |
    |   read_only image2d_t src_image,    |configured with               |
    |   read_only image2d_t ref_image,    |intel_sub_group_avc_ime_set_  |
    |   sampler_t vme_media_sampler,      |single_reference(..).         |
    |   intel_sub_group_avc_ime_payload_t |                              |
    |      payload )                      |The parameter ref_image must  |
    |                                     |be a valid forward image      |
    |                                     |kernel parameter per the      |
    |                                     |ordering conventions for the  |
    |                                     |kernel parameter list.        |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_      |Evaluate the dual reference   |
    |dual_reference_streamout_t           |IME operation with streamout  |
    |intel_sub_group_avc_ime_evaluate_    |and return its results. The   |
    |with_dual_reference_streamout(       |IME payload must have been    |
    |   read_only image2d_t src_image,    |configured with               |
    |   read_only image2d_t fwd_ref_image,|intel_sub_group_avc_ime_set_  |
    |   read_only image2d_t bwd_ref_image,|dual_reference(..).           |
    |   sampler_t vme_media_sampler,      |                              |
    |   intel_sub_group_avc_ime_payload_t |The parameter fwd_ref_image[  |
    |      payload )                      |bwd_ref_image] must be a valid|
    |                                     |forward[backward] image kernel|
    |                                     |parameter per the ordering    |
    |                                     |conventions for the kernel    |
    |                                     |parameter list.               |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_t     |Evaluate the single reference |
    |intel_sub_group_avc_ime_evaluate_    |IME operation with streamin   |
    |with_single_reference_streamin(      |and return its results. The   |
    |    read_only image2d_t src_image,   |IME payload must have been    |
    |    read_only image2d_t ref_image,   |configured with               |
    |    sampler_t vme_media_sampler,     |intel_sub_group_avc_ime_set_  |
    |    intel_sub_group_avc_ime_payload_t|single_reference(..).         |
    |       payload,                      |                              |
    |    intel_sub_group_avc_ime_single_  |The parameter ref_image must  |
    |    reference_streamin_t             |be a valid forward image      |
    |       streamin_components )         |kernel parameter per the      |
    |                                     |ordering conventions for the  |
    |                                     |kernel parameter list.        |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_t     |Evaluate the dual reference   |
    |intel_sub_group_avc_ime_evaluate_    |IME operation with streamin   |
    |with_dual_reference_streamin(        |and return its results. The   |
    |   read_only image2d_t src_image,    |IME payload must have been    |
    |   read_only image2d_t fwd_ref_image,|configured with               |
    |   read_only image2d_t bwd_ref_image,|intel_sub_group_avc_ime_set_  |
    |   sampler_t vme_media_sampler,      |dual_reference(..).           |
    |   intel_sub_group_avc_ime_payload_t |                              |
    |      payload,                       |The parameter fwd_ref_image[  |
    |   intel_sub_group_avc_ime_dual_     |bwd_ref_image] must be a valid|
    |   reference_streamin_t              |forward[backward] image kernel|
    |      streamin_components            |parameter per the ordering    |
    |   )                                 |conventions for the kernel    |
    |                                     |parameter list.               |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_      |Evaluate the single reference |
    |single_reference_streamout_t         |IME operation with streamin   |
    |intel_sub_group_avc_ime_evaluate_    |and streamout and return its  |
    |with_single_reference_streaminout(   |results. The IME payload must |
    |   read_only image2d_t src_image,    |have been configured with     |
    |   read_only image2d_t ref_image,    |intel_sub_group_avc_ime_set_  |
    |   sampler_t vme_media_sampler,      |single_reference(..).         |
    |   intel_sub_group_avc_ime_payload_t |                              |
    |      payload,                       |The parameter ref_image must  |
    |   intel_sub_group_avc_ime_single_   |be a valid forward image      |
    |   reference_streamin_t              |kernel parameter per the      |
    |      streamin_components            |ordering conventions for the  |
    | )                                   |kernel parameter list.        |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_      |Evaluate the dual reference   |
    |dual_reference_streamout_t           |IME operation with streamin   |
    |intel_sub_group_avc_ime_evaluate_    |and streamout and return its  |
    |with_dual_reference_streaminout(     |results. The IME payload must |
    |   read_only image2d_t src_image,    |have been configured with     |
    |   read_only image2d_t fwd_ref_image,|intel_sub_                    |
    |   read_only image2d_t bwd_ref_image,|group_avc_ime_set_dual_       |
    |   sampler_t vme_media_sampler,      |reference(..).                |
    |   intel_sub_group_avc_ime_payload_t |                              |
    |      payload,                       |The parameter fwd_ref_image[  |
    |   intel_sub_group_avc_ime_dual_     |bwd_ref_image] must be a valid|
    |   reference_streamin_t              |forward[backward] image kernel|
    |      streamin_components            |parameter per the ordering    |
    | )                                   |conventions for the kernel    |
    |                                     |parameter list.               |
    +-------------------------------------+------------------------------+

    Result type conversion functions
    +++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    evaluation phase to convert IME results to MCE results and vice-versa.

    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_mce_result_t   |Convert the IME result into a   |
    |intel_sub_group_avc_ime_           |MCE result.                     |
    |convert_to_mce_result(             |                                |
    |   intel_sub_group_avc_ime_result_t|                                |
    |      result )                     |                                |
    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_ime_result_t   |Convert the MCE result into an  |
    |intel_sub_group_avc_mce_           |IME result.                     |
    |convert_to_ime_result(             |                                |
    |   intel_sub_group_avc_mce_result_t|                                |
    |      result )                     |                                |
    +-----------------------------------+--------------------------------+

    Result processing phase functions
    +++++++++++++++++++++++++++++++++

    These built-in functions are called following the evaluation phase to
    extract the various result components from an IME evaluation result.

    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_             |Return the streamed out BMVs  |
    |single_reference_streamin_t          |and distortions from the input|
    |intel_sub_group_avc_ime_             |result from a single reference|
    |get_single_reference_streamin(       |streamout IME operation that  |
    |   intel_sub_group_avc_ime_result_   |can be used as streamin input |
    |   single_reference_streamout_t      |for a subsequent IME          |
    |      result )                       |operation.                    |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_             |Return the streamed out BMVs  |
    |dual_reference_streamin_t            |and distortions from the input|
    |intel_sub_group_avc_ime_get_         |result from a dual reference  |
    |dual_reference_streamin(             |streamout IME operation that  |
    |   intel_sub_group_avc_ime_result_   |can be used as streamin input |
    |   dual_reference_streamout_t        |for a subsequent IME          |
    |      result )                       |operation.                    |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_t     |Strip out the single reference|
    |intel_sub_group_avc_ime_strip_       |streamout BMVs and distortions|
    |single_reference_streamout(          |from the streamout results and|
    |   intel_sub_group_avc_ime_result_   |return the rest.              |
    |   single_reference_streamout_t      |                              |
    |      result )                       |                              |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ime_result_t     |Strip out the dual reference  |
    |intel_sub_group_avc_ime_strip_       |streamout MVs and distortions |
    |dual_reference_streamout(            |from the streamout results and|
    |   intel_sub_group_avc_ime_result_   |return the rest.              |
    |   dual_reference_streamout_t        |                              |
    |      result )                       |                              |
    +-------------------------------------+------------------------------+
    |uint intel_sub_group_avc_ime_get_    |Get the packed motion vectors |
    |streamout_major_shape_motion_vectors(|for the input major shape from|
    |   intel_sub_group_avc_ime_result_   |the IME single reference      |
    |   single_reference_streamout_t      |streamout results.            |
    |      result,                        |                              |
    |   uchar major_shape )               |The parameter major_shape must|
    |                                     |be a valid inter macro-block  |
    |                                     |major shape enumeration value |
    |                                     |and a compile-time constant.  |
    |                                     |                              |
    |                                     |Up to 4 packed MVs are        |
    |                                     |returned, one per             |
    |                                     |work-item. If the major shape |
    |                                     |is:                           |
    |                                     |- 16x6, then one packed MV    |
    |                                     |  is returned by work-item 0  |
    |                                     |- 16x8, or 8x16, then two     |
    |                                     |  packed MVs are returned by  |
    |                                     |  work-items 0 and 1          |
    |                                     |- 8x8, then four packed MVs   |
    |                                     |  are returned by work-items  |
    |                                     |  0 to 3.                     |
    +-------------------------------------+------------------------------+
    |ushort intel_sub_group_avc_ime_get_  |Get the distortions for the   |
    |streamout_major_shape_distortions(   |input major shape from the IME|
    |   intel_sub_group_avc_ime_result_   |single reference streamout    |
    |   single_reference_streamout_t      |results.                      |
    |      result,                        |                              |
    |   uchar major_shape )               |The parameter major_shape must|
    |                                     |be a valid inter macro-block  |
    |                                     |major shape enumeration value |
    |                                     |and a compile-time constant.  |
    |                                     |                              |
    |                                     |Up to 4 distortions are       |
    |                                     |returned, one per work-item in|
    |                                     |the same format as for motion |
    |                                     |vectors as described above.   |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |Get the reference identifiers |
    |streamout_major_shape_reference_ids( |for the input major shape and |
    |   intel_sub_group_avc_ime_result_   |direction from the IME dual   |
    |   single_reference_streamout_t      |reference streamout results.  |
    |      result,                        |                              |
    |   uchar major_shape )               |The parameter major_shape must|
    |                                     |be a valid inter macro-block  |
    |                                     |major shape enumeration value |
    |                                     |and a compile-time constant.  |
    |                                     |                              |
    |                                     |Up to 4 reference identifiers |
    |                                     |are returned, one per         |
    |                                     |work-item in the same format  |
    |                                     |as for motion vectors as      |
    |                                     |described above.              |
    +-------------------------------------+------------------------------+
    |uint intel_sub_group_avc_ime_get_    |Get the packed motion vectors |
    |streamout_major_shape_motion_vectors(|for the input major shape and |
    |   intel_sub_group_avc_ime_result_   |direction from the IME dual   |
    |   dual_reference_streamout_t result,|reference streamout results.  |
    |   uchar major_shape,                |                              |
    |   uchar direction )                 |The parameter major_shape must|
    |                                     |be a valid inter macro-block  |
    |                                     |major shape enumeration value |
    |                                     |and a compile-time constant.  |
    |                                     |                              |
    |                                     |The parameter direction must  |
    |                                     |be a valid unidirectional     |
    |                                     |inter macro-block major       |
    |                                     |direction value and a         |
    |                                     |compile-time constant.        |
    |                                     |                              |
    |                                     |Up to 4 packed MVs are        |
    |                                     |returned, one per work-item in|
    |                                     |the same format as for motion |
    |                                     |vectors for single reference  |
    |                                     |streamout as described above. |
    +-------------------------------------+------------------------------+
    |ushort intel_sub_group_avc_ime_get_  |Get the distortions for the   |
    |streamout_major_shape_distortions(   |input major shape and         |
    |   intel_sub_group_avc_ime_result_   |direction from the IME dual   |
    |   dual_reference_streamout_t result,|reference streamout results.  |
    |   uchar major_shape,                |                              |
    |   uchar direction )                 |The parameter major_shape must|
    |                                     |be a valid inter macro-block  |
    |                                     |major shape enumeration value |
    |                                     |and a compile-time constant.  |
    |                                     |                              |
    |                                     |The parameter direction must  |
    |                                     |be a valid unidirectional     |
    |                                     |inter macro-block major       |
    |                                     |direction value and a         |
    |                                     |compile-time constant.        |
    |                                     |                              |
    |                                     |Up to 4 distortions are       |
    |                                     |returned, one per work-item in|
    |                                     |the same format as for motion |
    |                                     |vectors as described above.   |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |Get the reference identifiers |
    |streamout_major_shape_reference_ids( |for the input major shape and |
    |   intel_sub_group_avc_ime_result_   |direction from the IME dual   |
    |   dual_reference_streamout_t result,|reference streamout results.  |
    |   uchar major_shape,                |                              |
    |   uchar direction )                 |                              |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |Get the bitmask indicating    |
    |border_reached(                      |whether any border of         |
    |   uchar image_select,               |forward/backward reference    |
    |   intel_sub_group_avc_ime_result_t  |image is reached by one or    |
    |      result )                       |more MVs in the winning inter |
    |                                     |shape. The bitmask values are |
    |                                     |as per the inter border       |
    |                                     |reached enumeration values.   |
    |                                     |                              |
    |                                     |The search window must have   |
    |                                     |been configured for a forward |
    |                                     |reference if image_select is  |
    |                                     |set as                        |
    |                                     |CLK_AVC_ME_FRAME_FORWARD_INTEL|
    |                                     |and with a backward reference |
    |                                     |if image_select is set as     |
    |                                     |CLK_AVC_ME_FRAME_BACKWARD_    |
    |                                     |INTEL.                        |
    |                                     |                              |
    |                                     |The value of image_select is  |
    |                                     |either                        |
    |                                     |CLK_AVC_ME_FRAME_FORWARD_INTEL|
    |                                     |or CLK_AVC_ME_FRAME_BACKWARD_ |
    |                                     |INTEL and must be a           |
    |                                     |compile-time constant.        |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |Get the indication that the   |
    |truncated_search_indication(         |search operation was prevented|
    |   intel_sub_group_avc_ime_result_t  |from providing the lowest     |
    |      result )                       |distortion solution due to the|
    |                                     |tighter constraints on the    |
    |                                     |maximum number of MB motion   |
    |                                     |vectors configured for the    |
    |                                     |search operation.             |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |Get the indication that       |
    |unidirectional_early_search_         |unidirectional search         |
    |termination(                         |operation terminated early    |
    |   intel_sub_group_avc_ime_result_t  |because the configured        |
    |      result )                       |distortion threshold was met. |
    +-------------------------------------+------------------------------+
    |uint intel_sub_group_avc_ime_get_    |Get the 16x16 motion vector   |
    |weighting_pattern_minimum_           |corresponding to the minimum  |
    |motion_vector(                       |16x16 distortion when applying|
    |   intel_sub_group_avc_ime_result_t  |the traditional Z-order SAD   |
    |      result )                       |weighting pattern.            |
    |                                     |                              |
    |                                     |This can only be called if a  |
    |                                     |SAD weighting pattern was set |
    |                                     |prior to evaluation using     |
    |                                     |intel_sub_group_avc_ime_set_  |
    |                                     |weighted_sad(..).             |
    |                                     |                              |
    |                                     |The argument for parameter    |
    |                                     |"pattern_id" should be a      |
    |                                     |compile-time constant.        |
    +-------------------------------------+------------------------------+
    |ushort intel_sub_group_avc_ime_get_  |Get the minimum 16x16         |
    |weighting_pattern_minimum_distortion(|distortion when applying the  |
    |   intel_sub_group_avc_ime_result_t  |traditional Z-order SAD       |
    |      result )                       |weighting pattern.            |
    +-------------------------------------+------------------------------+
    |ulong intel_sub_group_avc_ime_get_   |This is a wrapper for         |
    |motion_vectors(                      |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |motion_vectors(..) with the   |
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |ushort intel_sub_group_avc_ime_get_  |This is a wrapper for         |
    |inter_distortions(                   |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |inter_distortions(..) with the|
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |ushort intel_sub_group_avc_ime_get_  |This is a wrapper for         |
    |best_inter_distortion(               |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |best_inter_distortion(..)with |
    |      result )                       |the result conversions to/from|
    |                                     |MCE types. See MCE version for|
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |This is a wrapper for         |
    |inter_major_shape(                   |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |inter_major_shape(..) with the|
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |This is a wrapper for         |
    |inter_minor_shapes(                  |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |inter_minor_shapes(..) with   |
    |      result )                       |the result conversions to/from|
    |                                     |MCE types. See MCE version for|
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |This is a wrapper for         |
    |inter_directions(                    |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |inter_directions(..)  with the|
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |This is a wrapper for         |
    |inter_motion_vector_count(           |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |inter_motion_vector_count(..) |
    |      result )                       |with the result conversions   |
    |                                     |to/from MCE types. See MCE    |
    |                                     |version for description.      |
    +-------------------------------------+------------------------------+
    |uint intel_sub_group_avc_ime_get_    |This is a wrapper for         |
    |inter_reference_ids(                 |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ime_result_t  |inter_reference_ids(..) with  |
    |      result )                       |the result conversions to/from|
    |                                     |MCE types. See MCE version for|
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ime_get_   |This is a wrapper for         |
    |inter_reference_interlaced_          |intel_sub_group_avc_mce_get_  |
    |field_polarities(                    |inter_reference_interlaced_   |
    |   uint packed_reference_ids,        |field_polarities(..)  with the|
    |   uint packed_reference_parameter_  |result conversions to/from MCE|
    |   field_polarities,                 |types. See MCE version for    |
    |   intel_sub_group_avc_ime_result_t  |description.                  |
    |      result )                       |                              |
    +-------------------------------------+------------------------------+
    ______________________________________________________________________

    REF built-in functions
    ----------------------

    A set of ordered phases of functions are required to be called to
    evaluate a refinement operation result.

    Initialization phase functions
    ++++++++++++++++++++++++++++++

    These built-in functions create a properly initialized payload that
    can be used for further configured for evaluating REF operations. This
    is a required initial phase. A call to either
    intel_sub_group_avc_fme_initialize or
    intel_sub_group_avc_bme_initialize is required.

    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_ref_payload_t  |Return an initialized payload   |
    |intel_sub_group_avc_fme_initialize(|for a VME fractional motion     |
    |   ushort2 src_coord,              |estimation operation (FME).     |
    |   ulong motion_vectors,           |                                |
    |   uchar major_shapes,             |The payload is initialized for  |
    |   uchar minor_shapes,             |progressive frame operations,   |
    |   uchar directions,               |and the cost configuration      |
    |   uchar pixel_resolution,         |values and the miscellaneous    |
    |   uchar sad_adjustment )          |property values are all         |
    |                                   |initialized to zero. The cost   |
    |                                   |configuration and the           |
    |                                   |miscellaneous property          |
    |                                   |configuration phase functions   |
    |                                   |must be used to override the    |
    |                                   |initial configurations in the   |
    |                                   |payload.                        |
    |                                   |                                |
    |                                   |The src_coord value represents  |
    |                                   |the 2D offset of the top left   |
    |                                   |corner of the source MB in pixel|
    |                                   |units in the source image.      |
    |                                   |                                |
    |                                   |If the source image is an       |
    |                                   |interlaced scan image, then the |
    |                                   |bottom field lines are          |
    |                                   |considered as logically         |
    |                                   |overlapping with the top field  |
    |                                   |lines (i.e. the top field MBs   |
    |                                   |are considered as logically     |
    |                                   |overlapping with the bottom MBs)|
    |                                   |for the purposes for specifying |
    |                                   |the src_coord value.            |
    |                                   |                                |
    |                                   |The parameter motion_vectors    |
    |                                   |contains the BMVs returned by an|
    |                                   |IME in the same format as       |
    |                                   |returned by                     |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |motion_vectors(..). The MVs are |
    |                                   |in QPEL units. The X and Y      |
    |                                   |coordinates of each MV must be  |
    |                                   |in the range [-2048.00,         |
    |                                   |2047.75), otherwise the results |
    |                                   |are undefined.                  |
    |                                   |                                |
    |                                   |(If this argument value is      |
    |                                   |manually composed, all sub-block|
    |                                   |MVs must be replicated per its  |
    |                                   |format for each partition.  For |
    |                                   |example for 16x16 partition, all|
    |                                   |sub-block MVs must be replicated|
    |                                   |to the same MV, and for 8x8     |
    |                                   |partition, each 8x8 must have   |
    |                                   |its respective sub-block MVs    |
    |                                   |replicated.)                    |
    |                                   |                                |
    |                                   |Legal values and format for     |
    |                                   |major_shapes are as per the     |
    |                                   |return value of                 |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |inter_major_shape(..).          |
    |                                   |                                |
    |                                   |Legal values and format for     |
    |                                   |minor_shapes are as per the     |
    |                                   |return value of                 |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |inter_minor_shapes(..).         |
    |                                   |                                |
    |                                   |Legal values and format for     |
    |                                   |directions are as per the return|
    |                                   |value of                        |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |inter_directions(..).           |
    |                                   |                                |
    |                                   |Legal values for                |
    |                                   |pixel_resolution is either      |
    |                                   |CLK_AVC_ME_SUBPIXEL_MODE_HPEL_  |
    |                                   |INTEL or                        |
    |                                   |CLK_AVC_ME_SUBPIXEL_MODE_QPEL_  |
    |                                   |INTEL.                          |
    |                                   |                                |
    |                                   |The legal values for            |
    |                                   |sad_adjustment is specified by  |
    |                                   |its respective enumeration      |
    |                                   |values.                         |
    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_ref_payload_t  |Return an initialized payload   |
    |intel_sub_group_avc_bme_initialize(|for a VME bidirectional motion  |
    |   ushort2 src_coord,              |estimation (BME) operation.     |
    |   ulong motion_vectors,           |                                |
    |   uchar major_shapes,             |The payload is initialized for  |
    |   uchar minor_shapes,             |progressive frame operations,   |
    |   uchar directions,               |and the cost configuration      |
    |   uchar pixel_resolution,         |values and the miscellaneous    |
    |   uchar bidirectional_weight,     |property values are all         |
    |   uchar sad_adjustment )          |initialized to zero. The cost   |
    |                                   |configuration and the           |
    |                                   |miscellaneous property          |
    |                                   |configuration phase functions   |
    |                                   |must be used to override the    |
    |                                   |initial configurations in the   |
    |                                   |payload.                        |
    |                                   |                                |
    |                                   |If the specified                |
    |                                   |pixel_resolution is a sub-pixel |
    |                                   |resolution then an implicit FME |
    |                                   |operation is performed with the |
    |                                   |BME operation.                  |
    |                                   |                                |
    |                                   |The src_coord value represents  |
    |                                   |the 2D offset of the top left   |
    |                                   |corner of the source MB in pixel|
    |                                   |units in the source image.      |
    |                                   |                                |
    |                                   |If the source image is an       |
    |                                   |interlaced scan image, then the |
    |                                   |bottom field lines are          |
    |                                   |considered as logically         |
    |                                   |overlapping with the top field  |
    |                                   |lines (i.e. the top field MBs   |
    |                                   |are considered as logically     |
    |                                   |overlapping with the bottom MBs)|
    |                                   |for the purposes for specifying |
    |                                   |the src_coord value.            |
    |                                   |                                |
    |                                   |The parameter motion_vectors    |
    |                                   |contains the BMVs returned by an|
    |                                   |IME in the same format as       |
    |                                   |returned by                     |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |motion_vectors(..).             |
    |                                   |                                |
    |                                   |(If this argument value is      |
    |                                   |manually composed, all sub-block|
    |                                   |MVs must be replicated per its  |
    |                                   |format for each partition.  For |
    |                                   |example for 16x16 partition, all|
    |                                   |sub-block MVs must be replicated|
    |                                   |to the same MV, and for 8x8     |
    |                                   |partition, each 8x8 must have   |
    |                                   |its respective sub-block MVs    |
    |                                   |replicated.)                    |
    |                                   |                                |
    |                                   |Legal values and format for     |
    |                                   |major_shapes is as per the      |
    |                                   |return value of                 |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |inter_major_shape(..).          |
    |                                   |                                |
    |                                   |Legal values and format for     |
    |                                   |minor_shapes is as per the      |
    |                                   |return value of                 |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |inter_minor_shapes(..).         |
    |                                   |                                |
    |                                   |Legal values and format for     |
    |                                   |directions are as per the return|
    |                                   |value of                        |
    |                                   |intel_sub_group_avc_mce_get_    |
    |                                   |inter_directions(..).           |
    |                                   |                                |
    |                                   |Legal values for                |
    |                                   |pixel_resolution is either      |
    |                                   |CLK_AVC_ME_SUBPIXEL_MODE_       |
    |                                   |INTEGER_INTEL,                  |
    |                                   |CLK_AVC_ME_SUBPIXEL_MODE_       |
    |                                   |HPEL_INTEL or                   |
    |                                   |CLK_AVC_ME_SUBPIXEL_MODE_       |
    |                                   |QPEL_INTEL.                     |
    |                                   |                                |
    |                                   |Legal values for                |
    |                                   |bidirectional_weight and sad    |
    |                                   |adjustment is as per their      |
    |                                   |respective enumeration values.  |
    |                                   |                                |
    |                                   |The legal values for            |
    |                                   |sad_adjustment is specified by  |
    |                                   |its respective enumeration      |
    |                                   |values.                         |
    +-----------------------------------+--------------------------------+

    Payload type conversion functions
    +++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    initialization phase to convert REF payload to MCE payloads and
    vice-versa.

    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_mce_payload_t    |Convert the REF payload to a  |
    |intel_sub_group_avc_ref_             |generic ME payload.           |
    |convert_to_mce_payload(              |                              |
    |   intel_sub_group_avc_ref_payload_t |                              |
    |      payload )                      |                              |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ref_payload_t    |Convert the generic ME payload|
    |intel_sub_group_avc_mce_             |to an FBR payload.            |
    |convert_to_ref_payload(              |                              |
    |   intel_sub_group_avc_mce_payload_t |                              |
    |      payload )                      |                              |
    +-------------------------------------+------------------------------+

    Multi-reference cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    initialization phase to enable multi-reference image costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ref_set_        |intel_sub_group_avc_mce_       |
    |inter_base_multi_reference_penalty( |set_inter_base_multi_reference_|
    |   uchar reference_base_penalty,    |penalty(..) the payload        |
    |   intel_sub_group_avc_ref_payload_t|conversions with to/from MCE   |
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+

    Inter shape and direction cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    initialization phase to enable shape costing.
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ref_payload_t    |This is a wrapper for         |
    |intel_sub_group_avc_ref_set_         |intel_sub_group_avc_mce_set_  |
    |inter_shape_penalty(                 |inter_shape_penalty(..) with  |
    |   ulong packed_shape_cost,          |the payload conversions       |
    |   intel_sub_group_avc_ref_payload_t |to/from MCE types. See MCE    |
    |      payload )                      |version for description.      |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_ref_payload_t    |This is a wrapper for         |
    |intel_sub_group_avc_ref_set_         |intel_sub_group_avc_mce_set_  |
    |inter_direction_penalty(             |inter_direction_penalty(..)   |
    |   uchar direction_cost,             |with the payload conversions  |
    |   intel_sub_group_avc_ref_payload_t |to/from MCE types. See MCE    |
    |       payload )                     |version for description.      |
    +-------------------------------------+------------------------------+


    Inter motion vector cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    initialization phase to enable motion vector costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ref_set_        |intel_sub_group_avc_mce_set_   |
    |motion_vector_cost_function(        |motion_vector_cost_function    |
    |   ulong packed_cost_center_delta,  |with the payload conversions   |
    |   uint2 packed_cost_table,         |to/from MCE types. See MCE     |
    |   uchar cost_precision,            |version for description.       |
    |   intel_sub_group_avc_ref_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+

    Miscellaneous property configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    initialization phase to enable miscellaneous properties setting in the
    payload.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ref_set_        |intel_sub_group_avc_mce_set_   |
    |source_interlaced_field_polarity(   |source_interlaced_field_       |
    |   uchar src_field_polarity,        |polarity(..)  with the result  |
    |   intel_sub_group_avc_ref_payload_t|conversions to/from MCE        |
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ref_set_        |intel_sub_group_avc_mce_set_   |
    |single_reference_interlaced_        |single_reference_              |
    |field_polarity(                     |interlaced_field_polarity(..)  |
    |   uchar ref_field_polarity,        |with the result conversions    |
    |   intel_sub_group_avc_ref_payload_t|to/from MCE types. See MCE     |
    |      payload )                     |version for description.       |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ref_set_        |intel_sub_group_avc_mce_set_   |
    |dual_reference_interlaced_          |dual_reference_interlaced_     |
    |field_polarities(                   |field_polarities(..)  with the |
    |   uchar fwd_ref_field_polarity,    |result conversions to/from MCE |
    |   uchar bwd_ref_field_polarity,    |types. See MCE version for     |
    |   intel_sub_group_avc_ref_payload_t|description.                   |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |Update the input payload to    |
    |intel_sub_group_avc_ref_set_        |disable a mix of bidirectional |
    |bidirectional_mix_disable(          |and unidirectional MVs in the  |
    |   intel_sub_group_avc_ref_payload_t|result.                        |
    |      payload )                     |                               |
    |                                    |Default is to enable it.       |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |Update the input payload to do |
    |intel_sub_group_avc_ref_set_        |enable bilinear filter         |
    |bilinear_filter_enable(             |interpolation instead of 4-tap |
    |   intel_sub_group_avc_ref_payload_t|filter interpolation. Default  |
    |      payload )                     |is 4-tap filter interpolation. |
    |                                    |                               |
    |                                    |This should not be called if   |
    |                                    |the payload was initialized    |
    |                                    |with integer pixel resolution. |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_ref_set_        |intel_sub_group_avc_mce_set_   |
    |ac_only_haar(                       |ac_only_haar(..) with the      |
    |   intel_sub_group_avc_ref_payload_t|payload conversions to/from MCE|
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+

    Evaluation phase functions
    ++++++++++++++++++++++++++

    These built-in functions perform the evaluation of the REF operation
    configured in the payload with a VME media sampler and return the
    results.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_result_t    |Evaluate the basic REF         |
    |intel_sub_group_avc_ref_            |operation with single reference|
    |evaluate_with_single_reference(     |and return its results.        |
    |   read_only image2d_t src_image,   |                               |
    |   read_only image2d_t ref_image,   |The parameter ref_image must be|
    |   sampler_t vme_media_sampler,     |a valid forward image kernel   |
    |   intel_sub_group_avc_ref_payload_t|parameter per the ordering     |
    |      payload )                     |conventions for the kernel     |
    |                                    |parameter list.                |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_result_t    |Evaluate the basic REF         |
    |intel_sub_group_avc_ref_            |operation with dual reference  |
    |evaluate_with_dual_reference(       |and return its results.        |
    |  read_only image2d_t src_image,    |                               |
    |  read_only image2d_t fwd_ref_image,|The parameter                  |
    |  read_only image2d_t bwd_ref_image,|fwd_ref_image[bwd_ref_image]   |
    |  sampler_t vme_media_sampler,      |must be a valid                |
    |  intel_sub_group_avc_ref_payload_t |forward[backward] image kernel |
    |     payload )                      |parameter per the ordering     |
    |                                    |conventions for the kernel     |
    |                                    |parameter list.                |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_result_t    |Evaluate the basic REF         |
    |intel_sub_group_avc_ref_            |operation with multi references|
    |evaluate_with_multi_reference(      |and return its results.        |
    |   read_only image2d_t src_image,   |                               |
    |   uint packed_reference_ids,       |A unique pair of reference     |
    |   sampler_t vme_media_sampler,     |identifiers (indicating unique |
    |   intel_sub_group_avc_ref_payload_t|forward/backward reference     |
    |      payload )                     |images) may be specified for   |
    |                                    |each of the allowed major      |
    |                                    |partitions using               |
    |                                    |packed_dual_ref_ids.           |
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |packed_reference_ids is a      |
    |                                    |integer with the following bits|
    |                                    |specifying the values for the  |
    |                                    |pair of reference images for   |
    |                                    |each major partition.          |
    |                                    |                               |
    |                                    |3:0   => Fwd reference block 0 |
    |                                    |7:4   => Bwd reference block 0 |
    |                                    |11:8  => Fwd reference block 1 |
    |                                    |15:12 => Bwd reference block 1 |
    |                                    |19:16 => Fwd reference block 2 |
    |                                    |23:20 => Bwd reference block 2 |
    |                                    |27:24 => Fwd reference block 3 |
    |                                    |31:28 => Bwd reference block 3 |
    |                                    |                               |
    |                                    |A forward[backward] reference  |
    |                                    |idenitifer value of 'n'        |
    |                                    |indicates the forward[backward]|
    |                                    |image from the 'n'th pair of   |
    |                                    |forward/backward reference     |
    |                                    |images, with the value of 'n'  |
    |                                    |ranging from 0 to 15.          |
    |                                    |                               |
    |                                    |If the REF operation is        |
    |                                    |configured with only forward   |
    |                                    |reference images then, the     |
    |                                    |values of the backward         |
    |                                    |reference identifiers are not  |
    |                                    |used.                          |
    |                                    |                               |
    |                                    |The blocks are numbered using  |
    |                                    |the traditional Z order. For   |
    |                                    |larger block sizes, the        |
    |                                    |sub-block reference identifier |
    |                                    |pairs must be replicated. For  |
    |                                    |example, for a 16x16 block, all|
    |                                    |four pair of reference         |
    |                                    |identifiers must be replicated |
    |                                    |to the value of the first pair |
    |                                    |for block 0.                   |
    |                                    |                               |
    |                                    |The value for the              |
    |                                    |packed_reference_ids argument  |
    |                                    |is obtained by calling         |
    |                                    |intel_sub_group_avc_ime_get_   |
    |                                    |inter_reference_ids(..)  for   |
    |                                    |the preceding IME operation's  |
    |                                    |result.                        |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_ref_result_t    |Evaluate the basic REF         |
    |intel_sub_group_avc_ref_            |operation with multi references|
    |evaluate_with_multi_reference(      |and return its results.  This  |
    |   read_only image2d_t src_image,   |is used for interlaced source  |
    |   uint packed_reference_ids,       |and reference images.          |
    |   uchar packed_reference_          |                               |
    |   field_polarities,                |A unique pair of reference     |
    |   sampler_t vme_media_sampler,     |identifiers (indicating unique |
    |   intel_sub_group_avc_ref_payload_t|forward/backward reference     |
    |      payload )                     |images) may be specified for   |
    |                                    |each of the allowed major      |
    |                                    |partitions using               |
    |                                    |packed_reference_ids.          |
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |packed_reference_ids is an     |
    |                                    |integer with the following bits|
    |                                    |specifying the values for the  |
    |                                    |pair of reference images for   |
    |                                    |each major partition.          |
    |                                    |                               |
    |                                    |3:0   => Fwd reference block 0 |
    |                                    |7:4   => Bwd reference block 0 |
    |                                    |11:8  => Fwd reference block 1 |
    |                                    |15:12 => Bwd reference block 1 |
    |                                    |19:16 => Fwd reference block 2 |
    |                                    |23:20 => Bwd reference block 2 |
    |                                    |27:24 => Fwd reference block 3 |
    |                                    |31:28 => Bwd reference block 3 |
    |                                    |                               |
    |                                    |A forward[backward] reference  |
    |                                    |idenitifer value of 'n'        |
    |                                    |indicates the forward[backward]|
    |                                    |image from the 'n'th pair of   |
    |                                    |forward/backward reference     |
    |                                    |images, with the value of 'n'  |
    |                                    |ranging from 0 to 15.          |
    |                                    |                               |
    |                                    |If the REF operation is        |
    |                                    |configured with only forward   |
    |                                    |reference images then, the     |
    |                                    |values of the backward         |
    |                                    |reference identifiers are not  |
    |                                    |used.                          |
    |                                    |                               |
    |                                    |The blocks are numbered using  |
    |                                    |the traditional Z order. For   |
    |                                    |larger block sizes, the        |
    |                                    |sub-block reference identifier |
    |                                    |pairs must be replicated. For  |
    |                                    |example, for a 16x16 block, all|
    |                                    |four pair of reference         |
    |                                    |identifiers must be replicated |
    |                                    |to the value of the first pair |
    |                                    |for block 0.                   |
    |                                    |                               |
    |                                    |The value for the              |
    |                                    |packed_reference_ids argument  |
    |                                    |is obtained by calling         |
    |                                    |intel_sub_group_avc_ime_get_   |
    |                                    |inter_reference_ids(..)  for   |
    |                                    |the preceding IME operation's  |
    |                                    |result.                        |
    |                                    |                               |
    |                                    |Reference field polarities for |
    |                                    |forward and backward reference |
    |                                    |images are specified for each  |
    |                                    |of the allowed major partitions|
    |                                    |using packed_reference_field_  |
    |                                    |polarities.                    |
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |packed_reference_field_        |
    |                                    |polarities is an integer with  |
    |                                    |the following bits specifying  |
    |                                    |the reference field polarities |
    |                                    |for the major partitions.      |
    |                                    |                               |
    |                                    |0 : Fwd reference block 0      |
    |                                    |1 : Fwd reference block 1      |
    |                                    |2 : Fwd reference block 2      |
    |                                    |3 : Fwd reference block 3      |
    |                                    |4 : Bwd reference block 0      |
    |                                    |5 : Bwd reference block 1      |
    |                                    |6 : Bwd reference block 2      |
    |                                    |7 : Bwd reference block 3      |
    |                                    |                               |
    |                                    |If the dual-reference          |
    |                                    |evaluation functions are not   |
    |                                    |used, then the values of the   |
    |                                    |backward reference field       |
    |                                    |polarities are not used.       |
    |                                    |                               |
    |                                    |The blocks are numbered using  |
    |                                    |the traditional Z order. For   |
    |                                    |larger block sizes, the        |
    |                                    |sub-block reference field      |
    |                                    |polarities are replicated. For |
    |                                    |example, for a 16x16 block all |
    |                                    |four pairs of reference field  |
    |                                    |polarities are replicated to   |
    |                                    |the value of the first pair for|
    |                                    |block 0.                       |
    +------------------------------------+-------------------------------+

    Result type conversion functions
    +++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    evaluation phase to convert REF results to MCE results and vice-versa.

    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_mce_result_t   |Convert the REF result into a   |
    |intel_sub_group_avc_ref_           |MCE result.                     |
    |convert_to_mce_result(             |                                |
    |   intel_sub_group_avc_ref_result_t|                                |
    |      result )                     |                                |
    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_ref_result_t   |Convert the MCE result into an  |
    |intel_sub_group_avc_mce_           |REF result.                     |
    |convert_to_ref_result(             |                                |
    |   intel_sub_group_avc_mce_result_t|                                |
    |      result )                     |                                |
    +-----------------------------------+--------------------------------+

    Result processing phase functions
    +++++++++++++++++++++++++++++++++

    These built-in functions are called following the evaluation phase to
    extract the various result components from an REF evaluation result.

    +-------------------------------------+------------------------------+
    |ulong intel_sub_group_avc_ref_get_   |This is a wrapper for         |
    |motion_vectors(                      |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |motion_vectors(..) with the   |
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |ushort intel_sub_group_avc_ref_get_  |This is a wrapper for         |
    |inter_distortions(                   |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |inter_distortions(..) with the|
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |ushort intel_sub_group_avc_ref_get_  |This is a wrapper for         |
    |best_inter_distortion(               |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |best_inter_distortion(..)with |
    |      result )                       |the result conversions to/from|
    |                                     |MCE types. See MCE version for|
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ref_get_   |This is a wrapper for         |
    |inter_major_shape(                   |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |inter_major_shape(..) with the|
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ref_get_   |This is a wrapper for         |
    |inter_minor_shapes(                  |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |inter_minor_shapes(..) with   |
    |      result )                       |the result conversions to/from|
    |                                     |MCE types. See MCE version for|
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ref_get_   |This is a wrapper for         |
    |inter_directions(                    |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |inter_directions(..)  with the|
    |      result )                       |result conversions to/from MCE|
    |                                     |types. See MCE version for    |
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ref_get_   |This is a wrapper for         |
    |inter_motion_vector_count(           |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |inter_motion_vector_count(..) |
    |      result )                       |with the result conversions   |
    |                                     |to/from MCE types. See MCE    |
    |                                     |version for description.      |
    +-------------------------------------+------------------------------+
    |uint intel_sub_group_avc_ref_get_    |This is a wrapper for         |
    |inter_reference_ids(                 |intel_sub_group_avc_mce_get_  |
    |   intel_sub_group_avc_ref_result_t  |inter_reference_ids(..) with  |
    |      result )                       |the result conversions to/from|
    |                                     |MCE types. See MCE version for|
    |                                     |description.                  |
    +-------------------------------------+------------------------------+
    |uchar intel_sub_group_avc_ref_get_   |This is a wrapper for         |
    |inter_reference_interlaced_          |intel_sub_group_avc_mce_get_  |
    |field_polarities(                    |inter_reference_interlaced_   |
    |   uint packed_reference_ids,        |field_polarities(..)  with the|
    |   uint packed_reference_parameter_  |result conversions to/from MCE|
    |   field_polarities,                 |types. See MCE version for    |
    |   intel_sub_group_avc_ref_result_t  |description.                  |
    |      result )                       |                              |
    +-------------------------------------+------------------------------+

    ______________________________________________________________________


    SIC built-in functions
    ----------------------

    A set of ordered phases of functions are required to be called to
    evaluate a skip check or intra estimation operation result.

    Initialization phase functions
    ++++++++++++++++++++++++++++++

    These built-in functions create a properly initialized payload that
    can be used for further configured for evaluating SIC operations. This
    is a required initial phase.

    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_sic_payload_t  |Return an initialized payload   |
    |intel_sub_group_avc_sic_initialize(|for a VME SIC operation.        |
    |   ushort2 src_coord )             |                                |
    |                                   |                                |
    |                                   |If the source image is an       |
    |                                   |interlaced scan image, then the |
    |                                   |bottom field lines are          |
    |                                   |considered as logically         |
    |                                   |overlapping with the top field  |
    |                                   |lines (i.e. the top field MBs   |
    |                                   |are considered as logically     |
    |                                   |overlapping with the bottom MBs)|
    |                                   |for the purposes for specifying |
    |                                   |the src_coord value.            |
    |                                   |                                |
    |                                   |If the SIC operation is being   |
    |                                   |configured for chroma based     |
    |                                   |intra estimation, then the x and|
    |                                   |y coordinates of src_coord must |
    |                                   |be multiples of 2.              |
    +-----------------------------------+--------------------------------+

    Configuration phase functions
    +++++++++++++++++++++++++++++

    These built-in functions allow for configuration of a skip check
    and/or intra estimation operation. This is a required phase following
    the initialization phase to configure the skip check or intra
    estimation operation. A call to intel_sub_group_avc_sic_configure_
    skc(..) or intel_sub_group_avc_sic_configure_ipe(..) is required. Both
    intel_sub_group_avc_sic_configure_skc(..) and intel_sub_group_avc_
    sic_configure_ipe(..) may be called to initialize the payload to
    perform both skip checks and intra estimation as part for the same SIC
    evaluation, but if called together intel_sub_group_avc_sic_configure_
    skc(..) must be called first in the call sequence.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Configure the SIC payload for  |
    |intel_sub_group_avc_sic_            |(uni or bi-directional) skip   |
    |configure_skc(                      |checks.                        |
    |   uint skip_block_partition_type,  |                               |
    |   uint skip_motion_vector_mask,    |The legal values for           |
    |   ulong motion_vectors,            |skip_block_partition_type must |
    |   uchar bidirectional_weight,      |be one of the specified        |
    |   uchar skip_sad_adjustment,       |partition mask enumeration     |
    |   intel_sub_group_avc_sic_payload_t|values.                        |
    |     payload )                      |                               |
    |                                    |Legal values for               |
    |                                    |skip_motion_vector_mask can be |
    |                                    |composed using the '|' operator|
    |                                    |from the enumeration values    |
    |                                    |defined for it; both           |
    |                                    |unidirectional and             |
    |                                    |bidirectional skip vectors can |
    |                                    |be specified uniquely for each |
    |                                    |major partition (16x16 or 8x8) |
    |                                    |by an appropriate selection of |
    |                                    |the skip motion vector mask    |
    |                                    |enumeration values. If the     |
    |                                    |16x16 skip_block_partition_type|
    |                                    |is specified, then only the    |
    |                                    |16x16 enumeration values may be|
    |                                    |used, else only the 8x8        |
    |                                    |enumeration values may be used.|
    |                                    |                               |
    |                                    |For convenience, one of the    |
    |                                    |following two mechanisms can be|
    |                                    |used to obtain the value for   |
    |                                    |the skip_motion_vector_mask    |
    |                                    |argument.                      |
    |                                    |                               |
    |                                    |If the shape and directions are|
    |                                    |all compile-time constants,    |
    |                                    |then either one of the two     |
    |                                    |following macros may be used to|
    |                                    |set this argument's value for  |
    |                                    |16x16 or 8x8 major partitions  |
    |                                    |respectively:                  |
    |                                    |- CLK_AVC_ME_SKIP_BLOCK_16x16_ |
    |                                    |  INTEL(DIRECTION)             |
    |                                    |- CLK_AVC_ME_SKIP_BLOCK_8x8_   |
    |                                    |  INTEL(DIRECTION0, DIRECTION1,|
    |                                    |  DIRECTION2, DIRECTION3)      |
    |                                    |                               |
    |                                    |The DIRECTION{0-3} argument    |
    |                                    |values must be one of the inter|
    |                                    |macro-block major direction    |
    |                                    |values. The directions are     |
    |                                    |specified for each major       |
    |                                    |partition in traditional       |
    |                                    |Z-order.                       |
    |                                    |                               |
    |                                    |If the shape or directions are |
    |                                    |not compile-time constants,    |
    |                                    |then the helper function       |
    |                                    |intel_sub_group_avc_sic_get_   |
    |                                    |motion_vector_mask(..)  may be |
    |                                    |used to set this argument's    |
    |                                    |value.                         |
    |                                    |                               |
    |                                    |The parameter motion_vectors   |
    |                                    |specifies the input packed     |
    |                                    |BMVs. Either the forward or    |
    |                                    |backward is ignored if the     |
    |                                    |setting in                     |
    |                                    |skip_motion_vector_mask is     |
    |                                    |backward or forward            |
    |                                    |respectively. If the setting is|
    |                                    |bidirectional, then both the   |
    |                                    |forward and backward motion    |
    |                                    |vectors will be used.  If the  |
    |                                    |skip_block_partition_type is   |
    |                                    |16x16, work-item 0 in the      |
    |                                    |subgroup provides the BMV, and |
    |                                    |if the                         |
    |                                    |skip_block_partition_type is   |
    |                                    |8x8, work-items 0 to 4 in the  |
    |                                    |subgroup provide the four      |
    |                                    |BMVs. The MVs are in QPEL      |
    |                                    |units. The X and Y coordinates |
    |                                    |of each MV must be in the range|
    |                                    |[-2048.00, 2047.75] and        |
    |                                    |[-512.00 to 511.75]            |
    |                                    |respectively, otherwise the    |
    |                                    |results are undefined.         |
    |                                    |                               |
    |                                    |Legal values for               |
    |                                    |bidirectional_weight and       |
    |                                    |skip_sad_adjustment are as per |
    |                                    |their respective enumeration   |
    |                                    |values. If the setting is      |
    |                                    |unidirectional, then the       |
    |                                    |bidirectional_weight parameter |
    |                                    |value is ignored and can be set|
    |                                    |to the value '0 '.             |
   +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Return an initialized payload  |
    |intel_sub_group_avc_sic_            |for a VME luma intra prediction|
    |configure_ipe(                      |estimation (IPE) operation.    |
    |   uchar luma_intra_partition_mask, |                               |
    |   uchar                            |The luma_intra_partition_mask  |
    |   intra_neighbour_availabilty,     |can be composed from their     |
    |   uchar left_edge_luma_pixels,     |respective enumeration values  |
    |   uchar                            |using the '&' operator.        |
    |   upper_left_corner_luma_pixel,    |                               |
    |   uchar upper_edge_luma_pixels,    |Legal values for               |
    |   uchar                            |intra_neighbour_availabilty and|
    |   upper_right_edge_luma_pixels,    |intra_sad_adjustment are as per|
    |   uchar intra_sad_adjustment ,     |their respective enumeration   |
    |   intel_sub_group_avc_sic_payload_t|values, based on the intra     |
    |      payload )                     |neighboring macroblock's to    |
    |                                    |consider in intra mode         |
    |                                    |estimation.                    |
    |                                    |                               |
    |                                    |The other parameters specify   |
    |                                    |the neighbor edge pixels for   |
    |                                    |the left, top-left corner, top |
    |                                    |and top right edges with each  |
    |                                    |work-item providing each pixel |
    |                                    |value. These pixels values are |
    |                                    |used to perform the intra mode |
    |                                    |estimation.                    |
    |                                    |                               |
    |                                    |For the left and top edge      |
    |                                    |pixels, successive subgroup    |
    |                                    |work-items 0 to 15 provide the |
    |                                    |successive edge pixels. For the|
    |                                    |top-right edge, successive     |
    |                                    |work-items 0 to 7 provide the  |
    |                                    |successive edge pixels; the    |
    |                                    |pixel values in work-items 8 to|
    |                                    |15 are ignored. The top-left   |
    |                                    |corner pixel is a uniform pixel|
    |                                    |value with each work-item      |
    |                                    |providing the same corner      |
    |                                    |pixel.                         |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Return an initialized payload  |
    |intel_sub_group_avc_sic_            |for a VME luma and chroma intra|
    |configure_ipe(                      |prediction estimation (IPE)    |
    |   uchar luma_intra_partition_mask, |operation.                     |
    |   uchar intra_neighbour_           |                               |
    |   availabilty,                     |The luma_intra_partition_mask  |
    |   uchar left_edge_luma_pixels,     |can be composed from their     |
    |   uchar upper_left_corner_luma_    |respective enumeration values  |
    |   pixel,                           |using the '&' operator. Chroma |
    |   uchar upper_edge_luma_           |intra prediction estimation is |
    |   pixels,                          |performed on the 8x8 CbCr      |
    |   uchar upper_right_edge_luma_     |macroblock.                    |
    |   pixels,                          |                               |
    |   ushort left_edge_chroma_pixels,  |Legal values for               |
    |   ushort upper_left_corner_chroma_ |intra_neighbour_availabilty and|
    |   pixel,                           |intra_sad_adjustment are as per|
    |   ushort upper_edge_chroma_pixels, |their respective enumeration   |
    |   uchar intra_sad_adjustment,      |values, based on the intra     |
    |   intel_sub_group_avc_sic_payload_t|neighboring macroblock's to    |
    |      payload )                     |consider in intra mode         |
    |                                    |estimation.                    |
    |                                    |                               |
    |                                    |The other parameters specify   |
    |                                    |the neighbor luma and chroma   |
    |                                    |edge pixels for the left,      |
    |                                    |top-left corner, top and       |
    |                                    |top-right (luma only) edges    |
    |                                    |with each work-item providing  |
    |                                    |each pixel value. These pixels |
    |                                    |values are used to perform the |
    |                                    |intra mode estimation.         |
    |                                    |                               |
    |                                    |For the left and top luma edge |
    |                                    |pixels successive, subgroup    |
    |                                    |work-items 0 to 15 provide the |
    |                                    |successive edge pixels. For the|
    |                                    |top-right luma edge, successive|
    |                                    |work-items 0 to 7 provide the  |
    |                                    |successive edge pixels; the    |
    |                                    |pixel values in work-items 8 to|
    |                                    |15 are ignored. The top-left   |
    |                                    |corner pixel is a uniform pixel|
    |                                    |value with each work-item      |
    |                                    |providing the same corner      |
    |                                    |pixel.                         |
    |                                    |                               |
    |                                    |For the left and top chroma    |
    |                                    |CbCr pixels, successive        |
    |                                    |subgroup work-items 0 to 7     |
    |                                    |provide the successive CbCr    |
    |                                    |pixels; the pixel values in    |
    |                                    |work-items 8 to 15 are         |
    |                                    |ignored. The top-left corner   |
    |                                    |pixel is a uniform CbCr pixel  |
    |                                    |value with each work-item      |
    |                                    |providing the same corner CbCr |
    |                                    |pixel.                         |
    +------------------------------------+-------------------------------+
    |uint intel_sub_group_avc_sic_get_   |Compose the value for the input|
    |motion_vector_mask(                 |argument                       |
    |   uint skip_block_partition_type,  |skip_motion_vector_mask for the|
    |   uchar direction )                |input skip_block_partition_type|
    |                                    |and direction.                 |
    |                                    |                               |
    |                                    |The legal values for           |
    |                                    |skip_block_partition_type must |
    |                                    |be one of the specified        |
    |                                    |partition mask enumeration     |
    |                                    |values (16x16 or 8x8).         |
    |                                    |                               |
    |                                    |The direction parameter is a   |
    |                                    |bit field with the directions  |
    |                                    |for the 4 8x8 sub-partitions in|
    |                                    |traditional Z order, or for    |
    |                                    |only the 16x16 partition. Two  |
    |                                    |bits are reserved for each of  |
    |                                    |the four sub-partitions in     |
    |                                    |row-major order.  The 2-bit    |
    |                                    |values are as per the inter    |
    |                                    |macro-block major direction    |
    |                                    |values. If the                 |
    |                                    |skip_block_partition_type      |
    |                                    |indicates a 16x16 shape, then  |
    |                                    |only the 1st 2 bits contains   |
    |                                    |the direction, and other bits  |
    |                                    |must be zeroed.                |
    +------------------------------------+-------------------------------+

    Payload type conversion functions
    +++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    configuration phase to convert IME payload to MCE payloads and vice-
    versa.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_mce_payload_t   |Convert the SIC payload to a   |
    |intel_sub_group_avc_sic_convert_    |generic MCE payload.           |
    |to_mce_payload(                     |                               |
    |   intel_sub_group_avc_sic_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Convert the generic MCE payload|
    |intel_sub_group_avc_                |to a SIC payload.              |
    |mce_convert_to_sic_payload(         |                               |
    |   intel_sub_group_avc_mce_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+

    Multi-reference cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    configuration phase to enable multi-reference image costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_sic_set_        |intel_sub_group_avc_mce_       |
    |inter_base_multi_reference_penalty( |set_inter_base_multi_reference_|
    |   uchar reference_base_penalty,    |penalty(..) the payload        |
    |   intel_sub_group_avc_sic_payload_t|conversions with to/from MCE   |
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+

    Inter shape and direction cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    configuration phase to enable shape costing.
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_sic_payload_t    |This is a wrapper for         |
    |intel_sub_group_avc_sic_set_         |intel_sub_group_avc_mce_set_  |
    |inter_shape_penalty(                 |inter_shape_penalty(..) with  |
    |   ulong packed_shape_cost,          |the payload conversions       |
    |   intel_sub_group_avc_sic_payload_t |to/from MCE types. See MCE    |
    |      payload )                      |version for description.      |
    +-------------------------------------+------------------------------+
    |intel_sub_group_avc_sic_payload_t    |This is a wrapper for         |
    |intel_sub_group_avc_sic_set_         |intel_sub_group_avc_mce_set_  |
    |inter_direction_penalty(             |inter_direction_penalty(..)   |
    |    uchar direction_cost,            |with the payload conversions  |
    |    intel_sub_group_avc_sic_payload_t|to/from MCE types. See MCE    |
    |       payload )                     |version for description.      |
    +-------------------------------------+------------------------------+

    Inter motion vector cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    configuration phase to enable motion vector costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_sic_set_        |intel_sub_group_avc_mce_set_   |
    |motion_vector_cost_function(        |motion_vector_cost_function(..)|
    |   ulong packed_cost_center_delta,  |with the payload conversions   |
    |   uint2 packed_cost_table,         |to/from MCE types. See MCE     |
    |   uchar cost_precision,            |version for description.       |
    |   intel_sub_group_avc_sic_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+

    Intra shape cost configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    configuration phase to enable intra shape costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Set the luma shape cost penalty|
    |intel_sub_group_avc_sic_set_        |for intra motion estimation.   |
    |intra_luma_shape_penalty(           |                               |
    |   uint packed_shape_cost,          |The value of packed_shape_cost |
    |   intel_sub_group_avc_sic_payload_t|is an integer value with the   |
    |      payload )                     |following bits specifying the  |
    |                                    |shape cost in U4U4 format as   |
    |                                    |follows:                       |
    |                                    |                               |
    |                                    |7:0   : must be zero           |
    |                                    |15:8  : 16x16 cost             |
    |                                    |23:16 : 8x8 cost               |
    |                                    |31:24 : 4x4 cost               |
    |                                    |                               |
    |                                    |The U4U4 decoded integer values|
    |                                    |must bit fit within 12 bits.   |
    +------------------------------------+-------------------------------+

    Intra mode cost configuration phase functions
    +++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    configuration phase to enable intra mode costing.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |The value of luma_mode_penalty |
    |intel_sub_group_avc_sic_set_        |specifies the penalty to be    |
    |intra_luma_mode_cost_function(      |applied to the estimated luma  |
    |   uchar luma_mode_penalty,         |mode if it differs from its    |
    |   uint luma_packed_neighbor_modes, |predicted luma mode (based on  |
    |   uint luma_packed_non_dc_penalty, |its neighbor intra modes). It  |
    |   intel_sub_group_avc_sic_payload_t|is specified in U4U4 format and|
    |      payload )                     |must bit in 10 bits.           |
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |luma_packed_neighbor_modes     |
    |                                    |specifies the values of the    |
    |                                    |already computed top and left  |
    |                                    |neighbor modes for the         |
    |                                    |bordering 4x4 blocks, with the |
    |                                    |4x4 blocks numbered in the     |
    |                                    |traditional Z-order as shown   |
    |                                    |below.                         |
    |                                    |                               |
    |                                    |    0 1 4 5                    |
    |                                    |    2 3 6 7                    |
    |                                    |    8 9 C D                    |
    |                                    |    A B E F                    |
    |                                    |                               |
    |                                    |the following bits specify the |
    |                                    |neighbor modes.                |
    |                                    |                               |
    |                                    |3:0   => Left neighbor block 5 |
    |                                    |7:4   => Left neighbor block 7 |
    |                                    |11:8  => Left neighbor block D |
    |                                    |15:12 => Left neighbor block F |
    |                                    |19:16 => Top neighbor block A  |
    |                                    |23:20 => Top neighbor block B  |
    |                                    |27:24 => Top neighbor block E  |
    |                                    |31:28 => Top neighbor block F  |
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |luma_packed_non_dc_penalty     |
    |                                    |specifies the penalty to be    |
    |                                    |applied for any computed non-DC|
    |                                    |luma mode for each of the      |
    |                                    |16x16, 8x8, and 4x4 shapes,    |
    |                                    |with the following bits        |
    |                                    |specifying the penalties.      |
    |                                    |                               |
    |                                    |7:0   => Intra16x16 non-dc     |
    |                                    |         penalty               |
    |                                    |15:8  => Intra8x8 non-dc       |
    |                                    |         penalty               |
    |                                    |23:16 => Intra4x4 non-dc       |
    |                                    |         penalty               |
    |                                    |31:24 => Must be zero          |
    |                                    |                               |
    |                                    |The component values are       |
    |                                    |specified in 8-bit integer     |
    |                                    |format.                        |
    |                                    |                               |
    |                                    |The intra distortion for each  |
    |                                    |intra luma block can be        |
    |                                    |described by the following     |
    |                                    |formulas:                      |
    |                                    |                               |
    |                                    |Intra_4x4 SAD (or Haar) +      |
    |                                    |luma_shape_penalty_4x4 +       |
    |                                    |luma_non_dc_4x4_penalty (if not|
    |                                    |DC) +                          |
    |                                    |luma_mode_penalty (if computed |
    |                                    |mode is not the same predicted |
    |                                    |mode from neighbor modes)      |
    |                                    |                               |
    |                                    |Intra_8x8 SAD (or Haar) +      |
    |                                    |luma_shape_penalty_8x8 +       |
    |                                    |luma_non_dc_8x8_penalty (if not|
    |                                    |DC) +                          |
    |                                    |luma_mode_penalty (if computed |
    |                                    |mode is not the same predicted |
    |                                    |mode from neighbor modes )     |
    |                                    |                               |
    |                                    |Intra_16x16 SAD (or Haar) +    |
    |                                    |luma_shape_penalty_16x16 +     |
    |                                    |luma_non_dc_4x4_penalty (if not|
    |                                    |DC)                            |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Set the intra chroma mode cost |
    |intel_sub_group_avc_sic_set_        |function by specifying the     |
    |intra_chroma_mode_cost_function(    |penalty to be applied to the   |
    |   uchar chroma_mode_penalty,       |computed chroma mode.          |
    |   intel_sub_group_avc_sic_payload_t|                               |
    |      payload )                     |The chroma_mode_base_penalty is|
    |                                    |the base penalty to be applied |
    |                                    |to the computed intra chroma   |
    |                                    |modes. This penalty is in U4U4 |
    |                                    |format.                        |
    |                                    |                               |
    |                                    |The U4U4 decoded integer value |
    |                                    |must fit in 12 bits.           |
    |                                    |                               |
    |                                    |The base penalty is scaled     |
    |                                    |based on the computed mode as  |
    |                                    |defined below.                 |
    |                                    |                               |
    |                                    |   DC   : 0x                   |
    |                                    |   HORZ : 1x                   |
    |                                    |   VERT : 1x                   |
    |                                    |   PLANE: 2x                   |
    |                                    |                               |
    |                                    |The intra distortion for each  |
    |                                    |intra 8x8 chroma block can be  |
    |                                    |described by the following     |
    |                                    |formulas:                      |
    |                                    |                               |
    |                                    |distortion =                   |
    |                                    |SAD (or Haar) +                |
    |                                    |chroma_mode_penalty (scaled    |
    |                                    |based on computed mode)        |
    +------------------------------------+-------------------------------+

    Miscellaneous property configuration phase functions
    ++++++++++++++++++++++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    configuration phase to enable miscellaneous properties setting in the
    payload.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_sic_set_        |intel_sub_group_avc_mce_set_   |
    |source_interlaced_field_polarity(   |source_interlaced_field_       |
    |   uchar src_field_polarity,        |polarity(..)  with the result  |
    |   intel_sub_group_avc_sic_payload_t|conversions to/from MCE        |
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_sic_set_        |intel_sub_group_avc_mce_set_   |
    |single_reference_interlaced_        |single_reference_              |
    |field_polarity(                     |interlaced_field_polarity(..)  |
    |   uchar ref_field_polarity,        |with the result conversions    |
    |   intel_sub_group_avc_sic_payload_t|to/from MCE types. See MCE     |
    |      payload )                     |version for description.       |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_sic_set_        |intel_sub_group_avc_mce_set_   |
    |dual_reference_interlaced_          |dual_reference_interlaced_     |
    |field_polarities(                   |field_polarities(..)  with the |
    |   uchar fwd_ref_field_polarity,    |result conversions to/from MCE |
    |   uchar bwd_ref_field_polarity,    |types. See MCE version for     |
    |   intel_sub_group_avc_sic_payload_t|description.                   |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Update the input payload to do |
    |intel_sub_group_avc_sic_set_        |enable bilinear filter         |
    |skc_bilinear_filter_enable(         |interpolation instead of 4-tap |
    |   intel_sub_group_avc_sic_payload_t|filter interpolation.  Default |
    |      payload )                     |is 4-tap filter interpolation. |
    |                                    |                               |
    |                                    |This should not be called if   |
    |                                    |the payload was initialized    |
    |                                    |with integer pixel resolution. |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |Enable skip check forward      |
    |intel_sub_group_avc_sic_set_        |transform with the specified   |
    |skc_forward_transform_enable(       |SAD coefficients thresholds in |
    |   ulong packed_sad_coefficients,   |the frequency domain to        |
    |   intel_sub_group_avc_sic_payload_t|approximate the effects of     |
    |      payload )                     |forward quantization.          |
    |                                    |                               |
    |                                    |The skip decision will be      |
    |                                    |enhanced to include an accurate|
    |                                    |AVC forward transform for skip |
    |                                    |estimation. This feature is in |
    |                                    |addition to the previous SAD or|
    |                                    |HAAR skip estimation. The      |
    |                                    |results of the forward         |
    |                                    |transform are compared one     |
    |                                    |coefficient at a time against a|
    |                                    |user-specified threshold, in   |
    |                                    |the input argument             |
    |                                    |packed_sad_coefficients, to    |
    |                                    |emulate quantization's zeroing |
    |                                    |effect. The user is returned   |
    |                                    |the count of coefficients that |
    |                                    |exceeded their threshold along |
    |                                    |with a sum of the amount       |
    |                                    |exceeded, both grouped at the  |
    |                                    |8x8 block level (i.e. for each |
    |                                    |8x8 block).                    |
    |                                    |                               |
    |                                    |The SAD coefficient threshold  |
    |                                    |matrix for a 4x4 transform as  |
    |                                    |shown in the table below is    |
    |                                    |packed into a 64-bit           |
    |                                    |integer. The low 16 bits       |
    |                                    |contains the larger DC         |
    |                                    |threshold. The coefficient     |
    |                                    |thresholds for the remaining 6 |
    |                                    |AC thresholds in the order of  |
    |                                    |increasing frequency are       |
    |                                    |provided by the successive     |
    |                                    |8-bit bit ranges.              |
    |                                    |                               |
    |                                    |0 (DC) 1 (AC) 2 (AC) 3 (AC)    |
    |                                    |1 (AC) 2 (AC) 3 (AC) 4 (AC)    |
    |                                    |2 (AC) 3 (AC) 4 (AC) 5 (AC)    |
    |                                    |3 (AC) 4 (AC) 5 (AC) 6 (AC)    |
    |                                    |                               |
    |                                    |This is valid only for SKC     |
    |                                    |operations.                    |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |The raw skip SAD computed      |
    |intel_sub_group_avc_sic_set_        |during the evaluation phase    |
    |block_based_raw_skip_sad(           |will be the maximal SAD of     |
    |   uchar block_based_skip_type,     |individual 4x4 (or 8x8) blocks,|
    |   intel_sub_group_avc_sic_payload_t|instead of the sum of the      |
    |      payload )                     |entire individual 4x4 block    |
    |                                    |SADs of the MB.                |
    |                                    |                               |
    |                                    |The legal values for           |
    |                                    |block_based_skip_type must be  |
    |                                    |one of the specified block     |
    |                                    |based skip type enumeration    |
    |                                    |values.                        |
    |                                    |                               |
    |                                    |It is valid to call this       |
    |                                    |function only if the payload is|
    |                                    |configured for a skip check    |
    |                                    |operation by a prior call to   |
    |                                    |intel_sub_group_avc_sic_       |
    |                                    |configure_skc(..).             |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_payload_t   |This is a wrapper for          |
    |intel_sub_group_avc_sic_set_        |intel_sub_group_avc_mce_set_   |
    |ac_only_haar(                       |ac_only_haar(..) with the      |
    |   intel_sub_group_avc_sic_payload_t|payload conversions to/from MCE|
    |      payload )                     |types. See MCE version for     |
    |                                    |description.                   |
    +------------------------------------+-------------------------------+

    Evaluation phase functions
    ++++++++++++++++++++++++++

    These built-in functions perform the evaluation of the SIC operation
    configured in the payload with a VME media sampler and return the
    results.

    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_result_t    |Evaluate the SIC IPE operation |
    |intel_sub_group_avc_sic_evaluate_   |and return its results.        |
    |ipe(                                |                               |
    |   read_only image2d_t src_image,   |                               |
    |   sampler_t vme_media_sampler,     |                               |
    |   intel_sub_group_avc_sic_payload_t|                               |
    |      payload )                     |                               |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_result_t    |Evaluate the SIC operation with|
    |intel_sub_group_avc_sic_evaluate_   |single reference and return its|
    |with_single_reference(              |results.                       |
    |   read_only image2d_t src_image,   |                               |
    |   read_only image2d_t ref_image,   |The parameter ref_image must be|
    |   sampler_t vme_media_sampler,     |a valid forward image kernel   |
    |   intel_sub_group_avc_sic_payload_t|parameter per the ordering     |
    |      payload )                     |conventions for the kernel     |
    |                                    |parameter list.                |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_result_t    |Evaluate the SIC operation with|
    |intel_sub_group_avc_sic_evaluate_   |dual and return its results.   |
    |with_dual_reference (               |                               |
    |  read_only image2d_t src_image,    |The parameter                  |
    |  read_only image2d_t fwd_ref_image,|fwd_ref_image[bwd_ref_image]   |
    |  read_only image2d_t bwd_ref_image,|must be a valid                |
    |  sampler_t vme_media_sampler,      |forward[backward] image kernel |
    |  intel_sub_group_avc_sic_payload_t |parameter per the ordering     |
    |     payload )                      |conventions for the kernel     |
    |                                    |parameter list.                |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_result_t    |Evaluate the SIC operation with|
    |intel_sub_group_avc_sic_evaluate_   |multi references and return its|
    |with_multi_reference(               |results.                       |
    |   read_only image2d_t src_image,   |                               |
    |   uint packed_reference_ids,       |A pair of unique reference     |
    |   sampler_t vme_media_sampler,     |identifier (indicating unique  |
    |   intel_sub_group_avc_sic_payload_t|forward/backward reference     |
    |      payload )                     |images) may be specified for   |
    |                                    |each of the allowed major      |
    |                                    |partitions (one 16x16 or four  |
    |                                    |8x8) using packed_dual_ref_ids.|
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |packed_reference_ids is a      |
    |                                    |integer with the following bits|
    |                                    |specifying the values for the  |
    |                                    |pair of reference images for   |
    |                                    |each major partition.          |
    |                                    |                               |
    |                                    |3:0   => Fwd reference block 0 |
    |                                    |7:4   => Bwd reference block 0 |
    |                                    |11:8  => Fwd reference block 1 |
    |                                    |15:12 => Bwd reference block 1 |
    |                                    |19:16 => Fwd reference block 2 |
    |                                    |23:20 => Bwd reference block 2 |
    |                                    |27:24 => Fwd reference block 3 |
    |                                    |31:28 => Bwd reference block 3 |
    |                                    |                               |
    |                                    |A forward[backward] reference  |
    |                                    |idenitifer value of 'n'        |
    |                                    |indicates the forward[backward]|
    |                                    |image from the 'n'th pair of   |
    |                                    |forward/backward reference     |
    |                                    |images, with the value of 'n'  |
    |                                    |ranging from 0 to 15.          |
    |                                    |                               |
    |                                    |If the REF operation is        |
    |                                    |configured with only forward   |
    |                                    |reference images then, the     |
    |                                    |values of the backward         |
    |                                    |reference identifiers are not  |
    |                                    |used.                          |
    |                                    |                               |
    |                                    |The blocks are numbered using  |
    |                                    |the traditional Z order. For   |
    |                                    |larger block sizes, the        |
    |                                    |sub-block reference identifier |
    |                                    |pairs must be replicated. For  |
    |                                    |example, for a 16x16 block, all|
    |                                    |four pair of reference         |
    |                                    |identifiers must be replicated |
    |                                    |to the value of the first pair |
    |                                    |for block 0.                   |
    +------------------------------------+-------------------------------+
    |intel_sub_group_avc_sic_result_t    |Evaluate the SIC operation with|
    |intel_sub_group_avc_sic_evaluate_   |multi references and return its|
    |with_multi_reference(               |results. This is used for      |
    |   read_only image2d_t src_image,   |interlaced source and reference|
    |   uint packed_reference_ids,       |images.                        |
    |   uchar packed_reference_field_    |                               |
    |   polarities,                      |A pair of unique reference     |
    |   sampler_t vme_media_sampler,     |identifier (indicating unique  |
    |   intel_sub_group_avc_sic_payload_t|forward/backward reference     |
    |      payload )                     |images) may be specified for   |
    |                                    |each of the allowed major      |
    |                                    |partitions (one 16x16 or four  |
    |                                    |8x8) using packed_dual_ref_ids.|
    |                                    |                               |
    |                                    |The value of                   |
    |                                    |packed_reference_ids is a      |
    |                                    |integer with the following bits|
    |                                    |specifying the values for the  |
    |                                    |pair of reference images for   |
    |                                    |each major partition.          |
    |                                    |                               |
    |                                    |3:0   => Fwd reference block 0 |
    |                                    |7:4   => Bwd reference block 0 |
    |                                    |11:8  => Fwd reference block 1 |
    |                                    |15:12 => Bwd reference block 1 |
    |                                    |19:16 => Fwd reference block 2 |
    |                                    |23:20 => Bwd reference block 2 |
    |                                    |27:24 => Fwd reference block 3 |
    |                                    |31:28 => Bwd reference block 3 |
    |                                    |                               |
    |                                    |A forward[backward] reference  |
    |                                    |idenitifer value of 'n'        |
    |                                    |indicates the forward[backward]|
    |                                    |image from the ' nth' pair of  |
    |                                    |forward/backward reference     |
    |                                    |images, with the value of ' n' |
    |                                    |ranging from 0 to 15.          |
    |                                    |                               |
    |                                    |If the REF operation is        |
    |                                    |configured with only forward   |
    |                                    |reference images then, the     |
    |                                    |values of the backward         |
    |                                    |reference identifiers are not  |
    |                                    |used.                          |
    |                                    |                               |
    |                                    |The blocks are numbered using  |
    |                                    |the traditional Z order. For   |
    |                                    |larger block sizes, the        |
    |                                    |sub-block reference identifier |
    |                                    |pairs must be replicated. For  |
    |                                    |example, for a 16x16 block, all|
    |                                    |four pair of reference         |
    |                                    |identifiers must be replicated |
    |                                    |to the value of the first pair |
    |                                    |for block 0.                   |
    |                                    |                               |
    |                                    |Reference field polarities for |
    |                                    |forward and backward reference |
    |                                    |images are specified for each  |
    |                                    |of the allowed major partitions|
    |                                    |using packed_reference_field_  |
    |                                    |polarities.                    |
    |                                    |                               |
    |                                    |The value of packed_reference_ |
    |                                    |field_polarities is an integer |
    |                                    |with the following bits        |
    |                                    |specifying the reference field |
    |                                    |polarities for the major       |
    |                                    |partitions.                    |
    |                                    |                               |
    |                                    |0 : Fwd reference block 0      |
    |                                    |1 : Fwd reference block 1      |
    |                                    |2 : Fwd reference block 2      |
    |                                    |3 : Fwd reference block 3      |
    |                                    |4 : Bwd reference block 0      |
    |                                    |5 : Bwd reference block 1      |
    |                                    |6 : Bwd reference block 2      |
    |                                    |7 : Bwd reference block 3      |
    |                                    |                               |
    |                                    |If the dual-reference          |
    |                                    |evaluation functions are not   |
    |                                    |used, then the values of the   |
    |                                    |backward reference field       |
    |                                    |polarities are not used.       |
    |                                    |                               |
    |                                    |The blocks are numbered using  |
    |                                    |the traditional Z order. For   |
    |                                    |larger block sizes, the        |
    |                                    |sub-block reference field      |
    |                                    |polarities are replicated. For |
    |                                    |example, for a 16x16 block all |
    |                                    |four pairs of reference field  |
    |                                    |polarities are replicated to   |
    |                                    |the value of the first pair for|
    |                                    |block 0.                       |
    |                                    |                               |
    |                                    |The value for the packed_      |
    |                                    |interlaced_image_reference_    |
    |                                    |field_polarities argument is   |
    |                                    |obtained by calling            |
    |                                    |intel_sub_group_avc_ime_get_   |
    |                                    |inter_reference_interlaced_    |
    |                                    |field_polarities(..)  for the  |
    |                                    |preceding IME operation's      |
    |                                    |result.                        |
    +------------------------------------+-------------------------------+

    Result type conversion functions
    +++++++++++++++++++++++++++++++++

    These are optional built-in functions that may be called following the
    evaluation phase to convert REF results to MCE results and vice-versa.

    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_mce_result_t   |Convert the SIC result into a   |
    |intel_sub_group_avc_sic_           |MCE result.                     |
    |convert_to_mce_result(             |                                |
    |   intel_sub_group_avc_sic_result_t|                                |
    |      result )                     |                                |
    +-----------------------------------+--------------------------------+
    |intel_sub_group_avc_sic_result_t   |Convert the MCE result into an  |
    |intel_sub_group_avc_mce_           |SIC result.                     |
    |convert_to_sic_result(             |                                |
    |   intel_sub_group_avc_mce_result_t|                                |
    |      result )                     |                                |
    +-----------------------------------+--------------------------------+

    Result processing phase functions
    +++++++++++++++++++++++++++++++++

    These built-in functions are called following the evaluation phase to
    extract the various result components from an SIC evaluation result.

    +-----------------------------------+--------------------------------+
    |uchar intel_sub_group_avc_sic_get_ |Get the best intra shape from   |
    |ipe_luma_shape(                    |the SIC result.                 |
    |   intel_sub_group_avc_sic_result_t|                                |
    |      result)                      |The returned values are as per  |
    |                                   |the intra-MB shapes enumeration |
    |                                   |values.                         |
    +-----------------------------------+--------------------------------+
    |ushort intel_sub_group_avc_sic_get_|Get the best intra luma         |
    |best_ipe_luma_distortion(          |distortion from the SIC result  |
    |   intel_sub_group_avc_sic_result_t|for the shape returned by       |
    |      result)                      |intel_sub_group_avc_sic_get_    |
    |                                   |ipe_luma_shape(..).             |
    +-----------------------------------+--------------------------------+
    |ushort intel_sub_group_avc_sic_get_|Get the best intra chroma       |
    |best_ipe_chroma_distortion(        |distortion for the 8x8 shape    |
    |   intel_sub_group_avc_sic_result_t|from the SIC result.            |
    |      result)                      |                                |
    +-----------------------------------+--------------------------------+
    |ulong intel_sub_group_avc_sic_get_ |Get the packed intra luma modes |
    |packed_ipe_luma_modes(             |for all blocks from the SIC     |
    |   intel_sub_group_avc_sic_result_t|result. There are four bits per |
    |      result)                      |luma mode for a block and legal |
    |                                   |values for luma modes are as per|
    |                                   |its defined enumeration         |
    |                                   |values. The number of blocks is |
    |                                   |based on the result of          |
    |                                   |intel_sub_group_avc_sic_get_    |
    |                                   |ipe_luma_shape(..).             |
    |                                   |                                |
    |                                   |If the luma shape is:           |
    |                                   |- 16x16, then one mode is       |
    |                                   |  returned in bits [0, 3]       |
    |                                   |- 8x8, then four modes          |
    |                                   |  corresponding to the four     |
    |                                   |  partitions are returned by    |
    |                                   |  bits in the ranges [0, 3],    |
    |                                   |  [16, 19], [32, 35], and       |
    |                                   |  [48, 51]; the order of the    |
    |                                   |  four partitions are in the    |
    |                                   |  traditional Z-order           |
    |                                   |- 4x4, then 16 modes (4 bits per|
    |                                   |  mode) are returned of all 16  |
    |                                   |  partitions by all the bits;   |
    |                                   |  the order of the 16 partitions|
    |                                   |  are in the traditional Z-order|
    |                                   |  as shown below:               |
    |                                   |                                |
    |                                   |   0 1 4 5                      |
    |                                   |   2 3 6 7                      |
    |                                   |   8 9 C D                      |
    |                                   |   A B E F                      |
    +-----------------------------------+--------------------------------+
    |uchar intel_sub_group_avc_sic_get_ |Get the intra chroma mode for   |
    |ipe_chroma_mode(                   |the 8x8 block from the SIC      |
    |   intel_sub_group_avc_sic_result_t|result. The legal values for    |
    |      result)                      |chroma modes are as per its     |
    |                                   |defined enumeration values.     |
    +-----------------------------------+--------------------------------+
    |uint intel_sub_group_avc_sic_get_  |Get the packed count of luma    |
    |packed_skc_luma_count_threshold(   |coefficient components that     |
    |   intel_sub_group_avc_sic_result_t|exceeded their transform        |
    |      result)                      |thresholds from the SIC result  |
    |                                   |for each 8x8 partition in       |
    |                                   |traditional Z-order.            |
    |                                   |                                |
    |                                   |The format of the results is as |
    |                                   |follows:                        |
    |                                   |                                |
    |                                   |- count of coefficients that    |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_0 is  |
    |                                   |  returned in bit [0, 7]        |
    |                                   |- count of coefficients that    |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_1 is  |
    |                                   |  returned in bit [8, 15]       |
    |                                   |- count of coefficients that    |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_2 is  |
    |                                   |  returned in bit [16, 23]      |
    |                                   |- count of coefficients that    |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_0 is  |
    |                                   |  returned in bit [24, 31]      |
    |                                   |                                |
    |                                   |The results are only valid if   |
    |                                   |the SIC operation was configured|
    |                                   |with frequency domain SAD       |
    |                                   |transform coefficients using    |
    |                                   |intel_sub_group_avc_sic_set_skc_|
    |                                   |forward_transform_enable(..).   |
    +-----------------------------------+--------------------------------+
    |ulong intel_sub_group_avc_sic_get_ |Get the packed sum of luma      |
    |packed_skc_luma_sum_threshold(     |coefficient components that     |
    |   intel_sub_group_avc_sic_result_t|exceeded their transform        |
    |      result)                      |thresholds from the SIC result  |
    |                                   |for each 8x8 partition in       |
    |                                   |traditional Z-order.            |
    |                                   |                                |
    |                                   |The format of the results is as |
    |                                   |follows:                        |
    |                                   |- sum of coefficients that      |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_0 is  |
    |                                   |  returned in bit [0, 15]       |
    |                                   |- sum of coefficients that      |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_1 is  |
    |                                   |  returned in bit [16,31]       |
    |                                   |- sum of coefficients that      |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_2 is  |
    |                                   |  returned in bit [32,47]       |
    |                                   |- sum of coefficients that      |
    |                                   |  exceeded their respective     |
    |                                   |  threshold for block 8x8_0 is  |
    |                                   |  returned in bit [48, 63]      |
    |                                   |                                |
    |                                   |The results are only valid if   |
    |                                   |the SIC operation was configured|
    |                                   |with frequency domain SAD       |
    |                                   |transform coefficients using    |
    |                                   |intel_sub_group_avc_sic_set_skc_|
    |                                   |forward_transform_enable(..).   |
    +-----------------------------------+--------------------------------+
    |ushort intel_sub_group_avc_sic_get_|Get the skip check raw SAD      |
    |inter_raw_sads(                    |(i.e. without any mode or shape |
    |   intel_sub_group_avc_sic_result_t|costs included) for the entire  |
    |      result )                     |MB if the input payload was note|
    |                                   |configured for block based skip |
    |                                   |checks, otherwise return the    |
    |                                   |maximal SAD of individual 4x4   |
    |                                   |(or 8x8, if the block size for  |
    |                                   |block based skip checking was   |
    |                                   |configured as 8x8) blocks with  |
    |                                   |the MB.                         |
    +-----------------------------------+--------------------------------+
    |ushort intel_sub_group_avc_sic_get_|This is a wrapper for           |
    |inter_distortions(                 |intel_sub_group_avc_mce_get_    |
    |   intel_sub_group_avc_sic_result_t|inter_distortions(..) with the  |
    |      result )                     |result conversions to/from MCE  |
    |                                   |types. See MCE version for      |
    |                                   |description.                    |
    +-----------------------------------+--------------------------------+

    ______________________________________________________________________


    A Complete Example
    ------------------

    +--------------------------------------------------------------------+
    |__kernel __attribute__((reqd_work_group_size(16,1,1))) void         |
    |block_motion_estimate_intel(                                        |
    |    __read_only image2d_t src_img,                                  |
    |    __read_only image2d_t ref_img,                                  |
    |    __global short2* prediction_motion_vector_buffer,               |
    |    __global short2* motion_vector_buffer,                          |
    |    __global ushort* residuals_buffer,                              |
    |    __global uchar2* shapes_buffer,                                 |
    |    int iterations )                                                |
    |{                                                                   |
    |  // Each 16x1 workgroup processes a column of MBs in a image.      |
    |  // The number of workgroups is equal to the number of MB in       |
    |  // a image row. All column MBs are processed in parallel without  |
    |  // handling any MB dependencies.                                  |
    |  int gid_0 = get_group_id(0);                                      |
    |  int gid_1 = 0;                                                    |
    |                                                                    |
    |  // Initialize the inline VME sampler.                             |
    |  const sampler_t vme_sampler = CLK_AVC_ME_INITIALIZE_INTEL;        |
    |                                                                    |
    |  // Process all columns MBs in a loop.                             |
    |  for( int i = 0; i < iterations ; i++, gid_1++ ) {                 |
    |    ushort2 srcCoord = 0;                                           |
    |    short2 refCoord = 0;                                            |
    |    short2 predMV = 0;                                              |
    |                                                                    |
    |    // Compute the source MB coordinates.                           |
    |                                                                    |
    |    srcCoord.x = gid_0 * 16;                                        |
    |    srcCoord.y = gid_1 * 16;                                        |
    |                                                                    |
    |    // Obtain the predictor of the source MB from input predictor   |
    |    // buffer.                                                      |
    |    if( prediction_motion_vector_buffer != NULL ) {                 |
    |      predMV =                                                      |
    |        prediction_motion_vector_buffer[                            |
    |          gid_0 + gid_1 * get_num_groups(0) ];                      |
    |      refCoord.x = predMV.x / 4;                                    |
    |      refCoord.y = predMV.y / 4;                                    |
    |      refCoord.y = refCoord.y & 0xFFFE;                             |
    |    }                                                               |
    |                                                                    |
    |    // Enable the partition mask to allow all shapes and let the    |
    |    // VME unit decide the partitioning for the source MB.          |
    |    uchar partition_mask = CLK_AVC_ME_PARTITION_MASK_ALL_INTEL;     |
    |    uchar sad_adjustment = CLK_AVC_ME_SAD_ADJUST_MODE_NONE_INTEL;   |
    |    uchar pixel_mode = CLK_AVC_ME_SUBPIXEL_MODE_QPEL_INTEL;         |
    |                                                                    |
    |    // Create and initialize the IME payload.                       |
    |    intel_sub_group_avc_ime_payload_t payload =                     |
    |      intel_sub_group_avc_ime_initialize(                           |
    |          srcCoord, partition_mask, sad_adjustment );               |
    |                                                                    |
    |    // Setup the IME for a single reference search.                 |
    |    // The search window is 48x40 with exhaustive search, with it   |
    |    // location specified by the input predictor (refCooord).       |
    |    payload =                                                       |
    |      intel_sub_group_avc_ime_set_single_reference(                 |
    |          refCoord, CLK_AVC_ME_SEARCH_WINDOW_EXHAUSTIVE_INTEL,      |
    |          payload );                                                |
    |                                                                    |
    |    // Set a cost function for the IME operation.                   |
    |    ulong cost_center = 0;                                          |
    |    uint2 packed_cost_table =                                       |
    |      intel_sub_group_avc_mce_get_default_medium_penalty_cost_\     |
    |        table();                                                    |
    |    uchar search_cost_precision =                                   |
    |      CLK_AVC_ME_COST_PRECISION_HPEL_INTEL;                         |
    |    payload =                                                       |
    |      intel_sub_group_avc_ime_set_motion_vector_cost_function(      |
    |        cost_center, packed_cost_table, search_cost_precision,      |
    |        payload );                                                  |
    |                                                                    |
    |    // Evaluate the IME operation with its configured payload.      |
    |    intel_sub_group_avc_ime_result_t result =                       |
    |      intel_sub_group_avc_ime_evaluate_with_single_reference(       |
    |          src_img, ref_img, vme_sampler, payload );                 |
    |                                                                    |
    |    // Extract IME results.                                         |
    |    long mvs = intel_sub_group_avc_ime_get_motion_vectors( result );|
    |    ushort sads =                                                   |
    |      intel_sub_group_avc_ime_get_inter_distortions( result );      |
    |    uchar major_shape =                                             |
    |      intel_sub_group_avc_ime_get_inter_major_shape( result );      |
    |    uchar minor_shapes =                                            |
    |      intel_sub_group_avc_ime_get_inter_minor_shapes( result );     |
    |    uchar2 shapes = { major_shape, minor_shapes };                  |
    |    uchar directions =                                              |
    |      intel_sub_group_avc_ime_get_inter_directions( result );       |
    |                                                                    |
    |    // Perform FME if sub-pixel mode is specified.                  |
    |    // FME is performed on the results of the IME operation.        |
    |    if( pixel_mode != CLK_AVC_ME_SUBPIXEL_MODE_INTEGER_INTEL ) {    |
    |      intel_sub_group_avc_ref_payload_t payload =                   |
    |        intel_sub_group_avc_fme_initialize(                         |
    |            srcCoord, mvs, major_shape, minor_shapes, directions,   |
    |            pixel_mode, sad_adjustment);                            |
    |      intel_sub_group_avc_ref_result_t result =                     |
    |        intel_sub_group_avc_ref_evaluate_with_single_reference(     |
    |            src_img, ref_img, vme_sampler, payload );               |
    |      payload =                                                     |
    |        intel_sub_group_avc_ref_set_motion_vector_cost_function(    |
    |          cost_center, packed_cost_table, search_cost_precision,    |
    |          payload );                                                |
    |      mvs = intel_sub_group_avc_ref_get_motion_vectors( result );   |
    |      sads =                                                        |
    |        intel_sub_group_avc_ref_get_inter_distortions( result );    |
    |    }                                                               |
    |                                                                    |
    |    // Write out the results.                                       |
    |    int index =                                                     |
    |      ( gid_0 * 16 + get_local_id(0) ) +                            |
    |      ( gid_1 * 16 * get_num_groups(0) );                           |
    |    int2 bi_mvs = as_int2( mvs );                                   |
    |    motion_vector_buffer [ index ] = as_short2( bi_mvs.s0 );        |
    |    if( residuals_buffer != NULL ) {                                |
    |      residuals_buffer [ index ] = sads;                            |
    |    }                                                               |
    |    shapes_buffer [gid_0 + gid_1 * get_num_groups(0)] = shapes;     |
    |  }                                                                 |
    |}                                                                   |
    +--------------------------------------------------------------------+

    _____________________________________________________________________
    "

 Revision History

     Version 1 (12/02/2016): First public revision.
     Version 2 (09/18/2016): Fixed typos.
     Version 3 (10/02/2018): Modified definitions of default initialization
                             literals to be pre-defined enumeration literals.
                             Fixed typo in intra cost configuration function.
     Version 4 (10/31/2018): Minor typo fixes.
     Version 5 (11/09/2018): Marked intel_sub_group_ime_ref_window_size
                             function as deprecated.