Merge pull request #21 from bashbaug/cl_intel_media_block_io
cl_intel_media_block_io
diff --git a/extensions/clext.php b/extensions/clext.php
index 955d7d2..7b0b0af 100644
--- a/extensions/clext.php
+++ b/extensions/clext.php
@@ -99,4 +99,6 @@
</li>
<li value=50><a href="extensions/intel/cl_intel_device_side_avc_motion_estimation.txt">cl_intel_device_side_avc_motion_estimation</a>
</li>
+<li value=51><a href="extensions/intel/cl_intel_media_block_io.txt">cl_intel_media_block_io</a>
+</li>
</ol>
diff --git a/extensions/intel/cl_intel_media_block_io.txt b/extensions/intel/cl_intel_media_block_io.txt
new file mode 100644
index 0000000..7c801d2
--- /dev/null
+++ b/extensions/intel/cl_intel_media_block_io.txt
@@ -0,0 +1,609 @@
+Name String
+
+ cl_intel_media_block_io
+
+Contributors
+
+ Biju George, Intel
+ Ben Ashbaugh, Intel
+ Scott Pillow, Intel
+
+Contact
+
+ Biju George, Intel (biju.george 'at' intel.com)
+
+Version
+
+ Version 1, December 2, 2016
+
+Number
+
+ OpenCL Extension #51
+
+Status
+
+ First Draft
+
+Dependencies
+
+ OpenCL 1.2 is required.
+
+ The OpenCL Intel vendor extension cl_intel_subgroups is required. The media
+ block read/write built-in functions are an extension of the subgroup
+ functions defined in cl_intel_subgroups.
+
+ This extension is written against revision 29 of the OpenCL 2.0 API
+ specification, against revision 33 of the OpenCL 2.0 OpenCL C
+ specification, and against revision 32 of the OpenCL 2.0 extension
+ specification.
+
+Overview
+
+ This extension augments the block read/write functionality available in the
+ Intel vendor extensions cl_intel_subgroups and cl_intel_subgroups_short by
+ the specification of additional built-in functions to facilitate the reading
+ and writing of flexible 2D regions from images. This API allows for
+ the explicit specification of the width and height of the image regions.
+
+ While not required, this extension is most useful when the subgroup size is
+ known at compile-time. The primary use case for this extension is to support
+ the reading of the edge texels (or image elements) of neighboring macro-
+ blocks as described in the Intel vendor extension cl_intel_device_side_avc_
+ motion_estimation. When using the built-in functions from cl_intel_device_
+ side_avc_motion_estimation the subgroup size is implicitly fixed to 16. In
+ other use cases the subgroup size may be fixed using the cl_intel_required_
+ subgroup_size extension, if needed.
+
+New API Enums
+
+ None.
+
+Terms and Definitions
+
+ Texel:
+ ------
+ This refers to an images element (or an image pixel).
+
+ Byte:
+ -----
+ A 8-bit unsigned integer (or cl_uchar).
+
+ Word:
+ -----
+ A 16-bit unsigned integer (or cl_ushort).
+
+ Dword:
+ -----
+ A 32-bit unsigned integer (or cl_uint).
+
+New OpenCL C built-in functions
+
+ Append to Section 6.13.X "Sub Group Read and Write Functions" of the
+ OpenCL 2.0 C Specification, which was added by the cl_intel_subgroups
+ extension.
+
+ These built-in functions must be encountered by all work items in a subgroup
+ executing the kernel, otherwise the behavior is undefined (i.e. they can only
+ be used only in convergent control flow where all the work-items in the sub-
+ groups are enabled).
+
+ The following restrictions in Table 6.X apply for allowed sizes for the
+ explicit width and height parameters of the flexible media block read/write
+ built-in functions.
+
+ Table 6.X
+ +-----------+--------------+
+ | width |maximum height|
+ | (bytes) | (rows) |
+ +-----------+--------------+
+ | 4 | 64 |
+ +-----------+--------------+
+ | 8 | 32 |
+ +-----------+--------------+
+ | 12,16 | 16 |
+ +-----------+--------------+
+ |20,24,28,32| 8 |
+ +-----------+--------------+
+
+
+ If OpenCL image kernel parameters are used as input arguments in calls
+ to the flexible media block read/write built-in functions, then these image
+ objects must exclusively be used only by flexible media block read/write
+ built-in functions. If the same image needs to be used for other image
+ operations within the same kernel, then an additional image parameter may be
+ used that is bound to the same image object as for the flexible media block
+ read/write built-in function call in the OpenCL host application for the
+ kernel enqueue API call.
+
+ Additionally for images that are read from or written to using the flexible
+ media block read/write built-in functions, the images should be created with
+ the "image_width" value in "cl_image_desc" such that image_width multiplied
+ by the texel size is a multiple of 4.
+
+ The following additional restrictions are imposed for reading 2D images
+ created from buffers with the flexible media block read/write built-in
+ functions.
+
+ 1. The image row pitch is required to be a multiple of 64-bytes, in addition
+ to the CL_DEVICE_IMAGE_PITCH_ALIGNMENT requirements.
+ 2. If the buffer was created using CL_MEM_USE_HOST_PTR with the host
+ application providing the storage bits for the memory object using
+ "host_ptr", then "host_ptr" is required to be 16-byte aligned, in addition
+ to the clCreateBuffer requirements.
+ 3. The maximum height is further restricted to 16 or less.
+
+ The behavior is undefined if the flexible media block read/write built-in
+ functions are used to directly read/write planar YUV image. Instead they may
+ be indirectly used to read/write a 2D image object representing a single
+ plane from a planar YUV image object. Creating a 2D image object
+ representing a single plane from a planar YUV image object is described in
+ cl_intel_planar_yuv.
+ _____________________________________________________________________________
+
+ Read operations
+ ---------------
+
+ Byte sized read operations
+ ++++++++++++++++++++++++++
+
+ +---------------------------------+----------------------------------------+
+ |uchar |Reads a 2D region from an image. |
+ |intel_sub_group_media_block_read_| |
+ |uc( |The 2D source byte offset of the |
+ | int2 src_byte_offset, |top-left corner and the width and height|
+ | int width, |of the region are specified explicitly |
+ | int height, |in the interface parameters. The source |
+ | read_only image2d_t image ) |byte x-offset and width must be 4 byte |
+ | |aligned. |
+ +---------------------------------+ |
+ |uchar2 |The width is specified in byte units |
+ |intel_sub_group_media_block_read_|must be less than or equal to 32. The |
+ |uc2( |width and height of the region must be |
+ | int2 src_byte_offset, |compile-time constants. |
+ | int width, | |
+ | int height, |The read-in texels in the 2D region |
+ | read_only image2d_t image ) |taken in row-major order are |
+ | |re-organized as another 2D region with |
+ +---------------------------------+the byte width equal to the subgroup |
+ |uchar4 |size. Then each work-item reads each |
+ |intel_sub_group_media_block_read_|byte column vector of the re-organized |
+ |uc4( |rectangle, i.e. each column's subsequent|
+ | int2 src_byte_offset, |data element's address is strided by the|
+ | int width, |subgroup size. |
+ | int height, | |
+ | read_only image2d_t image ) |The max byte area of the region is |
+ | |defined as the byte size of the return |
+ +---------------------------------+type multiplied by the subgroup size. If|
+ |uchar8 |the byte area of the region is less than|
+ |intel_sub_group_media_block_read_|its max byte area, then corresponding |
+ |uc8( |tail elements of some of the column |
+ | int2 src_byte_offset, |vector are undefined. Conversely if the |
+ | int width, |byte area of the region is more than the|
+ | int height, |max byte area, then some corresponding |
+ | read_only image2d_t image ) |of the tail elements of the region are |
+ | |dropped. |
+ +---------------------------------+ |
+ |uchar16 |For out-of-bound reads, the read-in |
+ |intel_sub_group_media_block_read_|texels are replicated from the nearest |
+ |uc16( |edge for byte sized texels. The out-of- |
+ | int2 src_byte_offset, |bound behavior is undefined for larger |
+ | int width, |sized texels with the "_uc" |
+ | int height, |builtin-functions. |
+ | read_only image2d_t image ) | |
+ +---------------------------------+----------------------------------------+
+
+ Word sized read operations
+ ++++++++++++++++++++++++++
+
+ +---------------------------------+----------------------------------------+
+ |ushort |Reads a 2D region from an image. |
+ |intel_sub_group_media_block_read_| |
+ |us( |The 2D source byte offset of the |
+ | int2 src_byte_offset, |top-left corner and the width and height|
+ | int width, |of the region are specified explicitly |
+ | int height, |in the interface parameters. The source |
+ | read_only image2d_t image ) |byte x-offset and width must be 4 byte |
+ | |aligned. |
+ +---------------------------------+ |
+ |ushort2 |The width is specified in word units and|
+ |intel_sub_group_media_block_read_|must be less than or equal to 16. The |
+ |us2( |width and height of the region must be |
+ | int2 src_byte_offset, |compile-time constants. |
+ | int width, | |
+ | int height, |The read-in texels in the 2D region |
+ | read_only image2d_t image ) |taken in row-major order are |
+ | |re-organized as another 2D region with |
+ +---------------------------------+the word width equal to the subgroup |
+ |ushort4 |size. Then each work-item reads each |
+ |intel_sub_group_media_block_read_|word column vector of the re-organized |
+ |us4( |rectangle, i.e. each column's subsequent|
+ | int2 src_byte_offset, |data element's address is strided by the|
+ | int width, |subgroup size multipled by 2. |
+ | int height, | |
+ | read_only image2d_t image ) |The max word area of the region is |
+ | |defined as the word size of the return |
+ +---------------------------------+type multiplied by the subgroup size. If|
+ |ushort8 |the word area of the region is less than|
+ |intel_sub_group_media_block_read_|its max word area, then corresponding |
+ |us8( |tail elements of some of the column |
+ | int2 src_byte_offset, |vector are undefined. Conversely if the |
+ | int width, |word area of the region is more than the|
+ | int height, |max word area, then some corresponding |
+ | read_only image2d_t image ) |of the tail elements of the region are |
+ | |dropped. |
+ +---------------------------------+ |
+ |ushort16 |For out-of-bound reads, the read-in |
+ |intel_sub_group_media_block_read_|texels are replicated from the nearest |
+ |us16( |edge for byte and word sized texels. The|
+ | int2 src_byte_offset, |out-of-bound behavior is undefined for |
+ | int width, |larger sized texels with the "_us" |
+ | int height, |builtin-functions. |
+ | read_only image2d_t image ) | |
+ +---------------------------------+----------------------------------------+
+
+
+ Double Word (DWORD) sized read operations
+ +++++++++++++++++++++++++++++++++++++++++
+
+ +---------------------------------+----------------------------------------+
+ |uint |Reads a 2D region from an image. |
+ |intel_sub_group_media_block_read_| |
+ |ui( |The 2D source byte offset of the |
+ | int2 src_byte_offset, |top-left corner and the width and height|
+ | int width, |of the region are specified explicitly |
+ | int height, |in the interface parameters. The source |
+ | read_only image2d_t image ) |byte x-offset and width must be 4 byte |
+ | |aligned. |
+ +---------------------------------+ |
+ |uint2 |The width is specified in dword units |
+ |intel_sub_group_media_block_read_|and must be less than or equal to 8. The|
+ |ui2( |width and height of the region must be |
+ | int2 src_byte_offset, |compile-time constants. |
+ | int width, | |
+ | int height, |The read-in texels in the 2D region |
+ | read_only image2d_t image ) |taken in row-major order are |
+ | |re-organized as another 2D region with |
+ +---------------------------------+the dword width equal to the subgroup |
+ |uint4 |size. Then each work-item reads each |
+ |intel_sub_group_media_block_read_|dword column vector of the re-organized |
+ |ui4( |rectangle, i.e. each column's subsequent|
+ | int2 src_byte_offset, |data elements's address is strided by |
+ | int width, |the subgroup size multipled by 4. |
+ | int height, | |
+ | read_only image2d_t image ) |The max dword area of the region is the |
+ | |dword size of the return type multiplied|
+ +---------------------------------+by the subgroup size. If the dword area |
+ |uint8 |of the region is less than its max dword|
+ |intel_sub_group_media_block_read_|area, then corresponding tail elements |
+ |ui8( |of some of the column vector are |
+ | int2 src_byte_offset, |undefined. Conversely if the dword area |
+ | int width, |of the region is more than the max dword|
+ | int height, |area, then corresponding some of the |
+ | read_only image2d_t image ) |tail elements of the region are dropped.|
+ | | |
+ | |For out-of-bound reads, the read-in |
+ | |texels are replicated from the nearest |
+ | |edge for byte, word and dword sized |
+ | |texels. The out-of-bound behavior is |
+ | |undefined for larger sized texels with |
+ | |the "_ui" builtin-functions. |
+ +---------------------------------+----------------------------------------+
+
+ Additional notes on out-of-bound reads:
+ +++++++++++++++++++++++++++++++++++++++
+
+ 1. For an image with byte texels, the boundary byte is replicated. For
+ example, for a boundary word B0B1B2B3, to replicate the left boundary
+ byte texel, the out of bound dwords have the format of B0B0B0B0, and that
+ for right boundary is B3B3B3B3.
+ 2. For an image with word texels, boundary texel replication is on words. For
+ example, for a boundary dword B0B1B2B3, to replicate the left boundary
+ word texel, the out of bound dwords have the format of B0B1B0B1, and that
+ for right boundary is B2B3B2B3.
+ 3. For special images with (word texel) YUV packed format as described in
+ the cl_intel_packed_yuv extension, there are two cases depending on the
+ Y location: CL_YUYV_INTEL and CL_UYVY_INTEL. Boundary handling for
+ CL_YVYU_INTEL is the same as that for CL_YUYV_INTEL. Similarly, boundary
+ handling for CL_VYUY_INTEL is the same as that for UYVY. For a boundary
+ dword Y0U0Y1V0, to replicate the left boundary, we get Y0U0Y0V0, and to
+ replicate the right boundary, we get Y1U0Y1V0. For a boundary dword
+ U0Y0V0Y1, to replicate the left boundary, we get U0Y0V0Y0, and to
+ replicate the right boundary, we get U0Y1V0Y1.
+ 4. For an image with dword texels, the boundary dword texel is replicated.
+ 5. The behavior is undefined for images with greater than dword sized texels
+ (such as CL_RGBA + CL_FLOAT).
+
+ _____________________________________________________________________________
+
+ Write operations
+ ----------------
+
+ Byte sized write operations
+ +++++++++++++++++++++++++++
+
+ +----------------------------------+---------------------------------------+
+ |void |Writes to a 2D region of an image with |
+ |intel_sub_group_media_block_write_|surface formats of byte sized texels. |
+ |uc( | |
+ | int2 src_byte_offset, |The 2D source byte offset of the |
+ | int width, |top-left corner and the width and |
+ | int height, |height of the region are specified |
+ | uchar texel, |explicitly in the interface |
+ | image2d_t image ) |parameters. The source byte x-offset |
+ | |and width must be 4 byte aligned. |
+ +----------------------------------+ |
+ |intel_sub_group_media_block_write_|The width is specified in byte units |
+ |uc2( |and must be less than or equal to |
+ | int2 src_byte_offset, |32. The width and height of the region |
+ | int width, |must be compile-time constants. |
+ | int height, | |
+ | uchar2 texels, |The 2D region that is written to is |
+ | image2d_t image ) |logically re-organized taken in |
+ | |row-major order as another 2D region |
+ +----------------------------------+with the byte width equal to the |
+ |void |subgroup size. Then each work-item |
+ |intel_sub_group_media_block_write_|processes each byte column vector of |
+ |uc4( |the logically re-organized rectangle, |
+ | int2 src_byte_offset, |i.e. each column's subsequent data |
+ | int width, |element's address is strided by the |
+ | int height, |subgroup size. |
+ | uchar4 texels, | |
+ | image2d_t image ) |The max byte area of the region is |
+ | |defined as the byte size of the return |
+ +----------------------------------+type multiplied by the subgroup |
+ |void |size. If the byte area of the region is|
+ |intel_sub_group_media_block_write_|less than its max byte area, then |
+ |uc8( |corresponding tail elements of some of |
+ | int2 src_byte_offset, |the column vector will not be included |
+ | int width, |in the written out region. Conversely |
+ | int height, |if the byte area of the region is more |
+ | uchar8 texels, |than the max byte area, then |
+ | image2d_t image ) |corresponding some of the tail elements|
+ | |of the region are dropped. |
+ +----------------------------------+ |
+ |void |Out-of-bound writes are dropped. |
+ |intel_sub_group_media_block_write_| |
+ |uc16( | |
+ | int2 src_byte_offset, | |
+ | int width, | |
+ | int height, | |
+ | uchar16 texels, | |
+ | image2d_t image ) | |
+ | | |
+ +----------------------------------+---------------------------------------+
+
+ Word sized write operations
+ +++++++++++++++++++++++++++
+
+ +----------------------------------+---------------------------------------+
+ |void |Writes to a 2D region of an image with |
+ |intel_sub_group_media_block_write_|surface formats of word or byte sized |
+ |us( |texels. |
+ | int2 src_byte_offset, | |
+ | int width, |The 2D source byte offset of the |
+ | int height, |top-left corner and the width and |
+ | ushort texel, |height of the region are specified |
+ | image2d_t image ) |explicitly in the interface parameters.|
+ | |The source byte x-offset and width must|
+ +----------------------------------+be dword aligned. |
+ |void | |
+ |intel_sub_group_media_block_write_|The width is specified in word units |
+ |us2( |and must be less than or equal to |
+ | int2 src_byte_offset, |16. The width and height of the region |
+ | int width, |must be compile-time constants. |
+ | int height, | |
+ | ushort2 texels, |The 2D region that is written to is |
+ | image2d_t image ) |logically re-organized taken in |
+ | |row-major order as another 2D region |
+ +----------------------------------+with the word width equal to the |
+ |void |subgroup size. Then each work-item |
+ |intel_sub_group_media_block_write_|processes each column vector of the |
+ |us4( |logically re-organized rectangle, |
+ | int2 src_byte_offset, |i.e. each column's subsequent data |
+ | int width, |element's address is strided by the |
+ | int height, |subgroup size multipled by 2. |
+ | ushort4 texels, | |
+ | image2d_t image ) |The max word area of the region is |
+ | |defined as the word size of the return |
+ +----------------------------------+type multiplied by the subgroup |
+ |void |size. If the word area of the region is|
+ |intel_sub_group_media_block_write_|less than its max word area, then |
+ |us8( |corresponding tail elements of some of |
+ | int2 src_byte_offset, |the column vector will not be included |
+ | int width, |in the written out region. Conversely |
+ | int height, |if the word area of the region is more |
+ | ushort8 texels, |than the max word area, then |
+ | image2d_t image ) |corresponding some of the tail elements|
+ | |of the region are dropped. |
+ +----------------------------------+ |
+ |void |Out-of-bound writes are dropped. |
+ |intel_sub_group_media_block_write_| |
+ |us16( | |
+ | int2 src_byte_offset, | |
+ | int width, | |
+ | int height, | |
+ | ushort16 texels, | |
+ | image2d_t image ) | |
+ | | |
+ +----------------------------------+---------------------------------------+
+
+ Double word (DWORD) sized write operations
+ ++++++++++++++++++++++++++++++++++++++++++
+
+ +----------------------------------+---------------------------------------+
+ |void |Writes to a 2D region of an image with |
+ |intel_sub_group_media_block_write_|surface formats of dword, word or sized|
+ |ui( |texels. |
+ | int2 src_byte_offset, | |
+ | int width, |The 2D source byte offset of the |
+ | int height, |top-left corner and the width and |
+ | uint texels, |height of the region are specified |
+ | image2d_t image ) |explicitly in the interface |
+ | |parameters. The source byte x-offset |
+ +----------------------------------+and width must be 4 byte aligned. |
+ |void | |
+ |intel_sub_group_media_block_write_|The width is specified in dword units |
+ |ui2( |and must be less than or equal to |
+ | int2 src_byte_offset, |8. The width and height of the region |
+ | int width, |must be compile-time constants. |
+ | int height, | |
+ | uint2 texels, |The 2D region that is written to is |
+ | image2d_t image ) |logically re-organized taken in |
+ | |row-major order as another 2D region |
+ +----------------------------------+with the dword width equal to the |
+ |void |subgroup size. Then each work-item |
+ |intel_sub_group_media_block_write_|processes each column vector of the |
+ |ui4( |logically re-organized rectangle, |
+ | int2 src_byte_offset, |i.e. each column's subsequent data |
+ | int width, |element's address is strided by the |
+ | int height, |subgroup size multiplied by 4. |
+ | uint4 texels, | |
+ | image2d_t image ) |The max dword area of the region is |
+ | |defined as the dword size of the return|
+ +----------------------------------+type multiplied by the subgroup |
+ |void |size. If the dword area of the region |
+ |intel_sub_group_media_block_write_|is less than its max dword area, then |
+ |ui8( |corresponding tail elements of some of |
+ | int2 src_byte_offset, |the column vector will not be included |
+ | int width, |in the written out region. Conversely |
+ | int height, |if the dword area of the region is more|
+ | uint8 texels, |than the max texel area, then |
+ | image2d_t image ) |corresponding some of the tail elements|
+ | |of the region are dropped |
+ | | |
+ | |Out-of-bound writes are dropped. |
+ +----------------------------------+---------------------------------------+
+ _____________________________________________________________________________
+
+Examples
+
+ 1. Reading the vertical left edge of a macroblock in a kernel that use the
+ device-side VME built-in functions.
+
+ All images are 8-bit images with the image_channel_order and the
+ image_data_type as CL_R and CL_UNORM_INT8 respectively.
+
+ __kernel
+ void vme_intra_estimation_kernel(
+ __read_only image2d_t src_img,
+ __read_only image2d_t ref_img,
+ __read_only image2d_t src_luma_img,
+ ...
+ {
+ ...
+ // Read the left edge for a macro-block.
+ int2 edgeCoord;
+ edgeCoord.x = srcCoord.x - 4;
+ edgeCoord.y = srcCoord.y;
+
+ uint leftLumaEdgeDW =
+ intel_sub_group_media_block_read_ui(
+ edgeCoord,
+ 1, // image region width of 1 dword
+ 16, // image region height of 16
+ src_luma_image );
+ leftLumaEdge = as_uchar4( leftLumaEdgeDW ).s3;
+ ...
+ intel_sub_group_avc_sic_result_t result;
+ result =
+ intel_sub_group_avc_sic_evaluate_ipe(
+ src_img,
+ vme_sampler,
+ payload );
+ ...
+ }
+
+ Image 2D region: Subgroup work-items:
+ ++++++++++++++++ ++++++++++++++++++++
+
+ +-+
+ |0|
+ +-+
+ |1| Subgroup local id: 0 1 2 3 4 5 6 7 8 9 A B C D E F
+ +-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |2| |0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|
+ +-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |3|
+ +-+
+ |4|
+ +-+
+ |5| The max byte area of the image region is 1*16 (16).
+ +-+
+ |6| The 1x16 dword image region is re-organized taken in row-
+ +-+ major order as another 2D region within the subgroup with
+ |7| the word width equal to the subgroup size of 16 - 16x1
+ +-+ region.
+ |8|
+ +-+
+ |9|
+ +-+
+ |A|
+ +-+
+ |B|
+ +-+
+ |C|
+ +-+
+ |D|
+ +-+
+ |E|
+ +-+
+ |F|
+ +-+
+
+ 2. Reading a 16x2 word region from an image.
+
+ All images are 8-bit images with the image_channel_order and the
+ image_data_type as CL_R and CL_UNORM_INT8 respectively.
+
+ __kernel __attribute__((intel_reqd_sub_group_size(8)))
+ void vme_intra_estimation_kernel(
+ __read_only image2d_t src_img,
+ ...
+ {
+ ...
+ // Read the 16x2 word region in a subgroup of size 8.
+
+ int2 srcCoord;
+ ...
+ ushort4 texels =
+ intel_sub_group_media_block_read_us4(
+ srcCoord,
+ 16, // image region width of 16 words
+ 2, // image rgeion height of 16
+ src_image );
+ ...
+ }
+
+ Image 2D region:
+ ++++++++++++++++
+
+ The max word area of the region is 4*8 (32).
+
+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+ |0 |1 |2 |3 |4 |5 |6 |7 |8 |9 |A |B |C |D |E |F |
+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+ |10|11|12|13|14|15|16|17|18|19|1A|1B|1C|1D|1E|1F|
+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
+
+ Subgroup work-items:
+ ++++++++++++++++++++
+
+ The 16x2 image region is re-organized taken in row-major order as another
+ 2D region within the subgroup with the word width equal to the subgroup
+ size of 8 - 8x4 region. Each work-item processes a data item strided by 8
+ words.
+
+ Subgroup local id: 0 1 2 3 4 5 6 7
+ +--+--+--+--+--+--+--+--+
+ |0 |1 |2 |3 |4 |5 |6 |7 |
+ +--+--+--+--+--+--+--+--+
+ |8 |9 |A |B |C |D |E |F |
+ +--+--+--+--+--+--+--+--+
+ |10|11|12|13|14|15|16|17|
+ +--+--+--+--+--+--+--+--+
+ |18|19|1A|1B|1C|1D|1E|1F|
+ +--+--+--+--+--+--+--+--+
+
+Revision History
+
+ Version 1 (12/02/2016): First public revision.
diff --git a/extensions/registry.py b/extensions/registry.py
index 70453a8..c74cc6d 100644
--- a/extensions/registry.py
+++ b/extensions/registry.py
@@ -146,6 +146,11 @@
'flags' : { 'public' },
'url' : 'extensions/intel/cl_intel_egl_image_yuv.txt',
},
+ 'cl_intel_media_block_io' : {
+ 'number' : 51,
+ 'flags' : { 'public' },
+ 'url' : 'extensions/intel/cl_intel_media_block_io.txt',
+ },
'cl_intel_motion_estimation' : {
'number' : 23,
'flags' : { 'public' },
diff --git a/xml/cl.xml b/xml/cl.xml
index 52a1a9a..6cd7a93 100644
--- a/xml/cl.xml
+++ b/xml/cl.xml
@@ -1163,6 +1163,7 @@
<extension number="48" name="cl_intel_subgroups_short"/>
<extension number="49" name="cl_intel_planar_yuv"/>
<extension number="50" name="cl_intel_device_side_avc_motion_estimation"/>
+ <extension number="51" name="cl_intel_media_block_io"/>
<!-- NOTE: extension numbers are now assigned from
../extensions/registry.py - see ../README.adoc. It is no longer
necessary to reserve extension names and numbers here. -->