Merge pull request #44 from b-sumner/master

Add cl_amd_planar_yuv extension
diff --git a/index.php b/index.php
index 4a7eefe..ea6419d 100644
--- a/index.php
+++ b/index.php
@@ -4,16 +4,16 @@
 include_once("../../assets/static_pages/khr_page_top.php");
 ?>
 
-<p> The OpenCL registry contains specifications of the core API and the
-    OpenCL C language; a portable intermediate representation of OpenCL
-    programs; specifications of Khronos- and vendor-approved OpenCL
-    extensions; reference pages for different OpenCL versions; and links to
-    header files corresponding to the specifications, which are now hosted
-    in the <a href="https://www.github.com/KhronosGroup/OpenCL-Headers/">
-    OpenCL-Headers </a> Github repository. </p>
+<p> The OpenCL registry contains specifications for the core API; Khronos-
+    and vendor-approved extensions to the core API; the OpenCL C and C++
+    languages; and the OpenCL SPIR-V Environment.
 
-<p> The OpenCL Standard Portable Intermediate Representation (SPIR) is
-    found in the <a href="../spir/">SPIR Registry</a>. </p>
+<p> In addition, it contains reference pages for different OpenCL versions,
+    and links to the github repository containing unified header files
+    corresponding to the Specifications. </p>
+
+<p> The OpenCL Standard Portable Intermediate Representation (SPIR)
+    specification is found in the <a href="../spir/">SPIR Registry</a>. </p>
 
 <p> The SYCL specification is found in the <a href="../sycl/">SYCL
     Registry</a>. </p>
@@ -23,31 +23,42 @@
     href="https://github.com/KhronosGroup/OpenCL-Registry/"> OpenCL-Registry
     </a> Github repository. </li>
 
-<h6> OpenCL Core API and C Language Specification, Headers, and Documentation </h6>
+<h6> OpenCL Specifications and Related Documentation </h6>
 
 <p> The current version of OpenCL is OpenCL 2.2. </p>
 
-<p> The asciidoc source for the OpenCL 2.2 Specifications is available in
+<p> The asciidoctor source for the OpenCL 2.2 Specifications is available in
     the <a href="https://github.com/KhronosGroup/OpenCL-Docs">
     KhronosGroup/OpenCL-Docs </a> Github repository. </p>
 
 <ul>
 <li> OpenCL 2.2 API Specification
-     (<a href="specs/opencl-2.2.pdf">PDF</a>)
-     (<a href="specs/opencl-2.2.html">HTML</a>)
-     (May 12, 2017). </li>
+     (<a href="specs/2.2/html/OpenCL_API.html">HTML</a>)
+     (<a href="specs/2.2/pdf/OpenCL_API.pdf">PDF</a>)
+     (May 12, 2018). </li>
 <li> OpenCL 2.2 Extension Specification
-     (<a href="specs/opencl-2.2-extension.pdf">PDF</a>)
-     (<a href="specs/opencl-2.2-extension.html">HTML</a>)
-     (May 12, 2017). </li>
+     (<a href="specs/2.2/html/OpenCL_Ext.html">HTML</a>)
+     (<a href="specs/2.2/pdf/OpenCL_Ext.pdf">PDF</a>)
+     (May 12, 2018). </li>
 <li> OpenCL 2.2 SPIR-V Environment Specification
-     (<a href="specs/opencl-2.2-environment.pdf">PDF</a>)
-     (<a href="specs/opencl-2.2-environment.html">HTML</a>)
-     (May 12, 2017). </li>
+     (<a href="specs/2.2/html/OpenCL_Env.html">HTML</a>)
+     (<a href="specs/2.2/pdf/OpenCL_Env.pdf">PDF</a>)
+     (May 12, 2018). </li>
+<li> OpenCL C Language Specification
+     (<a href="specs/2.2/html/OpenCL_C.html">HTML</a>)
+     (<a href="specs/2.2/pdf/OpenCL_C.pdf">PDF</a>)
+     (May 12, 2018). </li>
 <li> OpenCL C++ Language Specification
-     (<a href="specs/opencl-2.2-cplusplus.pdf">PDF</a>)
-     (<a href="specs/opencl-2.2-cplusplus.html">HTML</a>)
-     (May 12, 2017). </li>
+     (<a href="specs/2.2/html/OpenCL_Cxx.html">HTML</a>)
+     (<a href="specs/2.2/pdf/OpenCL_Cxx.pdf">PDF</a>)
+     (May 12, 2018). </li>
+<li> OpenCL ICD Installation Guide Specification
+     (<a href="specs/2.2/html/OpenCL_ICD_Installation.html">HTML</a>)
+     (<a href="specs/2.2/pdf/OpenCL_ICD_Installation.pdf">PDF</a>)
+     (May 12, 2018). </li>
+<li> The <a href="https://www.khronos.org/registry/spir-v/"> OpenCL Extended
+     Instruction Set for SPIR-V</a> is included as part of the SPIR-V
+     registry.
 <li> The OpenCL 2.2 Reference Pages are not posted yet. </li>
      <!--
      <a href="http://www.khronos.org/registry/cl/sdk/2.2/docs/man/xhtml/">
@@ -58,6 +69,21 @@
      -->
 <li> <a href="https://www.khronos.org/files/opencl22-reference-guide.pdf">
      OpenCL 2.2 Reference Guide.</a> </li>
+</ul>
+
+<h6> OpenCL Header Files </h6>
+
+<p> C language header files for the OpenCL API are found in the <a
+    href="https://github.com/KhronosGroup/OpenCL-Headers">
+    KhronosGroup/OpenCL-Headers</a> Github repository. In the past, there
+    were different header files for each OpenCL version. The header files
+    have now been unified - a single set of headers supports all OpenCL API
+    versions - and the API version defined by the headers at compilation
+    time is determined as described in the README.md in that repository.
+    </a>
+
+    <!--
+<ul>
 <li> All of the following headers should be present in a directory
      <tt> CL/ </tt> (or <tt> OpenCL/ </tt> on MacOS X). The single
      header file <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl22/CL/opencl.h"> opencl.h </a> includes
@@ -88,10 +114,8 @@
      <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl22/CL/cl_gl_ext.h"> cl_gl_ext.h </a> -
           OpenCL 2.2 Vendor OpenCL/OpenGL Extensions Header File. </li>
      </li>
-<li> The <a href="https://www.khronos.org/registry/spir-v/"> OpenCL Extended
-     Instruction Set for SPIR-V</a> is included as part of the SPIR-V
-     registry.
 </ul>
+    -->
 
 
 <h6> OpenCL ICD and C++ Bindings Headers </h6>
@@ -137,36 +161,6 @@
      OpenCL 2.1 Reference Pages.</a>
 <li> <a href="http://www.khronos.org/registry/cl/sdk/2.1/docs/OpenCL-2.1-refcard.pdf">
      OpenCL 2.1 Quick Reference Card.</a> </li>
-<li> All of the following headers should be present in a directory
-     <tt> CL/ </tt> (or <tt> OpenCL/ </tt> on MacOS X). The single
-     header file <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/opencl.h"> opencl.h </a> includes
-     other headers as appropriate for the target platform, and simply
-     including <tt> opencl.h </tt> should be all that most applications
-     need to do.
-
-     <ul>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/opencl.h"> opencl.h </a> -
-          OpenCL 2.1 Single Header File for Applications. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_platform.h"> cl_platform.h </a> -
-          OpenCL 2.1 Platform-Dependent Macros. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl.h"> cl.h </a> -
-          OpenCL 2.1 Core API Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_ext.h"> cl_ext.h </a> -
-          OpenCL 2.1 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_egl.h"> cl_egl.h </a> -
-          OpenCL 2.1 Khronos EGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_dx9_media_sharing.h"> cl_dx9_media_sharing.h </a> -
-          OpenCL 2.1 Khronos OpenCL/Direct3D 9 Media Sharing Extensions
-          Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_d3d10.h"> cl_d3d10.h </a> -
-          OpenCL 2.1 Khronos OpenCL/Direct3D 10 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_d3d11.h"> cl_d3d11.h </a> -
-          OpenCL 2.1 Khronos OpenCL/Direct3D 11 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_gl.h"> cl_gl.h </a> -
-          OpenCL 2.1 Khronos OpenCL/OpenGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl21/CL/cl_gl_ext.h"> cl_gl_ext.h </a> -
-          OpenCL 2.1 Vendor OpenCL/OpenGL Extensions Header File. </li>
-     </ul> </li>
 </ul>
 
 <h6> OpenCL 2.0 </h6>
@@ -185,32 +179,6 @@
      OpenCL 2.0 Reference Pages.</a>
 <li> <a href="http://www.khronos.org/registry/cl/sdk/2.0/docs/OpenCL-2.0-refcard.pdf">
      OpenCL 2.0 Quick Reference Card.</a> </li>
-<li> OpenCL 2.0 headers are structured in exactly the same fashion as
-     OpenCL 2.0 headers described above.
-
-     <ul>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/opencl.h"> opencl.h </a> -
-          OpenCL 2.0 Single Header File for Applications. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_platform.h"> cl_platform.h </a> -
-          OpenCL 2.0 Platform-Dependent Macros. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl.h"> cl.h </a> -
-          OpenCL 2.0 Core API Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_ext.h"> cl_ext.h </a> -
-          OpenCL 2.0 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_egl.h"> cl_egl.h </a> -
-          OpenCL 2.0 Khronos EGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_dx9_media_sharing.h"> cl_dx9_media_sharing.h </a> -
-          OpenCL 2.0 Khronos OpenCL/Direct3D 9 Media Sharing Extensions
-          Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_d3d10.h"> cl_d3d10.h </a> -
-          OpenCL 2.0 Khronos OpenCL/Direct3D 10 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_d3d11.h"> cl_d3d11.h </a> -
-          OpenCL 2.0 Khronos OpenCL/Direct3D 11 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_gl.h"> cl_gl.h </a> -
-          OpenCL 2.0 Khronos OpenCL/OpenGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl20/CL/cl_gl_ext.h"> cl_gl_ext.h </a> -
-          OpenCL 2.0 Vendor OpenCL/OpenGL Extensions Header File. </li>
-     </ul> </li>
 </ul>
 
 <h6> OpenCL 1.2 </h6>
@@ -226,32 +194,6 @@
      OpenCL 1.2 Reference Pages.</a>
 <li> <a href="http://www.khronos.org/registry/cl/sdk/1.2/docs/OpenCL-1.2-refcard.pdf">
      OpenCL 1.2 Quick Reference Card.</a> </li>
-<li> OpenCL 1.2 headers are structured in exactly the same fashion as
-     OpenCL 2.1 headers described above.
-
-     <ul>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/opencl.h"> opencl.h </a> -
-          OpenCL 1.2 Single Header File for Applications. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_platform.h"> cl_platform.h </a> -
-          OpenCL 1.2 Platform-Dependent Macros. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl.h"> cl.h </a> -
-          OpenCL 1.2 Core API Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_ext.h"> cl_ext.h </a> -
-          OpenCL 1.2 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_egl.h"> cl_egl.h </a> -
-          OpenCL 1.2 Khronos EGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_dx9_media_sharing.h"> cl_dx9_media_sharing.h </a> -
-          OpenCL 1.2 Khronos OpenCL/Direct3D 9 Media Sharing Extensions
-          Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_d3d10.h"> cl_d3d10.h </a> -
-          OpenCL 1.2 Khronos OpenCL/Direct3D 10 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_d3d11.h"> cl_d3d11.h </a> -
-          OpenCL 1.2 Khronos OpenCL/Direct3D 11 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_gl.h"> cl_gl.h </a> -
-          OpenCL 1.2 Khronos OpenCL/OpenGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl12/CL/cl_gl_ext.h"> cl_gl_ext.h </a> -
-          OpenCL 1.2 Vendor OpenCL/OpenGL Extensions Header File. </li>
-     </ul> </li>
 </ul>
 
 <h6> OpenCL 1.1 </h6>
@@ -268,25 +210,6 @@
      OpenCL 1.1 Reference Pages.</a> </li>
 <li> <a href="http://www.khronos.org/registry/cl/sdk/1.1/docs/OpenCL-1.1-refcard.pdf">
      OpenCL 1.1 Reference Card.</a> </li>
-<li> OpenCL 1.1 headers are structured in exactly the same fashion as
-     OpenCL 2.1 headers described above.
-
-     <ul>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl11/CL/opencl.h"> opencl.h </a> -
-          OpenCL 1.1 Single Header File for Applications. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl11/CL/cl_platform.h"> cl_platform.h </a> -
-          OpenCL 1.1 Platform-Dependent Macros. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl11/CL/cl.h"> cl.h </a> -
-          OpenCL 1.1 Core API Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl11/CL/cl_ext.h"> cl_ext.h </a> -
-          OpenCL 1.1 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl11/CL/cl_d3d10.h"> cl_d3d10.h </a> -
-          OpenCL 1.1 Khronos OpenCL/Direct3D 10 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl11/CL/cl_gl.h"> cl_gl.h </a> -
-          OpenCL 1.1 Khronos OpenCL/OpenGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl11/CL/cl_gl_ext.h"> cl_gl_ext.h </a> -
-          OpenCL 1.1 Vendor OpenCL/OpenGL Extensions Header File. </li>
-     </ul> </li>
 </ul>
 
 <h6> OpenCL 1.0 </h6>
@@ -299,25 +222,6 @@
      OpenCL 1.0 Reference Pages.</a> </li>
 <li> <a href="http://www.khronos.org/registry/cl/sdk/1.0/docs/OpenCL-1.0-refcard.pdf">
      OpenCL 1.0 Reference Card.</a> </li>
-<li> OpenCL 1.0 headers are structured in exactly the same fashion as
-     OpenCL 2.1 headers described above.
-
-     <ul>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl10/CL/opencl.h"> opencl.h </a> -
-          OpenCL 1.0 Single Header File for Applications. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl10/CL/cl_platform.h"> cl_platform.h </a> -
-          OpenCL 1.0 Platform-Dependent Macros. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl10/CL/cl.h"> cl.h </a> -
-          OpenCL 1.0 Core API Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl10/CL/cl_ext.h"> cl_ext.h </a> -
-          OpenCL 1.0 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl10/CL/cl_d3d10.h"> cl_d3d10.h </a> -
-          OpenCL 1.0 Khronos OpenCL/Direct3D 10 Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl10/CL/cl_gl.h"> cl_gl.h </a> -
-          OpenCL 1.0 Khronos OpenCL/OpenGL Extensions Header File. </li>
-     <li> <a href="https://raw.githubusercontent.com/KhronosGroup/OpenCL-Headers/master/opencl10/CL/cl_gl_ext.h"> cl_gl_ext.h </a> -
-          OpenCL 1.0 Vendor OpenCL/OpenGL Extensions Header File. </li>
-     </ul> </li>
 </ul>
 
 <h6> Providing Feedback on the Registry </h6>
diff --git a/specs/2.2/html/OpenCL_API.html b/specs/2.2/html/OpenCL_API.html
new file mode 100644
index 0000000..afc7d01
--- /dev/null
+++ b/specs/2.2/html/OpenCL_API.html
@@ -0,0 +1,21451 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge"><![endif]-->
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="generator" content="Asciidoctor 1.5.6.1">
+<meta name="author" content="Khronos OpenCL Working Group">
+<title>The OpenCL&#8482; Specification</title>
+<style>
+/*! normalize.css v2.1.2 | MIT License | git.io/normalize */
+/* ========================================================================== HTML5 display definitions ========================================================================== */
+/** Correct `block` display not defined in IE 8/9. */
+article, aside, details, figcaption, figure, footer, header, hgroup, main, nav, section, summary { display: block; }
+
+/** Correct `inline-block` display not defined in IE 8/9. */
+audio, canvas, video { display: inline-block; }
+
+/** Prevent modern browsers from displaying `audio` without controls. Remove excess height in iOS 5 devices. */
+audio:not([controls]) { display: none; height: 0; }
+
+/** Address `[hidden]` styling not present in IE 8/9. Hide the `template` element in IE, Safari, and Firefox < 22. */
+[hidden], template { display: none; }
+
+script { display: none !important; }
+
+/* ========================================================================== Base ========================================================================== */
+/** 1. Set default font family to sans-serif. 2. Prevent iOS text size adjust after orientation change, without disabling user zoom. */
+html { font-family: sans-serif; /* 1 */ -ms-text-size-adjust: 100%; /* 2 */ -webkit-text-size-adjust: 100%; /* 2 */ }
+
+/** Remove default margin. */
+body { margin: 0; }
+
+/* ========================================================================== Links ========================================================================== */
+/** Remove the gray background color from active links in IE 10. */
+a { background: transparent; }
+
+/** Address `outline` inconsistency between Chrome and other browsers. */
+a:focus { outline: thin dotted; }
+
+/** Improve readability when focused and also mouse hovered in all browsers. */
+a:active, a:hover { outline: 0; }
+
+/* ========================================================================== Typography ========================================================================== */
+/** Address variable `h1` font-size and margin within `section` and `article` contexts in Firefox 4+, Safari 5, and Chrome. */
+h1 { font-size: 2em; margin: 0.67em 0; }
+
+/** Address styling not present in IE 8/9, Safari 5, and Chrome. */
+abbr[title] { border-bottom: 1px dotted; }
+
+/** Address style set to `bolder` in Firefox 4+, Safari 5, and Chrome. */
+b, strong { font-weight: bold; }
+
+/** Address styling not present in Safari 5 and Chrome. */
+dfn { font-style: italic; }
+
+/** Address differences between Firefox and other browsers. */
+hr { -moz-box-sizing: content-box; box-sizing: content-box; height: 0; }
+
+/** Address styling not present in IE 8/9. */
+mark { background: #ff0; color: #000; }
+
+/** Correct font family set oddly in Safari 5 and Chrome. */
+code, kbd, pre, samp { font-family: monospace, serif; font-size: 1em; }
+
+/** Improve readability of pre-formatted text in all browsers. */
+pre { white-space: pre-wrap; }
+
+/** Set consistent quote types. */
+q { quotes: "\201C" "\201D" "\2018" "\2019"; }
+
+/** Address inconsistent and variable font size in all browsers. */
+small { font-size: 80%; }
+
+/** Prevent `sub` and `sup` affecting `line-height` in all browsers. */
+sub, sup { font-size: 75%; line-height: 0; position: relative; vertical-align: baseline; }
+
+sup { top: -0.5em; }
+
+sub { bottom: -0.25em; }
+
+/* ========================================================================== Embedded content ========================================================================== */
+/** Remove border when inside `a` element in IE 8/9. */
+img { border: 0; }
+
+/** Correct overflow displayed oddly in IE 9. */
+svg:not(:root) { overflow: hidden; }
+
+/* ========================================================================== Figures ========================================================================== */
+/** Address margin not present in IE 8/9 and Safari 5. */
+figure { margin: 0; }
+
+/* ========================================================================== Forms ========================================================================== */
+/** Define consistent border, margin, and padding. */
+fieldset { border: 1px solid #c0c0c0; margin: 0 2px; padding: 0.35em 0.625em 0.75em; }
+
+/** 1. Correct `color` not being inherited in IE 8/9. 2. Remove padding so people aren't caught out if they zero out fieldsets. */
+legend { border: 0; /* 1 */ padding: 0; /* 2 */ }
+
+/** 1. Correct font family not being inherited in all browsers. 2. Correct font size not being inherited in all browsers. 3. Address margins set differently in Firefox 4+, Safari 5, and Chrome. */
+button, input, select, textarea { font-family: inherit; /* 1 */ font-size: 100%; /* 2 */ margin: 0; /* 3 */ }
+
+/** Address Firefox 4+ setting `line-height` on `input` using `!important` in the UA stylesheet. */
+button, input { line-height: normal; }
+
+/** Address inconsistent `text-transform` inheritance for `button` and `select`. All other form control elements do not inherit `text-transform` values. Correct `button` style inheritance in Chrome, Safari 5+, and IE 8+. Correct `select` style inheritance in Firefox 4+ and Opera. */
+button, select { text-transform: none; }
+
+/** 1. Avoid the WebKit bug in Android 4.0.* where (2) destroys native `audio` and `video` controls. 2. Correct inability to style clickable `input` types in iOS. 3. Improve usability and consistency of cursor style between image-type `input` and others. */
+button, html input[type="button"], input[type="reset"], input[type="submit"] { -webkit-appearance: button; /* 2 */ cursor: pointer; /* 3 */ }
+
+/** Re-set default cursor for disabled elements. */
+button[disabled], html input[disabled] { cursor: default; }
+
+/** 1. Address box sizing set to `content-box` in IE 8/9. 2. Remove excess padding in IE 8/9. */
+input[type="checkbox"], input[type="radio"] { box-sizing: border-box; /* 1 */ padding: 0; /* 2 */ }
+
+/** 1. Address `appearance` set to `searchfield` in Safari 5 and Chrome. 2. Address `box-sizing` set to `border-box` in Safari 5 and Chrome (include `-moz` to future-proof). */
+input[type="search"] { -webkit-appearance: textfield; /* 1 */ -moz-box-sizing: content-box; -webkit-box-sizing: content-box; /* 2 */ box-sizing: content-box; }
+
+/** Remove inner padding and search cancel button in Safari 5 and Chrome on OS X. */
+input[type="search"]::-webkit-search-cancel-button, input[type="search"]::-webkit-search-decoration { -webkit-appearance: none; }
+
+/** Remove inner padding and border in Firefox 4+. */
+button::-moz-focus-inner, input::-moz-focus-inner { border: 0; padding: 0; }
+
+/** 1. Remove default vertical scrollbar in IE 8/9. 2. Improve readability and alignment in all browsers. */
+textarea { overflow: auto; /* 1 */ vertical-align: top; /* 2 */ }
+
+/* ========================================================================== Tables ========================================================================== */
+/** Remove most spacing between table cells. */
+table { border-collapse: collapse; border-spacing: 0; }
+
+meta.foundation-mq-small { font-family: "only screen and (min-width: 768px)"; width: 768px; }
+
+meta.foundation-mq-medium { font-family: "only screen and (min-width:1280px)"; width: 1280px; }
+
+meta.foundation-mq-large { font-family: "only screen and (min-width:1440px)"; width: 1440px; }
+
+*, *:before, *:after { -moz-box-sizing: border-box; -webkit-box-sizing: border-box; box-sizing: border-box; }
+
+html, body { font-size: 100%; }
+
+body { background: white; color: #222222; padding: 0; margin: 0; font-family: "Helvetica Neue", "Helvetica", Helvetica, Arial, sans-serif; font-weight: normal; font-style: normal; line-height: 1; position: relative; cursor: auto; }
+
+a:hover { cursor: pointer; }
+
+img, object, embed { max-width: 100%; height: auto; }
+
+object, embed { height: 100%; }
+
+img { -ms-interpolation-mode: bicubic; }
+
+#map_canvas img, #map_canvas embed, #map_canvas object, .map_canvas img, .map_canvas embed, .map_canvas object { max-width: none !important; }
+
+.left { float: left !important; }
+
+.right { float: right !important; }
+
+.text-left { text-align: left !important; }
+
+.text-right { text-align: right !important; }
+
+.text-center { text-align: center !important; }
+
+.text-justify { text-align: justify !important; }
+
+.hide { display: none; }
+
+.antialiased { -webkit-font-smoothing: antialiased; }
+
+img { display: inline-block; vertical-align: middle; }
+
+textarea { height: auto; min-height: 50px; }
+
+select { width: 100%; }
+
+object, svg { display: inline-block; vertical-align: middle; }
+
+.center { margin-left: auto; margin-right: auto; }
+
+.spread { width: 100%; }
+
+p.lead, .paragraph.lead > p, #preamble > .sectionbody > .paragraph:first-of-type p { font-size: 1.21875em; line-height: 1.6; }
+
+.subheader, .admonitionblock td.content > .title, .audioblock > .title, .exampleblock > .title, .imageblock > .title, .listingblock > .title, .literalblock > .title, .stemblock > .title, .openblock > .title, .paragraph > .title, .quoteblock > .title, table.tableblock > .title, .verseblock > .title, .videoblock > .title, .dlist > .title, .olist > .title, .ulist > .title, .qlist > .title, .hdlist > .title { line-height: 1.4; color: black; font-weight: 300; margin-top: 0.2em; margin-bottom: 0.5em; }
+
+/* Typography resets */
+div, dl, dt, dd, ul, ol, li, h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6, pre, form, p, blockquote, th, td { margin: 0; padding: 0; direction: ltr; }
+
+/* Default Link Styles */
+a { color: #0068b0; text-decoration: none; line-height: inherit; }
+a:hover, a:focus { color: #333333; }
+a img { border: none; }
+
+/* Default paragraph styles */
+p { font-family: Noto, sans-serif; font-weight: normal; font-size: 1em; line-height: 1.6; margin-bottom: 0.75em; text-rendering: optimizeLegibility; }
+p aside { font-size: 0.875em; line-height: 1.35; font-style: italic; }
+
+/* Default header styles */
+h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { font-family: Noto, sans-serif; font-weight: normal; font-style: normal; color: black; text-rendering: optimizeLegibility; margin-top: 0.5em; margin-bottom: 0.5em; line-height: 1.2125em; }
+h1 small, h2 small, h3 small, #toctitle small, .sidebarblock > .content > .title small, h4 small, h5 small, h6 small { font-size: 60%; color: #4d4d4d; line-height: 0; }
+
+h1 { font-size: 2.125em; }
+
+h2 { font-size: 1.6875em; }
+
+h3, #toctitle, .sidebarblock > .content > .title { font-size: 1.375em; }
+
+h4 { font-size: 1.125em; }
+
+h5 { font-size: 1.125em; }
+
+h6 { font-size: 1em; }
+
+hr { border: solid #dddddd; border-width: 1px 0 0; clear: both; margin: 1.25em 0 1.1875em; height: 0; }
+
+/* Helpful Typography Defaults */
+em, i { font-style: italic; line-height: inherit; }
+
+strong, b { font-weight: bold; line-height: inherit; }
+
+small { font-size: 60%; line-height: inherit; }
+
+code { font-family: Consolas, "Liberation Mono", Courier, monospace; font-weight: normal; color: #264357; }
+
+/* Lists */
+ul, ol, dl { font-size: 1em; line-height: 1.6; margin-bottom: 0.75em; list-style-position: outside; font-family: Noto, sans-serif; }
+
+ul, ol { margin-left: 1.5em; }
+ul.no-bullet, ol.no-bullet { margin-left: 1.5em; }
+
+/* Unordered Lists */
+ul li ul, ul li ol { margin-left: 1.25em; margin-bottom: 0; font-size: 1em; /* Override nested font-size change */ }
+ul.square li ul, ul.circle li ul, ul.disc li ul { list-style: inherit; }
+ul.square { list-style-type: square; }
+ul.circle { list-style-type: circle; }
+ul.disc { list-style-type: disc; }
+ul.no-bullet { list-style: none; }
+
+/* Ordered Lists */
+ol li ul, ol li ol { margin-left: 1.25em; margin-bottom: 0; }
+
+/* Definition Lists */
+dl dt { margin-bottom: 0.3em; font-weight: bold; }
+dl dd { margin-bottom: 0.75em; }
+
+/* Abbreviations */
+abbr, acronym { text-transform: uppercase; font-size: 90%; color: black; border-bottom: 1px dotted #dddddd; cursor: help; }
+
+abbr { text-transform: none; }
+
+/* Blockquotes */
+blockquote { margin: 0 0 0.75em; padding: 0.5625em 1.25em 0 1.1875em; border-left: 1px solid #dddddd; }
+blockquote cite { display: block; font-size: 0.8125em; color: #5e93b8; }
+blockquote cite:before { content: "\2014 \0020"; }
+blockquote cite a, blockquote cite a:visited { color: #5e93b8; }
+
+blockquote, blockquote p { line-height: 1.6; color: #333333; }
+
+/* Microformats */
+.vcard { display: inline-block; margin: 0 0 1.25em 0; border: 1px solid #dddddd; padding: 0.625em 0.75em; }
+.vcard li { margin: 0; display: block; }
+.vcard .fn { font-weight: bold; font-size: 0.9375em; }
+
+.vevent .summary { font-weight: bold; }
+.vevent abbr { cursor: auto; text-decoration: none; font-weight: bold; border: none; padding: 0 0.0625em; }
+
+@media only screen and (min-width: 768px) { h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { line-height: 1.4; }
+  h1 { font-size: 2.75em; }
+  h2 { font-size: 2.3125em; }
+  h3, #toctitle, .sidebarblock > .content > .title { font-size: 1.6875em; }
+  h4 { font-size: 1.4375em; } }
+/* Tables */
+table { background: white; margin-bottom: 1.25em; border: solid 1px #d8d8ce; }
+table thead, table tfoot { background: -webkit-linear-gradient(top, #add386, #90b66a); font-weight: bold; }
+table thead tr th, table thead tr td, table tfoot tr th, table tfoot tr td { padding: 0.5em 0.625em 0.625em; font-size: inherit; color: white; text-align: left; }
+table tr th, table tr td { padding: 0.5625em 0.625em; font-size: inherit; color: #6d6e71; }
+table tr.even, table tr.alt, table tr:nth-of-type(even) { background: #edf2f2; }
+table thead tr th, table tfoot tr th, table tbody tr td, table tr td, table tfoot tr td { display: table-cell; line-height: 1.4; }
+
+body { -moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; tab-size: 4; }
+
+h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { line-height: 1.4; }
+
+a:hover, a:focus { text-decoration: underline; }
+
+.clearfix:before, .clearfix:after, .float-group:before, .float-group:after { content: " "; display: table; }
+.clearfix:after, .float-group:after { clear: both; }
+
+*:not(pre) > code { font-size: inherit; font-style: normal !important; letter-spacing: 0; padding: 0; background-color: white; -webkit-border-radius: 0; border-radius: 0; line-height: inherit; word-wrap: break-word; }
+*:not(pre) > code.nobreak { word-wrap: normal; }
+*:not(pre) > code.nowrap { white-space: nowrap; }
+
+pre, pre > code { line-height: 1.6; color: #264357; font-family: Consolas, "Liberation Mono", Courier, monospace; font-weight: normal; }
+
+em em { font-style: normal; }
+
+strong strong { font-weight: normal; }
+
+.keyseq { color: #333333; }
+
+kbd { font-family: Consolas, "Liberation Mono", Courier, monospace; display: inline-block; color: black; font-size: 0.65em; line-height: 1.45; background-color: #f7f7f7; border: 1px solid #ccc; -webkit-border-radius: 3px; border-radius: 3px; -webkit-box-shadow: 0 1px 0 rgba(0, 0, 0, 0.2), 0 0 0 0.1em white inset; box-shadow: 0 1px 0 rgba(0, 0, 0, 0.2), 0 0 0 0.1em white inset; margin: 0 0.15em; padding: 0.2em 0.5em; vertical-align: middle; position: relative; top: -0.1em; white-space: nowrap; }
+
+.keyseq kbd:first-child { margin-left: 0; }
+
+.keyseq kbd:last-child { margin-right: 0; }
+
+.menuseq, .menuref { color: #000; }
+
+.menuseq b:not(.caret), .menuref { font-weight: inherit; }
+
+.menuseq { word-spacing: -0.02em; }
+.menuseq b.caret { font-size: 1.25em; line-height: 0.8; }
+.menuseq i.caret { font-weight: bold; text-align: center; width: 0.45em; }
+
+b.button:before, b.button:after { position: relative; top: -1px; font-weight: normal; }
+
+b.button:before { content: "["; padding: 0 3px 0 2px; }
+
+b.button:after { content: "]"; padding: 0 2px 0 3px; }
+
+#header, #content, #footnotes, #footer { width: 100%; margin-left: auto; margin-right: auto; margin-top: 0; margin-bottom: 0; max-width: 62.5em; *zoom: 1; position: relative; padding-left: 1.5em; padding-right: 1.5em; }
+#header:before, #header:after, #content:before, #content:after, #footnotes:before, #footnotes:after, #footer:before, #footer:after { content: " "; display: table; }
+#header:after, #content:after, #footnotes:after, #footer:after { clear: both; }
+
+#content { margin-top: 1.25em; }
+
+#content:before { content: none; }
+
+#header > h1:first-child { color: black; margin-top: 2.25rem; margin-bottom: 0; }
+#header > h1:first-child + #toc { margin-top: 8px; border-top: 1px solid #dddddd; }
+#header > h1:only-child, body.toc2 #header > h1:nth-last-child(2) { border-bottom: 1px solid #dddddd; padding-bottom: 8px; }
+#header .details { border-bottom: 1px solid #dddddd; line-height: 1.45; padding-top: 0.25em; padding-bottom: 0.25em; padding-left: 0.25em; color: #5e93b8; display: -ms-flexbox; display: -webkit-flex; display: flex; -ms-flex-flow: row wrap; -webkit-flex-flow: row wrap; flex-flow: row wrap; }
+#header .details span:first-child { margin-left: -0.125em; }
+#header .details span.email a { color: #333333; }
+#header .details br { display: none; }
+#header .details br + span:before { content: "\00a0\2013\00a0"; }
+#header .details br + span.author:before { content: "\00a0\22c5\00a0"; color: #333333; }
+#header .details br + span#revremark:before { content: "\00a0|\00a0"; }
+#header #revnumber { text-transform: capitalize; }
+#header #revnumber:after { content: "\00a0"; }
+
+#content > h1:first-child:not([class]) { color: black; border-bottom: 1px solid #dddddd; padding-bottom: 8px; margin-top: 0; padding-top: 1rem; margin-bottom: 1.25rem; }
+
+#toc { border-bottom: 0 solid #dddddd; padding-bottom: 0.5em; }
+#toc > ul { margin-left: 0.125em; }
+#toc ul.sectlevel0 > li > a { font-style: italic; }
+#toc ul.sectlevel0 ul.sectlevel1 { margin: 0.5em 0; }
+#toc ul { font-family: Noto, sans-serif; list-style-type: none; }
+#toc li { line-height: 1.3334; margin-top: 0.3334em; }
+#toc a { text-decoration: none; }
+#toc a:active { text-decoration: underline; }
+
+#toctitle { color: black; font-size: 1.2em; }
+
+@media only screen and (min-width: 768px) { #toctitle { font-size: 1.375em; }
+  body.toc2 { padding-left: 15em; padding-right: 0; }
+  #toc.toc2 { margin-top: 0 !important; background-color: white; position: fixed; width: 15em; left: 0; top: 0; border-right: 1px solid #dddddd; border-top-width: 0 !important; border-bottom-width: 0 !important; z-index: 1000; padding: 1.25em 1em; height: 100%; overflow: auto; }
+  #toc.toc2 #toctitle { margin-top: 0; margin-bottom: 0.8rem; font-size: 1.2em; }
+  #toc.toc2 > ul { font-size: 0.9em; margin-bottom: 0; }
+  #toc.toc2 ul ul { margin-left: 0; padding-left: 1em; }
+  #toc.toc2 ul.sectlevel0 ul.sectlevel1 { padding-left: 0; margin-top: 0.5em; margin-bottom: 0.5em; }
+  body.toc2.toc-right { padding-left: 0; padding-right: 15em; }
+  body.toc2.toc-right #toc.toc2 { border-right-width: 0; border-left: 1px solid #dddddd; left: auto; right: 0; } }
+@media only screen and (min-width: 1280px) { body.toc2 { padding-left: 20em; padding-right: 0; }
+  #toc.toc2 { width: 20em; }
+  #toc.toc2 #toctitle { font-size: 1.375em; }
+  #toc.toc2 > ul { font-size: 0.95em; }
+  #toc.toc2 ul ul { padding-left: 1.25em; }
+  body.toc2.toc-right { padding-left: 0; padding-right: 20em; } }
+#content #toc { border-style: solid; border-width: 1px; border-color: #e6e6e6; margin-bottom: 1.25em; padding: 1.25em; background: white; -webkit-border-radius: 0; border-radius: 0; }
+#content #toc > :first-child { margin-top: 0; }
+#content #toc > :last-child { margin-bottom: 0; }
+
+#footer { max-width: 100%; background-color: none; padding: 1.25em; }
+
+#footer-text { color: black; line-height: 1.44; }
+
+#content { margin-bottom: 0.625em; }
+
+.sect1 { padding-bottom: 0.625em; }
+
+@media only screen and (min-width: 768px) { #content { margin-bottom: 1.25em; }
+  .sect1 { padding-bottom: 1.25em; } }
+.sect1:last-child { padding-bottom: 0; }
+
+.sect1 + .sect1 { border-top: 0 solid #dddddd; }
+
+#content h1 > a.anchor, h2 > a.anchor, h3 > a.anchor, #toctitle > a.anchor, .sidebarblock > .content > .title > a.anchor, h4 > a.anchor, h5 > a.anchor, h6 > a.anchor { position: absolute; z-index: 1001; width: 1.5ex; margin-left: -1.5ex; display: block; text-decoration: none !important; visibility: hidden; text-align: center; font-weight: normal; }
+#content h1 > a.anchor:before, h2 > a.anchor:before, h3 > a.anchor:before, #toctitle > a.anchor:before, .sidebarblock > .content > .title > a.anchor:before, h4 > a.anchor:before, h5 > a.anchor:before, h6 > a.anchor:before { content: "\00A7"; font-size: 0.85em; display: block; padding-top: 0.1em; }
+#content h1:hover > a.anchor, #content h1 > a.anchor:hover, h2:hover > a.anchor, h2 > a.anchor:hover, h3:hover > a.anchor, #toctitle:hover > a.anchor, .sidebarblock > .content > .title:hover > a.anchor, h3 > a.anchor:hover, #toctitle > a.anchor:hover, .sidebarblock > .content > .title > a.anchor:hover, h4:hover > a.anchor, h4 > a.anchor:hover, h5:hover > a.anchor, h5 > a.anchor:hover, h6:hover > a.anchor, h6 > a.anchor:hover { visibility: visible; }
+#content h1 > a.link, h2 > a.link, h3 > a.link, #toctitle > a.link, .sidebarblock > .content > .title > a.link, h4 > a.link, h5 > a.link, h6 > a.link { color: black; text-decoration: none; }
+#content h1 > a.link:hover, h2 > a.link:hover, h3 > a.link:hover, #toctitle > a.link:hover, .sidebarblock > .content > .title > a.link:hover, h4 > a.link:hover, h5 > a.link:hover, h6 > a.link:hover { color: black; }
+
+.audioblock, .imageblock, .literalblock, .listingblock, .stemblock, .videoblock { margin-bottom: 1.25em; }
+
+.admonitionblock td.content > .title, .audioblock > .title, .exampleblock > .title, .imageblock > .title, .listingblock > .title, .literalblock > .title, .stemblock > .title, .openblock > .title, .paragraph > .title, .quoteblock > .title, table.tableblock > .title, .verseblock > .title, .videoblock > .title, .dlist > .title, .olist > .title, .ulist > .title, .qlist > .title, .hdlist > .title { text-rendering: optimizeLegibility; text-align: left; }
+
+table.tableblock > caption.title { white-space: nowrap; overflow: visible; max-width: 0; }
+
+.paragraph.lead > p, #preamble > .sectionbody > .paragraph:first-of-type p { color: black; }
+
+table.tableblock #preamble > .sectionbody > .paragraph:first-of-type p { font-size: inherit; }
+
+.admonitionblock > table { border-collapse: separate; border: 0; background: none; width: 100%; }
+.admonitionblock > table td.icon { text-align: center; width: 80px; }
+.admonitionblock > table td.icon img { max-width: initial; }
+.admonitionblock > table td.icon .title { font-weight: bold; font-family: Noto, sans-serif; text-transform: uppercase; }
+.admonitionblock > table td.content { padding-left: 1.125em; padding-right: 1.25em; border-left: 1px solid #dddddd; color: #5e93b8; }
+.admonitionblock > table td.content > :last-child > :last-child { margin-bottom: 0; }
+
+.exampleblock > .content { border-style: solid; border-width: 1px; border-color: #e6e6e6; margin-bottom: 1.25em; padding: 1.25em; background: white; -webkit-border-radius: 0; border-radius: 0; }
+.exampleblock > .content > :first-child { margin-top: 0; }
+.exampleblock > .content > :last-child { margin-bottom: 0; }
+
+.sidebarblock { border-style: solid; border-width: 1px; border-color: #e6e6e6; margin-bottom: 1.25em; padding: 1.25em; background: white; -webkit-border-radius: 0; border-radius: 0; }
+.sidebarblock > :first-child { margin-top: 0; }
+.sidebarblock > :last-child { margin-bottom: 0; }
+.sidebarblock > .content > .title { color: black; margin-top: 0; }
+
+.exampleblock > .content > :last-child > :last-child, .exampleblock > .content .olist > ol > li:last-child > :last-child, .exampleblock > .content .ulist > ul > li:last-child > :last-child, .exampleblock > .content .qlist > ol > li:last-child > :last-child, .sidebarblock > .content > :last-child > :last-child, .sidebarblock > .content .olist > ol > li:last-child > :last-child, .sidebarblock > .content .ulist > ul > li:last-child > :last-child, .sidebarblock > .content .qlist > ol > li:last-child > :last-child { margin-bottom: 0; }
+
+.literalblock pre, .listingblock pre:not(.highlight), .listingblock pre[class="highlight"], .listingblock pre[class^="highlight "], .listingblock pre.CodeRay, .listingblock pre.prettyprint { background: #eeeeee; }
+.sidebarblock .literalblock pre, .sidebarblock .listingblock pre:not(.highlight), .sidebarblock .listingblock pre[class="highlight"], .sidebarblock .listingblock pre[class^="highlight "], .sidebarblock .listingblock pre.CodeRay, .sidebarblock .listingblock pre.prettyprint { background: #f2f1f1; }
+
+.literalblock pre, .literalblock pre[class], .listingblock pre, .listingblock pre[class] { border: 1px hidden #666666; -webkit-border-radius: 0; border-radius: 0; word-wrap: break-word; padding: 1.25em 1.5625em 1.125em 1.5625em; font-size: 0.8125em; }
+.literalblock pre.nowrap, .literalblock pre[class].nowrap, .listingblock pre.nowrap, .listingblock pre[class].nowrap { overflow-x: auto; white-space: pre; word-wrap: normal; }
+@media only screen and (min-width: 768px) { .literalblock pre, .literalblock pre[class], .listingblock pre, .listingblock pre[class] { font-size: 0.90625em; } }
+@media only screen and (min-width: 1280px) { .literalblock pre, .literalblock pre[class], .listingblock pre, .listingblock pre[class] { font-size: 1em; } }
+
+.literalblock.output pre { color: #eeeeee; background-color: #264357; }
+
+.listingblock pre.highlightjs { padding: 0; }
+.listingblock pre.highlightjs > code { padding: 1.25em 1.5625em 1.125em 1.5625em; -webkit-border-radius: 0; border-radius: 0; }
+
+.listingblock > .content { position: relative; }
+
+.listingblock code[data-lang]:before { display: none; content: attr(data-lang); position: absolute; font-size: 0.75em; top: 0.425rem; right: 0.5rem; line-height: 1; text-transform: uppercase; color: #999; }
+
+.listingblock:hover code[data-lang]:before { display: block; }
+
+.listingblock.terminal pre .command:before { content: attr(data-prompt); padding-right: 0.5em; color: #999; }
+
+.listingblock.terminal pre .command:not([data-prompt]):before { content: "$"; }
+
+table.pyhltable { border-collapse: separate; border: 0; margin-bottom: 0; background: none; }
+
+table.pyhltable td { vertical-align: top; padding-top: 0; padding-bottom: 0; line-height: 1.6; }
+
+table.pyhltable td.code { padding-left: .75em; padding-right: 0; }
+
+pre.pygments .lineno, table.pyhltable td:not(.code) { color: #999; padding-left: 0; padding-right: .5em; border-right: 1px solid #dddddd; }
+
+pre.pygments .lineno { display: inline-block; margin-right: .25em; }
+
+table.pyhltable .linenodiv { background: none !important; padding-right: 0 !important; }
+
+.quoteblock { margin: 0 1em 0.75em 1.5em; display: table; }
+.quoteblock > .title { margin-left: -1.5em; margin-bottom: 0.75em; }
+.quoteblock blockquote, .quoteblock blockquote p { color: #333333; font-size: 1.15rem; line-height: 1.75; word-spacing: 0.1em; letter-spacing: 0; font-style: italic; text-align: justify; }
+.quoteblock blockquote { margin: 0; padding: 0; border: 0; }
+.quoteblock blockquote:before { content: "\201c"; float: left; font-size: 2.75em; font-weight: bold; line-height: 0.6em; margin-left: -0.6em; color: black; text-shadow: 0 1px 2px rgba(0, 0, 0, 0.1); }
+.quoteblock blockquote > .paragraph:last-child p { margin-bottom: 0; }
+.quoteblock .attribution { margin-top: 0.5em; margin-right: 0.5ex; text-align: right; }
+.quoteblock .quoteblock { margin-left: 0; margin-right: 0; padding: 0.5em 0; border-left: 3px solid #5e93b8; }
+.quoteblock .quoteblock blockquote { padding: 0 0 0 0.75em; }
+.quoteblock .quoteblock blockquote:before { display: none; }
+
+.verseblock { margin: 0 1em 0.75em 1em; }
+.verseblock pre { font-family: "Open Sans", "DejaVu Sans", sans; font-size: 1.15rem; color: #333333; font-weight: 300; text-rendering: optimizeLegibility; }
+.verseblock pre strong { font-weight: 400; }
+.verseblock .attribution { margin-top: 1.25rem; margin-left: 0.5ex; }
+
+.quoteblock .attribution, .verseblock .attribution { font-size: 0.8125em; line-height: 1.45; font-style: italic; }
+.quoteblock .attribution br, .verseblock .attribution br { display: none; }
+.quoteblock .attribution cite, .verseblock .attribution cite { display: block; letter-spacing: -0.025em; color: #5e93b8; }
+
+.quoteblock.abstract { margin: 0 0 0.75em 0; display: block; }
+.quoteblock.abstract blockquote, .quoteblock.abstract blockquote p { text-align: left; word-spacing: 0; }
+.quoteblock.abstract blockquote:before, .quoteblock.abstract blockquote p:first-of-type:before { display: none; }
+
+table.tableblock { max-width: 100%; border-collapse: separate; }
+table.tableblock td > .paragraph:last-child p > p:last-child, table.tableblock th > p:last-child, table.tableblock td > p:last-child { margin-bottom: 0; }
+
+table.tableblock, th.tableblock, td.tableblock { border: 0 solid #d8d8ce; }
+
+table.grid-all > thead > tr > .tableblock, table.grid-all > tbody > tr > .tableblock { border-width: 0 1px 1px 0; }
+
+table.grid-all > tfoot > tr > .tableblock { border-width: 1px 1px 0 0; }
+
+table.grid-cols > * > tr > .tableblock { border-width: 0 1px 0 0; }
+
+table.grid-rows > thead > tr > .tableblock, table.grid-rows > tbody > tr > .tableblock { border-width: 0 0 1px 0; }
+
+table.grid-rows > tfoot > tr > .tableblock { border-width: 1px 0 0 0; }
+
+table.grid-all > * > tr > .tableblock:last-child, table.grid-cols > * > tr > .tableblock:last-child { border-right-width: 0; }
+
+table.grid-all > tbody > tr:last-child > .tableblock, table.grid-all > thead:last-child > tr > .tableblock, table.grid-rows > tbody > tr:last-child > .tableblock, table.grid-rows > thead:last-child > tr > .tableblock { border-bottom-width: 0; }
+
+table.frame-all { border-width: 1px; }
+
+table.frame-sides { border-width: 0 1px; }
+
+table.frame-topbot { border-width: 1px 0; }
+
+th.halign-left, td.halign-left { text-align: left; }
+
+th.halign-right, td.halign-right { text-align: right; }
+
+th.halign-center, td.halign-center { text-align: center; }
+
+th.valign-top, td.valign-top { vertical-align: top; }
+
+th.valign-bottom, td.valign-bottom { vertical-align: bottom; }
+
+th.valign-middle, td.valign-middle { vertical-align: middle; }
+
+table thead th, table tfoot th { font-weight: bold; }
+
+tbody tr th { display: table-cell; line-height: 1.4; background: -webkit-linear-gradient(top, #add386, #90b66a); }
+
+tbody tr th, tbody tr th p, tfoot tr th, tfoot tr th p { color: white; font-weight: bold; }
+
+p.tableblock > code:only-child { background: none; padding: 0; }
+
+p.tableblock { font-size: 1em; }
+
+td > div.verse { white-space: pre; }
+
+ol { margin-left: 1.75em; }
+
+ul li ol { margin-left: 1.5em; }
+
+dl dd { margin-left: 1.125em; }
+
+dl dd:last-child, dl dd:last-child > :last-child { margin-bottom: 0; }
+
+ol > li p, ul > li p, ul dd, ol dd, .olist .olist, .ulist .ulist, .ulist .olist, .olist .ulist { margin-bottom: 0.375em; }
+
+ul.checklist, ul.none, ol.none, ul.no-bullet, ol.no-bullet, ol.unnumbered, ul.unstyled, ol.unstyled { list-style-type: none; }
+
+ul.no-bullet, ol.no-bullet, ol.unnumbered { margin-left: 0.625em; }
+
+ul.unstyled, ol.unstyled { margin-left: 0; }
+
+ul.checklist { margin-left: 0.625em; }
+
+ul.checklist li > p:first-child > .fa-square-o:first-child, ul.checklist li > p:first-child > .fa-check-square-o:first-child { width: 1.25em; font-size: 0.8em; position: relative; bottom: 0.125em; }
+
+ul.checklist li > p:first-child > input[type="checkbox"]:first-child { margin-right: 0.25em; }
+
+ul.inline { display: -ms-flexbox; display: -webkit-box; display: flex; -ms-flex-flow: row wrap; -webkit-flex-flow: row wrap; flex-flow: row wrap; list-style: none; margin: 0 0 0.375em -0.75em; }
+
+ul.inline > li { margin-left: 0.75em; }
+
+.unstyled dl dt { font-weight: normal; font-style: normal; }
+
+ol.arabic { list-style-type: decimal; }
+
+ol.decimal { list-style-type: decimal-leading-zero; }
+
+ol.loweralpha { list-style-type: lower-alpha; }
+
+ol.upperalpha { list-style-type: upper-alpha; }
+
+ol.lowerroman { list-style-type: lower-roman; }
+
+ol.upperroman { list-style-type: upper-roman; }
+
+ol.lowergreek { list-style-type: lower-greek; }
+
+.hdlist > table, .colist > table { border: 0; background: none; }
+.hdlist > table > tbody > tr, .colist > table > tbody > tr { background: none; }
+
+td.hdlist1, td.hdlist2 { vertical-align: top; padding: 0 0.625em; }
+
+td.hdlist1 { font-weight: bold; padding-bottom: 0.75em; }
+
+.literalblock + .colist, .listingblock + .colist { margin-top: -0.5em; }
+
+.colist > table tr > td:first-of-type { padding: 0.4em 0.75em 0 0.75em; line-height: 1; vertical-align: top; }
+.colist > table tr > td:first-of-type img { max-width: initial; }
+.colist > table tr > td:last-of-type { padding: 0.25em 0; }
+
+.thumb, .th { line-height: 0; display: inline-block; border: solid 4px white; -webkit-box-shadow: 0 0 0 1px #dddddd; box-shadow: 0 0 0 1px #dddddd; }
+
+.imageblock.left, .imageblock[style*="float: left"] { margin: 0.25em 0.625em 1.25em 0; }
+.imageblock.right, .imageblock[style*="float: right"] { margin: 0.25em 0 1.25em 0.625em; }
+.imageblock > .title { margin-bottom: 0; }
+.imageblock.thumb, .imageblock.th { border-width: 6px; }
+.imageblock.thumb > .title, .imageblock.th > .title { padding: 0 0.125em; }
+
+.image.left, .image.right { margin-top: 0.25em; margin-bottom: 0.25em; display: inline-block; line-height: 0; }
+.image.left { margin-right: 0.625em; }
+.image.right { margin-left: 0.625em; }
+
+a.image { text-decoration: none; display: inline-block; }
+a.image object { pointer-events: none; }
+
+sup.footnote, sup.footnoteref { font-size: 0.875em; position: static; vertical-align: super; }
+sup.footnote a, sup.footnoteref a { text-decoration: none; }
+sup.footnote a:active, sup.footnoteref a:active { text-decoration: underline; }
+
+#footnotes { padding-top: 0.75em; padding-bottom: 0.75em; margin-bottom: 0.625em; }
+#footnotes hr { width: 20%; min-width: 6.25em; margin: -0.25em 0 0.75em 0; border-width: 1px 0 0 0; }
+#footnotes .footnote { padding: 0 0.375em 0 0.225em; line-height: 1.3334; font-size: 0.875em; margin-left: 1.2em; margin-bottom: 0.2em; }
+#footnotes .footnote a:first-of-type { font-weight: bold; text-decoration: none; margin-left: -1.05em; }
+#footnotes .footnote:last-of-type { margin-bottom: 0; }
+#content #footnotes { margin-top: -0.625em; margin-bottom: 0; padding: 0.75em 0; }
+
+.gist .file-data > table { border: 0; background: #fff; width: 100%; margin-bottom: 0; }
+.gist .file-data > table td.line-data { width: 99%; }
+
+div.unbreakable { page-break-inside: avoid; }
+
+.big { font-size: larger; }
+
+.small { font-size: smaller; }
+
+.underline { text-decoration: underline; }
+
+.overline { text-decoration: overline; }
+
+.line-through { text-decoration: line-through; }
+
+.aqua { color: #00bfbf; }
+
+.aqua-background { background-color: #00fafa; }
+
+.black { color: black; }
+
+.black-background { background-color: black; }
+
+.blue { color: #0000bf; }
+
+.blue-background { background-color: #0000fa; }
+
+.fuchsia { color: #bf00bf; }
+
+.fuchsia-background { background-color: #fa00fa; }
+
+.gray { color: #606060; }
+
+.gray-background { background-color: #7d7d7d; }
+
+.green { color: #006000; }
+
+.green-background { background-color: #007d00; }
+
+.lime { color: #00bf00; }
+
+.lime-background { background-color: #00fa00; }
+
+.maroon { color: #600000; }
+
+.maroon-background { background-color: #7d0000; }
+
+.navy { color: #000060; }
+
+.navy-background { background-color: #00007d; }
+
+.olive { color: #606000; }
+
+.olive-background { background-color: #7d7d00; }
+
+.purple { color: #600060; }
+
+.purple-background { background-color: #7d007d; }
+
+.red { color: #bf0000; }
+
+.red-background { background-color: #fa0000; }
+
+.silver { color: #909090; }
+
+.silver-background { background-color: #bcbcbc; }
+
+.teal { color: #006060; }
+
+.teal-background { background-color: #007d7d; }
+
+.white { color: #bfbfbf; }
+
+.white-background { background-color: #fafafa; }
+
+.yellow { color: #bfbf00; }
+
+.yellow-background { background-color: #fafa00; }
+
+span.icon > .fa { cursor: default; }
+a span.icon > .fa { cursor: inherit; }
+
+.admonitionblock td.icon [class^="fa icon-"] { font-size: 2.5em; text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.5); cursor: default; }
+.admonitionblock td.icon .icon-note:before { content: "\f05a"; color: #29475c; }
+.admonitionblock td.icon .icon-tip:before { content: "\f0eb"; text-shadow: 1px 1px 2px rgba(155, 155, 0, 0.8); color: #111; }
+.admonitionblock td.icon .icon-warning:before { content: "\f071"; color: #bf6900; }
+.admonitionblock td.icon .icon-caution:before { content: "\f06d"; color: #bf3400; }
+.admonitionblock td.icon .icon-important:before { content: "\f06a"; color: #bf0000; }
+
+.conum[data-value] { display: inline-block; color: #fff !important; background-color: black; -webkit-border-radius: 100px; border-radius: 100px; text-align: center; font-size: 0.75em; width: 1.67em; height: 1.67em; line-height: 1.67em; font-family: "Open Sans", "DejaVu Sans", sans-serif; font-style: normal; font-weight: bold; }
+.conum[data-value] * { color: #fff !important; }
+.conum[data-value] + b { display: none; }
+.conum[data-value]:after { content: attr(data-value); }
+pre .conum[data-value] { position: relative; top: -0.125em; }
+
+b.conum * { color: inherit !important; }
+
+.conum:not([data-value]):empty { display: none; }
+
+h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { border-bottom: 1px solid #dddddd; }
+
+.sect1 { padding-bottom: 0; }
+
+#toctitle { color: #00406F; font-weight: normal; margin-top: 1.5em; }
+
+.sidebarblock { border-color: #aaa; }
+
+code { -webkit-border-radius: 4px; border-radius: 4px; }
+
+p.tableblock.header { color: #6d6e71; }
+
+.literalblock pre, .listingblock pre { background: #eeeeee; }
+
+</style>
+<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css">
+<style>
+/* Stylesheet for CodeRay to match GitHub theme | MIT License | http://foundation.zurb.com */
+/*pre.CodeRay {background-color:#f7f7f8;}*/
+.CodeRay .line-numbers{border-right:1px solid #d8d8d8;padding:0 0.5em 0 .25em}
+.CodeRay span.line-numbers{display:inline-block;margin-right:.5em;color:rgba(0,0,0,.3)}
+.CodeRay .line-numbers strong{color:rgba(0,0,0,.4)}
+table.CodeRay{border-collapse:separate;border-spacing:0;margin-bottom:0;border:0;background:none}
+table.CodeRay td{vertical-align: top;line-height:1.45}
+table.CodeRay td.line-numbers{text-align:right}
+table.CodeRay td.line-numbers>pre{padding:0;color:rgba(0,0,0,.3)}
+table.CodeRay td.code{padding:0 0 0 .5em}
+table.CodeRay td.code>pre{padding:0}
+.CodeRay .debug{color:#fff !important;background:#000080 !important}
+.CodeRay .annotation{color:#007}
+.CodeRay .attribute-name{color:#000080}
+.CodeRay .attribute-value{color:#700}
+.CodeRay .binary{color:#509}
+.CodeRay .comment{color:#998;font-style:italic}
+.CodeRay .char{color:#04d}
+.CodeRay .char .content{color:#04d}
+.CodeRay .char .delimiter{color:#039}
+.CodeRay .class{color:#458;font-weight:bold}
+.CodeRay .complex{color:#a08}
+.CodeRay .constant,.CodeRay .predefined-constant{color:#008080}
+.CodeRay .color{color:#099}
+.CodeRay .class-variable{color:#369}
+.CodeRay .decorator{color:#b0b}
+.CodeRay .definition{color:#099}
+.CodeRay .delimiter{color:#000}
+.CodeRay .doc{color:#970}
+.CodeRay .doctype{color:#34b}
+.CodeRay .doc-string{color:#d42}
+.CodeRay .escape{color:#666}
+.CodeRay .entity{color:#800}
+.CodeRay .error{color:#808}
+.CodeRay .exception{color:inherit}
+.CodeRay .filename{color:#099}
+.CodeRay .function{color:#900;font-weight:bold}
+.CodeRay .global-variable{color:#008080}
+.CodeRay .hex{color:#058}
+.CodeRay .integer,.CodeRay .float{color:#099}
+.CodeRay .include{color:#555}
+.CodeRay .inline{color:#000}
+.CodeRay .inline .inline{background:#ccc}
+.CodeRay .inline .inline .inline{background:#bbb}
+.CodeRay .inline .inline-delimiter{color:#d14}
+.CodeRay .inline-delimiter{color:#d14}
+.CodeRay .important{color:#555;font-weight:bold}
+.CodeRay .interpreted{color:#b2b}
+.CodeRay .instance-variable{color:#008080}
+.CodeRay .label{color:#970}
+.CodeRay .local-variable{color:#963}
+.CodeRay .octal{color:#40e}
+.CodeRay .predefined{color:#369}
+.CodeRay .preprocessor{color:#579}
+.CodeRay .pseudo-class{color:#555}
+.CodeRay .directive{font-weight:bold}
+.CodeRay .type{font-weight:bold}
+.CodeRay .predefined-type{color:inherit}
+.CodeRay .reserved,.CodeRay .keyword {color:#000;font-weight:bold}
+.CodeRay .key{color:#808}
+.CodeRay .key .delimiter{color:#606}
+.CodeRay .key .char{color:#80f}
+.CodeRay .value{color:#088}
+.CodeRay .regexp .delimiter{color:#808}
+.CodeRay .regexp .content{color:#808}
+.CodeRay .regexp .modifier{color:#808}
+.CodeRay .regexp .char{color:#d14}
+.CodeRay .regexp .function{color:#404;font-weight:bold}
+.CodeRay .string{color:#d20}
+.CodeRay .string .string .string{background:#ffd0d0}
+.CodeRay .string .content{color:#d14}
+.CodeRay .string .char{color:#d14}
+.CodeRay .string .delimiter{color:#d14}
+.CodeRay .shell{color:#d14}
+.CodeRay .shell .delimiter{color:#d14}
+.CodeRay .symbol{color:#990073}
+.CodeRay .symbol .content{color:#a60}
+.CodeRay .symbol .delimiter{color:#630}
+.CodeRay .tag{color:#008080}
+.CodeRay .tag-special{color:#d70}
+.CodeRay .variable{color:#036}
+.CodeRay .insert{background:#afa}
+.CodeRay .delete{background:#faa}
+.CodeRay .change{color:#aaf;background:#007}
+.CodeRay .head{color:#f8f;background:#505}
+.CodeRay .insert .insert{color:#080}
+.CodeRay .delete .delete{color:#800}
+.CodeRay .change .change{color:#66f}
+.CodeRay .head .head{color:#f4f}
+</style>
+<link rel="stylesheet" href="../katex/katex.min.css">
+<script src="../katex/katex.min.js"></script>
+<script src="../katex/contrib/auto-render.min.js"></script>
+    <!-- Use KaTeX to render math once document is loaded, see
+         https://github.com/Khan/KaTeX/tree/master/contrib/auto-render -->
+<script>
+    document.addEventListener("DOMContentLoaded", function () {
+        renderMathInElement(
+            document.body,
+            {
+                delimiters: [
+                    { left: "$$", right: "$$", display: true},
+                    { left: "\\[", right: "\\]", display: true},
+                    { left: "$", right: "$", display: false},
+                    { left: "\\(", right: "\\)", display: false}
+                ]
+            }
+        );
+    });
+</script></head>
+<body class="book toc2 toc-left" style="max-width: 100;">
+<div id="header">
+<h1>The OpenCL<sup>&#8482;</sup> Specification</h1>
+<div class="details">
+<span id="author" class="author">Khronos OpenCL Working Group</span><br>
+<span id="revnumber">version 2.2-7,</span>
+<span id="revdate">Sat, 12 May 2018 13:21:27 +0000</span>
+<br><span id="revremark">from git branch: master commit: ab6da3001e9eeafaa36c18888ca7eb4ebb9768af</span>
+</div>
+<div id="toc" class="toc2">
+<div id="toctitle">Table of Contents</div>
+<ul class="sectlevel1">
+<li><a href="#_introduction">1. Introduction</a>
+<ul class="sectlevel2">
+<li><a href="#_normative_references">1.1. Normative References</a></li>
+<li><a href="#_version_numbers">1.2. Version Numbers</a></li>
+</ul>
+</li>
+<li><a href="#_glossary">2. Glossary</a></li>
+<li><a href="#_the_opencl_architecture">3. The OpenCL Architecture</a>
+<ul class="sectlevel2">
+<li><a href="#_platform_model">3.1. Platform Model</a></li>
+<li><a href="#_execution_model">3.2. Execution Model</a></li>
+<li><a href="#_memory_model">3.3. Memory Model</a></li>
+<li><a href="#opencl-framework">3.4. The OpenCL Framework</a></li>
+</ul>
+</li>
+<li><a href="#opencl-platform-layer">4. The OpenCL Platform Layer</a>
+<ul class="sectlevel2">
+<li><a href="#_querying_platform_info">4.1. Querying Platform Info</a></li>
+<li><a href="#platform-querying-devices">4.2. Querying Devices</a></li>
+<li><a href="#_partitioning_a_device">4.3. Partitioning a Device</a></li>
+<li><a href="#_contexts">4.4. Contexts</a></li>
+</ul>
+</li>
+<li><a href="#opencl-runtime">5. The OpenCL Runtime</a>
+<ul class="sectlevel2">
+<li><a href="#_command_queues">5.1. Command Queues</a></li>
+<li><a href="#_buffer_objects">5.2. Buffer Objects</a></li>
+<li><a href="#_image_objects">5.3. Image Objects</a></li>
+<li><a href="#_pipes">5.4. Pipes</a></li>
+<li><a href="#_querying_unmapping_migrating_retaining_and_releasing_memory_objects">5.5. Querying, Unmapping, Migrating, Retaining and Releasing Memory Objects</a></li>
+<li><a href="#_shared_virtual_memory">5.6. Shared Virtual Memory</a></li>
+<li><a href="#_sampler_objects">5.7. Sampler Objects</a></li>
+<li><a href="#_program_objects">5.8. Program Objects</a></li>
+<li><a href="#_kernel_objects">5.9. Kernel Objects</a></li>
+<li><a href="#_executing_kernels">5.10. Executing Kernels</a></li>
+<li><a href="#event-objects">5.11. Event Objects</a></li>
+<li><a href="#markers-barriers-waiting-for-events">5.12. Markers, Barriers and Waiting for Events</a></li>
+<li><a href="#_out_of_order_execution_of_kernels_and_memory_object_commands">5.13. Out-of-order Execution of Kernels and Memory Object Commands</a></li>
+<li><a href="#profiling-operations">5.14. Profiling Operations on Memory Objects and Kernels</a></li>
+<li><a href="#_flush_and_finish">5.15. Flush and Finish</a></li>
+</ul>
+</li>
+<li><a href="#_associated_opencl_specification">6. Associated OpenCL specification</a>
+<ul class="sectlevel2">
+<li><a href="#spirv-il">6.1. SPIR-V Intermediate language</a></li>
+<li><a href="#opencl-extensions">6.2. Extensions to OpenCL</a></li>
+<li><a href="#_support_for_earlier_opencl_c_kernel_languages">6.3. Support for earlier OpenCL C kernel languages</a></li>
+</ul>
+</li>
+<li><a href="#opencl-embedded-profile">7. OpenCL Embedded Profile</a></li>
+<li><a href="#_shared_objects_thread_safety">Appendix A: Shared Objects, Thread Safety</a>
+<ul class="sectlevel2">
+<li><a href="#shared-opencl-objects">Shared OpenCL Objects</a></li>
+<li><a href="#_multiple_host_threads">Multiple Host Threads</a></li>
+</ul>
+</li>
+<li><a href="#_portability">Appendix B: Portability</a></li>
+<li><a href="#data-types">Appendix C: Application Data Types</a>
+<ul class="sectlevel2">
+<li><a href="#scalar-data-types">Shared Application Scalar Data Types</a></li>
+<li><a href="#vector-data-types">Supported Application Vector Data Types</a></li>
+<li><a href="#alignment-app-data-types">Alignment of Application Data Types</a></li>
+<li><a href="#_vector_literals">Vector Literals</a></li>
+<li><a href="#vector-components">Vector Components</a></li>
+<li><a href="#_implicit_conversions">Implicit Conversions</a></li>
+<li><a href="#_explicit_casts">Explicit Casts</a></li>
+<li><a href="#_other_operators_and_functions">Other operators and functions</a></li>
+<li><a href="#_application_constant_definitions">Application constant definitions</a></li>
+</ul>
+</li>
+<li><a href="#check-copy-overlap">Appendix D: CL_MEM_COPY_OVERLAP</a></li>
+<li><a href="#_changes">Appendix E: Changes</a>
+<ul class="sectlevel2">
+<li><a href="#_summary_of_changes_from_opencl_1_0">Summary of changes from OpenCL 1.0</a></li>
+<li><a href="#_summary_of_changes_from_opencl_1_1">Summary of changes from OpenCL 1.1</a></li>
+<li><a href="#_summary_of_changes_from_opencl_1_2">Summary of changes from OpenCL 1.2</a></li>
+<li><a href="#_summary_of_changes_from_opencl_2_0">Summary of changes from OpenCL 2.0</a></li>
+<li><a href="#_summary_of_changes_from_opencl_2_1">Summary of changes from OpenCL 2.1</a></li>
+</ul>
+</li>
+</ul>
+</div>
+</div>
+<div id="content">
+<div id="preamble">
+<div class="sectionbody">
+<div style="page-break-after: always;"></div>
+<div class="paragraph">
+<p>Copyright 2008-2018 The Khronos Group.</p>
+</div>
+<div class="paragraph">
+<p>This specification is protected by copyright laws and contains material proprietary
+to the Khronos Group, Inc. Except as described by these terms, it or any components
+may not be reproduced, republished, distributed, transmitted, displayed, broadcast
+or otherwise exploited in any manner without the express prior written permission
+of Khronos Group.</p>
+</div>
+<div class="paragraph">
+<p>Khronos Group grants a conditional copyright license to use and reproduce the
+unmodified specification for any purpose, without fee or royalty, EXCEPT no licenses
+to any patent, trademark or other intellectual property rights are granted under
+these terms. Parties desiring to implement the specification and make use of
+Khronos trademarks in relation to that implementation, and receive reciprocal patent
+license protection under the Khronos IP Policy must become Adopters and confirm the
+implementation as conformant under the process defined by Khronos for this
+specification; see <a href="https://www.khronos.org/adopters" class="bare">https://www.khronos.org/adopters</a>.</p>
+</div>
+<div class="paragraph">
+<p>Khronos Group makes no, and expressly disclaims any, representations or warranties,
+express or implied, regarding this specification, including, without limitation:
+merchantability, fitness for a particular purpose, non-infringement of any
+intellectual property, correctness, accuracy, completeness, timeliness, and
+reliability. Under no circumstances will the Khronos Group, or any of its Promoters,
+Contributors or Members, or their respective partners, officers, directors,
+employees, agents or representatives be liable for any damages, whether direct,
+indirect, special or consequential damages for lost revenues, lost profits, or
+otherwise, arising from or in connection with these materials.</p>
+</div>
+<div class="paragraph">
+<p>Vulkan is a registered trademark and Khronos, OpenXR, SPIR, SPIR-V, SYCL, WebGL,
+WebCL, OpenVX, OpenVG, EGL, COLLADA, glTF, NNEF, OpenKODE, OpenKCAM, StreamInput,
+OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL, OpenMAX DL, OpenML and DevU are
+trademarks of the Khronos Group Inc. ASTC is a trademark of ARM Holdings PLC,
+OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks
+and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics
+International used under license by Khronos. All other product names, trademarks,
+and/or company names are used solely for identification and belong to their
+respective owners.</p>
+</div>
+<div style="page-break-after: always;"></div>
+<div class="paragraph">
+<p><strong>Acknowledgements</strong></p>
+</div>
+<div class="paragraph">
+<p>The OpenCL specification is the result of the contributions of many people,
+representing a cross section of the desktop, hand-held, and embedded
+computer industry.
+Following is a partial list of the contributors, including the company that
+they represented at the time of their contribution:</p>
+</div>
+<div class="paragraph">
+<p>Chuck Rose, Adobe<br>
+Eric Berdahl, Adobe<br>
+Shivani Gupta, Adobe<br>
+Bill Licea Kane, AMD<br>
+Ed Buckingham, AMD<br>
+Jan Civlin, AMD<br>
+Laurent Morichetti, AMD<br>
+Mark Fowler, AMD<br>
+Marty Johnson, AMD<br>
+Michael Mantor, AMD<br>
+Norm Rubin, AMD<br>
+Ofer Rosenberg, AMD<br>
+Brian Sumner, AMD<br>
+Victor Odintsov, AMD<br>
+Aaftab Munshi, Apple<br>
+Abe Stephens, Apple<br>
+Alexandre Namaan, Apple<br>
+Anna Tikhonova, Apple<br>
+Chendi Zhang, Apple<br>
+Eric Bainville, Apple<br>
+David Hayward, Apple<br>
+Giridhar Murthy, Apple<br>
+Ian Ollmann, Apple<br>
+Inam Rahman, Apple<br>
+James Shearer, Apple<br>
+MonPing Wang, Apple<br>
+Tanya Lattner, Apple<br>
+Mikael Bourges-Sevenier, Aptina<br>
+Anton Lokhmotov, ARM<br>
+Dave Shreiner, ARM<br>
+Hedley Francis, ARM<br>
+Robert Elliott, ARM<br>
+Scott Moyers, ARM<br>
+Tom Olson, ARM<br>
+Anastasia Stulova, ARM<br>
+Christopher Thompson-Walsh, Broadcom<br>
+Holger Waechtler, Broadcom<br>
+Norman Rink, Broadcom<br>
+Andrew Richards, Codeplay<br>
+Maria Rovatsou, Codeplay<br>
+Alistair Donaldson, Codeplay<br>
+Alastair Murray, Codeplay<br>
+Stephen Frye, Electronic Arts<br>
+Eric Schenk, Electronic Arts<br>
+Daniel Laroche, Freescale<br>
+David Neto, Google<br>
+Robin Grosman, Huawei<br>
+Craig Davies, Huawei<br>
+Brian Horton, IBM<br>
+Brian Watt, IBM<br>
+Gordon Fossum, IBM<br>
+Greg Bellows, IBM<br>
+Joaquin Madruga, IBM<br>
+Mark Nutter, IBM<br>
+Mike Perks, IBM<br>
+Sean Wagner, IBM<br>
+Jon Parr, Imagination Technologies<br>
+Robert Quill, Imagination Technologies<br>
+James McCarthy, Imagination Technologie<br>
+Jon Leech, Independent<br>
+Aaron Kunze, Intel<br>
+Aaron Lefohn, Intel<br>
+Adam Lake, Intel<br>
+Alexey Bader, Intel<br>
+Allen Hux, Intel<br>
+Andrew Brownsword, Intel<br>
+Andrew Lauritzen, Intel<br>
+Bartosz Sochacki, Intel<br>
+Ben Ashbaugh, Intel<br>
+Brian Lewis, Intel<br>
+Geoff Berry, Intel<br>
+Hong Jiang, Intel<br>
+Jayanth Rao, Intel<br>
+Josh Fryman, Intel<br>
+Larry Seiler, Intel<br>
+Mike MacPherson, Intel<br>
+Murali Sundaresan, Intel<br>
+Paul Lalonde, Intel<br>
+Raun Krisch, Intel<br>
+Stephen Junkins, Intel<br>
+Tim Foley, Intel<br>
+Timothy Mattson, Intel<br>
+Yariv Aridor, Intel<br>
+Michael Kinsner, Intel<br>
+Kevin Stevens, Intel<br>
+Jon Leech, Khronos<br>
+Benjamin Bergen, Los Alamos National Laboratory<br>
+Roy Ju, Mediatek<br>
+Bor-Sung Liang, Mediatek<br>
+Rahul Agarwal, Mediatek<br>
+Michal Witaszek, Mobica<br>
+JenqKuen Lee, NTHU<br>
+Amit Rao, NVIDIA<br>
+Ashish Srivastava, NVIDIA<br>
+Bastiaan Aarts, NVIDIA<br>
+Chris Cameron, NVIDIA<br>
+Christopher Lamb, NVIDIA<br>
+Dibyapran Sanyal, NVIDIA<br>
+Guatam Chakrabarti, NVIDIA<br>
+Ian Buck, NVIDIA<br>
+Jaydeep Marathe, NVIDIA<br>
+Jian-Zhong Wang, NVIDIA<br>
+Karthik Raghavan Ravi, NVIDIA<br>
+Kedar Patil, NVIDIA<br>
+Manjunath Kudlur, NVIDIA<br>
+Mark Harris, NVIDIA<br>
+Michael Gold, NVIDIA<br>
+Neil Trevett, NVIDIA<br>
+Richard Johnson, NVIDIA<br>
+Sean Lee, NVIDIA<br>
+Tushar Kashalikar, NVIDIA<br>
+Vinod Grover, NVIDIA<br>
+Xiangyun Kong, NVIDIA<br>
+Yogesh Kini, NVIDIA<br>
+Yuan Lin, NVIDIA<br>
+Mayuresh Pise, NVIDIA<br>
+Allan Tzeng, QUALCOMM<br>
+Alex Bourd, QUALCOMM<br>
+Anirudh Acharya, QUALCOMM<br>
+Andrew Gruber, QUALCOMM<br>
+Andrzej Mamona, QUALCOMM<br>
+Benedict Gaster, QUALCOMM<br>
+Bill Torzewski, QUALCOMM<br>
+Bob Rychlik, QUALCOMM<br>
+Chihong Zhang, QUALCOMM<br>
+Chris Mei, QUALCOMM<br>
+Colin Sharp, QUALCOMM<br>
+David Garcia, QUALCOMM<br>
+David Ligon, QUALCOMM<br>
+Jay Yun, QUALCOMM<br>
+Lee Howes, QUALCOMM<br>
+Richard Ruigrok, QUALCOMM<br>
+Robert J. Simpson, QUALCOMM<br>
+Sumesh Udayakumaran, QUALCOMM<br>
+Vineet Goel, QUALCOMM<br>
+Lihan Bin, QUALCOMM<br>
+Vlad Shimanskiy, QUALCOMM<br>
+Jian Liu, QUALCOMM<br>
+Tasneem Brutch, Samsung<br>
+Yoonseo Choi, Samsung<br>
+Dennis Adams, Sony<br>
+Pr-Anders Aronsson, Sony<br>
+Jim Rasmusson, Sony<br>
+Thierry Lepley, STMicroelectronics<br>
+Anton Gorenko, StreamHPC<br>
+Jakub Szuppe, StreamHPC<br>
+Vincent Hindriksen, StreamHPC<br>
+Alan Ward, Texas Instruments<br>
+Yuan Zhao, Texas Instruments<br>
+Pete Curry, Texas Instruments<br>
+Simon McIntosh-Smith, University of Bristol<br>
+James Price, University of Bristol<br>
+Paul Preney, University of Windsor<br>
+Shane Peelar, University of Windsor<br>
+Brian Hutsell, Vivante<br>
+Mike Cai, Vivante<br>
+Sumeet Kumar, Vivante<br>
+Wei-Lun Kao, Vivante<br>
+Xing Wang, Vivante<br>
+Jeff Fifield, Xilinx<br>
+Hem C. Neema, Xilinx<br>
+Henry Styles, Xilinx<br>
+Ralph Wittig, Xilinx<br>
+Ronan Keryell, Xilinx<br>
+AJ Guillon, YetiWare Inc<br></p>
+</div>
+<div style="page-break-after: always;"></div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_introduction">1. Introduction</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Modern processor architectures have embraced parallelism as an important
+pathway to increased performance.
+Facing technical challenges with higher clock speeds in a fixed power
+envelope, Central Processing Units (CPUs) now improve performance by adding
+multiple cores.
+Graphics Processing Units (GPUs) have also evolved from fixed function
+rendering devices into programmable parallel processors.
+As todays computer systems often include highly parallel CPUs, GPUs and
+other types of processors, it is important to enable software developers to
+take full advantage of these heterogeneous processing platforms.</p>
+</div>
+<div class="paragraph">
+<p>Creating applications for heterogeneous parallel processing platforms is
+challenging as traditional programming approaches for multi-core CPUs and
+GPUs are very different.
+CPU-based parallel programming models are typically based on standards but
+usually assume a shared address space and do not encompass vector
+operations.
+General purpose GPU programming models address complex memory hierarchies
+and vector operations but are traditionally platform-, vendor- or
+hardware-specific.
+These limitations make it difficult for a developer to access the compute
+power of heterogeneous CPUs, GPUs and other types of processors from a
+single, multi-platform source code base.
+More than ever, there is a need to enable software developers to effectively
+take full advantage of heterogeneous processing platforms from high
+performance compute servers, through desktop computer systems to handheld
+devices - that include a diverse mix of parallel CPUs, GPUs and other
+processors such as DSPs and the Cell/B.E.
+processor.</p>
+</div>
+<div class="paragraph">
+<p><strong>OpenCL</strong> (Open Computing Language) is an open royalty-free standard for
+general purpose parallel programming across CPUs, GPUs and other processors,
+giving software developers portable and efficient access to the power of
+these heterogeneous processing platforms.</p>
+</div>
+<div class="paragraph">
+<p>OpenCL supports a wide range of applications, ranging from embedded and
+consumer software to HPC solutions, through a low-level, high-performance,
+portable abstraction.
+By creating an efficient, close-to-the-metal programming interface, OpenCL
+will form the foundation layer of a parallel computing ecosystem of
+platform-independent tools, middleware and applications.
+OpenCL is particularly suited to play an increasingly significant role in
+emerging interactive graphics applications that combine general parallel
+compute algorithms with graphics rendering pipelines.</p>
+</div>
+<div class="paragraph">
+<p>OpenCL consists of an API for coordinating parallel computation across
+heterogeneous processors; and a cross-platform intermediate language with a
+well-specified computation environment.
+The OpenCL standard:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Supports both data- and task-based parallel programming models</p>
+</li>
+<li>
+<p>Utilizes a portable and self-contained intermediate representation with
+support for parallel execution</p>
+</li>
+<li>
+<p>Defines consistent numerical requirements based on IEEE 754</p>
+</li>
+<li>
+<p>Defines a configuration profile for handheld and embedded devices</p>
+</li>
+<li>
+<p>Efficiently interoperates with OpenGL, OpenGL ES and other graphics APIs</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>This document begins with an overview of basic concepts and the architecture
+of OpenCL, followed by a detailed description of its execution model, memory
+model and synchronization support.
+It then discusses the OpenCL platform and runtime API.
+Some examples are given that describe sample compute use-cases and how they
+would be written in OpenCL.
+The specification is divided into a core specification that any OpenCL
+compliant implementation must support; a handheld/embedded profile which
+relaxes the OpenCL compliance requirements for handheld and embedded
+devices; and a set of optional extensions that are likely to move into the
+core specification in later revisions of the OpenCL specification.</p>
+</div>
+<div class="sect2">
+<h3 id="_normative_references">1.1. Normative References</h3>
+<div class="paragraph">
+<p>Normative references are references to external documents or resources to
+which implementers of OpenCL must comply with all, or specified portions of,
+as described in this specification.</p>
+</div>
+<div id="iso-c11" class="paragraph">
+<p><em>ISO/IEC 9899:2011 - Information technology - Programming languages - C</em>,
+<a href="https://www.iso.org/standard/57853.html" class="bare">https://www.iso.org/standard/57853.html</a> (final specification),
+<a href="http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf" class="bare">http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf</a> (last public
+draft).</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_version_numbers">1.2. Version Numbers</h3>
+<div class="paragraph">
+<p>The OpenCL version number follows a <em>major.minor-revision</em> scheme.  When this
+version number is used within the API it generally only includes the
+<em>major.minor</em> components of the version number.</p>
+</div>
+<div class="paragraph">
+<p>A difference in the <em>major</em> or <em>minor</em> version number indicates that some
+amount of new functionality has been added to the specification, and may also
+include behavior changes and bug fixes.
+Functionality may also be deprecated or removed when the <em>major</em> or <em>minor</em>
+version changes.</p>
+</div>
+<div class="paragraph">
+<p>A difference in the <em>revision</em> number indicates small changes to the
+specification, typically to fix a bug or to clarify language.
+When the <em>revision</em> number changes there may be an impact on the behavior of
+existing functionality, but this should not affect backwards compatibility.
+Functionality should not be added or removed when the <em>revision</em> number
+changes.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_glossary">2. Glossary</h2>
+<div class="sectionbody">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">Application </dt>
+<dd>
+<p>The combination of the program running on the host and OpenCL devices.</p>
+</dd>
+<dt class="hdlist1">Acquire semantics </dt>
+<dd>
+<p>One of the memory order semantics defined for synchronization
+operations.
+Acquire semantics apply to atomic operations that load from memory.
+Given two units of execution, <strong>A</strong> and <strong>B</strong>, acting on a shared atomic
+object <strong>M</strong>, if <strong>A</strong> uses an atomic load of <strong>M</strong> with acquire semantics to
+synchronize-with an atomic store to <strong>M</strong> by <strong>B</strong> that used release
+semantics, then <strong>A</strong>'s atomic load will occur before any subsequent
+operations by <strong>A</strong>.
+Note that the memory orders <em>release</em>, <em>sequentially consistent</em>, and
+<em>acquire_release</em> all include <em>release semantics</em> and effectively pair
+with a load using acquire semantics.</p>
+</dd>
+<dt class="hdlist1">Acquire release semantics </dt>
+<dd>
+<p>A memory order semantics for synchronization operations (such as atomic
+operations) that has the properties of both acquire and release memory
+orders.
+It is used with read-modify-write operations.</p>
+</dd>
+<dt class="hdlist1">Atomic operations </dt>
+<dd>
+<p>Operations that at any point, and from any perspective, have either
+occurred completely, or not at all.
+Memory orders associated with atomic operations may constrain the
+visibility of loads and stores with respect to the atomic operations
+(see <em>relaxed semantics</em>, <em>acquire semantics</em>, <em>release semantics</em> or
+<em>acquire release semantics</em>).</p>
+</dd>
+<dt class="hdlist1">Blocking and Non-Blocking Enqueue API calls </dt>
+<dd>
+<p>A <em>non-blocking enqueue API call</em> places a <em>command</em> on a
+<em>command-queue</em> and returns immediately to the host.
+The <em>blocking-mode enqueue API calls</em> do not return to the host until
+the command has completed.</p>
+</dd>
+<dt class="hdlist1">Barrier </dt>
+<dd>
+<p>There are three types of <em>barriers</em> a command-queue barrier, a
+work-group barrier and a sub-group barrier.</p>
+<div class="openblock">
+<div class="content">
+<div class="ulist">
+<ul>
+<li>
+<p>The OpenCL API provides a function to enqueue a <em>command-queue</em>
+<em>barrier</em> command.
+This <em>barrier</em> command ensures that all previously enqueued commands to
+a command-queue have finished execution before any following <em>commands</em>
+enqueued in the <em>command-queue</em> can begin execution.</p>
+</li>
+<li>
+<p>The OpenCL kernel execution model provides built-in <em>work-group barrier</em>
+functionality.
+This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on
+a <em>device</em> to perform synchronization between <em>work-items</em> in a
+<em>work-group</em> executing the <em>kernel</em>.
+All the <em>work-items</em> of a <em>work-group</em> must execute the <em>barrier</em>
+construct before any are allowed to continue execution beyond the
+<em>barrier</em>.</p>
+</li>
+<li>
+<p>The OpenCL kernel execution model provides built-in <em>sub-group barrier</em>
+functionality.
+This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on
+a <em>device</em> to perform synchronization between <em>work-items</em> in a
+<em>sub-group</em> executing the <em>kernel</em>.
+All the <em>work-items</em> of a <em>sub-group</em> must execute the <em>barrier</em>
+construct before any are allowed to continue execution beyond the
+<em>barrier</em>.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1">Buffer Object </dt>
+<dd>
+<p>A memory object that stores a linear collection of bytes.
+Buffer objects are accessible using a pointer in a <em>kernel</em> executing on
+a <em>device</em>.
+Buffer objects can be manipulated by the host using OpenCL API calls.
+A <em>buffer object</em> encapsulates the following information:</p>
+<div class="openblock">
+<div class="content">
+<div class="ulist">
+<ul>
+<li>
+<p>Size in bytes.</p>
+</li>
+<li>
+<p>Properties that describe usage information and which region to allocate
+from.</p>
+</li>
+<li>
+<p>Buffer data.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1">Built-in Kernel </dt>
+<dd>
+<p>A <em>built-in kernel</em> is a <em>kernel</em> that is executed on an OpenCL <em>device</em>
+or <em>custom device</em> by fixed-function hardware or in firmware.
+<em>Applications</em> can query the <em>built-in kernels</em> supported by a <em>device</em>
+or <em>custom device</em>.
+A <em>program object</em> can only contain <em>kernels</em> written in OpenCL C or
+<em>built-in kernels</em> but not both.
+See also <em>Kernel</em> and <em>Program</em>.</p>
+</dd>
+<dt class="hdlist1">Child kernel </dt>
+<dd>
+<p>See <em>Device-side enqueue</em>.</p>
+</dd>
+<dt class="hdlist1">Command </dt>
+<dd>
+<p>The OpenCL operations that are submitted to a <em>command-queue</em> for
+execution.
+For example, OpenCL commands issue kernels for execution on a compute
+device, manipulate memory objects, etc.</p>
+</dd>
+<dt class="hdlist1">Command-queue </dt>
+<dd>
+<p>An object that holds <em>commands</em> that will be executed on a specific
+<em>device</em>.
+The <em>command-queue</em> is created on a specific <em>device</em> in a <em>context</em>.
+<em>Commands</em> to a <em>command-queue</em> are queued in-order but may be executed
+in-order or out-of-order.
+<em>Refer to In-order Execution_and_Out-of-order Execution</em>.</p>
+</dd>
+<dt class="hdlist1">Command-queue Barrier </dt>
+<dd>
+<p>See <em>Barrier</em>.</p>
+</dd>
+<dt class="hdlist1">Command synchronization </dt>
+<dd>
+<p>Constraints on the order that commands are launched for execution on a
+device defined in terms of the synchronization points that occur between
+commands in host command-queues and between commands in device-side
+command-queues.
+See <em>synchronization points</em>.</p>
+</dd>
+<dt class="hdlist1">Complete </dt>
+<dd>
+<p>The final state in the six state model for the execution of a command.
+The transition into this state occurs is signaled through event objects
+or callback functions associated with a command.</p>
+</dd>
+<dt class="hdlist1">Compute Device Memory </dt>
+<dd>
+<p>This refers to one or more memories attached to the compute device.</p>
+</dd>
+<dt class="hdlist1">Compute Unit </dt>
+<dd>
+<p>An OpenCL <em>device</em> has one or more <em>compute units</em>.
+A <em>work-group</em> executes on a single <em>compute unit</em>.
+A <em>compute unit</em> is composed of one or more <em>processing elements</em> and
+<em>local memory</em>.
+A <em>compute unit</em> may also include dedicated texture filter units that
+can be accessed by its processing elements.</p>
+</dd>
+<dt class="hdlist1">Concurrency </dt>
+<dd>
+<p>A property of a system in which a set of tasks in a system can remain
+active and make progress at the same time.
+To utilize concurrent execution when running a program, a programmer
+must identify the concurrency in their problem, expose it within the
+source code, and then exploit it using a notation that supports
+concurrency.</p>
+</dd>
+<dt class="hdlist1">Constant Memory </dt>
+<dd>
+<p>A region of <em>global memory</em> that remains constant during the execution
+of a <em>kernel</em>.
+The <em>host</em> allocates and initializes memory objects placed into
+<em>constant memory</em>.</p>
+</dd>
+<dt class="hdlist1">Context </dt>
+<dd>
+<p>The environment within which the kernels execute and the domain in which
+synchronization and memory management is defined.
+The <em>context</em> includes a set of <em>devices</em>, the memory accessible to
+those <em>devices</em>, the corresponding memory properties and one or more
+<em>command-queues</em> used to schedule execution of a <em>kernel(s)</em> or
+operations on <em>memory objects</em>.</p>
+</dd>
+<dt class="hdlist1">Control flow </dt>
+<dd>
+<p>The flow of instructions executed by a work-item.
+Multiple logically related work items may or may not execute the same
+control flow.
+The control flow is said to be <em>converged</em> if all the work-items in the
+set execution the same stream of instructions.
+In a <em>diverged</em> control flow, the work-items in the set execute
+different instructions.
+At a later point, if a diverged control flow becomes converged, it is
+said to be a re-converged control flow.</p>
+</dd>
+<dt class="hdlist1">Converged control flow </dt>
+<dd>
+<p>See <em>Control flow</em>.</p>
+</dd>
+<dt class="hdlist1">Custom Device </dt>
+<dd>
+<p>An OpenCL <em>device</em> that fully implements the OpenCL Runtime but does not
+support <em>programs</em> written in OpenCL C.
+A custom device may be specialized non-programmable hardware that is
+very power efficient and performant for directed tasks or hardware with
+limited programmable capabilities such as specialized DSPs.
+Custom devices are not OpenCL conformant.
+Custom devices may support an online compiler.
+Programs for custom devices can be created using the OpenCL runtime APIs
+that allow OpenCL programs to be created from source (if an online
+compiler is supported) and/or binary, or from <em>built-in kernels</em>
+supported by the <em>device</em>.
+See also <em>Device</em>.</p>
+</dd>
+<dt class="hdlist1">Data Parallel Programming Model </dt>
+<dd>
+<p>Traditionally, this term refers to a programming model where concurrency
+is expressed as instructions from a single program applied to multiple
+elements within a set of data structures.
+The term has been generalized in OpenCL to refer to a model wherein a
+set of instructions from a single program are applied concurrently to
+each point within an abstract domain of indices.</p>
+</dd>
+<dt class="hdlist1">Data race </dt>
+<dd>
+<p>The execution of a program contains a data race if it contains two
+actions in different work items or host threads where (1) one action
+modifies a memory location and the other action reads or modifies the
+same memory location, and (2) at least one of these actions is not
+atomic, or the corresponding memory scopes are not inclusive, and (3)
+the actions are global actions unordered by the global-happens-before
+relation or are local actions unordered by the local-happens before
+relation.</p>
+</dd>
+<dt class="hdlist1">Deprecation </dt>
+<dd>
+<p>Existing features are marked as deprecated if their usage is not
+recommended as that feature is being de-emphasized, superseded and may
+be removed from a future version of the specification.</p>
+</dd>
+<dt class="hdlist1">Device </dt>
+<dd>
+<p>A <em>device</em> is a collection of <em>compute units</em>.
+A <em>command-queue</em> is used to queue <em>commands</em> to a <em>device</em>.
+Examples of <em>commands</em> include executing <em>kernels</em>, or reading and
+writing <em>memory objects</em>.
+OpenCL devices typically correspond to a GPU, a multi-core CPU, and
+other processors such as DSPs and the Cell/B.E.
+processor.</p>
+</dd>
+<dt class="hdlist1">Device-side enqueue </dt>
+<dd>
+<p>A mechanism whereby a kernel-instance is enqueued by a kernel-instance
+running on a device without direct involvement by the host program.
+This produces <em>nested parallelism</em>; i.e. additional levels of
+concurrency are nested inside a running kernel-instance.
+The kernel-instance executing on a device (the <em>parent kernel</em>) enqueues
+a kernel-instance (the <em>child kernel</em>) to a device-side command queue.
+Child and parent kernels execute asynchronously though a parent kernel
+does not complete until all of its child-kernels have completed.</p>
+</dd>
+<dt class="hdlist1">Diverged control flow </dt>
+<dd>
+<p>See <em>Control flow</em>.</p>
+</dd>
+<dt class="hdlist1">Ended </dt>
+<dd>
+<p>The fifth state in the six state model for the execution of a command.
+The transition into this state occurs when execution of a command has
+ended.
+When a Kernel-enqueue command ends, all of the work-groups associated
+with that command have finished their execution.</p>
+</dd>
+<dt class="hdlist1">Event Object </dt>
+<dd>
+<p>An <em>event object</em> encapsulates the status of an operation such as a
+<em>command</em>.
+It can be used to synchronize operations in a context.</p>
+</dd>
+<dt class="hdlist1">Event Wait List </dt>
+<dd>
+<p>An <em>event wait list</em> is a list of <em>event objects</em> that can be used to
+control when a particular <em>command</em> begins execution.</p>
+</dd>
+<dt class="hdlist1">Fence </dt>
+<dd>
+<p>A memory ordering operation without an associated atomic object.
+A fence can use the <em>acquire semantics, release semantics</em>, or <em>acquire
+release semantics</em>.</p>
+</dd>
+<dt class="hdlist1">Framework </dt>
+<dd>
+<p>A software system that contains the set of components to support
+software development and execution.
+A <em>framework</em> typically includes libraries, APIs, runtime systems,
+compilers, etc.</p>
+</dd>
+<dt class="hdlist1">Generic address space </dt>
+<dd>
+<p>An address space that include the <em>private</em>, <em>local</em>, and <em>global</em>
+address spaces available to a device.
+The generic address space supports conversion of pointers to and from
+private, local and global address spaces, and hence lets a programmer
+write a single function that at compile time can take arguments from any
+of the three named address spaces.</p>
+</dd>
+<dt class="hdlist1">Global Happens before </dt>
+<dd>
+<p>See <em>Happens before</em>.</p>
+</dd>
+<dt class="hdlist1">Global ID </dt>
+<dd>
+<p>A <em>global ID</em> is used to uniquely identify a <em>work-item</em> and is derived
+from the number of <em>global work-items</em> specified when executing a
+<em>kernel</em>.
+The <em>global ID</em> is a N-dimensional value that starts at (0, 0, &#8230;&#8203; 0).
+See also <em>Local ID</em>.</p>
+</dd>
+<dt class="hdlist1">Global Memory </dt>
+<dd>
+<p>A memory region accessible to all <em>work-items</em> executing in a <em>context</em>.
+It is accessible to the <em>host</em> using <em>commands</em> such as read, write and
+map.
+<em>Global memory</em> is included within the <em>generic address space</em> that
+includes the private and local address spaces.</p>
+</dd>
+<dt class="hdlist1">GL share group </dt>
+<dd>
+<p>A <em>GL share group</em> object manages shared OpenGL or OpenGL ES resources
+such as textures, buffers, framebuffers, and renderbuffers and is
+associated with one or more GL context objects.
+The <em>GL share group</em> is typically an opaque object and not directly
+accessible.</p>
+</dd>
+<dt class="hdlist1">Handle </dt>
+<dd>
+<p>An opaque type that references an <em>object</em> allocated by OpenCL.
+Any operation on an <em>object</em> occurs by reference to that objects handle.</p>
+</dd>
+<dt class="hdlist1">Happens before </dt>
+<dd>
+<p>An ordering relationship between operations that execute on multiple
+units of execution.
+If an operation A happens-before operation B then A must occur before B;
+in particular, any value written by A will be visible to B.
+We define two separate happens before relations: <em>global-happens-before</em>
+and <em>local-happens-before</em>.
+These are defined in <a href="#memory-ordering-rules">Memory Model: Memory
+Ordering Rules</a>.</p>
+</dd>
+<dt class="hdlist1">Host </dt>
+<dd>
+<p>The <em>host</em> interacts with the <em>context</em> using the OpenCL API.</p>
+</dd>
+<dt class="hdlist1">Host-thread </dt>
+<dd>
+<p>The unit of execution that executes the statements in the host program.</p>
+</dd>
+<dt class="hdlist1">Host pointer </dt>
+<dd>
+<p>A pointer to memory that is in the virtual address space on the <em>host</em>.</p>
+</dd>
+<dt class="hdlist1">Illegal </dt>
+<dd>
+<p>Behavior of a system that is explicitly not allowed and will be reported
+as an error when encountered by OpenCL.</p>
+</dd>
+<dt class="hdlist1">Image Object </dt>
+<dd>
+<p>A <em>memory object</em> that stores a two- or three-dimensional structured
+array.
+Image data can only be accessed with read and write functions.
+The read functions use a <em>sampler</em>.</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>The <em>image object</em> encapsulates the following information:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Dimensions of the image.</p>
+</li>
+<li>
+<p>Description of each element in the image.</p>
+</li>
+<li>
+<p>Properties that describe usage information and which region to allocate
+from.</p>
+</li>
+<li>
+<p>Image data.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The elements of an image are selected from a list of predefined image
+formats.</p>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1">Implementation Defined </dt>
+<dd>
+<p>Behavior that is explicitly allowed to vary between conforming
+implementations of OpenCL.
+An OpenCL implementor is required to document the implementation-defined
+behavior.</p>
+</dd>
+<dt class="hdlist1">Independent Forward Progress </dt>
+<dd>
+<p>If an entity supports independent forward progress, then if it is
+otherwise not dependent on any actions due to be performed by any other
+entity (for example it does not wait on a lock held by, and thus that
+must be released by, any other entity), then its execution cannot be
+blocked by the execution of any other entity in the system (it will not
+be starved).
+Work items in a subgroup, for example, typically do not support
+independent forward progress, so one work item in a subgroup may be
+completely blocked (starved) if a different work item in the same
+subgroup enters a spin loop.</p>
+</dd>
+<dt class="hdlist1">In-order Execution </dt>
+<dd>
+<p>A model of execution in OpenCL where the <em>commands</em> in a <em>command-queue</em>
+are executed in order of submission with each <em>command</em> running to
+completion before the next one begins.
+See Out-of-order Execution.</p>
+</dd>
+<dt class="hdlist1">Intermediate Language </dt>
+<dd>
+<p>A lower-level language that may be used to create programs.
+SPIR-V is a required IL for OpenCL 2.2 runtimes.
+Additional ILs may be accepted on an implementation-defined basis.</p>
+</dd>
+<dt class="hdlist1">Kernel </dt>
+<dd>
+<p>A <em>kernel</em> is a function declared in a <em>program</em> and executed on an
+OpenCL <em>device</em>.
+A <em>kernel</em> is identified by the kernel or kernel qualifier applied to
+any function defined in a <em>program</em>.</p>
+</dd>
+<dt class="hdlist1">Kernel-instance </dt>
+<dd>
+<p>The work carried out by an OpenCL program occurs through the execution
+of kernel-instances on devices.
+The kernel instance is the <em>kernel object</em>, the values associated with
+the arguments to the kernel, and the parameters that define the
+<em>NDRange</em> index space.</p>
+</dd>
+<dt class="hdlist1">Kernel Object </dt>
+<dd>
+<p>A <em>kernel object</em> encapsulates a specific <code>__kernel</code> function declared
+in a <em>program</em> and the argument values to be used when executing this
+<code>__kernel</code> function.</p>
+</dd>
+<dt class="hdlist1">Kernel Language </dt>
+<dd>
+<p>A language that is used to create source code for kernel.
+Supported kernel languages include OpenCL C, OpenCL C++, and OpenCL
+dialect of SPIR-V.</p>
+</dd>
+<dt class="hdlist1">Launch </dt>
+<dd>
+<p>The transition of a command from the <em>submitted</em> state to the <em>ready</em>
+state.
+See <em>Ready</em>.</p>
+</dd>
+<dt class="hdlist1">Local ID </dt>
+<dd>
+<p>A <em>local ID</em> specifies a unique <em>work-item ID</em> within a given
+<em>work-group</em> that is executing a <em>kernel</em>.
+The <em>local ID</em> is a N-dimensional value that starts at (0, 0, &#8230;&#8203; 0).
+See also <em>Global ID</em>.</p>
+</dd>
+<dt class="hdlist1">Local Memory </dt>
+<dd>
+<p>A memory region associated with a <em>work-group</em> and accessible only by
+<em>work-items</em> in that <em>work-group</em>.
+<em>Local memory</em> is included within the <em>generic address space</em> that
+includes the private and global address spaces.</p>
+</dd>
+<dt class="hdlist1">Marker </dt>
+<dd>
+<p>A <em>command</em> queued in a <em>command-queue</em> that can be used to tag all
+<em>commands</em> queued before the <em>marker</em> in the <em>command-queue</em>.
+The <em>marker</em> command returns an <em>event</em> which can be used by the
+<em>application</em> to queue a wait on the marker event i.e. wait for all
+commands queued before the <em>marker</em> command to complete.</p>
+</dd>
+<dt class="hdlist1">Memory Consistency Model </dt>
+<dd>
+<p>Rules that define which values are observed when multiple units of
+execution load data from any shared memory plus the synchronization
+operations that constrain the order of memory operations and define
+synchronization relationships.
+The memory consistency model in OpenCL is based on the memory model from
+the ISO C11 programming language.</p>
+</dd>
+<dt class="hdlist1">Memory Objects </dt>
+<dd>
+<p>A <em>memory object</em> is a handle to a reference counted region of <em>Global
+Memory</em>.
+Also see <em>Buffer Object</em> and <em>Image Object</em>.</p>
+</dd>
+<dt class="hdlist1">Memory Regions (or Pools) </dt>
+<dd>
+<p>A distinct address space in OpenCL.
+<em>Memory regions</em> may overlap in physical memory though OpenCL will treat
+them as logically distinct.
+The <em>memory regions</em> are denoted as <em>private</em>, <em>local</em>, <em>constant,</em> and
+<em>global</em>.</p>
+</dd>
+<dt class="hdlist1">Memory Scopes </dt>
+<dd>
+<p>These memory scopes define a hierarchy of visibilities when analyzing
+the ordering constraints of memory operations.
+They are defined by the values of the <strong>memory_scope</strong> enumeration
+constant.
+Current values are <strong>memory_scope_work_item</strong> (memory constraints only
+apply to a single work-item and in practice apply only to image
+operations), <strong>memory_scope_sub_group</strong> (memory-ordering constraints only
+apply to work-items executing in a sub-group), <strong>memory_scope_work_group</strong>
+(memory-ordering constraints only apply to work-items executing in a
+work-group), <strong>memory_scope_device</strong> (memory-ordering constraints only
+apply to work-items executing on a single device) and
+<strong>memory_scope_all_svm_devices</strong> (memory-ordering constraints only apply
+to work-items executing across multiple devices and when using shared
+virtual memory).</p>
+</dd>
+<dt class="hdlist1">Modification Order </dt>
+<dd>
+<p>All modifications to a particular atomic object M occur in some
+particular <em>total order</em>, called the <em>modification order</em> of M.
+If A and B are modifications of an atomic object M, and A happens-before
+B, then A shall precede B in the modification order of M.
+Note that the modification order of an atomic object M is independent of
+whether M is in local or global memory.</p>
+</dd>
+<dt class="hdlist1">Nested Parallelism </dt>
+<dd>
+<p>See <em>device-side enqueue</em>.</p>
+</dd>
+<dt class="hdlist1">Object </dt>
+<dd>
+<p>Objects are abstract representation of the resources that can be
+manipulated by the OpenCL API.
+Examples include <em>program objects</em>, <em>kernel objects</em>, and <em>memory
+objects</em>.</p>
+</dd>
+<dt class="hdlist1">Out-of-Order Execution </dt>
+<dd>
+<p>A model of execution in which <em>commands</em> placed in the <em>work queue</em> may
+begin and complete execution in any order consistent with constraints
+imposed by <em>event wait lists_and_command-queue barrier</em>.
+See <em>In-order Execution</em>.</p>
+</dd>
+<dt class="hdlist1">Parent device </dt>
+<dd>
+<p>The OpenCL <em>device</em> which is partitioned to create <em>sub-devices</em>.
+Not all <em>parent devices</em> are <em>root devices</em>.
+A <em>root device</em> might be partitioned and the <em>sub-devices</em> partitioned
+again.
+In this case, the first set of <em>sub-devices</em> would be <em>parent devices</em>
+of the second set, but not the <em>root devices</em>.
+Also see <em>Device</em>, <em>parent device</em> and <em>root device</em>.</p>
+</dd>
+<dt class="hdlist1">Parent kernel </dt>
+<dd>
+<p>see <em>Device-side enqueue</em>.</p>
+</dd>
+<dt class="hdlist1">Pipe </dt>
+<dd>
+<p>The <em>pipe</em> memory object conceptually is an ordered sequence of data
+items.
+A pipe has two endpoints: a write endpoint into which data items are
+inserted, and a read endpoint from which data items are removed.
+At any one time, only one kernel instance may write into a pipe, and
+only one kernel instance may read from a pipe.
+To support the producer consumer design pattern, one kernel instance
+connects to the write endpoint (the producer) while another kernel
+instance connects to the reading endpoint (the consumer).</p>
+</dd>
+<dt class="hdlist1">Platform </dt>
+<dd>
+<p>The <em>host</em> plus a collection of <em>devices</em> managed by the OpenCL
+<em>framework</em> that allow an application to share <em>resources</em> and execute
+<em>kernels</em> on <em>devices</em> in the <em>platform</em>.</p>
+</dd>
+<dt class="hdlist1">Private Memory </dt>
+<dd>
+<p>A region of memory private to a <em>work-item</em>.
+Variables defined in one <em>work-items</em> <em>private memory</em> are not visible
+to another <em>work-item</em>.</p>
+</dd>
+<dt class="hdlist1">Processing Element </dt>
+<dd>
+<p>A virtual scalar processor.
+A work-item may execute on one or more processing elements.</p>
+</dd>
+<dt class="hdlist1">Program </dt>
+<dd>
+<p>An OpenCL <em>program</em> consists of a set of <em>kernels</em>.
+<em>Programs</em> may also contain auxiliary functions called by the
+<code>__kernel</code> functions and constant data.</p>
+</dd>
+<dt class="hdlist1">Program Object </dt>
+<dd>
+<p>A <em>program object</em> encapsulates the following information:</p>
+<div class="openblock">
+<div class="content">
+<div class="ulist">
+<ul>
+<li>
+<p>A reference to an associated <em>context</em>.</p>
+</li>
+<li>
+<p>A <em>program</em> source or binary.</p>
+</li>
+<li>
+<p>The latest successfully built program executable, the list of <em>devices</em>
+for which the program executable is built, the build options used and a
+build log.</p>
+</li>
+<li>
+<p>The number of <em>kernel objects</em> currently attached.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1">Queued </dt>
+<dd>
+<p>The first state in the six state model for the execution of a command.
+The transition into this state occurs when the command is enqueued into
+a command-queue.</p>
+</dd>
+<dt class="hdlist1">Ready </dt>
+<dd>
+<p>The third state in the six state model for the execution of a command.
+The transition into this state occurs when pre-requisites constraining
+execution of a command have been met; i.e. the command has been
+launched.
+When a kernel-enqueue command is launched, work-groups associated with
+the command are placed in a devices work-pool from which they are
+scheduled for execution.</p>
+</dd>
+<dt class="hdlist1">Re-converged Control Flow </dt>
+<dd>
+<p>see <em>Control flow</em>.</p>
+</dd>
+<dt class="hdlist1">Reference Count </dt>
+<dd>
+<p>The life span of an OpenCL object is determined by its <em>reference
+count</em>, an internal count of the number of references to the object.
+When you create an object in OpenCL, its <em>reference count</em> is set to
+one.
+Subsequent calls to the appropriate <em>retain</em> API (such as
+<strong>clRetainContext</strong>, <strong>clRetainCommandQueue</strong>) increment the <em>reference
+count</em>.
+Calls to the appropriate <em>release</em> API (such as <strong>clReleaseContext</strong>,
+<strong>clReleaseCommandQueue</strong>) decrement the <em>reference count</em>.
+Implementations may also modify the <em>reference count</em>, e.g. to track
+attached objects or to ensure correct operation of in-progress or
+scheduled activities.
+The object becomes inaccessible to host code when the number of
+<em>release</em> operations performed matches the number of <em>retain</em> operations
+plus the allocation of the object.
+At this point the reference count may be zero but this is not
+guaranteed.</p>
+</dd>
+<dt class="hdlist1">Relaxed Consistency </dt>
+<dd>
+<p>A memory consistency model in which the contents of memory visible to
+different <em>work-items</em> or <em>commands</em> may be different except at a
+<em>barrier</em> or other explicit synchronization points.</p>
+</dd>
+<dt class="hdlist1">Relaxed Semantics </dt>
+<dd>
+<p>A memory order semantics for atomic operations that implies no order
+constraints.
+The operation is <em>atomic</em> but it has no impact on the order of memory
+operations.</p>
+</dd>
+<dt class="hdlist1">Release Semantics </dt>
+<dd>
+<p>One of the memory order semantics defined for synchronization
+operations.
+Release semantics apply to atomic operations that store to memory.
+Given two units of execution, <strong>A</strong> and <strong>B</strong>, acting on a shared atomic
+object <strong>M</strong>, if <strong>A</strong> uses an atomic store of <strong>M</strong> with release semantics to
+synchronize-with an atomic load to <strong>M</strong> by <strong>B</strong> that used acquire
+semantics, then <strong>A</strong>'s atomic store will occur <em>after</em> any prior
+operations by <strong>A</strong>.
+Note that the memory orders <em>acquire</em>, <em>sequentialy consistent</em>, and
+<em>acquire_release</em> all include <em>acquire semantics</em> and effectively pair
+with a store using release semantics.</p>
+</dd>
+<dt class="hdlist1">Remainder work-groups </dt>
+<dd>
+<p>When the work-groups associated with a kernel-instance are defined, the
+sizes of a work-group in each dimension may not evenly divide the size
+of the NDRange in the corresponding dimensions.
+The result is a collection of work-groups on the boundaries of the
+NDRange that are smaller than the base work-group size.
+These are known as <em>remainder work-groups</em>.</p>
+</dd>
+<dt class="hdlist1">Running </dt>
+<dd>
+<p>The fourth state in the six state model for the execution of a command.
+The transition into this state occurs when the execution of the command
+starts.
+When a Kernel-enqueue command starts, one or more work-groups associated
+with the command start to execute.</p>
+</dd>
+<dt class="hdlist1">Root device </dt>
+<dd>
+<p>A <em>root device</em> is an OpenCL <em>device</em> that has not been partitioned.
+Also see <em>Device</em>, <em>Parent device</em> and <em>Root device</em>.</p>
+</dd>
+<dt class="hdlist1">Resource </dt>
+<dd>
+<p>A class of <em>objects</em> defined by OpenCL.
+An instance of a <em>resource</em> is an <em>object</em>.
+The most common <em>resources</em> are the <em>context</em>, <em>command-queue</em>, <em>program
+objects</em>, <em>kernel objects</em>, and <em>memory objects</em>.
+Computational resources are hardware elements that participate in the
+action of advancing a program counter.
+Examples include the <em>host</em>, <em>devices</em>, <em>compute units</em> and <em>processing
+elements</em>.</p>
+</dd>
+<dt class="hdlist1">Retain, Release </dt>
+<dd>
+<p>The action of incrementing (retain) and decrementing (release) the
+reference count using an OpenCL <em>object</em>.
+This is a book keeping functionality to make sure the system doesnt
+remove an <em>object</em> before all instances that use this <em>object</em> have
+finished.
+Refer to <em>Reference Count</em>.</p>
+</dd>
+<dt class="hdlist1">Sampler </dt>
+<dd>
+<p>An <em>object</em> that describes how to sample an image when the image is read
+in the <em>kernel</em>.
+The image read functions take a <em>sampler</em> as an argument.
+The <em>sampler</em> specifies the image addressing-mode i.e. how out-of-range
+image coordinates are handled, the filter mode, and whether the input
+image coordinate is a normalized or unnormalized value.</p>
+</dd>
+<dt class="hdlist1">Scope inclusion </dt>
+<dd>
+<p>Two actions <strong>A</strong> and <strong>B</strong> are defined to have an inclusive scope if they
+have the same scope <strong>P</strong> such that: (1) if <strong>P</strong> is
+<strong>memory_scope_sub_group</strong>, and <strong>A</strong> and <strong>B</strong> are executed by work-items
+within the same sub-group, or (2) if <strong>P</strong> is <strong>memory_scope_work_group</strong>,
+and <strong>A</strong> and <strong>B</strong> are executed by work-items within the same work-group,
+or (3) if <strong>P</strong> is <strong>memory_scope_device</strong>, and <strong>A</strong> and <strong>B</strong> are executed by
+work-items on the same device, or (4) if <strong>P</strong> is
+<strong>memory_scope_all_svm_devices</strong>, if <strong>A</strong> and <strong>B</strong> are executed by host
+threads or by work-items on one or more devices that can share SVM
+memory with each other and the host process.</p>
+</dd>
+<dt class="hdlist1">Sequenced before </dt>
+<dd>
+<p>A relation between evaluations executed by a single unit of execution.
+Sequenced-before is an asymmetric, transitive, pair-wise relation that
+induces a partial order between evaluations.
+Given any two evaluations A and B, if A is sequenced-before B, then the
+execution of A shall precede the execution of B.</p>
+</dd>
+<dt class="hdlist1">Sequential consistency </dt>
+<dd>
+<p>Sequential consistency interleaves the steps executed by each unit of
+execution.
+Each access to a memory location sees the last assignment to that
+location in that interleaving.</p>
+</dd>
+<dt class="hdlist1">Sequentially consistent semantics </dt>
+<dd>
+<p>One of the memory order semantics defined for synchronization
+operations.
+When using sequentially-consistent synchronization operations, the loads
+and stores within one unit of execution appear to execute in program
+order (i.e., the sequenced-before order), and loads and stores from
+different units of execution appear to be simply interleaved.</p>
+</dd>
+<dt class="hdlist1">Shared Virtual Memory (SVM) </dt>
+<dd>
+<p>An address space exposed to both the host and the devices within a
+context.
+SVM causes addresses to be meaningful between the host and all of the
+devices within a context and therefore supports the use of pointer based
+data structures in OpenCL kernels.
+It logically extends a portion of the global memory into the host
+address space therefore giving work-items access to the host address
+space.
+There are three types of SVM in OpenCL:</p>
+<div class="openblock">
+<div class="content">
+<div class="dlist">
+<dl>
+<dt class="hdlist1"><em>Coarse-Grained buffer SVM</em> </dt>
+<dd>
+<p>Sharing occurs at the granularity of regions of OpenCL buffer memory
+objects.</p>
+</dd>
+<dt class="hdlist1"><em>Fine-Grained buffer SVM</em> </dt>
+<dd>
+<p>Sharing occurs at the granularity of individual loads/stores into bytes
+within OpenCL buffer memory objects.</p>
+</dd>
+<dt class="hdlist1"><em>Fine-Grained system SVM</em> </dt>
+<dd>
+<p>Sharing occurs at the granularity of individual loads/stores into bytes
+occurring anywhere within the host memory.</p>
+</dd>
+</dl>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1">SIMD </dt>
+<dd>
+<p>Single Instruction Multiple Data.
+A programming model where a <em>kernel</em> is executed concurrently on
+multiple <em>processing elements</em> each with its own data and a shared
+program counter.
+All <em>processing elements</em> execute a strictly identical set of
+instructions.</p>
+</dd>
+<dt class="hdlist1">Specialization constants </dt>
+<dd>
+<p>Specialization is intended for constant objects that will not have known
+constant values until after initial generation of a SPIR-V module.
+Such objects are called specialization constants.
+Application might provide values for the specialization constants that
+will be used when SPIR-V program is built.
+Specialization constants that do not receive a value from an application
+shall use default value as defined in SPIR-V specification.</p>
+</dd>
+<dt class="hdlist1">SPMD </dt>
+<dd>
+<p>Single Program Multiple Data.
+A programming model where a <em>kernel</em> is executed concurrently on
+multiple <em>processing elements</em> each with its own data and its own
+program counter.
+Hence, while all computational resources run the same <em>kernel</em> they
+maintain their own instruction counter and due to branches in a
+<em>kernel</em>, the actual sequence of instructions can be quite different
+across the set of <em>processing elements</em>.</p>
+</dd>
+<dt class="hdlist1">Sub-device </dt>
+<dd>
+<p>An OpenCL <em>device</em> can be partitioned into multiple <em>sub-devices</em>.
+The new <em>sub-devices</em> alias specific collections of compute units within
+the parent <em>device</em>, according to a partition scheme.
+The <em>sub-devices</em> may be used in any situation that their parent
+<em>device</em> may be used.
+Partitioning a <em>device</em> does not destroy the parent <em>device</em>, which may
+continue to be used along side and intermingled with its child
+<em>sub-devices</em>.
+Also see <em>Device</em>, <em>Parent device</em> and <em>Root device</em>.</p>
+</dd>
+<dt class="hdlist1">Sub-group </dt>
+<dd>
+<p>Sub-groups are an implementation-dependent grouping of work-items within
+a work-group.
+The size and number of sub-groups is implementation-defined.</p>
+</dd>
+<dt class="hdlist1">Sub-group Barrier </dt>
+<dd>
+<p>See <em>Barrier</em>.</p>
+</dd>
+<dt class="hdlist1">Submitted </dt>
+<dd>
+<p>The second state in the six state model for the execution of a command.
+The transition into this state occurs when the command is flushed from
+the command-queue and submitted for execution on the device.
+Once submitted, a programmer can assume a command will execute once its
+prerequisites have been met.</p>
+</dd>
+<dt class="hdlist1">SVM Buffer </dt>
+<dd>
+<p>A memory allocation enabled to work with <em>Shared Virtual Memory (SVM)</em>.
+Depending on how the SVM buffer is created, it can be a coarse-grained
+or fine-grained SVM buffer.
+Optionally it may be wrapped by a <em>Buffer Object</em>.
+See <em>Shared Virtual Memory (SVM)</em>.</p>
+</dd>
+<dt class="hdlist1">Synchronization </dt>
+<dd>
+<p>Synchronization refers to mechanisms that constrain the order of
+execution and the visibility of memory operations between two or more
+units of execution.</p>
+</dd>
+<dt class="hdlist1">Synchronization operations </dt>
+<dd>
+<p>Operations that define memory order constraints in a program.
+They play a special role in controlling how memory operations in one
+unit of execution (such as work-items or, when using SVM a host thread)
+are made visible to another.
+Synchronization operations in OpenCL include <em>atomic operations</em> and
+<em>fences</em>.</p>
+</dd>
+<dt class="hdlist1">Synchronization point </dt>
+<dd>
+<p>A synchronization point between a pair of commands (A and B) assures
+that results of command A happens-before command B is launched (i.e.
+enters the ready state) .</p>
+</dd>
+<dt class="hdlist1">Synchronizes with </dt>
+<dd>
+<p>A relation between operations in two different units of execution that
+defines a memory order constraint in global memory
+(<em>global-synchronizes-with</em>) or local memory
+(<em>local-synchronizes-with</em>).</p>
+</dd>
+<dt class="hdlist1">Task Parallel Programming Model </dt>
+<dd>
+<p>A programming model in which computations are expressed in terms of
+multiple concurrent tasks executing in one or more <em>command-queues</em>.
+The concurrent tasks can be running different <em>kernels</em>.</p>
+</dd>
+<dt class="hdlist1">Thread-safe </dt>
+<dd>
+<p>An OpenCL API call is considered to be <em>thread-safe</em> if the internal
+state as managed by OpenCL remains consistent when called simultaneously
+by multiple <em>host</em> threads.
+OpenCL API calls that are <em>thread-safe</em> allow an application to call
+these functions in multiple <em>host</em> threads without having to implement
+mutual exclusion across these <em>host</em> threads i.e. they are also
+re-entrant-safe.</p>
+</dd>
+<dt class="hdlist1">Undefined </dt>
+<dd>
+<p>The behavior of an OpenCL API call, built-in function used inside a
+<em>kernel</em> or execution of a <em>kernel</em> that is explicitly not defined by
+OpenCL.
+A conforming implementation is not required to specify what occurs when
+an undefined construct is encountered in OpenCL.</p>
+</dd>
+<dt class="hdlist1">Unit of execution </dt>
+<dd>
+<p>A generic term for a process, OS managed thread running on the host (a
+host-thread), kernel-instance, host program, work-item or any other
+executable agent that advances the work associated with a program.</p>
+</dd>
+<dt class="hdlist1">Work-group </dt>
+<dd>
+<p>A collection of related <em>work-items</em> that execute on a single <em>compute
+unit</em>.
+The <em>work-items</em> in the group execute the same <em>kernel-instance</em> and
+share <em>local</em> <em>memory</em> and <em>work-group functions</em>.</p>
+</dd>
+<dt class="hdlist1">Work-group Barrier </dt>
+<dd>
+<p>See <em>Barrier</em>.</p>
+</dd>
+<dt class="hdlist1">Work-group Function </dt>
+<dd>
+<p>A function that carries out collective operations across all the
+work-items in a work-group.
+Available collective operations are a barrier, reduction, broadcast,
+prefix sum, and evaluation of a predicate.
+A work-group function must occur within a <em>converged control flow</em>; i.e.
+all work-items in the work-group must encounter precisely the same
+work-group function.</p>
+</dd>
+<dt class="hdlist1">Work-group Synchronization </dt>
+<dd>
+<p>Constraints on the order of execution for work-items in a single
+work-group.</p>
+</dd>
+<dt class="hdlist1">Work-pool </dt>
+<dd>
+<p>A logical pool associated with a device that holds commands and
+work-groups from kernel-instances that are ready to execute.
+OpenCL does not constrain the order that commands and work-groups are
+scheduled for execution from the work-pool; i.e. a programmer must
+assume that they could be interleaved.
+There is one work-pool per device used by all command-queues associated
+with that device.
+The work-pool may be implemented in any manner as long as it assures
+that work-groups placed in the pool will eventually execute.</p>
+</dd>
+<dt class="hdlist1">Work-item </dt>
+<dd>
+<p>One of a collection of parallel executions of a <em>kernel</em> invoked on a
+<em>device</em> by a <em>command</em>.
+A <em>work-item</em> is executed by one or more <em>processing elements</em> as part
+of a <em>work-group</em> executing on a <em>compute unit</em>.
+A <em>work-item</em> is distinguished from other work-items by its <em>global ID</em>
+or the combination of its <em>work-group</em> ID and its <em>local ID</em> within a
+<em>work-group</em>.</p>
+</dd>
+</dl>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_the_opencl_architecture">3. The OpenCL Architecture</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p><strong>OpenCL</strong> is an open industry standard for programming a heterogeneous
+collection of CPUs, GPUs and other discrete computing devices organized into
+a single platform.
+It is more than a language.
+OpenCL is a framework for parallel programming and includes a language, API,
+libraries and a runtime system to support software development.
+Using OpenCL, for example, a programmer can write general purpose programs
+that execute on GPUs without the need to map their algorithms onto a 3D
+graphics API such as OpenGL or DirectX.</p>
+</div>
+<div class="paragraph">
+<p>The target of OpenCL is expert programmers wanting to write portable yet
+efficient code.
+This includes library writers, middleware vendors, and performance oriented
+application programmers.
+Therefore OpenCL provides a low-level hardware abstraction plus a framework
+to support programming and many details of the underlying hardware are
+exposed.</p>
+</div>
+<div class="paragraph">
+<p>To describe the core ideas behind OpenCL, we will use a hierarchy of models:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Platform Model</p>
+</li>
+<li>
+<p>Memory Model</p>
+</li>
+<li>
+<p>Execution Model</p>
+</li>
+<li>
+<p>Programming Model</p>
+</li>
+</ul>
+</div>
+<div class="sect2">
+<h3 id="_platform_model">3.1. Platform Model</h3>
+<div class="paragraph">
+<p>The <a href="#platform-model-image">Platform model</a> for OpenCL is defined below.
+The model consists of a <strong>host</strong> connected to one or more <strong>OpenCL devices</strong>.
+An OpenCL device is divided into one or more <strong>compute units</strong> (CUs) which are
+further divided into one or more <strong>processing elements</strong> (PEs).
+Computations on a device occur within the processing elements.</p>
+</div>
+<div class="paragraph">
+<p>An OpenCL application is implemented as both host code and device kernel
+code.
+The host code portion of an OpenCL application runs on a host processor
+according to the models native to the host platform.
+The OpenCL application host code submits the kernel code as commands from
+the host to OpenCL devices.
+An OpenCL device executes the commands computation on the processing
+elements within the device.</p>
+</div>
+<div class="paragraph">
+<p>An OpenCL device has considerable latitude on how computations are mapped
+onto the devices processing elements.
+When processing elements within a compute unit execute the same sequence of
+statements across the processing elements, the control flow is said to be
+<em>converged</em>.
+Hardware optimized for executing a single stream of instructions over
+multiple processing elements is well suited to converged control flows.
+When the control flow varies from one processing element to another, it is
+said to be <em>diverged</em>.
+While a kernel always begins execution with a converged control flow, due to
+branching statements within a kernel, converged and diverged control flows
+may occur within a single kernel.
+This provides a great deal of flexibility in the algorithms that can be
+implemented with OpenCL.</p>
+</div>
+<div id="platform-model-image" class="imageblock" style="text-align: center">
+<div class="content">
+<img src="" alt="platform model">
+</div>
+<div class="title">Figure 1. Platform Model &#8230;&#8203; one host plus one or more compute devices each with one or more compute units composed of one or more processing elements.</div>
+</div>
+<div class="paragraph">
+<p>Programmers provide programs in the form of SPIR-V source binaries, OpenCL C
+or OpenCL C++ source strings or implementation-defined binary objects.
+The OpenCL platform provides a compiler to translate program input of either
+form into executable program objects.
+The device code compiler may be <em>online</em> or <em>offline</em>.
+An <em>online</em> <em>compiler</em> is available during host program execution using
+standard APIs.
+An <em>offline compiler</em> is invoked outside of host program control, using
+platform-specific methods.
+The OpenCL runtime allows developers to get a previously compiled device
+program executable and be able to load and execute a previously compiled
+device program executable.</p>
+</div>
+<div class="paragraph">
+<p>OpenCL defines two kinds of platform profiles: a <em>full profile</em> and a
+reduced-functionality <em>embedded profile</em>.
+A full profile platform must provide an online compiler for all its devices.
+An embedded platform may provide an online compiler, but is not required to
+do so.</p>
+</div>
+<div class="paragraph">
+<p>A device may expose special purpose functionality as a <em>built-in function</em>.
+The platform provides APIs for enumerating and invoking the built-in
+functions offered by a device, but otherwise does not define their
+construction or semantics.
+A <em>custom device</em> supports only built-in functions, and cannot be programmed
+via a kernel language.</p>
+</div>
+<div class="paragraph">
+<p>All device types support the OpenCL execution model, the OpenCL memory
+model, and the APIs used in OpenCL to manage devices.</p>
+</div>
+<div class="paragraph">
+<p>The platform model is an abstraction describing how OpenCL views the
+hardware.
+The relationship between the elements of the platform model and the hardware
+in a system may be a fixed property of a device or it may be a dynamic
+feature of a program dependent on how a compiler optimizes code to best
+utilize physical hardware.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_execution_model">3.2. Execution Model</h3>
+<div class="paragraph">
+<p>The OpenCL execution model is defined in terms of two distinct units of
+execution: <strong>kernels</strong> that execute on one or more OpenCL devices and a <strong>host
+program</strong> that executes on the host.
+With regard to OpenCL, the kernels are where the "work" associated with a
+computation occurs.
+This work occurs through <strong>work-items</strong> that execute in groups
+(<strong>work-groups</strong>).</p>
+</div>
+<div class="paragraph">
+<p>A kernel executes within a well-defined context managed by the host.
+The context defines the environment within which kernels execute.
+It includes the following resources:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Devices</strong>: One or more devices exposed by the OpenCL platform.</p>
+</li>
+<li>
+<p><strong>Kernel Objects</strong>:The OpenCL functions with their associated argument
+values that run on OpenCL devices.</p>
+</li>
+<li>
+<p><strong>Program Objects</strong>:The program source and executable that implement the
+kernels.</p>
+</li>
+<li>
+<p><strong>Memory Objects</strong>:Variables visible to the host and the OpenCL devices.
+Instances of kernels operate on these objects as they execute.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The host program uses the OpenCL API to create and manage the context.
+Functions from the OpenCL API enable the host to interact with a device
+through a <em>command-queue</em>.
+Each command-queue is associated with a single device.
+The commands placed into the command-queue fall into one of three types:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Kernel-enqueue commands</strong>: Enqueue a kernel for execution on a device.</p>
+</li>
+<li>
+<p><strong>Memory commands</strong>: Transfer data between the host and device memory,
+between memory objects, or map and unmap memory objects from the host
+address space.</p>
+</li>
+<li>
+<p><strong>Synchronization commands</strong>: Explicit synchronization points that define
+order constraints between commands.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>In addition to commands submitted from the host command-queue, a kernel
+running on a device can enqueue commands to a device-side command queue.
+This results in <em>child kernels</em> enqueued by a kernel executing on a device
+(the <em>parent kernel</em>).
+Regardless of whether the command-queue resides on the host or a device,
+each command passes through six states.</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p><strong>Queued</strong>: The command is enqueued to a command-queue.
+A command may reside in the queue until it is flushed either explicitly
+(a call to <strong>clFlush</strong>) or implicitly by some other command.</p>
+</li>
+<li>
+<p><strong>Submitted</strong>: The command is flushed from the command-queue and submitted
+for execution on the device.
+Once flushed from the command-queue, a command will execute after any
+prerequisites for execution are met.</p>
+</li>
+<li>
+<p><strong>Ready</strong>: All prerequisites constraining execution of a command have been
+met.
+The command, or for a kernel-enqueue command the collection of work
+groups associated with a command, is placed in a device work-pool from
+which it is scheduled for execution.</p>
+</li>
+<li>
+<p><strong>Running</strong>: Execution of the command starts.
+For the case of a kernel-enqueue command, one or more work-groups
+associated with the command start to execute.</p>
+</li>
+<li>
+<p><strong>Ended</strong>: Execution of a command ends.
+When a Kernel-enqueue command ends, all of the work-groups associated
+with that command have finished their execution.
+<em>Immediate side effects</em>, i.e. those associated with the kernel but not
+necessarily with its child kernels, are visible to other units of
+execution.
+These side effects include updates to values in global memory.</p>
+</li>
+<li>
+<p><strong>Complete</strong>: The command and its child commands have finished execution
+and the status of the event object, if any, associated with the command
+is set to CL_COMPLETE.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>The <a href="#profiled-states-image">execution states and the transitions between
+them</a> are summarized below.
+These states and the concept of a device work-pool are conceptual elements
+of the execution model.
+An implementation of OpenCL has considerable freedom in how these are
+exposed to a program.
+Five of the transitions, however, are directly observable through a
+profiling interface.
+These <a href="#profiled-states-image">profiled states</a> are shown below.</p>
+</div>
+<div id="profiled-states-image" class="imageblock" style="text-align: center">
+<div class="content">
+<img src="" alt="profiled states">
+</div>
+<div class="title">Figure 2. The states and transitions between states defined in the OpenCL execution model. A subset of these transitions is exposed through the <a href="#profiling-operations">profiling interface</a>.</div>
+</div>
+<div class="paragraph">
+<p>Commands communicate their status through <em>Event objects</em>.
+Successful completion is indicated by setting the event status associated
+with a command to CL_COMPLETE.
+Unsuccessful completion results in abnormal termination of the command which
+is indicated by setting the event status to a negative value.
+In this case, the command-queue associated with the abnormally terminated
+command and all other command-queues in the same context may no longer be
+available and their behavior is implementation defined.</p>
+</div>
+<div class="paragraph">
+<p>A command submitted to a device will not launch until prerequisites that
+constrain the order of commands have been resolved.
+These prerequisites have three sources:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>They may arise from commands submitted to a command-queue that constrain
+the order in which commands are launched.
+For example, commands that follow a command queue barrier will not
+launch until all commands prior to the barrier are complete.</p>
+</li>
+<li>
+<p>The second source of prerequisites is dependencies between commands
+expressed through events.
+A command may include an optional list of events.
+The command will wait and not launch until all the events in the list
+are in the state CL COMPLETE.
+By this mechanism, event objects define order constraints between
+commands and coordinate execution between the host and one or more
+devices.</p>
+</li>
+<li>
+<p>The third source of prerequisities can be the presence of non-trivial C
+initializers or C constructors for program scope global variables.
+In this case, OpenCL C/C compiler shall generate program
+initialization kernels that perform C initialization or C++
+construction.
+These kernels must be executed by OpenCL runtime on a device before any
+kernel from the same program can be executed on the same device.
+The ND-range for any program initialization kernel is (1,1,1).
+When multiple programs are linked together, the order of execution of
+program initialization kernels that belong to different programs is
+undefined.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Program clean up may result in the execution of one or more program clean up
+kernels by the OpenCL runtime.
+This is due to the presence of non-trivial C++ destructors for
+program scope variables.
+The ND-range for executing any program clean up kernel is (1,1,1).
+The order of execution of clean up kernels from different programs (that are
+linked together) is undefined.</p>
+</div>
+<div class="paragraph">
+<p>Note that C initializers, C constructors, or C destructors for program
+scope variables cannot use pointers to coarse grain and fine grain SVM
+allocations.</p>
+</div>
+<div class="paragraph">
+<p>A command may be submitted to a device and yet have no visible side effects
+outside of waiting on and satisfying event dependences.
+Examples include markers, kernels executed over ranges of no work-items or
+copy operations with zero sizes.
+Such commands may pass directly from the <em>ready</em> state to the <em>ended</em> state.</p>
+</div>
+<div class="paragraph">
+<p>Command execution can be blocking or non-blocking.
+Consider a sequence of OpenCL commands.
+For blocking commands, the OpenCL API functions that enqueue commands don&#8217;t
+return until the command has completed.
+Alternatively, OpenCL functions that enqueue non-blocking commands return
+immediately and require that a programmer defines dependencies between
+enqueued commands to ensure that enqueued commands are not launched before
+needed resources are available.
+In both cases, the actual execution of the command may occur asynchronously
+with execution of the host program.</p>
+</div>
+<div class="paragraph">
+<p>Commands within a single command-queue execute relative to each other in one
+of two modes:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>In-order Execution</strong>: Commands and any side effects associated with
+commands appear to the OpenCL application as if they execute in the same
+order they are enqueued to a command-queue.</p>
+</li>
+<li>
+<p><strong>Out-of-order Execution</strong>: Commands execute in any order constrained only
+by explicit synchronization points (e.g. through command queue barriers)
+or explicit dependencies on events.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Multiple command-queues can be present within a single context.
+Multiple command-queues execute commands independently.
+Event objects visible to the host program can be used to define
+synchronization points between commands in multiple command queues.
+If such synchronization points are established between commands in multiple
+command-queues, an implementation must assure that the command-queues
+progress concurrently and correctly account for the dependencies established
+by the synchronization points.
+For a detailed explanation of synchronization points, see
+<a href="#execution-model-sync">Execution Model: Synchronization</a>.</p>
+</div>
+<div class="paragraph">
+<p>The core of the OpenCL execution model is defined by how the kernels
+execute.
+When a kernel-enqueue command submits a kernel for execution, an index space
+is defined.
+The kernel, the argument values associated with the arguments to the kernel,
+and the parameters that define the index space define a <em>kernel-instance</em>.
+When a kernel-instance executes on a device, the kernel function executes
+for each point in the defined index space.
+Each of these executing kernel functions is called a <em>work-item</em>.
+The work-items associated with a given kernel-instance are managed by the
+device in groups called <em>work-groups</em>.
+These work-groups define a coarse grained decomposition of the Index space.
+Work-groups are further divided into <em>sub-groups</em>, which provide an
+additional level of control over execution.</p>
+</div>
+<div class="paragraph">
+<p>Work-items have a global ID based on their coordinates within the Index
+space.
+They can also be defined in terms of their work-group and the local ID
+within a work-group.
+The details of this mapping are described in the following section.</p>
+</div>
+<div class="sect3">
+<h4 id="_execution_model_mapping_work_items_onto_an_ndrange">3.2.1. Execution Model: Mapping work-items onto an NDRange</h4>
+<div class="paragraph">
+<p>The index space supported by OpenCL is called an NDRange.
+An NDRange is an N-dimensional index space, where N is one, two or three.
+The NDRange is decomposed into work-groups forming blocks that cover the
+Index space.
+An NDRange is defined by three integer arrays of length N:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The extent of the index space (or global size) in each dimension.</p>
+</li>
+<li>
+<p>An offset index F indicating the initial value of the indices in each
+dimension (zero by default).</p>
+</li>
+<li>
+<p>The size of a work-group (local size) in each dimension.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Each work-items global ID is an N-dimensional tuple.
+The global ID components are values in the range from F, to F plus the
+number of elements in that dimension minus one.</p>
+</div>
+<div class="paragraph">
+<p>If a kernel is created from OpenCL 2.0 or SPIR-V, the size of work-groups in
+an NDRange (the local size) need not be the same for all work-groups.
+In this case, any single dimension for which the global size is not
+divisible by the local size will be partitioned into two regions.
+One region will have work-groups that have the same number of work items as
+was specified for that dimension by the programmer (the local size).
+The other region will have work-groups with less than the number of work
+items specified by the local size parameter in that dimension (the
+<em>remainder work-groups</em>).
+Work-group sizes could be non-uniform in multiple dimensions, potentially
+producing work-groups of up to 4 different sizes in a 2D range and 8
+different sizes in a 3D range.</p>
+</div>
+<div class="paragraph">
+<p>Each work-item is assigned to a work-group and given a local ID to represent
+its position within the work-group.
+A work-item&#8217;s local ID is an N-dimensional tuple with components in the
+range from zero to the size of the work-group in that dimension minus one.</p>
+</div>
+<div class="paragraph">
+<p>Work-groups are assigned IDs similarly.
+The number of work-groups in each dimension is not directly defined but is
+inferred from the local and global NDRanges provided when a kernel-instance
+is enqueued.
+A work-group&#8217;s ID is an N-dimensional tuple with components in the range 0
+to the ceiling of the global size in that dimension divided by the local
+size in the same dimension.
+As a result, the combination of a work-group ID and the local-ID within a
+work-group uniquely defines a work-item.
+Each work-item is identifiable in two ways; in terms of a global index, and
+in terms of a work-group index plus a local index within a work group.</p>
+</div>
+<div class="paragraph">
+<p>For example, consider the <a href="#index-space-image">2-dimensional index space</a>
+shown below.
+We input the index space for the work-items (G<sub>x</sub>, G<sub>y</sub>), the size of each
+work-group (S<sub>x</sub>, S<sub>y</sub>) and the global ID offset (F<sub>x</sub>, F<sub>y</sub>).
+The global indices define an G<sub>x</sub>by G<sub>y</sub> index space where the total number
+of work-items is the product of G<sub>x</sub> and G<sub>y</sub>.
+The local indices define an S<sub>x</sub> by S<sub>y</sub> index space where the number of
+work-items in a single work-group is the product of S<sub>x</sub> and S<sub>y</sub>.
+Given the size of each work-group and the total number of work-items we can
+compute the number of work-groups.
+A 2-dimensional index space is used to uniquely identify a work-group.
+Each work-item is identified by its global ID (<em>g</em><sub>x</sub>, <em>g</em><sub>y</sub>) or by the
+combination of the work-group ID (<em>w</em><sub>x</sub>, <em>w</em><sub>y</sub>), the size of each
+work-group (S<sub>x</sub>,S<sub>y</sub>) and the local ID (s<sub>x</sub>, s<sub>y</sub>) inside the work-group
+such that</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"></dt>
+<dd>
+<p>(g<sub>x</sub> , g<sub>y</sub>) = (w<sub>x</sub> S<sub>x</sub> + s<sub>x</sub> + F<sub>x</sub>, w<sub>y</sub> S<sub>y</sub> + s<sub>y</sub> + F<sub>y</sub>)</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p>The number of work-groups can be computed as:</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"></dt>
+<dd>
+<p>(W<sub>x</sub>, W<sub>y</sub>) = (ceil(G<sub>x</sub> / S<sub>x</sub>), ceil(G<sub>y</sub> / S<sub>y</sub>))</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p>Given a global ID and the work-group size, the work-group ID for a work-item
+is computed as:</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1"></dt>
+<dd>
+<p>(w<sub>x</sub>, w<sub>y</sub>) = ( (g<sub>x</sub> s<sub>x</sub> F<sub>x</sub>) / S<sub>x</sub>, (g<sub>y</sub> s<sub>y</sub> F<sub>y</sub>) / S<sub>y</sub> )</p>
+</dd>
+</dl>
+</div>
+<div id="index-space-image" class="imageblock" style="text-align: center">
+<div class="content">
+<img src="" alt="index space">
+</div>
+<div class="title">Figure 3. An example of an NDRange index space showing work-items, their global IDs and their mapping onto the pair of work-group and local IDs. In this case, we assume that in each dimension, the size of the work-group evenly divides the global NDRange size (i.e. all work-groups have the same size) and that the offset is equal to zero.</div>
+</div>
+<div class="paragraph">
+<p>Within a work-group work-items may be divided into sub-groups.
+The mapping of work-items to sub-groups is implementation-defined and may be
+queried at runtime.
+While sub-groups may be used in multi-dimensional work-groups, each
+sub-group is 1-dimensional and any given work-item may query which sub-group
+it is a member of.</p>
+</div>
+<div class="paragraph">
+<p>Work items are mapped into sub-groups through a combination of compile-time
+decisions and the parameters of the dispatch.
+The mapping to sub-groups is invariant for the duration of a kernels
+execution, across dispatches of a given kernel with the same work-group
+dimensions, between dispatches and query operations consistent with the
+dispatch parameterization, and from one work-group to another within the
+dispatch (excluding the trailing edge work-groups in the presence of
+non-uniform work-group sizes).
+In addition, all sub-groups within a work-group will be the same size, apart
+from the sub-group with the maximum index which may be smaller if the size
+of the work-group is not evenly divisible by the size of the sub-groups.</p>
+</div>
+<div class="paragraph">
+<p>In the degenerate case, a single sub-group must be supported for each
+work-group.
+In this situation all sub-group scope functions are equivalent to their
+work-group level equivalents.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_execution_model_execution_of_kernel_instances">3.2.2. Execution Model: Execution of kernel-instances</h4>
+<div class="paragraph">
+<p>The work carried out by an OpenCL program occurs through the execution of
+kernel-instances on compute devices.
+To understand the details of OpenCLs execution model, we need to consider
+how a kernel object moves from the kernel-enqueue command, into a
+command-queue, executes on a device, and completes.</p>
+</div>
+<div class="paragraph">
+<p>A kernel-object is defined from a function within the program object and a
+collection of arguments connecting the kernel to a set of argument values.
+The host program enqueues a kernel-object to the command queue along with
+the NDRange, and the work-group decomposition.
+These define a <em>kernel-instance</em>.
+In addition, an optional set of events may be defined when the kernel is
+enqueued.
+The events associated with a particular kernel-instance are used to
+constrain when the kernel-instance is launched with respect to other
+commands in the queue or to commands in other queues within the same
+context.</p>
+</div>
+<div class="paragraph">
+<p>A kernel-instance is submitted to a device.
+For an in-order command queue, the kernel instances appear to launch and
+then execute in that same order; where we use the term appear to emphasize
+that when there are no dependencies between commands and hence differences
+in the order that commands execute cannot be observed in a program, an
+implementation can reorder commands even in an in-order command queue.
+For an out of order command-queue, kernel-instances wait to be launched
+until:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Synchronization commands enqueued prior to the kernel-instance are
+satisfied.</p>
+</li>
+<li>
+<p>Each of the events in an optional event list defined when the
+kernel-instance was enqueued are set to CL_COMPLETE.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Once these conditions are met, the kernel-instance is launched and the
+work-groups associated with the kernel-instance are placed into a pool of
+ready to execute work-groups.
+This pool is called a <em>work-pool</em>.
+The work-pool may be implemented in any manner as long as it assures that
+work-groups placed in the pool will eventually execute.
+The device schedules work-groups from the work-pool for execution on the
+compute units of the device.
+The kernel-enqueue command is complete when all work-groups associated with
+the kernel-instance end their execution, updates to global memory associated
+with a command are visible globally, and the device signals successful
+completion by setting the event associated with the kernel-enqueue command
+to CL_COMPLETE.</p>
+</div>
+<div class="paragraph">
+<p>While a command-queue is associated with only one device, a single device
+may be associated with multiple command-queues all feeding into the single
+work-pool.
+A device may also be associated with command queues associated with
+different contexts within the same platform, again all feeding into the
+single work-pool.
+The device will pull work-groups from the work-pool and execute them on one
+or several compute units in any order; possibly interleaving execution of
+work-groups from multiple commands.
+A conforming implementation may choose to serialize the work-groups so a
+correct algorithm cannot assume that work-groups will execute in parallel.
+There is no safe and portable way to synchronize across the independent
+execution of work-groups since once in the work-pool, they can execute in
+any order.</p>
+</div>
+<div class="paragraph">
+<p>The work-items within a single sub-group execute concurrently but not
+necessarily in parallel (i.e. they are not guaranteed to make independent
+forward progress).
+Therefore, only high-level synchronization constructs (e.g. sub-group
+functions such as barriers) that apply to all the work-items in a sub-group
+are well defined and included in OpenCL.</p>
+</div>
+<div class="paragraph">
+<p>Sub-groups execute concurrently within a given work-group and with
+appropriate device support (see <a href="#platform-querying-devices">Querying
+Devices</a>), may make independent forward progress with respect to each
+other, with respect to host threads and with respect to any entities
+external to the OpenCL system but running on an OpenCL device, even in the
+absence of work-group barrier operations.
+In this situation, sub-groups are able to internally synchronize using
+barrier operations without synchronizing with each other and may perform
+operations that rely on runtime dependencies on operations other sub-groups
+perform.</p>
+</div>
+<div class="paragraph">
+<p>The work-items within a single work-group execute concurrently but are only
+guaranteed to make independent progress in the presence of sub-groups and
+device support.
+In the absence of this capability, only high-level synchronization
+constructs (e.g. work-group functions such as barriers) that apply to all
+the work-items in a work-group are well defined and included in OpenCL for
+synchronization within the work-group.</p>
+</div>
+<div class="paragraph">
+<p>In the absence of synchronization functions (e.g. a barrier), work-items
+within a sub-group may be serialized.
+In the presence of sub -group functions, work-items within a sub -group may
+be serialized before any given sub -group function, between dynamically
+encountered pairs of sub-group functions and between a work-group function
+and the end of the kernel.</p>
+</div>
+<div class="paragraph">
+<p>In the absence of independent forward progress of constituent sub-groups,
+work-items within a work-group may be serialized before, after or between
+work-group synchronization functions.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="device-side-enqueue">3.2.3. Execution Model: Device-side enqueue</h4>
+<div class="paragraph">
+<p>Algorithms may need to generate additional work as they execute.
+In many cases, this additional work cannot be determined statically; so the
+work associated with a kernel only emerges at runtime as the kernel-instance
+executes.
+This capability could be implemented in logic running within the host
+program, but involvement of the host may add significant overhead and/or
+complexity to the application control flow.
+A more efficient approach would be to nest kernel-enqueue commands from
+inside other kernels.
+This <strong>nested parallelism</strong> can be realized by supporting the enqueuing of
+kernels on a device without direct involvement by the host program;
+so-called <strong>device-side enqueue</strong>.</p>
+</div>
+<div class="paragraph">
+<p>Device-side kernel-enqueue commands are similar to host-side kernel-enqueue
+commands.
+The kernel executing on a device (the <strong>parent kernel</strong>) enqueues a
+kernel-instance (the <strong>child kernel</strong>) to a device-side command queue.
+This is an out-of-order command-queue and follows the same behavior as the
+out-of-order command-queues exposed to the host program.
+Commands enqueued to a device side command-queue generate and use events to
+enforce order constraints just as for the command-queue on the host.
+These events, however, are only visible to the parent kernel running on the
+device.
+When these prerequisite events take on the value CL_COMPLETE, the
+work-groups associated with the child kernel are launched into the devices
+work pool.
+The device then schedules them for execution on the compute units of the
+device.
+Child and parent kernels execute asynchronously.
+However, a parent will not indicate that it is complete by setting its event
+to CL_COMPLETE until all child kernels have ended execution and have
+signaled completion by setting any associated events to the value
+CL_COMPLETE.
+Should any child kernel complete with an event status set to a negative
+value (i.e. abnormally terminate), the parent kernel will abnormally
+terminate and propagate the childs negative event value as the value of the
+parents event.
+If there are multiple children that have an event status set to a negative
+value, the selection of which childs negative event value is propagated is
+implementation-defined.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="execution-model-sync">3.2.4. Execution Model: Synchronization</h4>
+<div class="paragraph">
+<p>Synchronization refers to mechanisms that constrain the order of execution
+between two or more units of execution.
+Consider the following three domains of synchronization in OpenCL:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Work-group synchronization: Constraints on the order of execution for
+work-items in a single work-group</p>
+</li>
+<li>
+<p>Sub-group synchronization: Contraints on the order of execution for
+work-items in a single sub-group</p>
+</li>
+<li>
+<p>Command synchronization: Constraints on the order of commands launched
+for execution</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Synchronization across all work-items within a single work-group is carried
+out using a <em>work-group function</em>.
+These functions carry out collective operations across all the work-items in
+a work-group.
+Available collective operations are: barrier, reduction, broadcast, prefix
+sum, and evaluation of a predicate.
+A work-group function must occur within a converged control flow; i.e. all
+work-items in the work-group must encounter precisely the same work-group
+function.
+For example, if a work-group function occurs within a loop, the work-items
+must encounter the same work-group function in the same loop iterations.
+All the work-items of a work-group must execute the work-group function and
+complete reads and writes to memory before any are allowed to continue
+execution beyond the work-group function.
+Work-group functions that apply between work-groups are not provided in
+OpenCL since OpenCL does not define forward-progress or ordering relations
+between work-groups, hence collective synchronization operations are not
+well defined.</p>
+</div>
+<div class="paragraph">
+<p>Synchronization across all work-items within a single sub-group is carried
+out using a <em>sub-group function</em>.
+These functions carry out collective operations across all the work-items in
+a sub-group.
+Available collective operations are: barrier, reduction, broadcast, prefix
+sum, and evaluation of a predicate.
+A sub-group function must occur within a converged control flow; i.e. all
+work-items in the sub-group must encounter precisely the same sub-group
+function.
+For example, if a work-group function occurs within a loop, the work-items
+must encounter the same sub-group function in the same loop iterations.
+All the work-items of a sub-group must execute the sub-group function and
+complete reads and writes to memory before any are allowed to continue
+execution beyond the sub-group function.
+Synchronization between sub-groups must either be performed using work-group
+functions, or through memory operations.
+Using memory operations for sub-group synchronization should be used
+carefully as forward progress of sub-groups relative to each other is only
+supported optionally by OpenCL implementations.</p>
+</div>
+<div class="paragraph">
+<p>Command synchronization is defined in terms of distinct <strong>synchronization
+points</strong>.
+The synchronization points occur between commands in host command-queues and
+between commands in device-side command-queues.
+The synchronization points defined in OpenCL include:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Launching a command:</strong> A kernel-instance is launched onto a device after
+all events that kernel is waiting-on have been set to CL_COMPLETE.</p>
+</li>
+<li>
+<p><strong>Ending a command:</strong> Child kernels may be enqueued such that they wait
+for the parent kernel to reach the <em>end</em> state before they can be
+launched.
+In this case, the ending of the parent command defines a synchronization
+point.</p>
+</li>
+<li>
+<p><strong>Completion of a command:</strong> A kernel-instance is complete after all of
+the work-groups in the kernel and all of its child kernels have
+completed.
+This is signaled to the host, a parent kernel or other kernels within
+command queues by setting the value of the event associated with a
+kernel to CL_COMPLETE.</p>
+</li>
+<li>
+<p><strong>Blocking Commands:</strong> A blocking command defines a synchronization point
+between the unit of execution that calls the blocking API function and
+the enqueued command reaching the complete state.</p>
+</li>
+<li>
+<p><strong>Command-queue barrier:</strong> The command-queue barrier ensures that all
+previously enqueued commands have completed before subsequently enqueued
+commands can be launched.</p>
+</li>
+<li>
+<p><strong>clFinish:</strong> This function blocks until all previously enqueued commands
+in the command queue have completed after which <strong>clFinish</strong> defines a
+synchronization point and the <strong>clFinish</strong> function returns.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>A synchronization point between a pair of commands (A and B) assures that
+results of command A happens-before command B is launched.
+This requires that any updates to memory from command A complete and are
+made available to other commands before the synchronization point completes.
+Likewise, this requires that command B waits until after the synchronization
+point before loading values from global memory.
+The concept of a synchronization point works in a similar fashion for
+commands such as a barrier that apply to two sets of commands.
+All the commands prior to the barrier must complete and make their results
+available to following commands.
+Furthermore, any commands following the barrier must wait for the commands
+prior to the barrier before loading values and continuing their execution.</p>
+</div>
+<div class="paragraph">
+<p>These <em>happens-before</em> relationships are a fundamental part of the OpenCL
+memory model.
+When applied at the level of commands, they are straightforward to define at
+a language level in terms of ordering relationships between different
+commands.
+Ordering memory operations inside different commands, however, requires
+rules more complex than can be captured by the high level concept of a
+synchronization point.
+These rules are described in detail in <a href="#memory-ordering-rules">Memory
+Ordering Rules</a>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_execution_model_categories_of_kernels">3.2.5. Execution Model: Categories of Kernels</h4>
+<div class="paragraph">
+<p>The OpenCL execution model supports three types of kernels:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpenCL kernels</strong> are managed by the OpenCL API as kernel-objects
+associated with kernel functions within program-objects.
+OpenCL kernels are provided via a kernel language.
+All OpenCL implementations must support OpenCL kernels supplied in the
+standard SPIR-V intermediate language with the appropriate environment
+specification, and the OpenCL C programming language defined in earlier
+versions of the OpenCL specification.
+Implementations must also support OpenCL kernels in SPIR-V intermediate
+language.
+SPIR-V binaries nay be generated from an OpenCL kernel language or by a
+third party compiler from an alternative input.</p>
+</li>
+<li>
+<p><strong>Native kernels</strong> are accessed through a host function pointer.
+Native kernels are queued for execution along with OpenCL kernels on a
+device and share memory objects with OpenCL kernels.
+For example, these native kernels could be functions defined in
+application code or exported from a library.
+The ability to execute native kernels is optional within OpenCL and the
+semantics of native kernels are implementation-defined.
+The OpenCL API includes functions to query capabilities of a device(s)
+and determine if this capability is supported.</p>
+</li>
+<li>
+<p><strong>Built-in kernels</strong> are tied to particular device and are not built at
+runtime from source code in a program object.
+The common use of built in kernels is to expose fixed-function hardware
+or firmware associated with a particular OpenCL device or custom device.
+The semantics of a built-in kernel may be defined outside of OpenCL and
+hence are implementation defined.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>All three types of kernels are manipulated through the OpenCL command queues
+and must conform to the synchronization points defined in the OpenCL
+execution model.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_memory_model">3.3. Memory Model</h3>
+<div class="paragraph">
+<p>The OpenCL memory model describes the structure, contents, and behavior of
+the memory exposed by an OpenCL platform as an OpenCL program runs.
+The model allows a programmer to reason about values in memory as the host
+program and multiple kernel-instances execute.</p>
+</div>
+<div class="paragraph">
+<p>An OpenCL program defines a context that includes a host, one or more
+devices, command-queues, and memory exposed within the context.
+Consider the units of execution involved with such a program.
+The host program runs as one or more host threads managed by the operating
+system running on the host (the details of which are defined outside of
+OpenCL).
+There may be multiple devices in a single context which all have access to
+memory objects defined by OpenCL.
+On a single device, multiple work-groups may execute in parallel with
+potentially overlapping updates to memory.
+Finally, within a single work-group, multiple work-items concurrently
+execute, once again with potentially overlapping updates to memory.</p>
+</div>
+<div class="paragraph">
+<p>The memory model must precisely define how the values in memory as seen from
+each of these units of execution interact so a programmer can reason about
+the correctness of OpenCL programs.
+We define the memory model in four parts.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Memory regions: The distinct memories visible to the host and the
+devices that share a context.</p>
+</li>
+<li>
+<p>Memory objects: The objects defined by the OpenCL API and their
+management by the host and devices.</p>
+</li>
+<li>
+<p>Shared Virtual Memory: A virtual address space exposed to both the host
+and the devices within a context.</p>
+</li>
+<li>
+<p>Consistency Model: Rules that define which values are observed when
+multiple units of execution load data from memory plus the atomic/fence
+operations that constrain the order of memory operations and define
+synchronization relationships.</p>
+</li>
+</ul>
+</div>
+<div class="sect3">
+<h4 id="_memory_model_fundamental_memory_regions">3.3.1. Memory Model: Fundamental Memory Regions</h4>
+<div class="paragraph">
+<p>Memory in OpenCL is divided into two parts.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Host Memory:</strong> The memory directly available to the host.
+The detailed behavior of host memory is defined outside of OpenCL.
+Memory objects move between the Host and the devices through functions
+within the OpenCL API or through a shared virtual memory interface.</p>
+</li>
+<li>
+<p><strong>Device Memory:</strong> Memory directly available to kernels executing on
+OpenCL devices.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Device memory consists of four named address spaces or <em>memory regions</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Global Memory:</strong> This memory region permits read/write access to all
+work-items in all work-groups running on any device within a context.
+Work-items can read from or write to any element of a memory object.
+Reads and writes to global memory may be cached depending on the
+capabilities of the device.</p>
+</li>
+<li>
+<p><strong>Constant Memory</strong>: A region of global memory that remains constant
+during the execution of a kernel-instance.
+The host allocates and initializes memory objects placed into constant
+memory.</p>
+</li>
+<li>
+<p><strong>Local Memory</strong>: A memory region local to a work-group.
+This memory region can be used to allocate variables that are shared by
+all work-items in that work-group.</p>
+</li>
+<li>
+<p><strong>Private Memory</strong>: A region of memory private to a work-item.
+Variables defined in one work-items private memory are not visible to
+another work-item.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The <a href="#memory-regions-image">memory regions</a> and their relationship to the
+OpenCL Platform model are summarized below.
+Local and private memories are always associated with a particular device.
+The global and constant memories, however, are shared between all devices
+within a given context.
+An OpenCL device may include a cache to support efficient access to these
+shared memories</p>
+</div>
+<div class="paragraph">
+<p>To understand memory in OpenCL, it is important to appreciate the
+relationships between these named address spaces.
+The four named address spaces available to a device are disjoint meaning
+they do not overlap.
+This is a logical relationship, however, and an implementation may choose to
+let these disjoint named address spaces share physical memory.</p>
+</div>
+<div class="paragraph">
+<p>Programmers often need functions callable from kernels where the pointers
+manipulated by those functions can point to multiple named address spaces.
+This saves a programmer from the error-prone and wasteful practice of
+creating multiple copies of functions; one for each named address space.
+Therefore the global, local and private address spaces belong to a single
+<em>generic address space</em>.
+This is closely modeled after the concept of a generic address space used in
+the embedded C standard (ISO/IEC 9899:1999).
+Since they all belong to a single generic address space, the following
+properties are supported for pointers to named address spaces in device
+memory:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>A pointer to the generic address space can be cast to a pointer to a
+global, local or private address space</p>
+</li>
+<li>
+<p>A pointer to a global, local or private address space can be cast to a
+pointer to the generic address space.</p>
+</li>
+<li>
+<p>A pointer to a global, local or private address space can be implicitly
+converted to a pointer to the generic address space, but the converse is
+not allowed.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The constant address space is disjoint from the generic address space.</p>
+</div>
+<div class="paragraph">
+<p>The addresses of memory associated with memory objects in Global memory are
+not preserved between kernel instances, between a device and the host, and
+between devices.
+In this regard global memory acts as a global pool of memory objects rather
+than an address space.
+This restriction is relaxed when shared virtual memory (SVM) is used.</p>
+</div>
+<div class="paragraph">
+<p>SVM causes addresses to be meaningful between the host and all of the
+devices within a context hence supporting the use of pointer based data
+structures in OpenCL kernels.
+It logically extends a portion of the global memory into the host address
+space giving work-items access to the host address space.
+On platforms with hardware support for a shared address space between the
+host and one or more devices, SVM may also provide a more efficient way to
+share data between devices and the host.
+Details about SVM are presented in <a href="#shared-virtual-memory">Shared Virtual
+Memory</a>.</p>
+</div>
+<div id="memory-regions-image" class="imageblock" style="text-align: center">
+<div class="content">
+<img src="" alt="memory regions">
+</div>
+<div class="title">Figure 4. The named address spaces exposed in an OpenCL Platform. Global and Constant memories are shared between the one or more devices within a context, while local and private memories are associated with a single device. Each device may include an optional cache to support efficient access to their view of the global and constant address spaces.</div>
+</div>
+<div class="paragraph">
+<p>A programmer may use the features of the <a href="#memory-consistency-model">memory
+consistency model</a> to manage safe access to global memory from multiple
+work-items potentially running on one or more devices.
+In addition, when using shared virtual memory (SVM), the memory consistency
+model may also be used to ensure that host threads safely access memory
+locations in the shared memory region.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_memory_model_memory_objects">3.3.2. Memory Model: Memory Objects</h4>
+<div class="paragraph">
+<p>The contents of global memory are <em>memory objects</em>.
+A memory object is a handle to a reference counted region of global memory.
+Memory objects use the OpenCL type <em>cl_mem</em> and fall into three distinct
+classes.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Buffer</strong>: A memory object stored as a block of contiguous memory and
+used as a general purpose object to hold data used in an OpenCL program.
+The types of the values within a buffer may be any of the built in types
+(such as int, float), vector types, or user-defined structures.
+The buffer can be manipulated through pointers much as one would with
+any block of memory in C.</p>
+</li>
+<li>
+<p><strong>Image</strong>: An image memory object holds one, two or three dimensional
+images.
+The formats are based on the standard image formats used in graphics
+applications.
+An image is an opaque data structure managed by functions defined in the
+OpenCL API.
+To optimize the manipulation of images stored in the texture memories
+found in many GPUs, OpenCL kernels have traditionally been disallowed
+from both reading and writing a single image.
+In OpenCL 2.0, however, we have relaxed this restriction by providing
+synchronization and fence operations that let programmers properly
+synchronize their code to safely allow a kernel to read and write a
+single image.</p>
+</li>
+<li>
+<p><strong>Pipe</strong>: The <em>pipe</em> memory object conceptually is an ordered sequence of
+data items.
+A pipe has two endpoints: a write endpoint into which data items are
+inserted, and a read endpoint from which data items are removed.
+At any one time, only one kernel instance may write into a pipe, and
+only one kernel instance may read from a pipe.
+To support the producer consumer design pattern, one kernel instance
+connects to the write endpoint (the producer) while another kernel
+instance connects to the reading endpoint (the consumer).</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Memory objects are allocated by host APIs.
+The host program can provide the runtime with a pointer to a block of
+continuous memory to hold the memory object when the object is created
+(CL_MEM_USE_HOST_PTR).
+Alternatively, the physical memory can be managed by the OpenCL runtime and
+not be directly accessible to the host program.</p>
+</div>
+<div class="paragraph">
+<p>Allocation and access to memory objects within the different memory regions
+varies between the host and work-items running on a device.
+This is summarized in the <a href="#memory-regions-table">Memory Regions</a> table,
+which describes whether the kernel or the host can allocate from a memory
+region, the type of allocation (static at compile time vs.
+dynamic at runtime) and the type of access allowed (i.e. whether the kernel
+or the host can read and/or write to a memory region).</p>
+</div>
+<table id="memory-regions-table" class="tableblock frame-all grid-all" style="width: 80%;">
+<caption class="title">Table 1. Memory Regions</caption>
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 20%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"></th>
+<th class="tableblock halign-left valign-top">Global</th>
+<th class="tableblock halign-left valign-top">Constant</th>
+<th class="tableblock halign-left valign-top">Local</th>
+<th class="tableblock halign-left valign-top">Private</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock">Host</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">No Allocation</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access to buffers and images but not pipes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">No access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">No access</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock">Kernel</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation for program scope variables</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation.</p>
+<p class="tableblock">            Dynamic allocation for child kernel</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read-only access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access.</p>
+<p class="tableblock">        No access to child&#8217;s local memory.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
+</tr>
+</tbody>
+</table>
+<div class="sidebarblock">
+<div class="content">
+<div class="title">Caption</div>
+<div class="paragraph">
+<p>The <a href="#memory-regions-table">Memory Regions</a> table shows the different
+memory regions in OpenCL and how memory objects are allocated and accessed
+by the host and by an executing instance of a kernel.
+For the case of kernels, we distinguish between the behavior of local memory
+with respect to a kernel (self) and its child kernels.</p>
+</div>
+</div>
+</div>
+<div class="paragraph">
+<p>Once allocated, a memory object is made available to kernel-instances
+running on one or more devices.
+In addition to <a href="#shared-virtual-memory">Shared Virtual Memory</a>, there are
+three basic ways to manage the contents of buffers between the host and
+devices.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Read/Write/Fill commands</strong>: The data associated with a memory object is
+explicitly read and written between the host and global memory regions
+using commands enqueued to an OpenCL command queue.</p>
+</li>
+<li>
+<p><strong>Map/Unmap commands</strong>: Data from the memory object is mapped into a
+contiguous block of memory accessed through a host accessible pointer.
+The host program enqueues a <em>map</em> command on block of a memory object
+before it can be safely manipulated by the host program.
+When the host program is finished working with the block of memory, the
+host program enqueues an <em>unmap</em> command to allow a kernel-instance to
+safely read and/or write the buffer.</p>
+</li>
+<li>
+<p><strong>Copy commands:</strong> The data associated with a memory object is copied
+between two buffers, each of which may reside either on the host or on
+the device.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>In both cases, the commands to transfer data between devices and the host
+can be blocking or non-blocking operations.
+The OpenCL function call for a blocking memory transfer returns once the
+associated memory resources on the host can be safely reused.
+For a non-blocking memory transfer, the OpenCL function call returns as soon
+as the command is enqueued.</p>
+</div>
+<div class="paragraph">
+<p>Memory objects are bound to a context and hence can appear in multiple
+kernel-instances running on more than one physical device.
+The OpenCL platform must support a large range of hardware platforms
+including systems that do not support a single shared address space in
+hardware; hence the ways memory objects can be shared between
+kernel-instances is restricted.
+The basic principle is that multiple read operations on memory objects from
+multiple kernel-instances that overlap in time are allowed, but mixing
+overlapping reads and writes into the same memory objects from different
+kernel instances is only allowed when fine grained synchronization is used
+with <a href="#shared-virtual-memory">Shared Virtual Memory</a>.</p>
+</div>
+<div class="paragraph">
+<p>When global memory is manipulated by multiple kernel-instances running on
+multiple devices, the OpenCL runtime system must manage the association of
+memory objects with a given device.
+In most cases the OpenCL runtime will implicitly associate a memory object
+with a device.
+A kernel instance is naturally associated with the command queue to which
+the kernel was submitted.
+Since a command-queue can only access a single device, the queue uniquely
+defines which device is involved with any given kernel-instance; hence
+defining a clear association between memory objects, kernel-instances and
+devices.
+Programmers may anticipate these associations in their programs and
+explicitly manage association of memory objects with devices in order to
+improve performance.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="shared-virtual-memory">3.3.3. Memory Model: Shared Virtual Memory</h4>
+<div class="paragraph">
+<p>OpenCL extends the global memory region into the host memory region through
+a shared virtual memory (SVM) mechanism.
+There are three types of SVM in OpenCL</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Coarse-Grained buffer SVM</strong>: Sharing occurs at the granularity of
+regions of OpenCL buffer memory objects.
+Consistency is enforced at synchronization points and with map/unmap
+commands to drive updates between the host and the device.
+This form of SVM is similar to non-SVM use of memory; however, it lets
+kernel-instances share pointer-based data structures (such as
+linked-lists) with the host program.
+Program scope global variables are treated as per-device coarse-grained
+SVM for addressing and sharing purposes.</p>
+</li>
+<li>
+<p><strong>Fine-Grained buffer SVM</strong>: Sharing occurs at the granularity of
+individual loads/stores into bytes within OpenCL buffer memory objects.
+Loads and stores may be cached.
+This means consistency is guaranteed at synchronization points.
+If the optional OpenCL atomics are supported, they can be used to
+provide fine-grained control of memory consistency.</p>
+</li>
+<li>
+<p><strong>Fine-Grained system SVM</strong>: Sharing occurs at the granularity of
+individual loads/stores into bytes occurring anywhere within the host
+memory.
+Loads and stores may be cached so consistency is guaranteed at
+synchronization points.
+If the optional OpenCL atomics are supported, they can be used to
+provide fine-grained control of memory consistency.</p>
+</li>
+</ul>
+</div>
+<table id="svm-summary-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 2. A summary of shared virtual memory (SVM) options in OpenCL</caption>
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 20%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-center valign-top"></th>
+<th class="tableblock halign-center valign-top">Granularity of sharing</th>
+<th class="tableblock halign-center valign-top">Memory Allocation</th>
+<th class="tableblock halign-center valign-top">Mechanisms to enforce Consistency</th>
+<th class="tableblock halign-center valign-top">Explicit updates between host and device</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Non-SVM buffers</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">OpenCL Memory objects(buffer)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock"><strong>clCreateBuffer</strong></p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Host synchronization points on the same or between devices.</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">yes, through Map and Unmap commands.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Coarse-Grained buffer SVM</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">OpenCL Memory objects (buffer)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock"><strong>clSVMAlloc</strong></p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Host synchronization points between devices</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">yes, through Map and Unmap commands.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Fine Grained buffer SVM</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Bytes within OpenCL Memory objects (buffer)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock"><strong>clSVMAlloc</strong></p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Synchronization points plus atomics (if supported)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">No</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Fine-Grained system SVM</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Bytes within Host memory (system)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Host memory allocation mechanisms (e.g. malloc)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">Synchronization points plus atomics (if supported)</p></td>
+<td class="tableblock halign-center valign-top"><p class="tableblock">No</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Coarse-Grained buffer SVM is required in the core OpenCL specification.
+The two finer grained approaches are optional features in OpenCL.
+The various SVM mechanisms to access host memory from the work-items
+associated with a kernel instance are <a href="#svm-summary-table">summarized
+above</a>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="memory-consistency-model">3.3.4. Memory Model: Memory Consistency Model</h4>
+<div class="paragraph">
+<p>The OpenCL memory model tells programmers what they can expect from an
+OpenCL implementation; which memory operations are guaranteed to happen in
+which order and which memory values each read operation will return.
+The memory model tells compiler writers which restrictions they must follow
+when implementing compiler optimizations; which variables they can cache in
+registers and when they can move reads or writes around a barrier or atomic
+operation.
+The memory model also tells hardware designers about limitations on hardware
+optimizations; for example, when they must flush or invalidate hardware
+caches.</p>
+</div>
+<div class="paragraph">
+<p>The memory consistency model in OpenCL is based on the memory model from the
+ISO C11 programming language.
+To help make the presentation more precise and self-contained, we include
+modified paragraphs taken verbatim from the ISO C11 international standard.
+When a paragraph is taken or modified from the C11 standard, it is
+identified as such along with its original location in the <a href="#iso-c11">C11
+standard</a>.</p>
+</div>
+<div class="paragraph">
+<p>For programmers, the most intuitive model is the <em>sequential consistency</em>
+memory model.
+Sequential consistency interleaves the steps executed by each of the units
+of execution.
+Each access to a memory location sees the last assignment to that location
+in that interleaving.
+While sequential consistency is relatively straightforward for a programmer
+to reason about, implementing sequential consistency is expensive.
+Therefore, OpenCL implements a relaxed memory consistency model; i.e. it is
+possible to write programs where the loads from memory violate sequential
+consistency.
+Fortunately, if a program does not contain any races and if the program only
+uses atomic operations that utilize the sequentially consistent memory order
+(the default memory ordering for OpenCL), OpenCL programs appear to execute
+with sequential consistency.</p>
+</div>
+<div class="paragraph">
+<p>Programmers can to some degree control how the memory model is relaxed by
+choosing the memory order for synchronization operations.
+The precise semantics of synchronization and the memory orders are formally
+defined in <a href="#memory-ordering-rules">Memory Ordering Rules</a>.
+Here, we give a high level description of how these memory orders apply to
+atomic operations on atomic objects shared between units of execution.
+OpenCL memory_order choices are based on those from the ISO C11 standard
+memory model.
+They are specified in certain OpenCL functions through the following
+enumeration constants:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>memory_order_relaxed</strong>: implies no order constraints.
+This memory order can be used safely to increment counters that are
+concurrently incremented, but it doesnt guarantee anything about the
+ordering with respect to operations to other memory locations.
+It can also be used, for example, to do ticket allocation and by expert
+programmers implementing lock-free algorithms.</p>
+</li>
+<li>
+<p><strong>memory_order_acquire</strong>: A synchronization operation (fence or atomic)
+that has acquire semantics "acquires" side-effects from a release
+operation that synchronises with it: if an acquire synchronises with a
+release, the acquiring unit of execution will see all side-effects
+preceding that release (and possibly subsequent side-effects.) As part
+of carefully-designed protocols, programmers can use an "acquire" to
+safely observe the work of another unit of execution.</p>
+</li>
+<li>
+<p><strong>memory_order_release</strong>: A synchronization operation (fence or atomic
+operation) that has release semantics "releases" side effects to an
+acquire operation that synchronises with it.
+All side effects that precede the release are included in the release.
+As part of carefully-designed protocols, programmers can use a "release"
+to make changes made in one unit of execution visible to other units of
+execution.</p>
+</li>
+</ul>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+In general, no acquire must <em>always</em> synchronise with any particular
+release.
+However, synchronisation can be forced by certain executions.
+See <a href="#memory-ordering-fence">Memory Order Rules: Fence Operations</a> for
+detailed rules for when synchronisation must occur.
+</td>
+</tr>
+</table>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>memory_order_acq_rel</strong>: A synchronization operation with acquire-release
+semantics has the properties of both the acquire and release memory
+orders.
+It is typically used to order read-modify-write operations.</p>
+</li>
+<li>
+<p><strong>memory_order_seq_cst</strong>: The loads and stores of each unit of execution
+appear to execute in program (i.e., sequenced-before) order, and the
+loads and stores from different units of execution appear to be simply
+interleaved.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Regardless of which memory_order is specified, resolving constraints on
+memory operations across a heterogeneous platform adds considerable overhead
+to the execution of a program.
+An OpenCL platform may be able to optimize certain operations that depend on
+the features of the memory consistency model by restricting the scope of the
+memory operations.
+Distinct memory scopes are defined by the values of the memory_scope
+enumeration constant:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>memory_scope_work_item</strong>: memory-ordering constraints only apply within
+the work-item<sup>1</sup>.</p>
+<div class="openblock">
+<div class="content">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">1</dt>
+<dd>
+<p>This value for memory_scope can only be used with atomic_work_item_fence
+with flags set to CLK_IMAGE_MEM_FENCE.</p>
+</dd>
+</dl>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p><strong>memory_scope_sub_group</strong>:memory-ordering constraints only apply within
+the sub-group.</p>
+</li>
+<li>
+<p><strong>memory_scope_work_group</strong>: memory-ordering constraints only apply to
+work-items executing within a single work-group.</p>
+</li>
+<li>
+<p><strong>memory_scope_device:</strong> memory-ordering constraints only apply to
+work-items executing on a single device</p>
+</li>
+<li>
+<p><strong>memory_scope_all_svm_devices</strong>: memory-ordering constraints apply to
+work-items executing across multiple devices and (when using SVM) the
+host.
+A release performed with <strong>memory_scope_all_svm_devices</strong> to a buffer that
+does not have the CL_MEM_SVM_ATOMICS flag set will commit to at least
+<strong>memory_scope_device</strong> visibility, with full synchronization of the
+buffer at a queue synchronization point (e.g. an OpenCL event).</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>These memory scopes define a hierarchy of visibilities when analyzing the
+ordering constraints of memory operations.
+For example if a programmer knows that a sequence of memory operations will
+only be associated with a collection of work-items from a single work-group
+(and hence will run on a single device), the implementation is spared the
+overhead of managing the memory orders across other devices within the same
+context.
+This can substantially reduce overhead in a program.
+All memory scopes are valid when used on global memory or local memory.
+For local memory, all visibility is constrained to within a given work-group
+and scopes wider than <strong>memory_scope_work_group</strong> carry no additional meaning.</p>
+</div>
+<div class="paragraph">
+<p>In the following subsections (leading up to <a href="#opencl-framework">OpenCL
+Framework</a>), we will explain the synchronization constructs and detailed
+rules needed to use OpenCL&#8217;s relaxed memory models.
+It is important to appreciate, however, that many programs do not benefit
+from relaxed memory models.
+Even expert programmers have a difficult time using atomics and fences to
+write correct programs with relaxed memory models.
+A large number of OpenCL programs can be written using a simplified memory
+model.
+This is accomplished by following these guidelines.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Write programs that manage safe sharing of global memory objects through
+the synchronization points defined by the command queues.</p>
+</li>
+<li>
+<p>Restrict low level synchronization inside work-groups to the work-group
+functions such as barrier.</p>
+</li>
+<li>
+<p>If you want sequential consistency behavior with system allocations or
+fine-grain SVM buffers with atomics support, use only
+<strong>memory_order_seq_cst</strong> operations with the scope
+<strong>memory_scope_all_svm_devices</strong>.</p>
+</li>
+<li>
+<p>If you want sequential consistency behavior when not using system
+allocations or fine-grain SVM buffers with atomics support, use only
+<strong>memory_order_seq_cst</strong> operations with the scope <strong>memory_scope_device</strong>
+or <strong>memory_scope_all_svm_devices</strong>.</p>
+</li>
+<li>
+<p>Ensure your program has no races.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If these guidelines are followed in your OpenCL programs, you can skip the
+detailed rules behind the relaxed memory models and go directly to
+<a href="#opencl-framework">OpenCL Framework</a>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_memory_model_overview_of_atomic_and_fence_operations">3.3.5. Memory Model: Overview of atomic and fence operations</h4>
+<div class="paragraph">
+<p>The OpenCL 2.0 specification defines a number of <em>synchronization
+operations</em> that are used to define memory order constraints in a program.
+They play a special role in controlling how memory operations in one unit of
+execution (such as work-items or, when using SVM a host thread) are made
+visible to another.
+There are two types of synchronization operations in OpenCL; <em>atomic
+operations</em> and <em>fences</em>.</p>
+</div>
+<div class="paragraph">
+<p>Atomic operations are indivisible.
+They either occur completely or not at all.
+These operations are used to order memory operations between units of
+execution and hence they are parameterized with the memory_order and
+memory_scope parameters defined by the OpenCL memory consistency model.
+The atomic operations for OpenCL kernel languages are similar to the
+corresponding operations defined by the C11 standard.</p>
+</div>
+<div class="paragraph">
+<p>The OpenCL 2.0 atomic operations apply to variables of an atomic type (a
+subset of those in the C11 standard) including atomic versions of the int,
+uint, long, ulong, float, double, half, intptr_t, uintptr_t, size_t, and
+ptrdiff_t types.
+However, support for some of these atomic types depends on support for the
+corresponding regular types.</p>
+</div>
+<div class="paragraph">
+<p>An atomic operation on one or more memory locations is either an acquire
+operation, a release operation, or both an acquire and release operation.
+An atomic operation without an associated memory location is a fence and can
+be either an acquire fence, a release fence, or both an acquire and release
+fence.
+In addition, there are relaxed atomic operations, which do not have
+synchronization properties, and atomic read-modify-write operations, which
+have special characteristics.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 5, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>The orders <strong>memory_order_acquire</strong> (used for reads), <strong>memory_order_release</strong>
+(used for writes), and <strong>memory_order_acq_rel</strong> (used for read-modify-write
+operations) are used for simple communication between units of execution
+using shared variables.
+Informally, executing a <strong>memory_order_release</strong> on an atomic object A makes
+all previous side effects visible to any unit of execution that later
+executes a <strong>memory_order_acquire</strong> on A.
+The orders <strong>memory_order_acquire</strong>, <strong>memory_order_release</strong>, and
+<strong>memory_order_acq_rel</strong> do not provide sequential consistency for race-free
+programs because they will not ensure that atomic stores followed by atomic
+loads become visible to other threads in that order.</p>
+</div>
+<div id="atomic-fence-orders" class="paragraph">
+<p>The fence operation is atomic_work_item_fence, which includes a memory_order
+argument as well as the memory_scope and cl_mem_fence_flags arguments.
+Depending on the memory_order argument, this operation:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>has no effects, if <strong>memory_order_relaxed</strong>;</p>
+</li>
+<li>
+<p>is an acquire fence, if <strong>memory_order_acquire</strong>;</p>
+</li>
+<li>
+<p>is a release fence, if <strong>memory_order_release</strong>;</p>
+</li>
+<li>
+<p>is both an acquire fence and a release fence, if <strong>memory_order_acq_rel</strong>;</p>
+</li>
+<li>
+<p>is a sequentially-consistent fence with both acquire and release
+semantics, if <strong>memory_order_seq_cst</strong>.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If specified, the cl_mem_fence_flags argument must be CLK_IMAGE_MEM_FENCE,
+CLK_GLOBAL_MEM_FENCE, CLK_LOCAL_MEM_FENCE, or CLK_GLOBAL_MEM_FENCE |
+CLK_LOCAL_MEM_FENCE.</p>
+</div>
+<div class="paragraph">
+<p>The atomic_work_item_fence(CLK_IMAGE_MEM_FENCE) built-in function must be
+used to make sure that sampler-less writes are visible to later reads by the
+same work-item.
+Without use of the atomic_work_item_fence function, write-read coherence on
+image objects is not guaranteed: if a work-item reads from an image to which
+it has previously written without an intervening atomic_work_item_fence, it
+is not guaranteed that those previous writes are visible to the work-item.</p>
+</div>
+<div class="paragraph">
+<p>The synchronization operations in OpenCL can be parameterized by a
+memory_scope.
+Memory scopes control the extent that an atomic operation or fence is
+visible with respect to the memory model.
+These memory scopes may be used when performing atomic operations and fences
+on global memory and local memory.
+When used on global memory visibility is bounded by the capabilities of that
+memory.
+When used on a fine-grained non-atomic SVM buffer, a coarse-grained SVM
+buffer, or a non-SVM buffer, operations parameterized with
+<strong>memory_scope_all_svm_devices</strong> will behave as if they were parameterized
+with <strong>memory_scope_device</strong>.
+When used on local memory, visibility is bounded by the work-group and, as a
+result, memory_scope with wider visibility than <strong>memory_scope_work_group</strong>
+will be reduced to <strong>memory_scope_work_group</strong>.</p>
+</div>
+<div class="paragraph">
+<p>Two actions <strong>A</strong> and <strong>B</strong> are defined to have an inclusive scope if they have
+the same scope <strong>P</strong> such that:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>P</strong> is <strong>memory_scope_sub_group</strong> and <strong>A</strong> and <strong>B</strong> are executed by
+work-items within the same sub-group.</p>
+</li>
+<li>
+<p><strong>P</strong> is <strong>memory_scope_work_group</strong> and <strong>A</strong> and <strong>B</strong> are executed by
+work-items within the same work-group.</p>
+</li>
+<li>
+<p><strong>P</strong> is <strong>memory_scope_device</strong> and <strong>A</strong> and <strong>B</strong> are executed by work-items
+on the same device when <strong>A</strong> and <strong>B</strong> apply to an SVM allocation or <strong>A</strong>
+and <strong>B</strong> are executed by work-items in the same kernel or one of its
+children when <strong>A</strong> and <strong>B</strong> apply to a cl_mem buffer.</p>
+</li>
+<li>
+<p><strong>P</strong> is <strong>memory_scope_all_svm_devices</strong> if <strong>A</strong> and <strong>B</strong> are executed by
+host threads or by work-items on one or more devices that can share SVM
+memory with each other and the host process.</p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="memory-ordering-rules">3.3.6. Memory Model: Memory Ordering Rules</h4>
+<div class="paragraph">
+<p>Fundamentally, the issue in a memory model is to understand the orderings in
+time of modifications to objects in memory.
+Modifying an object or calling a function that modifies an object are side
+effects, i.e. changes in the state of the execution environment.
+Evaluation of an expression in general includes both value computations and
+initiation of side effects.
+Value computation for an lvalue expression includes determining the identity
+of the designated object.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.3, paragraph 2, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>We assume that the OpenCL kernel language and host programming languages
+have a sequenced-before relation between the evaluations executed by a
+single unit of execution.
+This sequenced-before relation is an asymmetric, transitive, pair-wise
+relation between those evaluations, which induces a partial order among
+them.
+Given any two evaluations <strong>A</strong> and <strong>B</strong>, if <strong>A</strong> is sequenced-before <strong>B</strong>, then
+the execution of <strong>A</strong> shall precede the execution of <strong>B</strong>.
+(Conversely, if <strong>A</strong> is sequenced-before <strong>B</strong>, then <strong>B</strong> is sequenced-after
+<strong>A</strong>.) If <strong>A</strong> is not sequenced-before or sequenced-after <strong>B</strong>, then <strong>A</strong> and
+<strong>B</strong> are unsequenced.
+Evaluations <strong>A</strong> and <strong>B</strong> are indeterminately sequenced when <strong>A</strong> is either
+sequenced-before or sequenced-after <strong>B</strong>, but it is unspecified which.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.3, paragraph 3, modified.]</a></p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Sequenced-before is a partial order of the operations executed by a
+single unit of execution (e.g. a host thread or work-item).
+It generally corresponds to the source program order of those operations,
+and is partial because of the undefined argument evaluation order of OpenCLs
+kernel C language.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>In an OpenCL kernel language, the value of an object visible to a work-item
+W at a particular point is the initial value of the object, a value stored
+in the object by W, or a value stored in the object by another work-item or
+host thread, according to the rules below.
+Depending on details of the host programming language, the value of an
+object visible to a host thread may also be the value stored in that object
+by another work-item or host thread.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 2, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>Two expression evaluations conflict if one of them modifies a memory
+location and the other one reads or modifies the same memory location.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 4.]</a></p>
+</div>
+<div class="paragraph">
+<p>All modifications to a particular atomic object <strong>M</strong> occur in some particular
+total order, called the modification order of <strong>M</strong>.
+If <strong>A</strong> and <strong>B</strong> are modifications of an atomic object <strong>M</strong>, and <strong>A</strong>
+happens-before <strong>B</strong>, then <strong>A</strong> shall precede <strong>B</strong> in the modification order of
+<strong>M</strong>, which is defined below.
+Note that the modification order of an atomic object <strong>M</strong> is independent of
+whether <strong>M</strong> is in local or global memory.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 7, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>A release sequence begins with a release operation <strong>A</strong> on an atomic object
+<strong>M</strong> and is the maximal contiguous sub-sequence of side effects in the
+modification order of <strong>M</strong>, where the first operation is <strong>A</strong> and every
+subsequent operation either is performed by the same work-item or host
+thread that performed the release or is an atomic read-modify-write
+operation.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 10, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>OpenCLs local and global memories are disjoint.
+Kernels may access both kinds of memory while host threads may only access
+global memory.
+Furthermore, the <em>flags</em> argument of OpenCLs work_group_barrier function
+specifies which memory operations the function will make visible: these
+memory operations can be, for example, just the ones to local memory, or the
+ones to global memory, or both.
+Since the visibility of memory operations can be specified for local memory
+separately from global memory, we define two related but independent
+relations, <em>global-synchronizes-with</em> and <em>local-synchronizes-with</em>.
+Certain operations on global memory may global-synchronize-with other
+operations performed by another work-item or host thread.
+An example is a release atomic operation in one work- item that
+global-synchronizes-with an acquire atomic operation in a second work-item.
+Similarly, certain atomic operations on local objects in kernels can
+local-synchronize- with other atomic operations on those local objects.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 11, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>We define two separate happens-before relations: global-happens-before and
+local-happens-before.</p>
+</div>
+<div class="paragraph">
+<p>A global memory action <strong>A</strong> global-happens-before a global memory action <strong>B</strong>
+if</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>A</strong> is sequenced before <strong>B</strong>, or</p>
+</li>
+<li>
+<p><strong>A</strong> global-synchronizes-with <strong>B</strong>, or</p>
+</li>
+<li>
+<p>For some global memory action <strong>C</strong>, <strong>A</strong> global-happens-before <strong>C</strong> and <strong>C</strong>
+global-happens-before <strong>B</strong>.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>A local memory action <strong>A</strong> local-happens-before a local memory action <strong>B</strong> if</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>A</strong> is sequenced before <strong>B</strong>, or</p>
+</li>
+<li>
+<p><strong>A</strong> local-synchronizes-with <strong>B</strong>, or</p>
+</li>
+<li>
+<p>For some local memory action <strong>C</strong>, <strong>A</strong> local-happens-before <strong>C</strong> and <strong>C</strong>
+local-happens-before <strong>B</strong>.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>An OpenCL implementation shall ensure that no program execution demonstrates
+a cycle in either the local-happens-before relation or the
+global-happens-before relation.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+The global- and local-happens-before relations are critical to
+defining what values are read and when data races occur.
+The global-happens-before relation, for example, defines what global memory
+operations definitely happen before what other global memory operations.
+If an operation <strong>A</strong> global-happens-before operation <strong>B</strong> then <strong>A</strong> must occur
+before <strong>B</strong>; in particular, any write done by <strong>A</strong> will be visible to <strong>B</strong>.
+The local-happens-before relation has similar properties for local memory.
+Programmers can use the local- and global-happens-before relations to reason
+about the order of program actions.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>A visible side effect <strong>A</strong> on a global object <strong>M</strong> with respect to a value
+computation <strong>B</strong> of <strong>M</strong> satisfies the conditions:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>A</strong> global-happens-before <strong>B</strong>, and</p>
+</li>
+<li>
+<p>there is no other side effect <strong>X</strong> to <strong>M</strong> such that <strong>A</strong>
+global-happens-before <strong>X</strong> and <strong>X</strong> global-happens-before <strong>B</strong>.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>We define visible side effects for local objects <strong>M</strong> similarly.
+The value of a non-atomic scalar object <strong>M</strong>, as determined by evaluation
+<strong>B</strong>, shall be the value stored by the visible side effect <strong>A</strong>.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 19, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>The execution of a program contains a data race if it contains two
+conflicting actions <strong>A</strong> and <strong>B</strong> in different units of execution, and</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>(1) at least one of <strong>A</strong> or <strong>B</strong> is not atomic, or <strong>A</strong> and <strong>B</strong> do not have
+inclusive memory scope, and</p>
+</li>
+<li>
+<p>(2) the actions are global actions unordered by the
+global-happens-before relation or are local actions unordered by the
+local-happens-before relation.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Any such data race results in undefined behavior.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 25, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>We also define the visible sequence of side effects on local and global
+atomic objects.
+The remaining paragraphs of this subsection define this sequence for a
+global atomic object <strong>M</strong>; the visible sequence of side effects for a local
+atomic object is defined similarly by using the local-happens-before
+relation.</p>
+</div>
+<div class="paragraph">
+<p>The visible sequence of side effects on a global atomic object <strong>M</strong>, with
+respect to a value computation <strong>B</strong> of <strong>M</strong>, is a maximal contiguous
+sub-sequence of side effects in the modification order of <strong>M</strong>, where the
+first side effect is visible with respect to <strong>B</strong>, and for every side effect,
+it is not the case that <strong>B</strong> global-happens-before it.
+The value of <strong>M</strong>, as determined by evaluation <strong>B</strong>, shall be the value stored
+by some operation in the visible sequence of <strong>M</strong> with respect to <strong>B</strong>.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 22, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>If an operation <strong>A</strong> that modifies an atomic object <strong>M</strong> global-happens before
+an operation <strong>B</strong> that modifies <strong>M</strong>, then <strong>A</strong> shall be earlier than <strong>B</strong> in
+the modification order of <strong>M</strong>.
+This requirement is known as write-write coherence.</p>
+</div>
+<div class="paragraph">
+<p>If a value computation <strong>A</strong> of an atomic object <strong>M</strong> global-happens-before a
+value computation <strong>B</strong> of <strong>M</strong>, and <strong>A</strong> takes its value from a side effect <strong>X</strong>
+on <strong>M</strong>, then the value computed by <strong>B</strong> shall either equal the value stored
+by <strong>X</strong>, or be the value stored by a side effect <strong>Y</strong> on <strong>M</strong>, where <strong>Y</strong>
+follows <strong>X</strong> in the modification order of <strong>M</strong>.
+This requirement is known as read-read coherence.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 22, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>If a value computation <strong>A</strong> of an atomic object <strong>M</strong> global-happens-before an
+operation <strong>B</strong> on <strong>M</strong>, then <strong>A</strong> shall take its value from a side effect <strong>X</strong>
+on <strong>M</strong>, where <strong>X</strong> precedes <strong>B</strong> in the modification order of <strong>M</strong>.
+This requirement is known as read-write coherence.</p>
+</div>
+<div class="paragraph">
+<p>If a side effect <strong>X</strong> on an atomic object <strong>M</strong> global-happens-before a value
+computation <strong>B</strong> of <strong>M</strong>, then the evaluation <strong>B</strong> shall take its value from
+<strong>X</strong> or from a side effect <strong>Y</strong> that follows <strong>X</strong> in the modification order of
+<strong>M</strong>.
+This requirement is known as write-read coherence.</p>
+</div>
+<div class="sect4">
+<h5 id="_memory_ordering_rules_atomic_operations">Memory Ordering Rules: Atomic Operations</h5>
+<div class="paragraph">
+<p>This and following sections describe how different program actions in kernel
+C code and the host program contribute to the local- and
+global-happens-before relations.
+This section discusses ordering rules for OpenCL 2.0 atomic operations.</p>
+</div>
+<div class="paragraph">
+<p><a href="#device-side-enqueue">Device-side enqueue</a> defines the enumerated type
+memory_order.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>For <strong>memory_order_relaxed</strong>, no operation orders memory.</p>
+</li>
+<li>
+<p>For <strong>memory_order_release</strong>, <strong>memory_order_acq_rel</strong>, and
+<strong>memory_order_seq_cst</strong>, a store operation performs a release operation
+on the affected memory location.</p>
+</li>
+<li>
+<p>For <strong>memory_order_acquire</strong>, <strong>memory_order_acq_rel</strong>, and
+<strong>memory_order_seq_cst</strong>, a load operation performs an acquire operation
+on the affected memory location.
+<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraphs 2-4, modified.]</a></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Certain built-in functions synchronize with other built-in functions
+performed by another unit of execution.
+This is true for pairs of release and acquire operations under specific
+circumstances.
+An atomic operation <strong>A</strong> that performs a release operation on a global object
+<strong>M</strong> global-synchronizes-with an atomic operation <strong>B</strong> that performs an
+acquire operation on <strong>M</strong> and reads a value written by any side effect in the
+release sequence headed by <strong>A</strong>.
+A similar rule holds for atomic operations on objects in local memory: an
+atomic operation <strong>A</strong> that performs a release operation on a local object <strong>M</strong>
+local-synchronizes-with an atomic operation <strong>B</strong> that performs an acquire
+operation on <strong>M</strong> and reads a value written by any side effect in the release
+sequence headed by <strong>A</strong>.
+<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 11, modified.]</a></p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+Atomic operations specifying <strong>memory_order_relaxed</strong> are relaxed only
+with respect to memory ordering.
+Implementations must still guarantee that any given atomic access to a
+particular atomic object be indivisible with respect to all other atomic
+accesses to that object.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>There shall exist a single total order <strong>S</strong> for all <strong>memory_order_seq_cst</strong>
+operations that is consistent with the modification orders for all affected
+locations, as well as the appropriate global-happens-before and
+local-happens-before orders for those locations, such that each
+<strong>memory_order_seq</strong> operation <strong>B</strong> that loads a value from an atomic object
+<strong>M</strong> in global or local memory observes one of the following values:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>the result of the last modification <strong>A</strong> of <strong>M</strong> that precedes <strong>B</strong> in <strong>S</strong>,
+if it exists, or</p>
+</li>
+<li>
+<p>if <strong>A</strong> exists, the result of some modification of <strong>M</strong> in the visible
+sequence of side effects with respect to <strong>B</strong> that is not
+<strong>memory_order_seq_cst</strong> and that does not happen before <strong>A</strong>, or</p>
+</li>
+<li>
+<p>if <strong>A</strong> does not exist, the result of some modification of <strong>M</strong> in the
+visible sequence of side effects with respect to <strong>B</strong> that is not
+<strong>memory_order_seq_cst</strong>.
+<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 6, modified.]</a></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Let X and Y be two <strong>memory_order_seq_cst</strong> operations.
+If X local-synchronizes-with or global-synchronizes-with Y then X both
+local-synchronizes-with Y and global-synchronizes-with Y.</p>
+</div>
+<div class="paragraph">
+<p>If the total order <strong>S</strong> exists, the following rules hold:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>For an atomic operation <strong>B</strong> that reads the value of an atomic object
+<strong>M</strong>, if there is a <strong>memory_order_seq_cst</strong> fence <strong>X</strong> sequenced-before
+<strong>B</strong>, then <strong>B</strong> observes either the last <strong>memory_order_seq_cst</strong>
+modification of <strong>M</strong> preceding <strong>X</strong> in the total order <strong>S</strong> or a later
+modification of <strong>M</strong> in its modification order.
+<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 9.]</a></p>
+</li>
+<li>
+<p>For atomic operations <strong>A</strong> and <strong>B</strong> on an atomic object <strong>M</strong>, where <strong>A</strong>
+modifies <strong>M</strong> and <strong>B</strong> takes its value, if there is a
+<strong>memory_order_seq_cst</strong> fence <strong>X</strong> such that <strong>A</strong> is sequenced-before <strong>X</strong>
+and <strong>B</strong> follows <strong>X</strong> in <strong>S</strong>, then <strong>B</strong> observes either the effects of <strong>A</strong>
+or a later modification of <strong>M</strong> in its modification order.
+<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 10.]</a></p>
+</li>
+<li>
+<p>For atomic operations <strong>A</strong> and <strong>B</strong> on an atomic object <strong>M</strong>, where <strong>A</strong>
+modifies <strong>M</strong> and <strong>B</strong> takes its value, if there are
+<strong>memory_order_seq_cst</strong> fences <strong>X</strong> and <strong>Y</strong> such that <strong>A</strong> is
+sequenced-before <strong>X</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>X</strong> precedes <strong>Y</strong>
+in <strong>S</strong>, then <strong>B</strong> observes either the effects of <strong>A</strong> or a later
+modification of <strong>M</strong> in its modification order.
+<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 11.]</a></p>
+</li>
+<li>
+<p>For atomic operations <strong>A</strong> and <strong>B</strong> on an atomic object <strong>M</strong>, if there are
+<strong>memory_order_seq_cst</strong> fences <strong>X</strong> and <strong>Y</strong> such that <strong>A</strong> is
+sequenced-before <strong>X</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>X</strong> precedes <strong>Y</strong>
+in <strong>S</strong>, then <strong>B</strong> occurs later than <strong>A</strong> in the modification order of <strong>M</strong>.</p>
+</li>
+</ul>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<strong>memory_order_seq_cst</strong> ensures sequential consistency only for a
+program that is (1) free of data races, and (2) exclusively uses
+<strong>memory_order_seq_cst</strong> synchronization operations.
+Any use of weaker ordering will invalidate this guarantee unless extreme
+care is used.
+In particular, <strong>memory_order_seq_cst</strong> fences ensure a total order only for
+the fences themselves.
+Fences cannot, in general, be used to restore sequential consistency for
+atomic operations with weaker ordering specifications.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>Atomic read-modify-write operations should always read the last value (in
+the modification order) stored before the write associated with the
+read-modify-write operation.
+<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 12.]</a></p>
+</div>
+<div class="paragraph">
+<p><span class="underline">Implementations should ensure that no "out-of-thin-air" values
+are computed that circularly depend on their own computation.</span></p>
+</div>
+<div class="paragraph">
+<p>Note: Under the rules described above, and independent to the previously
+footnoted C++ issue, it is known that <em>x == y == 42</em> is a valid final state
+in the following problematic example:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">global atomic_int x = ATOMIC_VAR_INIT(<span class="integer">0</span>);
+local atomic_int y = ATOMIC_VAR_INIT(<span class="integer">0</span>);
+
+<span class="label">unit_of_execution_1:</span>
+... [execution not reading or writing x or y, leading up to:]
+<span class="predefined-type">int</span> t = atomic_load_explicit(&amp;y, memory_order_acquire);
+atomic_store_explicit(&amp;x, t, memory_order_release);
+
+<span class="label">unit_of_execution_2:</span>
+... [execution not reading or writing x or y, leading up to:]
+<span class="predefined-type">int</span> t = atomic_load_explicit(&amp;x, memory_order_acquire);
+atomic_store_explicit(&amp;y, t, memory_order_release);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This is not useful behavior and implementations should not exploit this
+phenomenon.
+It should be expected that in the future this may be disallowed by
+appropriate updates to the memory model description by the OpenCL committee.</p>
+</div>
+<div class="paragraph">
+<p>Implementations should make atomic stores visible to atomic loads within a
+reasonable amount of time.
+<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 16.]</a></p>
+</div>
+<div class="paragraph">
+<p>As long as the following conditions are met, a host program sharing SVM
+memory with a kernel executing on one or more OpenCL devices may use atomic
+and synchronization operations to ensure that its assignments, and those of
+the kernel, are visible to each other:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Either fine-grained buffer or fine-grained system SVM must be used to
+share memory.
+While coarse-grained buffer SVM allocations may support atomic
+operations, visibility on these allocations is not guaranteed except at
+map and unmap operations.</p>
+</li>
+<li>
+<p>The optional OpenCL 2.0 SVM atomic-controlled visibility specified by
+provision of the CL_MEM_SVM_ATOMICS flag must be supported by the device
+and the flag provided to the SVM buffer on allocation.</p>
+</li>
+<li>
+<p>The host atomic and synchronization operations must be compatible with
+those of an OpenCL kernel language.
+This requires that the size and representation of the data types that
+the host atomic operations act on be consistent with the OpenCL kernel
+language atomic types.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>If these conditions are met, the host operations will apply at
+all_svm_devices scope.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="memory-ordering-fence">Memory Ordering Rules: Fence Operations</h5>
+<div class="paragraph">
+<p>This section describes how the OpenCL 2.0 fence operations contribute to the
+local- and global-happens-before relations.</p>
+</div>
+<div class="paragraph">
+<p>Earlier, we introduced synchronization primitives called fences.
+Fences can utilize the acquire memory_order, release memory_order, or both.
+A fence with acquire semantics is called an acquire fence; a fence with
+release semantics is called a release fence.  The <a href="#atomic-fence-orders">overview of atomic and fence operations</a> section decribes the memory orders
+that result in acquire and release fences.</p>
+</div>
+<div class="paragraph">
+<p>A global release fence <strong>A</strong> global-synchronizes-with a global acquire fence
+<strong>B</strong> if there exist atomic operations <strong>X</strong> and <strong>Y</strong>, both operating on some
+global atomic object <strong>M</strong>, such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong>
+modifies <strong>M</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, <strong>Y</strong> reads the value written by
+<strong>X</strong> or a value written by any side effect in the hypothetical release
+sequence <strong>X</strong> would head if it were a release operation, and that the scopes
+of <strong>A</strong>, <strong>B</strong> are inclusive.
+<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 2, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>A global release fence <strong>A</strong> global-synchronizes-with an atomic operation <strong>B</strong>
+that performs an acquire operation on a global atomic object <strong>M</strong> if there
+exists an atomic operation <strong>X</strong> such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong>
+modifies <strong>M</strong>, <strong>B</strong> reads the value written by <strong>X</strong> or a value written by any
+side effect in the hypothetical release sequence <strong>X</strong> would head if it were a
+release operation, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
+<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 3, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>An atomic operation <strong>A</strong> that is a release operation on a global atomic
+object <strong>M</strong> global-synchronizes-with a global acquire fence <strong>B</strong> if there
+exists some atomic operation <strong>X</strong> on <strong>M</strong> such that <strong>X</strong> is sequenced-before
+<strong>B</strong> and reads the value written by <strong>A</strong> or a value written by any side effect
+in the release sequence headed by <strong>A</strong>, and the scopes of <strong>A</strong> and <strong>B</strong> are
+inclusive.
+<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 4, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>A local release fence <strong>A</strong> local-synchronizes-with a local acquire fence <strong>B</strong>
+if there exist atomic operations <strong>X</strong> and <strong>Y</strong>, both operating on some local
+atomic object <strong>M</strong>, such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong> modifies <strong>M</strong>,
+<strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>Y</strong> reads the value written by <strong>X</strong> or a
+value written by any side effect in the hypothetical release sequence <strong>X</strong>
+would head if it were a</p>
+</div>
+<div class="paragraph">
+<p>release operation, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
+<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 2, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>A local release fence <strong>A</strong> local-synchronizes-with an atomic operation <strong>B</strong>
+that performs an acquire operation on a local atomic object <strong>M</strong> if there
+exists an atomic operation <strong>X</strong> such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong>
+modifies <strong>M</strong>, and <strong>B</strong> reads the value written by <strong>X</strong> or a value written by
+any side effect in the hypothetical release sequence <strong>X</strong> would head if it
+were a release operation, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
+<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 3, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>An atomic operation <strong>A</strong> that is a release operation on a local atomic object
+<strong>M</strong> local-synchronizes-with a local acquire fence <strong>B</strong> if there exists some
+atomic operation <strong>X</strong> on <strong>M</strong> such that <strong>X</strong> is sequenced-before <strong>B</strong> and reads
+the value written by <strong>A</strong> or a value written by any side effect in the
+release sequence headed by <strong>A</strong>, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
+<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 4, modified.]</a></p>
+</div>
+<div class="paragraph">
+<p>Let <strong>X</strong> and <strong>Y</strong> be two work item fences that each have both the
+CLK_GLOBAL_MEM_FENCE and CLK_LOCAL_MEM_FENCE flags set.
+<strong>X</strong> global-synchronizes-with <strong>Y</strong> and <strong>X</strong> local synchronizes with <strong>Y</strong> if the
+conditions required for <strong>X</strong> to global-synchronize with <strong>Y</strong> are met, the
+conditions required for <strong>X</strong> to local-synchronize-with <strong>Y</strong> are met, or both
+sets of conditions are met.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_memory_ordering_rules_work_group_functions">Memory Ordering Rules: Work-group Functions</h5>
+<div class="paragraph">
+<p>The OpenCL kernel execution model includes collective operations across the
+work-items within a single work-group.
+These are called work-group functions.
+Besides the work-group barrier function, they include the scan, reduction
+and pipe work-group functions described in the SPIR-V IL specifications.
+We will first discuss the work-group barrier.
+The other work-group functions are discussed afterwards.</p>
+</div>
+<div class="paragraph">
+<p>The barrier function provides a mechanism for a kernel to synchronize the
+work-items within a single work-group: informally, each work-item of the
+work-group must execute the barrier before any are allowed to proceed.
+It also orders memory operations to a specified combination of one or more
+address spaces such as local memory or global memory, in a similar manner to
+a fence.</p>
+</div>
+<div class="paragraph">
+<p>To precisely specify the memory ordering semantics for barrier, we need to
+distinguish between a dynamic and a static instance of the call to a
+barrier.
+A call to a barrier can appear in a loop, for example, and each execution of
+the same static barrier call results in a new dynamic instance of the
+barrier that will independently synchronize a work-groups work-items.</p>
+</div>
+<div class="paragraph">
+<p>A work-item executing a dynamic instance of a barrier results in two
+operations, both fences, that are called the entry and exit fences.
+These fences obey all the rules for fences specified elsewhere in this
+chapter as well as the following:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The entry fence is a release fence with the same flags and scope as
+requested for the barrier.</p>
+</li>
+<li>
+<p>The exit fence is an acquire fence with the same flags and scope as
+requested for the barrier.</p>
+</li>
+<li>
+<p>For each work-item the entry fence is sequenced before the exit fence.</p>
+</li>
+<li>
+<p>If the flags have CLK_GLOBAL_MEM_FENCE set then for each work-item the
+entry fence global-synchronizes-with the exit fence of all other
+work-items in the same work-group.</p>
+</li>
+<li>
+<p>If the flags have CLK_LOCAL_MEM_FENCE set then for each work-item the
+entry fence local-synchronizes-with the exit fence of all other
+work-items in the same work-group.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The other work-group functions include such functions as work_group_all()
+and work_group_broadcast() and are described in the kernel language and IL
+specifications.
+The use of these work-group functions implies sequenced-before relationships
+between statements within the execution of a single work-item in order to
+satisfy data dependencies.
+For example, a work item that provides a value to a work-group function must
+behave as if it generates that value before beginning execution of that
+work-group function.
+Furthermore, the programmer must ensure that all work items in a work group
+must execute the same work-group function call site, or dynamic work-group
+function instance.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_memory_ordering_rules_sub_group_functions">Memory Ordering Rules: Sub-group Functions</h5>
+<div class="paragraph">
+<p>The OpenCL kernel execution model includes collective operations across the
+work-items within a single sub-group.
+These are called sub-group functions.
+Besides the sub-group-barrier function, they include the scan, reduction and
+pipe sub-group functions described in the SPIR-V IL specification.
+We will first discuss the sub-group barrier.
+The other sub-group functions are discussed afterwards.</p>
+</div>
+<div class="paragraph">
+<p>The barrier function provides a mechanism for a kernel to synchronize the
+work-items within a single sub-group: informally, each work-item of the
+sub-group must execute the barrier before any are allowed to proceed.
+It also orders memory operations to a specified combination of one or more
+address spaces such as local memory or global memory, in a similar manner to
+a fence.</p>
+</div>
+<div class="paragraph">
+<p>To precisely specify the memory ordering semantics for barrier, we need to
+distinguish between a dynamic and a static instance of the call to a
+barrier.
+A call to a barrier can appear in a loop, for example, and each execution of
+the same static barrier call results in a new dynamic instance of the
+barrier that will independently synchronize a sub-groups work-items.</p>
+</div>
+<div class="paragraph">
+<p>A work-item executing a dynamic instance of a barrier results in two
+operations, both fences, that are called the entry and exit fences.
+These fences obey all the rules for fences specified elsewhere in this
+chapter as well as the following:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The entry fence is a release fence with the same flags and scope as
+requested for the barrier.</p>
+</li>
+<li>
+<p>The exit fence is an acquire fence with the same flags and scope as
+requested for the barrier.</p>
+</li>
+<li>
+<p>For each work-item the entry fence is sequenced before the exit fence.</p>
+</li>
+<li>
+<p>If the flags have CLK_GLOBAL_MEM_FENCE set then for each work-item the
+entry fence global-synchronizes-with the exit fence of all other
+work-items in the same sub-group.</p>
+</li>
+<li>
+<p>If the flags have CLK_LOCAL_MEM_FENCE set then for each work-item the
+entry fence local-synchronizes-with the exit fence of all other
+work-items in the same sub-group.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The other sub-group functions include such functions as sub_group_all() and
+sub_group_broadcast() and are described in OpenCL kernel languages
+specifications.
+The use of these sub-group functions implies sequenced-before relationships
+between statements within the execution of a single work-item in order to
+satisfy data dependencies.
+For example, a work item that provides a value to a sub-group function must
+behave as if it generates that value before beginning execution of that
+sub-group function.
+Furthermore, the programmer must ensure that all work items in a sub-group
+must execute the same sub-group function call site, or dynamic sub-group
+function instance.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_memory_ordering_rules_host_side_and_device_side_commands">Memory Ordering Rules: Host-side and Device-side Commands</h5>
+<div class="paragraph">
+<p>This section describes how the OpenCL API functions associated with
+command-queues contribute to happens-before relations.
+There are two types of command queues and associated API functions in OpenCL
+2.0; <em>host command-queues</em> and <em>device command-queues</em>.
+The interaction of these command queues with the memory model are for the
+most part equivalent.
+In a few cases, the rules only applies to the host command-queue.
+We will indicate these special cases by specifically denoting the host
+command-queue in the memory ordering rule.
+SVM memory consistency in such instances is implied only with respect to
+synchronizing host commands.</p>
+</div>
+<div class="paragraph">
+<p>Memory ordering rules in this section apply to all memory objects (buffers,
+images and pipes) as well as to SVM allocations where no earlier, and more
+fine-grained, rules apply.</p>
+</div>
+<div class="paragraph">
+<p>In the remainder of this section, we assume that each command <strong>C</strong> enqueued
+onto a command-queue has an associated event object <strong>E</strong> that signals its
+execution status, regardless of whether <strong>E</strong> was returned to the unit of
+execution that enqueued <strong>C</strong>.
+We also distinguish between the API function call that enqueues a command
+<strong>C</strong> and creates an event <strong>E</strong>, the execution of <strong>C</strong>, and the completion of
+<strong>C</strong>(which marks the event <strong>E</strong> as complete).</p>
+</div>
+<div class="paragraph">
+<p>The ordering and synchronization rules for API commands are defined as
+following:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>If an API function call <strong>X</strong> enqueues a command <strong>C</strong>, then <strong>X</strong>
+global-synchronizes-with <strong>C</strong>.
+For example, a host API function to enqueue a kernel
+global-synchronizes-with the start of that kernel-instances execution,
+so that memory updates sequenced-before the enqueue kernel function call
+will global-happen-before any kernel reads or writes to those same
+memory locations.
+For a device-side enqueue, global memory updates sequenced before <strong>X</strong>
+happens-before <strong>C</strong> reads or writes to those memory locations only in the
+case of fine-grained SVM.</p>
+</li>
+<li>
+<p>If <strong>E</strong> is an event upon which a command <strong>C</strong> waits, then <strong>E</strong>
+global-synchronizes-with <strong>C</strong>.
+In particular, if <strong>C</strong> waits on an event <strong>E</strong> that is tracking the
+execution status of the command <strong>C1</strong>, then memory operations done by
+<strong>C1</strong> will global-happen-before memory operations done by <strong>C</strong>.
+As an example, assume we have an OpenCL program using coarse-grain SVM
+sharing that enqueues a kernel to a host command-queue to manipulate the
+contents of a region of a buffer that the host thread then accesses
+after the kernel completes.
+To do this, the host thread can call <strong>clEnqueueMapBuffer</strong> to enqueue a
+blocking-mode map command to map that buffer region, specifying that the
+map command must wait on an event signaling the kernels completion.
+When <strong>clEnqueueMapBuffer</strong> returns, any memory operations performed by
+the kernel to that buffer region will global- happen-before subsequent
+memory operations made by the host thread.</p>
+</li>
+<li>
+<p>If a command <strong>C</strong> has an event <strong>E</strong> that signals its completion, then <strong>C</strong>
+global- synchronizes-with <strong>E</strong>.</p>
+</li>
+<li>
+<p>For a command <strong>C</strong> enqueued to a host-side command queue, if <strong>C</strong> has an
+event <strong>E</strong> that signals its completion, then <strong>E</strong> global-synchronizes-with
+an API call <strong>X</strong> that waits on <strong>E</strong>.
+For example, if a host thread or kernel-instance calls the
+wait-for-events function on <strong>E</strong> (e.g. the <strong>clWaitForEvents</strong> function
+called from a host thread), then <strong>E</strong> global-synchronizes-with that
+wait-for-events function call.</p>
+</li>
+<li>
+<p>If commands <strong>C</strong> and <strong>C1</strong> are enqueued in that sequence onto an in-order
+command-queue, then the event (including the event implied between <strong>C</strong>
+and <strong>C1</strong> due to the in-order queue) signaling <strong>C</strong>'s completion
+global-synchronizes-with <strong>C1</strong>.
+Note that in OpenCL 2.0, only a host command-queue can be configured as
+an in-order queue.</p>
+</li>
+<li>
+<p>If an API call enqueues a marker command <strong>C</strong> with an empty list of
+events upon which <strong>C</strong> should wait, then the events of all commands
+enqueued prior to <strong>C</strong> in the command-queue global-synchronize-with <strong>C</strong>.</p>
+</li>
+<li>
+<p>If a host API call enqueues a command-queue barrier command <strong>C</strong> with an
+empty list of events on which <strong>C</strong> should wait, then the events of all
+commands enqueued prior to <strong>C</strong> in the command-queue
+global-synchronize-with <strong>C</strong>.
+In addition, the event signaling the completion of <strong>C</strong>
+global-synchronizes-with all commands enqueued after <strong>C</strong> in the
+command-queue.</p>
+</li>
+<li>
+<p>If a host thread executes a <strong>clFinish</strong> call <strong>X</strong>, then the events of all
+commands enqueued prior to <strong>X</strong> in the command-queue
+global-synchronizes-with <strong>X</strong>.</p>
+</li>
+<li>
+<p>The start of a kernel-instance <strong>K</strong> global-synchronizes-with all
+operations in the work items of <strong>K</strong>.
+Note that this includes the execution of any atomic operations by the
+work items in a program using fine-grain SVM.</p>
+</li>
+<li>
+<p>All operations of all work items of a kernel-instance <strong>K</strong>
+global-synchronizes-with the event signaling the completion of <strong>K</strong>.
+Note that this also includes the execution of any atomic operations by
+the work items in a program using fine-grain SVM.</p>
+</li>
+<li>
+<p>If a callback procedure <strong>P</strong> is registered on an event <strong>E</strong>, then <strong>E</strong>
+global-synchronizes-with all operations of <strong>P</strong>.
+Note that callback procedures are only defined for commands within host
+command-queues.</p>
+</li>
+<li>
+<p>If <strong>C</strong> is a command that waits for an event <strong>E</strong>'s completion, and API
+function call <strong>X</strong> sets the status of a user event <strong>E</strong>'s status to
+CL_COMPLETE (for example, from a host thread using a
+<strong>clSetUserEventStatus</strong> function), then <strong>X</strong> global-synchronizes-with <strong>C</strong>.</p>
+</li>
+<li>
+<p>If a device enqueues a command <strong>C</strong> with the
+CLK_ENQUEUE_FLAGS_WAIT_KERNEL flag, then the end state of the parent
+kernel instance global-synchronizes with <strong>C</strong>.</p>
+</li>
+<li>
+<p>If a work-group enqueues a command <strong>C</strong> with the
+CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP flag, then the end state of the
+work-group global-synchronizes with <strong>C</strong>.</p>
+</li>
+</ol>
+</div>
+<div class="paragraph">
+<p>When using an out-of-order command queue, a wait on an event or a marker or
+command-queue barrier command can be used to ensure the correct ordering of
+dependent commands.
+In those cases, the wait for the event or the marker or barrier command will
+provide the necessary global-synchronizes-with relation.</p>
+</div>
+<div class="paragraph">
+<p>In this situation:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>access to shared locations or disjoint locations in a single cl_mem
+object when using atomic operations from different kernel instances
+enqueued from the host such that one or more of the atomic operations is
+a write is implementation-defined and correct behavior is not guaranteed
+except at synchronization points.</p>
+</li>
+<li>
+<p>access to shared locations or disjoint locations in a single cl_mem
+object when using atomic operations from different kernel instances
+consisting of a parent kernel and any number of child kernels enqueued
+by that kernel is guaranteed under the memory ordering rules described
+earlier in this section.</p>
+</li>
+<li>
+<p>access to shared locations or disjoint locations in a single program
+scope global variable, coarse-grained SVM allocation or fine-grained SVM
+allocation when using atomic operations from different kernel instances
+enqueued from the host to a single device is guaranteed under the memory
+ordering rules described earlier in this section.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If fine-grain SVM is used but without support for the OpenCL 2.0 atomic
+operations, then the host and devices can concurrently read the same memory
+locations and can concurrently update non-overlapping memory regions, but
+attempts to update the same memory locations are undefined.
+Memory consistency is guaranteed at the OpenCL synchronization points
+without the need for calls to <strong>clEnqueueMapBuffer</strong> and
+<strong>clEnqueueUnmapMemObject</strong>.
+For fine-grained SVM buffers it is guaranteed that at synchronization points
+only values written by the kernel will be updated.
+No writes to fine-grained SVM buffers can be introduced that were not in the
+original program.</p>
+</div>
+<div class="paragraph">
+<p>In the remainder of this section, we discuss a few points regarding the
+ordering rules for commands with a host command queue.</p>
+</div>
+<div class="paragraph">
+<p>The OpenCL 1.2 standard describes a synchronization point as a
+kernel-instance or host program location where the contents of memory
+visible to different work-items or command-queue commands are the same.
+It also says that waiting on an event and a command-queue barrier are
+synchronization points between commands in command- queues.
+Four of the rules listed above (2, 4, 7, and 8) cover these OpenCL
+synchronization points.</p>
+</div>
+<div class="paragraph">
+<p>A map operation (<strong>clEnqueueMapBuffer</strong> or <strong>clEnqueueMapImage</strong>) performed on a
+non-SVM buffer or a coarse-grained SVM buffer is allowed to overwrite the
+entire target region with the latest runtime view of the data as seen by the
+command with which the map operation synchronizes, whether the values were
+written by the executing kernels or not.
+Any values that were changed within this region by another kernel or host
+thread while the kernel synchronizing with the map operation was executing
+may be overwritten by the map operation.</p>
+</div>
+<div class="paragraph">
+<p>Access to non-SVM cl_mem buffers and coarse-grained SVM allocations is
+ordered at synchronization points between host commands.
+In the presence of an out-of-order command queue or a set of command queues
+mapped to the same device, multiple kernel instances may execute
+concurrently on the same device.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="opencl-framework">3.4. The OpenCL Framework</h3>
+<div class="paragraph">
+<p>The OpenCL framework allows applications to use a host and one or more
+OpenCL devices as a single heterogeneous parallel computer system.
+The framework contains the following components:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpenCL Platform layer</strong>: The platform layer allows the host program to
+discover OpenCL devices and their capabilities and to create contexts.</p>
+</li>
+<li>
+<p><strong>OpenCL Runtime</strong>: The runtime allows the host program to manipulate
+contexts once they have been created.</p>
+</li>
+<li>
+<p><strong>OpenCL Compiler</strong>: The OpenCL compiler creates program executables that
+contain OpenCL kernels.
+SPIR-V intermediate language, OpenCL C, OpenCL C++, and OpenCL C
+language versions from earlier OpenCL specifications are supported by
+the compiler.
+Other input languages may be supported by some implementations.</p>
+</li>
+</ul>
+</div>
+<div class="sect3">
+<h4 id="_opencl_framework_mixed_version_support">3.4.1. OpenCL Framework: Mixed Version Support</h4>
+<div class="paragraph">
+<p>OpenCL supports devices with different capabilities under a single platform.
+This includes devices which conform to different versions of the OpenCL
+specification.
+There are three version identifiers to consider for an OpenCL system: the
+platform version, the version of a device, and the version(s) of the kernel
+language or IL supported on a device.</p>
+</div>
+<div class="paragraph">
+<p>The platform version indicates the version of the OpenCL runtime that is
+supported.
+This includes all of the APIs that the host can use to interact with
+resources exposed by the OpenCL runtime; including contexts, memory objects,
+devices, and command queues.</p>
+</div>
+<div class="paragraph">
+<p>The device version is an indication of the device&#8217;s capabilities separate
+from the runtime and compiler as represented by the device info returned by
+<strong>clGetDeviceInfo</strong>.
+Examples of attributes associated with the device version are resource
+limits (e.g., minimum size of local memory per compute unit) and extended
+functionality (e.g., list of supported KHR extensions).
+The version returned corresponds to the highest version of the OpenCL
+specification for which the device is conformant, but is not higher than the
+platform version.</p>
+</div>
+<div class="paragraph">
+<p>The language version for a device represents the OpenCL programming language
+features a developer can assume are supported on a given device.
+The version reported is the highest version of the language supported.</p>
+</div>
+<div class="paragraph">
+<p>Backwards compatibility is an important goal for the OpenCL standard.
+Backwards compatibility is expected such that a device will consume earlier
+versions of the SPIR-V and OpenCL C programming languages with the following
+minimum requirements:</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>An OpenCL 1.x device must support at least one 1.x version of the OpenCL
+C programming language.</p>
+</li>
+<li>
+<p>An OpenCL 2.0 device must support all the requirements of an OpenCL 1.x
+device in addition to the OpenCL C 2.0 programming language.
+If multiple language versions are supported, the compiler defaults to
+using the highest OpenCL 1.x language version supported for the device
+(typically OpenCL 1.2).
+To utilize the OpenCL 2.0 Kernel programming language, a programmer must
+specifically set the appropriate compiler flag (-cl-std=CL2.0).
+The language version must not be higher than the platform version, but
+may exceed the <a href="#opencl-c-version">device version</a>.</p>
+</li>
+<li>
+<p>An OpenCL 2.1 device must support all the requirements of an OpenCL 2.0
+device in addition to the SPIR-V intermediate language at version 1.0 or
+above.
+Intermediate language versioning is encoded as part of the binary object
+and no flags are required to be passed to the compiler.</p>
+</li>
+<li>
+<p>An OpenCL 2.2 device must support all the requirements of an OpenCL 2.0
+device in addition to the SPIR-V intermediate language at version 1.2 or
+above.
+Intermediate language is encoded as a part of the binary object and no
+flags are required to be passed to the compiler.</p>
+</li>
+</ol>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="opencl-platform-layer">4. The OpenCL Platform Layer</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This section describes the OpenCL platform layer which implements
+platform-specific features that allow applications to query OpenCL devices,
+device configuration information, and to create OpenCL contexts using one or
+more devices.</p>
+</div>
+<div class="sect2">
+<h3 id="_querying_platform_info">4.1. Querying Platform Info</h3>
+<div class="paragraph">
+<p>The list of platforms available can be obtained using the following
+function.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetPlatformIDs(cl_uint num_entries,
+                        cl_platform_id *platforms,
+                        cl_uint *num_platforms)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><em>num_entries</em> is the number of cl_platform_id entries that can be added to
+<em>platforms</em>.
+If <em>platforms</em> is not <code>NULL</code>, the <em>num_entries</em> must be greater than zero.</p>
+</div>
+<div class="paragraph">
+<p><em>platforms</em> returns a list of OpenCL platforms found.
+The cl_platform_id_ values returned in <em>platforms</em> can be used to identify a
+specific OpenCL platform.
+If <em>platforms</em> argument is <code>NULL</code>, this argument is ignored.
+The number of OpenCL platforms returned is the minimum of the value
+specified by <em>num_entries</em> or the number of OpenCL platforms available.</p>
+</div>
+<div class="paragraph">
+<p><em>num_platforms</em> returns the number of OpenCL platforms available.
+If <em>num_platforms</em> is <code>NULL</code>, this argument is ignored.</p>
+</div>
+<div class="paragraph">
+<p><strong>clGetPlatformIDs</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_VALUE if <em>num_entries</em> is equal to zero and <em>platforms</em> is
+not <code>NULL</code> or if both <em>num_platforms</em> and <em>platforms</em> are <code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetPlatformInfo(cl_platform_id platform,
+                         cl_platform_info param_name,
+                         size_t param_value_size,
+                         <span class="directive">void</span> *param_value,
+                         size_t *param_value_size_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>gets specific information about the OpenCL platform.
+The information that can be queried using <strong>clGetPlatformInfo</strong> is specified
+in the <a href="#platform-queries-table">Platform Queries</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>platform</em> refers to the platform ID returned by <strong>clGetPlatformIDs</strong> or can
+be <code>NULL</code>.
+If <em>platform</em> is <code>NULL</code>, the behavior is implementation-defined.</p>
+</div>
+<div class="paragraph">
+<p><em>param_name</em> is an enumeration constant that identifies the platform
+information being queried.
+It can be one of the following values as specified in the
+<a href="#platform-queries-table">Platform Queries</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value</em> is a pointer to memory location where appropriate values for a
+given <em>param_name</em>, as specified in the <a href="#platform-queries-table">Platform
+Queries</a> table, will be returned.
+If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size</em> specifies the size in bytes of memory pointed to by
+<em>param_value</em>.
+This size in bytes must be ≥ size of return type specified in the
+<a href="#platform-queries-table">Platform Queries</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
+queried by <em>param_name</em>.
+If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<table id="platform-queries-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 3. OpenCL Platform Queries</caption>
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 10%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_platform_info</strong></th>
+<th class="tableblock halign-left valign-top">Return Type</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_PROFILE</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]<sup>1</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL profile string.
+        Returns the profile name supported by the implementation.
+        The profile name returned can be one of the following strings:</p>
+<p class="tableblock">        FULL_PROFILE - if the implementation supports the OpenCL
+        specification (functionality defined as part of the core
+        specification and does not require any extensions to be supported).</p>
+<p class="tableblock">        EMBEDDED_PROFILE - if the implementation supports the OpenCL
+        embedded profile.
+        The embedded profile is defined to be a subset for each version of
+        OpenCL.
+        The embedded profile for OpenCL 2.2 is described in
+        <a href="#opencl-embedded-profile">OpenCL Embedded Profile</a>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_VERSION</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL version string.
+        Returns the OpenCL version supported by the implementation.
+        This version string has the following format:</p>
+<p class="tableblock">        <em>OpenCL&lt;space&gt;&lt;major_version.minor_version&gt;&lt;space&gt;&lt;platform-specific
+        information&gt;</em></p>
+<p class="tableblock">        The <em>major_version.minor_version</em> value returned will be 2.2.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_NAME</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Platform name string.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_VENDOR</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Platform vendor string.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_EXTENSIONS</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns a space separated list of extension names (the extension
+        names themselves do not contain any spaces) supported by the
+        platform.
+        Each extension that is supported by all devices associated with this
+        platform must be reported here.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_HOST_TIMER_RESOLUTION</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the resolution of the host timer in nanoseconds as used by
+        <strong>clGetDeviceAndHostTimer</strong>.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p><strong>clGetPlatformInfo</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors<sup>2</sup>.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_PLATFORM if <em>platform</em> is not a valid platform.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
+if size in bytes specified by <em>param_value_size</em> is &lt; size of return
+type as specified in the <a href="#platform-queries-table">OpenCL Platform
+Queries</a> table, and <em>param_value</em> is not a <code>NULL</code> value.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">1</dt>
+<dd>
+<p>A null terminated string is returned by OpenCL query function calls if
+the return type of the information being queried is a char[].</p>
+</dd>
+<dt class="hdlist1">2</dt>
+<dd>
+<p>The OpenCL specification does not describe the order of precedence for
+error codes returned by API calls.</p>
+</dd>
+</dl>
+</div>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect2">
+<h3 id="platform-querying-devices">4.2. Querying Devices</h3>
+<div class="paragraph">
+<p>The list of devices available on a platform can be obtained using the
+following function<sup>3</sup>.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetDeviceIDs(cl_platform_id platform,
+                      cl_device_type device_type,
+                      cl_uint num_entries,
+                      cl_device_id * devices,
+                      cl_uint *num_devices)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><em>platform</em> refers to the platform ID returned by <strong>clGetPlatformIDs</strong> or can
+be <code>NULL</code>.
+If <em>platform</em> is <code>NULL</code>, the behavior is implementation-defined.</p>
+</div>
+<div class="paragraph">
+<p><em>device_type</em> is a bitfield that identifies the type of OpenCL device.
+The <em>device_type</em> can be used to query specific OpenCL devices or all OpenCL
+devices available.
+The valid values for <em>device_type</em> are specified in the
+<a href="#device-categories-table">Device Categories</a> table.</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">3</dt>
+<dd>
+<p><strong>clGetDeviceIDs</strong> may return all or a subset of the actual physical
+devices present in the platform and that match <em>device_type</em>.</p>
+</dd>
+</dl>
+</div>
+<table id="device-categories-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 4. List of OpenCL Device Categories</caption>
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_device_type</strong></th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_CPU</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">An OpenCL device that is the host processor.
+    The host processor runs the OpenCL implementations and is a single or
+    multi-core CPU.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_GPU</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">An OpenCL device that is a GPU.
+    By this we mean that the device can also be used to accelerate a 3D API
+    such as OpenGL or DirectX.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_ACCELERATOR</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dedicated OpenCL accelerators (for example the IBM CELL Blade).
+    These devices communicate with the host processor using a peripheral
+    interconnect such as PCIe.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_CUSTOM</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dedicated accelerators that do not support programs written in an OpenCL
+    kernel language,</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_DEFAULT</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The default OpenCL device in the system.
+    The default device cannot be a <strong>CL_DEVICE_TYPE_CUSTOM</strong> device.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_ALL</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">All OpenCL devices available in the system except
+    <strong>CL_DEVICE_TYPE_CUSTOM</strong> devices..</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p><em>num_entries</em> is the number of cl_device_id entries that can be added to
+<em>devices</em>.
+If <em>devices</em> is not <code>NULL</code>, the <em>num_entries</em> must be greater than zero.</p>
+</div>
+<div class="paragraph">
+<p><em>devices</em> returns a list of OpenCL devices found.
+The cl_device_id values returned in <em>devices</em> can be used to identify a
+specific OpenCL device.
+If <em>devices</em> argument is <code>NULL</code>, this argument is ignored.
+The number of OpenCL devices returned is the minimum of the value specified
+by <em>num_entries</em> or the number of OpenCL devices whose type matches
+<em>device_type</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>num_devices</em> returns the number of OpenCL devices available that match
+<em>device_type</em>.
+If <em>num_devices</em> is <code>NULL</code>, this argument is ignored.</p>
+</div>
+<div class="paragraph">
+<p><strong>clGetDeviceIDs</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_PLATFORM if <em>platform</em> is not a valid platform.</p>
+</li>
+<li>
+<p>CL_INVALID_DEVICE_TYPE if <em>device_type</em> is not a valid value.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>num_entries</em> is equal to zero and <em>devices</em> is not
+<code>NULL</code> or if both <em>num_devices</em> and <em>devices</em> are <code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_DEVICE_NOT_FOUND if no OpenCL devices that matched <em>device_type</em> were
+found.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The application can query specific capabilities of the OpenCL device(s)
+returned by <strong>clGetDeviceIDs</strong>.
+This can be used by the application to determine which device(s) to use.</p>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetDeviceInfo(cl_device_id device,
+                       cl_device_info param_name,
+                       size_t param_value_size,
+                       <span class="directive">void</span> *param_value,
+                       size_t *param_value_size_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>gets specific information about an OpenCL device.</p>
+</div>
+<div class="paragraph">
+<p><em>device</em> may be a device returned by <strong>clGetDeviceIDs</strong> or a sub-device
+created by <strong>clCreateSubDevices</strong>.
+If <em>device</em> is a sub-device, the specific information for the sub-device
+will be returned.
+The information that can be queried using <strong>clGetDeviceInfo</strong> is specified in
+the <a href="#device-queries-table">Device Queries</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>param_name</em> is an enumeration constant that identifies the device
+information being queried.
+It can be one of the following values as specified in the
+<a href="#device-queries-table">Device Queries</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value</em> is a pointer to memory location where appropriate values for a
+given <em>param_name</em>, as specified in the <a href="#device-queries-table">Device
+Queries</a> table, will be returned.
+If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size</em> specifies the size in bytes of memory pointed to by
+<em>param_value</em>.
+This size in bytes must be ≥ size of return type specified in the
+<a href="#device-queries-table">Device Queries</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
+queried by <em>param_name</em>.
+If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<table id="device-queries-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 5. OpenCL Device Queries</caption>
+<colgroup>
+<col style="width: 30%;">
+<col style="width: 20%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_device_info</strong></th>
+<th class="tableblock halign-left valign-top">Return Type</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_TYPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_type</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL device type.
+        Currently supported values are:</p>
+<p class="tableblock">        CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, CL_DEVICE_TYPE_ACCELERATOR,
+        CL_DEVICE_TYPE_DEFAULT, a combination of the above types or
+        CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_VENDOR_ID</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">A unique device vendor identifier.
+        An example of a unique device identifier could be the PCIe ID.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_COMPUTE_UNITS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The number of parallel compute units on the OpenCL device.
+        A work-group executes on a single compute unit.
+        The minimum value is 1.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum dimensions that specify the global and local work-item IDs
+        used by the data parallel execution model. (Refer to
+        <strong>clEnqueueNDRangeKernel</strong>).
+        The minimum value is 3 for devices that are not of type
+        CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WORK_ITEM_SIZES</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t []</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of work-items that can be specified in each dimension
+        of the work-group to <strong>clEnqueueNDRangeKernel</strong>.</p>
+<p class="tableblock">        Returns <em>n</em> size_t entries, where <em>n</em> is the value returned by the
+        query for CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS.</p>
+<p class="tableblock">        The minimum value is (1, 1, 1) for devices that are not of type
+        CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WORK_GROUP_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of work-items in a work-group that a device is
+        capable of executing on a single compute unit, for any given
+        kernel-instance running on the device. (Refer also to
+        <strong>clEnqueueNDRangeKernel</strong> and CL_KERNEL_WORK_GROUP_SIZE ).
+        The minimum value is 1.
+        The returned value is an upper limit and will not necessarily
+        maximize performance.
+        This maximum may be larger than supported by a specific kernel
+        (refer to the CL_KERNEL_WORK_GROUP_SIZE query of <strong>clGetKernelWorkGroupInfo</strong>).</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR  <br>
+  CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT <br>
+  CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT   <br>
+  CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG  <br>
+  CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT <br>
+  CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE<br>
+  CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Preferred native vector width size for built-in scalar types that
+        can be put into vectors.
+        The vector width is defined as the number of scalar elements that
+        can be stored in the vector.</p>
+<p class="tableblock">        If double precision is not supported,
+        CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE must return 0.</p>
+<p class="tableblock">        If the <strong>cl_khr_fp16</strong> extension is not supported,
+        CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF must return 0.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR  <br>
+  CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT <br>
+  CL_DEVICE_NATIVE_VECTOR_WIDTH_INT   <br>
+  CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG  <br>
+  CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT <br>
+  CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE<br>
+  CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the native ISA vector width.
+        The vector width is defined as the number of scalar elements that
+        can be stored in the vector.</p>
+<p class="tableblock">        If double precision is not supported,
+        CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE must return 0.</p>
+<p class="tableblock">        If the <strong>cl_khr_fp16</strong> extension is not supported,
+        CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF must return 0.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_CLOCK_FREQUENCY</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Clock frequency of the device in MHz.
+        The meaning of this value is implementation-defined.
+        For devices with multiple clock domains, the clock frequency for any
+        of the clock domains may be returned.
+        For devices that dynamically change frequency for power or thermal
+        reasons, the returned clock frequency may be any valid frequency.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_ADDRESS_BITS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The default compute device address space size of the global address
+        space specified as an unsigned integer value in bits.
+        Currently supported values are 32 or 64 bits.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_MEM_ALLOC_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max size of memory object allocation in bytes.
+        The minimum value is max(min(1024 × 1024 × 1024, 1/4<sup>th</sup>
+        of CL_DEVICE_GLOBAL_MEM_SIZE), 32 × 1024 × 1024) for
+        devices that are not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_SUPPORT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if images are supported by the OpenCL device and CL_FALSE
+        otherwise.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_READ_IMAGE_ARGS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of image objects arguments of a kernel declared with the
+        read_only qualifier.
+        The minimum value is 128 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WRITE_IMAGE_ARGS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of image objects arguments of a kernel declared with the
+        write_only qualifier.
+        The minimum value is 64 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS<sup>4</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of image objects arguments of a kernel declared with the
+        write_only or read_write qualifier.
+        The minimum value is 64 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IL_VERSION</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The intermediate languages that can be supported by
+        <strong>clCreateProgramWithIL</strong> for this device.
+        Returns a space-separated list of IL version strings of the form
+        &lt;IL_Prefix&gt;_&lt;Major_Version&gt;.&lt;Minor_Version&gt;.
+        For OpenCL 2.2, SPIR-V is a required IL prefix.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE2D_MAX_WIDTH</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max width of 2D image or 1D image not created from a buffer object
+        in pixels.</p>
+<p class="tableblock">        The minimum value is 16384 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE2D_MAX_HEIGHT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max height of 2D image in pixels.</p>
+<p class="tableblock">        The minimum value is 16384 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE3D_MAX_WIDTH</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max width of 3D image in pixels.</p>
+<p class="tableblock">        The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE3D_MAX_HEIGHT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max height of 3D image in pixels.</p>
+<p class="tableblock">        The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE3D_MAX_DEPTH</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max depth of 3D image in pixels.</p>
+<p class="tableblock">        The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_MAX_BUFFER_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of pixels for a 1D image created from a buffer object.</p>
+<p class="tableblock">        The minimum value is 65536 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_MAX_ARRAY_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of images in a 1D or 2D image array.</p>
+<p class="tableblock">        The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_SAMPLERS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of samplers that can be used in a kernel.</p>
+<p class="tableblock">        The minimum value is 16 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_PITCH_ALIGNMENT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The row pitch alignment size in pixels for 2D images created from a
+        buffer.
+        The value returned must be a power of 2.</p>
+<p class="tableblock">        If the device does not support images, this value must be 0.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This query should be used when a 2D image is created from a buffer
+        which was created using CL_MEM_USE_HOST_PTR.
+        The value returned must be a power of 2.</p>
+<p class="tableblock">        This query specifies the minimum alignment in pixels of the host_ptr
+        specified to <strong>clCreateBuffer</strong>.</p>
+<p class="tableblock">        If the device does not support images, this value must be 0.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_PIPE_ARGS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of pipe objects that can be passed as arguments
+        to a kernel.
+        The minimum value is 16.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of reservations that can be active for a pipe per
+        work-item in a kernel.
+        A work-group reservation is counted as one reservation per
+        work-item.
+        The minimum value is 1.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PIPE_MAX_PACKET_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum size of pipe packet in bytes.
+        The minimum value is 1024 bytes.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_PARAMETER_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max size in bytes of all arguments that can be passed to a kernel.</p>
+<p class="tableblock">        The minimum value is 1024 for devices that are not of type
+        CL_DEVICE_TYPE_CUSTOM.
+        For this minimum value, only a maximum of 128 arguments can be
+        passed to a kernel</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MEM_BASE_ADDR_ALIGN</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Alignment requirement (in bits) for sub-buffer offsets.
+        The minimum value is the size (in bits) of the largest OpenCL
+        built-in data type supported by the device (long16 in FULL profile,
+        long16 or int16 in EMBEDDED profile) for devices that are not of
+        type CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_SINGLE_FP_CONFIG<sup>5</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_fp_config</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describes single precision floating-point capability of the device.
+        This is a bit-field that describes one or more of the following
+        values:</p>
+<p class="tableblock">        CL_FP_DENORM - denorms are supported</p>
+<p class="tableblock">        CL_FP_INF_NAN - INF and quiet NaNs are supported.</p>
+<p class="tableblock">        CL_FP_ROUND_TO_NEAREST-- round to nearest even rounding mode
+        supported</p>
+<p class="tableblock">        CL_FP_ROUND_TO_ZERO - round to zero rounding mode supported</p>
+<p class="tableblock">        CL_FP_ROUND_TO_INF - round to positive and negative infinity
+        rounding modes supported</p>
+<p class="tableblock">        CL_FP_FMA - IEEE754-2008 fused multiply-add is supported.</p>
+<p class="tableblock">        CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT - divide and sqrt are correctly
+        rounded as defined by the IEEE754 specification.</p>
+<p class="tableblock">        CL_FP_SOFT_FLOAT - Basic floating-point operations (such as
+        addition, subtraction, multiplication) are implemented in software.</p>
+<p class="tableblock">        For the full profile, the mandated minimum floating-point capability
+        for devices that are not of type CL_DEVICE_TYPE_CUSTOM is:
+        CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN.</p>
+<p class="tableblock">        For the embedded profile, see section 10.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_DOUBLE_FP_CONFIG<sup>6</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_fp_config</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describes double precision floating-point capability of the OpenCL
+        device.
+        This is a bit-field that describes one or more of the following
+        values:</p>
+<p class="tableblock">        CL_FP_DENORM - denorms are supported</p>
+<p class="tableblock">        CL_FP_INF_NAN - INF and NaNs are supported.</p>
+<p class="tableblock">        CL_FP_ROUND_TO_NEAREST - round to nearest even rounding mode
+        supported.</p>
+<p class="tableblock">        CL_FP_ROUND_TO_ZERO - round to zero rounding mode supported.</p>
+<p class="tableblock">        CL_FP_ROUND_TO_INF - round to positive and negative infinity
+        rounding modes supported.</p>
+<p class="tableblock">        CP_FP_FMA - IEEE754-2008 fused multiply-add is supported.</p>
+<p class="tableblock">        CL_FP_SOFT_FLOAT - Basic floating-point operations (such as
+        addition, subtraction, multiplication) are implemented in software.</p>
+<p class="tableblock">        Double precision is an optional feature so the mandated minimum
+        double precision floating-point capability is 0.</p>
+<p class="tableblock">        If double precision is supported by the device, then the minimum
+        double precision floating-point capability must be:<br>
+        CL_FP_FMA |<br>
+        CL_FP_ROUND_TO_NEAREST |<br>
+        CL_FP_INF_NAN |<br>
+        CL_FP_DENORM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_CACHE_TYPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_mem_cache_type</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Type of global memory cache supported.
+        Valid values are: CL_NONE, CL_READ_ONLY_CACHE and
+        CL_READ_WRITE_CACHE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Size of global memory cache line in bytes.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_CACHE_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Size of global memory cache in bytes.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Size of global device memory in bytes.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max size in bytes of a constant buffer allocation.
+        The minimum value is 64 KB for devices that are not of type
+        CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_CONSTANT_ARGS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of arguments declared with the <code>__constant</code> qualifier
+        in a kernel.
+        The minimum value is 8 for devices that are not of type
+        CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of bytes of storage that may be allocated for any
+        single variable in program scope or inside a function in an OpenCL
+        kernel language declared in the global address space.</p>
+<p class="tableblock">        The minimum value is 64 KB.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum preferred total size, in bytes, of all program variables in
+        the global address space.
+        This is a performance hint.
+        An implementation may place such variables in storage with optimized
+        device access.
+        This query returns the capacity of such storage.
+        The minimum value is 0.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_LOCAL_MEM_TYPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_local_mem_type</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Type of local memory supported.
+        This can be set to CL_LOCAL implying dedicated local memory storage
+        such as SRAM , or CL_GLOBAL.</p>
+<p class="tableblock">        For custom devices, CL_NONE can also be returned indicating no local
+        memory support.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_LOCAL_MEM_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Size of local memory region in bytes.
+        The minimum value is 32 KB for devices that are not of type
+        CL_DEVICE_TYPE_CUSTOM.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_ERROR_CORRECTION_SUPPORT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the device implements error correction for all
+        accesses to compute device memory (global and constant).
+        Is CL_FALSE if the device does not implement such error correction.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PROFILING_TIMER_RESOLUTION</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the resolution of device timer.
+        This is measured in nanoseconds.
+        Refer to <a href="#profiling-operations">Profiling Operations</a> for details.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_ENDIAN_LITTLE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the OpenCL device is a little endian device and
+        CL_FALSE otherwise</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_AVAILABLE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the device is available and CL_FALSE otherwise.
+        A device is considered to be available if the device can be expected
+        to successfully execute commands enqueued to the device.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_COMPILER_AVAILABLE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_FALSE if the implementation does not have a compiler available
+        to compile the program source.</p>
+<p class="tableblock">        Is CL_TRUE if the compiler is available.
+        This can be CL_FALSE for the embedded platform profile only.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_LINKER_AVAILABLE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_FALSE if the implementation does not have a linker available.
+        Is CL_TRUE if the linker is available.</p>
+<p class="tableblock">        This can be CL_FALSE for the embedded platform profile only.</p>
+<p class="tableblock">        This must be CL_TRUE if CL_DEVICE_COMPILER_AVAILABLE is CL_TRUE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_EXECUTION_CAPABILITIES</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_exec_ capabilities</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the execution capabilities of the device.
+        This is a bit-field that describes one or more of the following
+        values:</p>
+<p class="tableblock">        CL_EXEC_KERNEL - The OpenCL device can execute OpenCL kernels.</p>
+<p class="tableblock">        CL_EXEC_NATIVE_KERNEL - The OpenCL device can execute native
+        kernels.</p>
+<p class="tableblock">        The mandated minimum capability is: CL_EXEC_KERNEL.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_HOST_PROPERTIES<sup>7</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue_properties</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the on host command-queue properties supported by the
+        device.
+        This is a bit-field that describes one or more of the following
+        values:</p>
+<p class="tableblock">        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE<br>
+        CL_QUEUE_PROFILING_ENABLE</p>
+<p class="tableblock">        These properties are described in the <a href="#queue-properties-table">Queue Properties</a> table.</p>
+<p class="tableblock">        The mandated minimum capability is: CL_QUEUE_PROFILING_ENABLE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue_properties</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the on device command-queue properties supported by the
+        device.
+        This is a bit-field that describes one or more of the following
+        values:</p>
+<p class="tableblock">        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE<br>
+        CL_QUEUE_PROFILING_ENABLE</p>
+<p class="tableblock">        These properties are described in the <a href="#queue-properties-table">Queue Properties</a> table.</p>
+<p class="tableblock">        The mandated minimum capability is:
+        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The size of the device queue in bytes preferred by the
+        implementation.
+        Applications should use this size for the device queue to ensure
+        good performance.</p>
+<p class="tableblock">        The minimum value is 16 KB</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The max. size of the device queue in bytes.
+        The minimum value is 256 KB for the full profile and 64 KB for the
+        embedded profile</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_ON_DEVICE_QUEUES</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of device queues that can be created for this
+        device in a single context.</p>
+<p class="tableblock">        The minimum value is 1.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_ON_DEVICE_EVENTS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of events in use by a device queue.
+        These refer to events returned by the <code>enqueue_</code> built-in functions
+        to a device queue or user events returned by the <code>create_user_event</code>
+        built-in function that have not been released.</p>
+<p class="tableblock">        The minimum value is 1024.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_BUILT_IN_KERNELS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">A semi-colon separated list of built-in kernels supported by the
+        device.
+        An empty string is returned if no built-in kernels are supported by
+        the device.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PLATFORM</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_platform_id</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">The platform associated with this device.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_NAME</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Device name string.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_VENDOR</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Vendor name string.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DRIVER_VERSION</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL software driver version string.
+        Follows a vendor-specific format.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PROFILE<sup>8</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL profile string.
+        Returns the profile name supported by the device.
+        The profile name returned can be one of the following strings:</p>
+<p class="tableblock">        FULL_PROFILE - if the device supports the OpenCL specification
+        (functionality defined as part of the core specification and does
+        not require any extensions to be supported).</p>
+<p class="tableblock">        EMBEDDED_PROFILE - if the device supports the OpenCL embedded
+        profile.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_VERSION</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL version string.
+        Returns the OpenCL version supported by the device. This version
+        string has the following format:</p>
+<p class="tableblock">        <em>OpenCL&lt;space&gt;&lt;major_version.minor_version&gt;&lt;space&gt;&lt;vendor-specific
+        information&gt;</em></p>
+<p class="tableblock">        The major_version.minor_version value returned will be 2.2.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_OPENCL_C_VERSION</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL C version string.
+        Returns the highest OpenCL C version supported by the compiler for
+        this device that is not of type CL_DEVICE_TYPE_CUSTOM.
+        This version string has the following format:</p>
+<p class="tableblock">        <em>OpenCL&lt;space&gt;C&lt;space&gt;&lt;major_version.minor_version&gt;&lt;space&gt;&lt;vendor-specific
+        information&gt;</em></p>
+<p class="tableblock">        The major_version.minor_version value returned must be 2.0 if
+        CL_DEVICE_VERSION is OpenCL 2.0.</p>
+<p class="tableblock">        The major_version.minor_version value returned must be 1.2 if
+        CL_DEVICE_VERSION is OpenCL 1.2.</p>
+<p class="tableblock">        The major_version.minor_version value returned must be 1.1 if
+        CL_DEVICE_VERSION is OpenCL 1.1.</p>
+<p class="tableblock">        The major_version.minor_version value returned can be 1.0 or 1.1 if
+        CL_DEVICE_VERSION is OpenCL 1.0.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_EXTENSIONS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns a space separated list of extension names (the extension
+        names themselves do not contain any spaces) supported by the device.
+        The list of extension names returned can be vendor supported
+        extension names and one or more of the following Khronos approved
+        extension names:</p>
+<p class="tableblock">        <strong>cl_khr_int64_base_atomics</strong><br>
+        <strong>cl_khr_int64_extended_atomics</strong><br>
+        <strong>cl_khr_fp16</strong><br>
+        <strong>cl_khr_gl_sharing</strong><br>
+        <strong>cl_khr_gl_event</strong><br>
+        <strong>cl_khr_d3d10_sharing</strong><br>
+        <strong>cl_khr_dx9_media_sharing</strong><br>
+        <strong>cl_khr_d3d11_sharing</strong><br>
+        <strong>cl_khr_gl_depth_images</strong><br>
+        <strong>cl_khr_gl_msaa_sharing</strong><br>
+        <strong>cl_khr_initialize_memory</strong><br>
+        <strong>cl_khr_terminate_context</strong><br>
+        <strong>cl_khr_spir</strong><br>
+        <strong>cl_khr_srgb_image_writes</strong></p>
+<p class="tableblock">        The following approved Khronos extension names must be returned by
+        all devices that support OpenCL C 2.0:</p>
+<p class="tableblock">        <strong>cl_khr_byte_addressable_store</strong><br>
+        <strong>cl_khr_fp64</strong> (for backward compatibility if double precision is
+        supported)<br>
+        <strong>cl_khr_3d_image_writes</strong><br>
+        <strong>cl_khr_image2d_from_buffer</strong><br>
+        <strong>cl_khr_depth_images</strong></p>
+<p class="tableblock">        Please refer to the OpenCL 2.0 Extension Specification for a
+        detailed description of these extensions.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PRINTF_BUFFER_SIZE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum size in bytes of the internal buffer that holds the output
+        of printf calls from a kernel.
+        The minimum value for the FULL profile is 1 MB.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_INTEROP_USER_SYNC</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the devices preference is for the user to be
+        responsible for synchronization, when sharing memory objects between
+        OpenCL and other APIs such as DirectX, CL_FALSE if the device /
+        implementation has a performant path for performing synchronization
+        of memory object shared between OpenCL and other APIs such as
+        DirectX.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARENT_DEVICE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_id</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the cl_device_id of the parent device to which this
+        sub-device belongs.
+        If <em>device</em> is a root-level device, a <code>NULL</code> value is returned.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_MAX_SUB_DEVICES</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the maximum number of sub-devices that can be created when a
+        device is partitioned.</p>
+<p class="tableblock">        The value returned cannot exceed CL_DEVICE_MAX_COMPUTE_UNITS.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_PROPERTIES</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_partition_ property[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the list of partition types supported by <em>device</em>.
+        This is an array of cl_device_partition_property values drawn from
+        the following list:</p>
+<p class="tableblock">        CL_DEVICE_PARTITION_EQUALLY<br>
+        CL_DEVICE_PARTITION_BY_COUNTS<br>
+        CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN</p>
+<p class="tableblock">        If the device cannot be partitioned (i.e. there is no partitioning
+        scheme supported by the device that will return at least two
+        subdevices), a value of 0 will be returned.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_AFFINITY_DOMAIN</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_affinity_ domain</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the list of supported affinity domains for partitioning the
+        device using CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN.
+        This is a bit-field that describes one or more of the following
+        values:</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_NUMA<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE<br>
+        CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE</p>
+<p class="tableblock">        If the device does not support any affinity domains, a value of 0
+        will be returned.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_TYPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_partition_ property[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the properties argument specified in <strong>clCreateSubDevices</strong> if
+        device is a sub-device.
+        In the case where the properties argument to <strong>clCreateSubDevices</strong> is
+        CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN,
+        CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE, the affinity domain
+        used to perform the partition will be returned.
+        This can be one of the following values:</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_NUMA<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE<br>
+        CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE</p>
+<p class="tableblock">        Otherwise the implementation may either return a
+        <em>param_value_size_ret</em> of 0 i.e. there is no partition type
+        associated with device or can return a property value of 0 (where 0
+        is used to terminate the partition property list) in the memory that
+        <em>param_value</em> points to.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_REFERENCE_COUNT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the <em>device</em> reference count.
+        If the device is a root-level device, a reference count of one is
+        returned.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_SVM_CAPABILITIES</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_svm_ capabilities</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the various shared virtual memory (a.k.a. SVM) memory
+        allocation types the device supports.
+        Coarse-grain SVM allocations are required to be supported by all
+        OpenCL 2.0 devices.
+        This is a bit-field that describes a combination of the following
+        values:</p>
+<p class="tableblock">        CL_DEVICE_SVM_COARSE_GRAIN_BUFFER - Support for coarse-grain buffer
+        sharing using <strong>clSVMAlloc</strong>.
+        Memory consistency is guaranteed at synchronization points and the
+        host must use calls to <strong>clEnqueueMapBuffer</strong> and
+        <strong>clEnqueueUnmapMemObject</strong>.</p>
+<p class="tableblock">        CL_DEVICE_SVM_FINE_GRAIN_BUFFER - Support for fine-grain buffer
+        sharing using <strong>clSVMAlloc</strong>.
+        Memory consistency is guaranteed at synchronization points without
+        need for <strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueUnmapMemObject</strong>.</p>
+<p class="tableblock">        CL_DEVICE_SVM_FINE_GRAIN_SYSTEM - Support for sharing the host&#8217;s
+        entire virtual memory including memory allocated using <strong>malloc</strong>.
+        Memory consistency is guaranteed at synchronization points.</p>
+<p class="tableblock">        CL_DEVICE_SVM_ATOMICS - Support for the OpenCL 2.0 atomic
+        operations that provide memory consistency across the host and all
+        OpenCL devices supporting fine-grain SVM allocations.</p>
+<p class="tableblock">        The mandated minimum capability is
+        CL_DEVICE_SVM_COARSE_GRAIN_BUFFER.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the value representing the preferred alignment in bytes for
+        OpenCL 2.0 fine-grained SVM atomic types.
+        This query can return 0 which indicates that the preferred alignment
+        is aligned to the natural size of the type.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the value representing the preferred alignment in bytes for
+        OpenCL 2.0 atomic types to global memory.
+        This query can return 0 which indicates that the preferred alignment
+        is aligned to the natural size of the type.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the value representing the preferred alignment in bytes for
+        OpenCL 2.0 atomic types to local memory.
+        This query can return 0 which indicates that the preferred alignment
+        is aligned to the natural size of the type.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_NUM_SUB_GROUPS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of sub-groups in a work-group that a device is
+        capable of executing on a single compute unit, for any given
+        kernel-instance running on the device.
+        The minimum value is 1.
+        (Refer also to <strong>clGetKernelSubGroupInfo</strong>.)</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if this device supports independent forward progress of
+        sub-groups, CL_FALSE otherwise.
+        If <strong>cl_khr_subgroups</strong> is supported by the device this must return
+        CL_TRUE.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">4</dt>
+<dd>
+<p>NOTE: <strong>CL_DEVICE_MAX_WRITE_IMAGE_ARGS</strong> is only there for backward
+compatibility.
+<strong>CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS</strong> should be used instead.</p>
+</dd>
+<dt class="hdlist1">5</dt>
+<dd>
+<p>The optional rounding modes should be included as a device capability
+only if it is supported natively.
+All explicit conversion functions with specific rounding modes must
+still operate correctly.</p>
+</dd>
+<dt class="hdlist1">6</dt>
+<dd>
+<p>The optional rounding modes should be included as a device capability
+only if it is supported natively.
+All explicit conversion functions with specific rounding modes must
+still operate correctly.</p>
+</dd>
+<dt class="hdlist1">7</dt>
+<dd>
+<p>CL_DEVICE_QUEUE_PROPERTIES is deprecated and replaced by
+CL_DEVICE_QUEUE_ON_HOST_PROPERTIES.</p>
+</dd>
+<dt class="hdlist1">8</dt>
+<dd>
+<p>The platform profile returns the profile that is implemented by the
+OpenCL framework.
+If the platform profile returned is FULL_PROFILE, the OpenCL framework
+will support devices that are FULL_PROFILE and may also support devices
+that are EMBEDDED_PROFILE.
+The compiler must be available for all devices i.e.
+CL_DEVICE_COMPILER_AVAILABLE is CL_TRUE.
+If the platform profile returned is EMBEDDED_PROFILE, then devices that
+are only EMBEDDED_PROFILE are supported.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p>The device queries described in the <a href="#device-queries-table">Device Queries</a>
+table should return the same information for a root-level device i.e. a
+device returned by <strong>clGetDeviceIDs</strong> and any sub-devices created from this
+device except for the following queries:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_DEVICE_GLOBAL_MEM_CACHE_SIZE</p>
+</li>
+<li>
+<p>CL_DEVICE_BUILT_IN_KERNELS</p>
+</li>
+<li>
+<p>CL_DEVICE_PARENT_DEVICE</p>
+</li>
+<li>
+<p>CL_DEVICE_PARTITION_TYPE</p>
+</li>
+<li>
+<p>CL_DEVICE_REFERENCE_COUNT</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clGetDeviceInfo</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_DEVICE if <em>device</em> is not valid.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
+if size in bytes specified by <em>param_value_size</em> is &lt; size of return
+type as specified in the <a href="#device-queries-table">Device Queries</a> table
+and <em>param_value</em> is not a <code>NULL</code> value or if <em>param_name</em> is a value
+that is available as an extension and the corresponding extension is not
+supported by the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetDeviceAndHostTimer(cl_device_id device,
+                               cl_ulong* device_timestamp,
+                               cl_ulong* host_timestamp)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Returns a reasonably synchronized pair of timestamps from the device timer
+and the host timer as seen by <em>device</em>.
+Implementations may need to execute this query with a high latency in order
+to provide reasonable synchronization of the timestamps.
+The host timestamp and device timestamp returned by this function and
+<strong>clGetHostTimer</strong> each have an implementation defined timebase.
+The timestamps will always be in their respective timebases regardless of
+which query function is used.
+The timestamp returned from <strong>clGetEventProfilingInfo</strong> for an event on a
+device and a device timestamp queried from the same device will always be in
+the same timebase.</p>
+</div>
+<div class="paragraph">
+<p><em>device</em> is a device returned by <strong>clGetDeviceIDs</strong>.</p>
+</div>
+<div class="paragraph">
+<p><em>device_timestamp</em> will be updated with the value of the device timer in
+nanoseconds.
+The resolution of the timer is the same as the device profiling timer
+returned by <strong>clGetDeviceInfo</strong> and the CL_DEVICE_PROFILING_TIMER_RESOLUTION
+query.</p>
+</div>
+<div class="paragraph">
+<p><em>host_timestamp</em> will be updated with the value of the host timer in
+nanoseconds at the closest possible point in time to that at which
+<em>device_timer</em> was returned.
+The resolution of the timer may be queried via <strong>clGetPlatformInfo</strong> and the
+flag CL_PLATFORM_HOST_TIMER_RESOLUTION.</p>
+</div>
+<div class="paragraph">
+<p><strong>clGetDeviceAndHostTimer</strong> will return CL_SUCCESS with a time value in
+<em>host_timestamp</em> if provided.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_DEVICE if <em>device</em> is not a valid OpenCL device.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>host_timestamp</em> or <em>device_timestamp</em> is <code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetHostTimer(cl_device_id device,
+                      cl_ulong* host_timestamp)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Return the current value of the host clock as seen by <em>device</em>.
+This value is in the same timebase as the host_timestamp returned from
+<strong>clGetDeviceAndHostTimer</strong>.
+The implementation will return with as low a latency as possible to allow a
+correlation with a subsequent application sampled time.
+The host timestamp and device timestamp returned by this function and
+<strong>clGetDeviceAndHostTimer</strong> each have an implementation defined timebase.
+The timestamps will always be in their respective timebases regardless of
+which query function is used.
+The timestamp returned from <strong>clGetEventProfilingInfo</strong> for an event on a
+device and a device timestamp queried from the same device will always be in
+the same timebase.</p>
+</div>
+<div class="paragraph">
+<p><em>device</em> is a device returned by <strong>clGetDeviceIDs</strong>.</p>
+</div>
+<div class="paragraph">
+<p><em>host_timestamp</em> will be updated with the value of the current timer in
+nanoseconds.
+The resolution of the timer may be queried via <strong>clGetPlatformInfo</strong> and the
+flag CL_PLATFORM_HOST_TIMER_RESOLUTION.</p>
+</div>
+<div class="paragraph">
+<p><strong>clGetHostTimer</strong> will return CL_SUCCESS with a time value in
+<em>host_timestamp</em> if provided.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_DEVICE if <em>device</em> is not a valid OpenCL device.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>host_timestamp</em> is <code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_partitioning_a_device">4.3. Partitioning a Device</h3>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clCreateSubDevices(cl_device_id in_device,
+                          <span class="directive">const</span> cl_device_partition_property *properties,
+                          cl_uint num_devices,
+                          cl_device_id *out_devices,
+                          cl_uint *num_devices_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>creates an array of sub-devices that each reference a non-intersecting set
+of compute units within in_device, according to a partition scheme given by
+<em>properties</em>.
+The output sub-devices may be used in every way that the root (or parent)
+device can be used, including creating contexts, building programs, further
+calls to <strong>clCreateSubDevices</strong> and creating command-queues.
+When a command-queue is created against a sub-device, the commands enqueued
+on the queue are executed only on the sub-device.</p>
+</div>
+<div class="paragraph">
+<p><em>in_device</em> is the device to be partitioned.</p>
+</div>
+<div class="paragraph">
+<p><em>properties</em> specifies how <em>in_device</em> is to be partition described by a
+partition name and its corresponding value.
+Each partition name is immediately followed by the corresponding desired
+value.
+The list is terminated with 0.
+The list of supported partitioning schemes is described in the
+<a href="#subdevice-partition-table">Subdevice Partition</a> table.
+Only one of the listed partitioning schemes can be specified in
+<em>properties</em>.</p>
+</div>
+<table id="subdevice-partition-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 6. <em>List of supported partition schemes by</em> <strong>clCreateSubDevices</strong></caption>
+<colgroup>
+<col style="width: 30%;">
+<col style="width: 20%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_device_partition_property enum</strong></th>
+<th class="tableblock halign-left valign-top">Partition value</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_PARTITION_EQUALLY</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Split the aggregate device into as many smaller aggregate devices as
+        can be created, each containing <em>n</em> compute units.
+        The value <em>n</em> is passed as the value accompanying this property.
+        If <em>n</em> does not divide evenly into
+        CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS, then the remaining compute
+        units are not used.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_PARTITION_BY_COUNTS</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This property is followed by a
+        CL_DEVICE_PARTITION_BY_COUNTS_LIST_END terminated list of compute
+        unit counts.
+        For each non-zero count <em>m</em> in the list, a sub-device is created
+        with <em>m</em> compute units in it.
+        CL_DEVICE_PARTITION_BY_COUNTS_LIST_END is defined to be 0.</p>
+<p class="tableblock">        The number of non-zero count entries in the list may not exceed
+        CL_DEVICE_PARTITION_MAX_SUB_DEVICES.</p>
+<p class="tableblock">        The total number of compute units specified may not exceed
+        CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_affinity_ domain</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Split the device into smaller aggregate devices containing one or
+        more compute units that all share part of a cache hierarchy.
+        The value accompanying this property may be drawn from the following
+        list:</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_NUMA - Split the device into sub-devices
+        comprised of compute units that share a NUMA node.</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE - Split the device into
+        sub-devices comprised of compute units that share a level 4 data
+        cache.</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE - Split the device into
+        sub-devices comprised of compute units that share a level 3 data
+        cache.</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE - Split the device into
+        sub-devices comprised of compute units that share a level 2 data
+        cache.</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE - Split the device into
+        sub-devices comprised of compute units that share a level 1 data
+        cache.</p>
+<p class="tableblock">        CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE - Split the device
+        along the next partitionable affinity domain.
+        The implementation shall find the first level along which the device
+        or sub-device may be further subdivided in the order NUMA, L4, L3,
+        L2, L1, and partition the device into sub-devices comprised of
+        compute units that share memory subsystems at this level.</p>
+<p class="tableblock">        The user may determine what happened by calling
+        <strong>clGetDeviceInfo</strong>(CL_DEVICE_PARTITION_TYPE) on the sub-devices.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p><em>num_devices</em> is the size of memory pointed to by <em>out_devices</em> specified as
+the number of cl_device_id entries.</p>
+</div>
+<div class="paragraph">
+<p><em>out_devices</em> is the buffer where the OpenCL sub-devices will be returned.
+If <em>out_devices</em> is <code>NULL</code>, this argument is ignored.
+If <em>out_devices</em> is not <code>NULL</code>, <em>num_devices</em> must be greater than or equal
+to the number of sub-devices that <em>device</em> may be partitioned into according
+to the partitioning scheme specified in <em>properties</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>num_devices_ret</em> returns the number of sub-devices that <em>device</em> may be
+partitioned into according to the partitioning scheme specified in
+<em>properties</em>.
+If <em>num_devices_ret</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateSubDevices</strong> returns CL_SUCCESS if the partition is created
+successfully.
+Otherwise, it returns a <code>NULL</code> value with the following error values
+returned in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_DEVICE if <em>in_device</em> is not valid.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if values specified in <em>properties</em> are not valid or if
+values specified in <em>properties</em> are valid but not supported by the
+device.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>out_devices</em> is not <code>NULL</code> and <em>num_devices</em> is
+less than the number of sub-devices created by the partition scheme.</p>
+</li>
+<li>
+<p>CL_DEVICE_PARTITION_FAILED if the partition name is supported by the
+implementation but in_device could not be further partitioned.</p>
+</li>
+<li>
+<p>CL_INVALID_DEVICE_PARTITION_COUNT if the partition name specified in
+<em>properties</em> is CL_DEVICE_PARTITION_BY_COUNTS and the number of
+sub-devices requested exceeds CL_DEVICE_PARTITION_MAX_SUB_DEVICES or the
+total number of compute units requested exceeds
+CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS for <em>in_device</em>, or the number of
+compute units requested for one or more sub-devices is less than zero or
+the number of sub-devices requested exceeds
+CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS for <em>in_device</em>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>A few examples that describe how to specify partition properties in
+<em>properties</em> argument to <strong>clCreateSubDevices</strong> are given below:</p>
+</div>
+<div class="paragraph">
+<p>To partition a device containing 16 compute units into two sub-devices, each
+containing 8 compute units, pass the following in <em>properties</em>:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_EQUALLY, <span class="integer">8</span>, <span class="integer">0</span> }</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To partition a device with four compute units into two sub-devices with one
+sub-device containing 3 compute units and the other sub-device 1 compute
+unit, pass the following in properties argument:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_BY_COUNTS,
+  <span class="integer">3</span>, <span class="integer">1</span>, CL_DEVICE_PARTITION_BY_COUNTS_LIST_END, <span class="integer">0</span> }</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To split a device along the outermost cache line (if any), pass the
+following in properties argument:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN,
+  CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE,
+  <span class="integer">0</span> }</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainDevice(cl_device_id device)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>increments the <em>device</em> reference count if <em>device</em> is a valid sub-device
+created by a call to <strong>clCreateSubDevices</strong>.
+If <em>device</em> is a root level device i.e. a cl_device_id returned by
+<strong>clGetDeviceIDs</strong>, the <em>device</em> reference count remains unchanged.
+<strong>clRetainDevice</strong> returns CL_SUCCESS if the function is executed successfully
+or the device is a root-level device.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_DEVICE if <em>device</em> is not a valid sub-device created by a
+call to <strong>clCreateSubDevices</strong>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseDevice(cl_device_id device)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>decrements the <em>device</em> reference count if device is a valid sub-device
+created by a call to <strong>clCreateSubDevices</strong>.
+If <em>device</em> is a root level device i.e. a cl_device_id returned by
+<strong>clGetDeviceIDs</strong>, the <em>device</em> reference count remains unchanged.
+<strong>clReleaseDevice</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_DEVICE if <em>device</em> is not a valid sub-device created by a
+call to <strong>clCreateSubDevices</strong>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>After the <em>device</em> reference count becomes zero and all the objects attached
+to <em>device</em> (such as command-queues) are released, the <em>device</em> object is
+deleted.
+Using this function to release a reference that was not obtained by creating
+the object or by calling <strong>clRetainDevice</strong> causes undefined behavior.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_contexts">4.4. Contexts</h3>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_context clCreateContext(<span class="directive">const</span> cl_context_properties *properties,
+                           cl_uint num_devices,
+                           <span class="directive">const</span> cl_device_id *devices,
+                           <span class="directive">void</span>(CL_CALLBACK *pfn_notify)
+                               (<span class="directive">const</span> <span class="predefined-type">char</span> *errinfo,
+                               <span class="directive">const</span> <span class="directive">void</span> *private_info,
+                               size_t cb,
+                               <span class="directive">void</span> *user_data),
+                           <span class="directive">void</span> *user_data,
+                           cl_int *errcode_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>creates an OpenCL context.
+An OpenCL context is created with one or more devices.
+Contexts are used by the OpenCL runtime for managing objects such as
+command-queues, memory, program and kernel objects and for executing kernels
+on one or more devices specified in the context.</p>
+</div>
+<div class="paragraph">
+<p><em>properties</em> specifies a list of context property names and their
+corresponding values.
+Each property name is immediately followed by the corresponding desired
+value.
+The list is terminated with 0.
+The list of supported properties is described in the
+<a href="#context-properties-table">Context Properties</a> table.
+<em>properties</em> can be <code>NULL</code> in which case the platform that is selected is
+implementation-defined.</p>
+</div>
+<table id="context-properties-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 7. <em>List of supported properties by</em> <strong>clCreateContext</strong></caption>
+<colgroup>
+<col style="width: 34%;">
+<col style="width: 33%;">
+<col style="width: 33%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_context_properties enum</strong></th>
+<th class="tableblock halign-left valign-top">Property value</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_PLATFORM</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_platform_id</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies the platform to use.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_INTEROP_USER_SYNC</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies whether the user is responsible for synchronization
+        between OpenCL and other APIs.
+        Please refer to the specific sections in the OpenCL 2.0 extension
+        specification that describe sharing with other APIs for restrictions
+        on using this flag.</p>
+<p class="tableblock">        If CL_CONTEXT_INTEROP_USER_SYNC is not specified, a default of
+        CL_FALSE is assumed.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p><em>num_devices</em> is the number of devices specified in the <em>devices</em> argument.</p>
+</div>
+<div class="paragraph">
+<p><em>devices</em> is a pointer to a list of unique devices<sup>9</sup> returned by
+<strong>clGetDeviceIDs</strong> or sub-devices created by <strong>clCreateSubDevices</strong> for a
+platform.</p>
+</div>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">9</dt>
+<dd>
+<p>Duplicate devices specified in <em>devices</em> are ignored.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p><em>pfn_notify</em> is a callback function that can be registered by the
+application.
+This callback function will be used by the OpenCL implementation to report
+information on errors during context creation as well as errors that occur
+at runtime in this context.
+This callback function may be called asynchronously by the OpenCL
+implementation.
+It is the applications responsibility to ensure that the callback function
+is thread-safe.
+The parameters to this callback function are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>errinfo</em> is a pointer to an error string.</p>
+</li>
+<li>
+<p><em>private_info</em> and <em>cb</em> represent a pointer to binary data that is
+returned by the OpenCL implementation that can be used to log additional
+information helpful in debugging the error.</p>
+</li>
+<li>
+<p><em>user_data</em> is a pointer to user supplied data.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If <em>pfn_notify</em> is <code>NULL</code>, no callback function is registered.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+There are a number of cases where error notifications need to be
+delivered due to an error that occurs outside a context.
+Such notifications may not be delivered through the <em>pfn_notify</em> callback.
+Where these notifications go is implementation-defined.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p><em>user_data</em> will be passed as the <em>user_data</em> argument when <em>pfn_notify</em> is
+called.
+<em>user_data</em> can be <code>NULL</code>.</p>
+</div>
+<div class="paragraph">
+<p><em>errcode_ret</em> will return an appropriate error code.
+If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateContext</strong> returns a valid non-zero context and <em>errcode_ret</em> is set
+to CL_SUCCESS if the context is created successfully.
+Otherwise, it returns a <code>NULL</code> value with the following error values
+returned in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_PLATFORM if <em>properties</em> is <code>NULL</code> and no platform could be
+selected or if platform value specified in <em>properties</em> is not a valid
+platform.</p>
+</li>
+<li>
+<p>CL_INVALID_PROPERTY if context property name in <em>properties</em> is not a
+supported property name, if the value specified for a supported property
+name is not valid, or if the same property name is specified more than
+once.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>devices</em> is <code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>num_devices</em> is equal to zero.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>pfn_notify</em> is <code>NULL</code> but <em>user_data</em> is not
+<code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_INVALID_DEVICE if <em>devices</em> contains an invalid device.</p>
+</li>
+<li>
+<p>CL_DEVICE_NOT_AVAILABLE if a device in <em>devices</em> is currently not
+available even though the device was returned by <strong>clGetDeviceIDs</strong>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function<sup>10</sup></p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_context clCreateContextFromType(<span class="directive">const</span> cl_context_properties *properties,
+                                   cl_device_type device_type,
+                                   <span class="directive">void</span>(CL_CALLBACK *pfn_notify)(
+                                       (<span class="directive">const</span> <span class="predefined-type">char</span> *errinfo,
+                                        <span class="directive">const</span> <span class="directive">void</span> *private_info,
+                                        size_t cb,
+                                        <span class="directive">void</span> *user_data),
+                                   <span class="directive">void</span> *user_data,
+                                   cl_int *errcode_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>creates an OpenCL context from a device type that identifies the specific
+device(s) to use.
+Only devices that are returned by <strong>clGetDeviceIDs</strong> for <em>device_type</em> are
+used to create the context.
+The context does not reference any sub-devices that may have been created
+from these devices.</p>
+</div>
+<div class="paragraph">
+<p><em>properties</em> specifies a list of context property names and their
+corresponding values.
+Each property name is immediately followed by the corresponding desired
+value.
+The list of supported properties is described in the
+<a href="#context-properties-table">Context Properties</a> table.
+<em>properties</em> can also be <code>NULL</code> in which case the platform that is selected
+is implementation-defined.</p>
+</div>
+<div class="paragraph">
+<p><em>device_type</em> is a bit-field that identifies the type of device and is
+described in the <a href="#device-categories-table">Device Categories</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>pfn_notify</em> and <em>user_data</em> are described in <strong>clCreateContext</strong>.</p>
+</div>
+<div class="paragraph">
+<p><em>errcode_ret</em> will return an appropriate error code.
+If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateContextFromType</strong> returns a valid non-zero context and <em>errcode_ret</em>
+is set to CL_SUCCESS if the context is created successfully.
+Otherwise, it returns a <code>NULL</code> value with the following error values
+returned in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_PLATFORM if <em>properties</em> is <code>NULL</code> and no platform could be
+selected or if platform value specified in <em>properties</em> is not a valid
+platform.</p>
+</li>
+<li>
+<p>CL_INVALID_PROPERTY if context property name in <em>properties</em> is not a
+supported property name, if the value specified for a supported property
+name is not valid, or if the same property name is specified more than
+once.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>pfn_notify</em> is <code>NULL</code> but <em>user_data</em> is not
+<code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_INVALID_DEVICE_TYPE if <em>device_type</em> is not a valid value.</p>
+</li>
+<li>
+<p>CL_DEVICE_NOT_AVAILABLE if no devices that match <em>device_type</em> and
+property values specified in <em>properties</em> are currently available.</p>
+</li>
+<li>
+<p>CL_DEVICE_NOT_FOUND if no devices that match <em>device_type</em> and property
+values specified in <em>properties</em> were found.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">10</dt>
+<dd>
+<p><strong>clCreateContextfromType</strong> may return all or a subset of the actual
+physical devices present in the platform and that match device_type.</p>
+</dd>
+</dl>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainContext(cl_context context)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>increments the <em>context</em> reference count.
+<strong>clRetainContext</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid OpenCL context.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clCreateContext</strong> and <strong>clCreateContextFromType</strong> perform an implicit retain.
+This is very helpful for 3<sup>rd</sup> party libraries, which typically get a
+context passed to them by the application.
+However, it is possible that the application may delete the context without
+informing the library.
+Allowing functions to attach to (i.e. retain) and release a context solves
+the problem of a context being used by a library no longer being valid.</p>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseContext(cl_context context)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>decrements the <em>context</em> reference count.
+<strong>clReleaseContext</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid OpenCL context.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>After the <em>context</em> reference count becomes zero and all the objects
+attached to <em>context</em> (such as memory objects, command-queues) are released,
+the <em>context</em> is deleted.
+Using this function to release a reference that was not obtained by creating
+the object or by calling <strong>clRetainContext</strong> causes undefined behavior.</p>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetContextInfo(cl_context context,
+                        cl_context_info param_name,
+                        size_t param_value_size,
+                        <span class="directive">void</span> *param_value,
+                        size_t *param_value_size_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>can be used to query information about a context.</p>
+</div>
+<div class="paragraph">
+<p><em>context</em> specifies the OpenCL context being queried.</p>
+</div>
+<div class="paragraph">
+<p><em>param_name</em> is an enumeration constant that specifies the information to
+query.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value</em> is a pointer to memory where the appropriate result being
+queried is returned.
+If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size</em> specifies the size in bytes of memory pointed to by
+<em>param_value</em>.
+This size must be greater than or equal to the size of return type as
+described in the <a href="#context-info-table">Context Attributes</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
+queried by <em>param_name</em>.
+If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p>The list of supported <em>param_name</em> values and the information returned in
+<em>param_value</em> by <strong>clGetContextInfo</strong> is described in the
+<a href="#context-info-table">Context Attributes</a> table.</p>
+</div>
+<table id="context-info-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 8. List of supported param_names by <strong>clGetContextInfo</strong></caption>
+<colgroup>
+<col style="width: 34%;">
+<col style="width: 33%;">
+<col style="width: 33%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_context_info</strong></th>
+<th class="tableblock halign-left valign-top">Return Type</th>
+<th class="tableblock halign-left valign-top">Information returned in param_value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_REFERENCE_COUNT</strong><sup>11</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the <em>context</em> reference count.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_NUM_DEVICES</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the number of devices in <em>context</em>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_DEVICES</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_id[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the list of devices and sub-devices in <em>context</em>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_PROPERTIES</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context_properties[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the properties argument specified in <strong>clCreateContext</strong> or
+        <strong>clCreateContextFromType</strong>.</p>
+<p class="tableblock">        If the <em>properties</em> argument specified in <strong>clCreateContext</strong> or
+        <strong>clCreateContextFromType</strong> used to create <em>context</em> is not <code>NULL</code>, the
+        implementation must return the values specified in the properties
+        argument.</p>
+<p class="tableblock">        If the <em>properties</em> argument specified in <strong>clCreateContext</strong> or
+        <strong>clCreateContextFromType</strong> used to create <em>context</em> is <code>NULL</code>, the
+        implementation may return either a <em>param_value_size_ret</em> of 0
+        i.e. there is no context property value to be returned or can return
+        a context property value of 0 (where 0 is used to terminate the
+        context properties list) in the memory that <em>param_value</em> points
+        to.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">11</dt>
+<dd>
+<p>The reference count returned should be considered immediately stale.
+It is unsuitable for general use in applications.
+This feature is provided for identifying memory leaks.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p><strong>clGetContextInfo</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
+if size in bytes specified by <em>param_value_size</em> is &lt; size of return
+type as specified in the <a href="#context-info-table">Context Attributes</a>
+table and <em>param_value</em> is not a <code>NULL</code> value.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="opencl-runtime">5. The OpenCL Runtime</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>In this section we describe the API calls that manage OpenCL objects such as
+command-queues, memory objects, program objects, kernel objects for kernel
+functions in a program and calls that allow you to enqueue commands to a
+command-queue such as executing a kernel, reading, or writing a memory
+object.</p>
+</div>
+<div class="sect2">
+<h3 id="_command_queues">5.1. Command Queues</h3>
+<div class="paragraph">
+<p>OpenCL objects such as memory, program and kernel objects are created using
+a context.
+Operations on these objects are performed using a command-queue.
+The command-queue can be used to queue a set of operations (referred to as
+commands) in order.
+Having multiple command-queues allows applications to queue multiple
+independent commands without requiring synchronization.
+Note that this should work as long as these objects are not being shared.
+Sharing of objects across multiple command-queues will require the
+application to perform appropriate synchronization.
+This is described in <a href="#shared-opencl-objects">Shared OpenCL Objects</a></p>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_command_queue clCreateCommandQueueWithProperties(
+    cl_context context,
+    cl_device_id device,
+    <span class="directive">const</span> cl_queue_properties *properties,
+    cl_int *errcode_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>creates a host or device command-queue on a specific device.</p>
+</div>
+<div class="paragraph">
+<p><em>context</em> must be a valid OpenCL context.</p>
+</div>
+<div class="paragraph">
+<p><em>device</em> must be a device or sub-device associated with <em>context</em>.
+It can either be in the list of devices and sub-devices specified when
+<em>context</em> is created using <strong>clCreateContext</strong> or be a root device with the
+same device type as specified when <em>context</em> is created using
+<strong>clCreateContextFromType</strong>.</p>
+</div>
+<div class="paragraph">
+<p><em>properties</em> specifies a list of properties for the command-queue and their
+corresponding values.
+Each property name is immediately followed by the corresponding desired
+value.
+The list is terminated with 0.
+The list of supported properties is described in the table below.
+If a supported property and its value is not specified in <em>properties</em>, its
+default value will be used.
+<em>properties</em> can be <code>NULL</code> in which case the default values for supported
+command-queue properties will be used.</p>
+</div>
+<table id="queue-properties-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 9. List of supported cl_queue_properties values and description</caption>
+<colgroup>
+<col style="width: 34%;">
+<col style="width: 33%;">
+<col style="width: 33%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Queue Properties</strong></th>
+<th class="tableblock halign-left valign-top">Property Value</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_PROPERTIES</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bitfield</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This is a bitfield and can be set to a combination of the following
+        values:</p>
+<p class="tableblock">        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE - Determines whether the
+        commands queued in the command-queue are executed in-order or
+        out-of-order.
+        If set, the commands in the command-queue are executed out-of-order.
+        Otherwise, commands are executed in-order.</p>
+<p class="tableblock">        CL_QUEUE_PROFILING_ENABLE - Enable or disable profiling of commands
+        in the command-queue.
+        If set, the profiling of commands is enabled.
+        Otherwise profiling of commands is disabled.</p>
+<p class="tableblock">        CL_QUEUE_ON_DEVICE - Indicates that this is a device queue.
+        If CL_QUEUE_ON_DEVICE is set,
+        CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE<sup>1</sup> must also be set.</p>
+<p class="tableblock">        CL_QUEUE_ON_DEVICE_DEFAULT<sup>2</sup> --indicates that this is the default
+        device queue.
+        This can only be used with CL_QUEUE_ON_DEVICE.</p>
+<p class="tableblock">        If CL_QUEUE_PROPERTIES is not specified an in-order host command
+        queue is created for the specified device</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_SIZE</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies the size of the device queue in bytes.</p>
+<p class="tableblock">        This can only be specified if CL_QUEUE_ON_DEVICE is set in
+        CL_QUEUE_PROPERTIES.
+        This must be a value ≤ CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE.</p>
+<p class="tableblock">        For best performance, this should be ≤
+        CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE.</p>
+<p class="tableblock">        If CL_QUEUE_SIZE is not specified, the device queue is created with
+        CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE as the size of the queue.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">1</dt>
+<dd>
+<p>Only out-of-order device queues are supported.</p>
+</dd>
+<dt class="hdlist1">2</dt>
+<dd>
+<p>The application must create the default device queue if any kernels
+containing calls to get_default_queue are enqueued.
+There can only be one default device queue for each device within a
+context.
+<strong>clCreateCommandQueueWithProperties</strong> with CL_QUEUE_PROPERTIES set to
+CL_QUEUE_ON_DEVICE or CL_QUEUE_ON_DEVICE_DEFAULT will return the default
+device queue that has already been created and increment its retain
+count by 1.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p><em>errcode_ret</em> will return an appropriate error code.
+If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateCommandQueueWithProperties</strong> returns a valid non-zero command-queue
+and <em>errcode_ret</em> is set to CL_SUCCESS if the command-queue is created
+successfully.
+Otherwise, it returns a <code>NULL</code> value with one of the following error values
+returned in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
+</li>
+<li>
+<p>CL_INVALID_DEVICE if <em>device</em> is not a valid device or is not associated
+with <em>context</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if values specified in <em>properties</em> are not valid.</p>
+</li>
+<li>
+<p>CL_INVALID_QUEUE_PROPERTIES if values specified in <em>properties</em> are
+valid but are not supported by the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clSetDefaultDeviceCommandQueue(cl_context context,
+                                      cl_device_id device,
+                                      cl_command_queue command_queue)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>replaces the default command queue on the <em>device</em>.</p>
+</div>
+<div class="paragraph">
+<p><strong>clSetDefaultDeviceCommandQueue</strong> returns CL_SUCCESS if the function is
+executed successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
+</li>
+<li>
+<p>CL_INVALID_DEVICE if <em>device</em> is not a valid device or is not associated
+with <em>context</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid command-queue
+for <em>device</em>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clSetDefaultDeviceCommandQueue</strong> may be used to replace a default device
+command queue created with <strong>clCreateCommandQueueWithProperties</strong> and the
+CL_QUEUE_ON_DEVICE_DEFAULT flag.</p>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainCommandQueue(cl_command_queue command_queue)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>increments the <em>command_queue</em> reference count.
+<strong>clRetainCommandQueue</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid
+command-queue.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clCreateCommandQueueWithProperties</strong> performs an implicit retain.
+This is very helpful for 3<sup>rd</sup> party libraries, which typically get a
+command-queue passed to them by the application.
+However, it is possible that the application may delete the command-queue
+without informing the library.
+Allowing functions to attach to (i.e. retain) and release a command-queue
+solves the problem of a command-queue being used by a library no longer
+being valid.</p>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseCommandQueue(cl_command_queue command_queue)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>decrements the <em>command_queue</em> reference count.
+<strong>clReleaseCommandQueue</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid
+command-queue.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>After the <em>command_queue</em> reference count becomes zero and all commands
+queued to <em>command_queue</em> have finished (eg.
+kernel-instances, memory object updates etc.), the command-queue is deleted.</p>
+</div>
+<div class="paragraph">
+<p><strong>clReleaseCommandQueue</strong> performs an implicit flush to issue any previously
+queued OpenCL commands in <em>command_queue</em>.
+Using this function to release a reference that was not obtained by creating
+the object or by calling <strong>clRetainCommandQueue</strong> causes undefined behavior.</p>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetCommandQueueInfo(cl_command_queue command_queue,
+                             cl_command_queue_info param_name,
+                             size_t param_value_size,
+                             <span class="directive">void</span> *param_value,
+                             size_t *param_value_size_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>can be used to query information about a command-queue.</p>
+</div>
+<div class="paragraph">
+<p><em>command_queue</em> specifies the command-queue being queried.</p>
+</div>
+<div class="paragraph">
+<p><em>param_name</em> specifies the information to query.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value</em> is a pointer to memory where the appropriate result being
+queried is returned.
+If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size</em> is used to specify the size in bytes of memory pointed to
+by <em>param_value</em>.
+This size must be ≥ size of return type as described in the
+<a href="#command-queue-param-table">Command Queue Parameter</a> table.
+If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
+queried by <em>param_name</em>.
+If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
+</div>
+<div class="paragraph">
+<p>The list of supported <em>param_name</em> values and the information returned in
+<em>param_value</em> by <strong>clGetCommandQueueInfo</strong> is described in the
+<a href="#command-queue-param-table">Command Queue Parameter</a> table.</p>
+</div>
+<table id="command-queue-param-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 10. List of supported param_names by <strong>clGetCommandQueueInfo</strong></caption>
+<colgroup>
+<col style="width: 34%;">
+<col style="width: 33%;">
+<col style="width: 33%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_command_queue_info</strong></th>
+<th class="tableblock halign-left valign-top">Return Type</th>
+<th class="tableblock halign-left valign-top">Information returned in param_value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_CONTEXT</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the context specified when the command-queue is created.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_DEVICE</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_id</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the device specified when the command-queue is created.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_REFERENCE_COUNT</strong><sup>3</sup></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the command-queue reference count.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_PROPERTIES</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue_properties</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the currently specified properties for the command-queue.
+        These properties are specified by the value associated with the
+        CL_COMMAND_QUEUE_PROPERTIES passed in <em>properties</em> argument in
+        <strong>clCreateCommandQueueWithProperties.</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_SIZE</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the currently specified size for the device command-queue.
+        This query is only supported for device command queues.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_DEVICE_DEFAULT</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the current default command queue for the underlying device.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">3</dt>
+<dd>
+<p>The reference count returned should be considered immediately stale.
+It is unsuitable for general use in applications.
+This feature is provided for identifying memory leaks.</p>
+</dd>
+</dl>
+</div>
+<div class="paragraph">
+<p><strong>clGetCommandQueueInfo</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
+if size in bytes specified by <em>param_value_size</em> is &lt; size of return
+type as specified in the <a href="#command-queue-param-table">Command Queue
+Parameter</a> table, and <em>param_value</em> is not a <code>NULL</code> value.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>It is possible that a device(s) becomes unavailable after a context and
+command-queues that use this device(s) have been created and commands have
+been queued to command-queues.
+In this case the behavior of OpenCL API calls that use this context (and
+command-queues) are considered to be implementation-defined.
+The user callback function, if specified, when the context is created can be
+used to record appropriate information in the <em>errinfo</em>, <em>private_info</em>
+arguments passed to the callback function when the device becomes
+unavailable.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_buffer_objects">5.2. Buffer Objects</h3>
+<div class="paragraph">
+<p>A <em>buffer</em> object stores a one-dimensional collection of elements.
+Elements of a <em>buffer</em> object can be a scalar data type (such as an int,
+float), vector data type, or a user-defined structure.</p>
+</div>
+<div class="sect3">
+<h4 id="_creating_buffer_objects">5.2.1. Creating Buffer Objects</h4>
+<div class="paragraph">
+<p>A <strong>buffer object</strong> is created using the following function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_mem clCreateBuffer(cl_context context,
+                      cl_mem_flags flags,
+                      size_t size,
+                      <span class="directive">void</span> *host_ptr,
+                      cl_int *errcode_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><em>context</em> is a valid OpenCL context used to create the buffer object.</p>
+</div>
+<div class="paragraph">
+<p><em>flags</em> is a bit-field that is used to specify allocation and usage
+information such as the memory arena that should be used to allocate the
+buffer object and how it will be used.
+The <a href="#memory-flags-table">Memory Flags</a> table describes the possible values
+for <em>flags</em>.
+If value specified for <em>flags</em> is 0, the default is used which is
+CL_MEM_READ_WRITE.</p>
+</div>
+<table id="memory-flags-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 11. List of supported cl_mem_flags values</caption>
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_mem_flags</strong></th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_READ_WRITE</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the memory object will be read
+    and written by a kernel.
+    This is the default.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_WRITE_ONLY</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the memory object will be
+    written but not read by a kernel.</p>
+<p class="tableblock">    Reading from a buffer or image object created with CL_MEM_WRITE_ONLY
+    inside a kernel is undefined.</p>
+<p class="tableblock">    CL_MEM_READ_WRITE and CL_MEM_WRITE_ONLY are mutually exclusive.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_READ_ONLY</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the memory object is a
+    readonly memory object when used inside a kernel.</p>
+<p class="tableblock">    Writing to a buffer or image object created with CL_MEM_READ_ONLY inside
+    a kernel is undefined.</p>
+<p class="tableblock">    CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY and CL_MEM_READ_ONLY are mutually
+    exclusive.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_USE_HOST_PTR</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag is valid only if host_ptr is not <code>NULL</code>.
+    If specified, it indicates that the application wants the OpenCL
+    implementation to use memory referenced by host_ptr as the storage bits
+    for the memory object.</p>
+<p class="tableblock">    The contents of the memory pointed to by host_ptr at the time of the
+    clCreateBuffer call define the initial contents of the buffer object.</p>
+<p class="tableblock">    OpenCL implementations are allowed to cache the buffer contents pointed
+    to by host_ptr in device memory.
+    This cached copy can be used when kernels are executed on a device.</p>
+<p class="tableblock">    The result of OpenCL commands that operate on multiple buffer objects
+    created with the same host_ptr or from overlapping host or SVM regions
+    is considered to be undefined.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_ALLOC_HOST_PTR</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the application wants the OpenCL implementation
+    to allocate memory from host accessible memory.</p>
+<p class="tableblock">    CL_MEM_ALLOC_HOST_PTR and CL_MEM_USE_HOST_PTR are mutually exclusive.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_COPY_HOST_PTR</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag is valid only if host_ptr is not <code>NULL</code>.
+    If specified, it indicates that the application wants the OpenCL
+    implementation to allocate memory for the memory object and copy the
+    data from memory referenced by host_ptr.
+    The implementation will copy the memory immediately and host_ptr is
+    available for reuse by the application when the <strong>clCreateBuffer</strong> or
+    <strong>clCreateImage</strong> operation returns.</p>
+<p class="tableblock">    CL_MEM_COPY_HOST_PTR and CL_MEM_USE_HOST_PTR are mutually exclusive.</p>
+<p class="tableblock">    CL_MEM_COPY_HOST_PTR can be used with CL_MEM_ALLOC_HOST_PTR to
+    initialize the contents of the cl_mem object allocated using
+    hostaccessible (e.g. PCIe) memory.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_HOST_WRITE_ONLY</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the host will only write to the memory object
+    (using OpenCL APIs that enqueue a write or a map for write).
+    This can be used to optimize write access from the host (e.g. enable
+    write-combined allocations for memory objects for devices that
+    communicate with the host over a system bus such as PCIe).</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_HOST_READ_ONLY</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the host will only read
+    the memory object (using OpenCL APIs that enqueue a read or a map for
+    read).</p>
+<p class="tableblock">    CL_MEM_HOST_WRITE_ONLY and CL_MEM_HOST_READ_ONLY are mutually exclusive.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_HOST_NO_ACCESS</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the host will not read or
+    write the memory object.</p>
+<p class="tableblock">    CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_READ_ONLY and
+    CL_MEM_HOST_NO_ACCESS are mutually exclusive.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p><em>size</em> is the size in bytes of the buffer memory object to be allocated.</p>
+</div>
+<div class="paragraph">
+<p><em>host_ptr</em> is a pointer to the buffer data that may already be allocated by
+the application.
+The size of the buffer that <em>host_ptr</em> points to must be ≥ <em>size</em> bytes.</p>
+</div>
+<div class="paragraph">
+<p>The user is responsible for ensuring that data passed into and out of OpenCL
+images are natively aligned relative to the start of the buffer as per
+kernel language or IL requirements.
+OpenCL buffers created with CL_MEM_USE_HOST_PTR need to provide an
+appropriately aligned host memory pointer that is aligned to the data types
+used to access these buffers in a kernel(s).</p>
+</div>
+<div class="paragraph">
+<p><em>errcode_ret</em> will return an appropriate error code.
+If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
+</div>
+<div class="paragraph">
+<p>If <strong>clCreateBuffer</strong> is called with CL_MEM_USE_HOST_PTR set in its <em>flags</em>
+argument, the contents of the memory pointed to by <em>host_ptr</em> at the time
+of the <strong>clCreateBuffer</strong> call define the initial contents of the
+buffer object.</p>
+</div>
+<div class="paragraph">
+<p>If <strong>clCreateBuffer</strong> is called with a pointer returned by <strong>clSVMAlloc</strong> as its
+<em>host_ptr</em> argument, and CL_MEM_USE_HOST_PTR is set in its <em>flags</em> argument,
+<strong>clCreateBuffer</strong> will succeed and return a valid non-zero buffer object as
+long as the <em>size</em> argument to <strong>clCreateBuffer</strong> is no larger than the <em>size</em>
+argument passed in the original <strong>clSVMAlloc</strong> call.
+The new buffer object returned has the shared memory as the underlying
+storage.
+Locations in the buffers underlying shared memory can be operated on using
+atomic operations to the devices level of support as defined in the memory
+model.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateBuffer</strong> returns a valid non-zero buffer object and <em>errcode_ret</em> is
+set to CL_SUCCESS if the buffer object is created successfully.
+Otherwise, it returns a <code>NULL</code> value with one of the following error values
+returned in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if values specified in <em>flags</em> are not valid as defined
+in the <a href="#memory-flags-table">Memory Flags</a> table.</p>
+</li>
+<li>
+<p>CL_INVALID_BUFFER_SIZE if <em>size</em> is 0<sup>4</sup>.</p>
+</li>
+<li>
+<p>CL_INVALID_HOST_PTR if <em>host_ptr</em> is <code>NULL</code> and CL_MEM_USE_HOST_PTR or
+CL_MEM_COPY_HOST_PTR are set in <em>flags</em> or if <em>host_ptr</em> is not <code>NULL</code>
+but CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in <em>flags</em>.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for buffer object.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">4</dt>
+<dd>
+<p>Implementations may return CL_INVALID_BUFFER_SIZE if size is greater
+than CL_DEVICE_MAX_MEM_ALLOC_SIZE value specified in the
+<a href="#device-queries-table">Device Queries</a> table for all devices in
+context.</p>
+</dd>
+</dl>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_mem clCreateSubBuffer(cl_mem buffer,
+                         cl_mem_flags flags,
+                         cl_buffer_create_type buffer_create_type,
+                         <span class="directive">const</span> <span class="directive">void</span> *buffer_create_info,
+                         cl_int *errcode_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>can be used to create a new buffer object (referred to as a sub-buffer
+object) from an existing buffer object.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer</em> must be a valid buffer object and cannot be a sub-buffer object.</p>
+</div>
+<div class="paragraph">
+<p><em>flags</em> is a bit-field that is used to specify allocation and usage
+information about the sub-buffer memory object being created and is
+described in the <a href="#memory-flags-table">Memory Flags</a> table.
+If the CL_MEM_READ_WRITE, CL_MEM_READ_ONLY or CL_MEM_WRITE_ONLY values are
+not specified in <em>flags</em>, they are inherited from the corresponding memory
+access qualifers associated with <em>buffer</em>.
+The CL_MEM_USE_HOST_PTR, CL_MEM_ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR
+values cannot be specified in <em>flags</em> but are inherited from the
+corresponding memory access qualifiers associated with <em>buffer</em>.
+If CL_MEM_COPY_HOST_PTR is specified in the memory access qualifier values
+associated with <em>buffer</em> it does not imply any additional copies when the
+sub-buffer is created from <em>buffer</em>.
+If the CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY or
+CL_MEM_HOST_NO_ACCESS values are not specified in <em>flags</em>, they are
+inherited from the corresponding memory access qualifiers associated with
+<em>buffer</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer_create_type</em> and <em>buffer_create_info</em> describe the type of buffer
+object to be created.
+The list of supported values for <em>buffer_create_type</em> and corresponding
+descriptor that <em>buffer_create_info</em> points to is described in the
+<a href="#subbuffer-create-info-table">SubBuffer Attributes</a> table.</p>
+</div>
+<table id="subbuffer-create-info-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 12. List of supported names and values in <strong>clCreateSubBuffer</strong></caption>
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_buffer_create_type</strong></th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_BUFFER_CREATE_TYPE_REGION</strong></p></td>
+<td class="tableblock halign-left valign-top"><div><div class="paragraph">
+<p>Create a buffer object that represents a
+    specific region in buffer.</p>
+</div>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>buffer_create_info is a pointer to the following structure:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> <span class="keyword">struct</span> _cl_buffer_region {
+    size_t origin;
+    size_t size;
+} cl_buffer_region;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>(<em>origin</em>, <em>size</em>) defines the offset and size in bytes in buffer.</p>
+</div>
+<div class="paragraph">
+<p>If buffer is created with CL_MEM_USE_HOST_PTR, the host_ptr associated with
+the buffer object returned is <em>host_ptr + origin</em>.</p>
+</div>
+<div class="paragraph">
+<p>The buffer object returned references the data store allocated for buffer
+and points to a specific region given by (origin, size) in this data store.</p>
+</div>
+<div class="paragraph">
+<p>CL_INVALID_VALUE is returned in errcode_ret if the region specified by
+(origin, size) is out of bounds in buffer.</p>
+</div>
+<div class="paragraph">
+<p>CL_INVALID_BUFFER_SIZE if size is 0.</p>
+</div>
+<div class="paragraph">
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET is returned in errcode_ret if there are no
+devices in context associated with buffer for which the origin value is
+aligned to the CL_DEVICE_MEM_BASE_ADDR_ALIGN value.</p>
+</div>
+</div>
+</div></div></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p><strong>clCreateSubBuffer</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object or is a
+sub-buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>buffer</em> was created with CL_MEM_WRITE_ONLY and
+<em>flags</em> specifies CL_MEM_READ_WRITE or CL_MEM_READ_ONLY, or if <em>buffer</em>
+was created with CL_MEM_READ_ONLY and <em>flags</em> specifies
+CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY, or if <em>flags</em> specifies
+CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR or CL_MEM_COPY_HOST_PTR.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>buffer</em> was created with CL_MEM_HOST_WRITE_ONLY and
+<em>flags</em> specify CL_MEM_HOST_READ_ONLY, or if <em>buffer</em> was created with
+CL_MEM_HOST_READ_ONLY and <em>flags</em> specify CL_MEM_HOST_WRITE_ONLY, or if
+<em>buffer</em> was created with CL_MEM_HOST_NO_ACCESS and <em>flags</em> specify
+CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if value specified in <em>buffer_create_type</em> is not
+valid.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if value(s) specified in <em>buffer_create_info</em> (for a
+given <em>buffer_create_type</em>) is not valid or if <em>buffer_create_info</em> is
+<code>NULL</code>.</p>
+</li>
+<li>
+<p>CL_INVALID_BUFFER_SIZE if <em>size</em> is 0.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for sub-buffer object.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>Concurrent reading from, writing to and copying between both a buffer object
+and its sub-buffer object(s) is undefined.
+Concurrent reading from, writing to and copying between overlapping
+sub-buffer objects created with the same buffer object is undefined.
+Only reading from both a buffer object and its sub-buffer objects or reading
+from multiple overlapping sub-buffer objects is defined.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_reading_writing_and_copying_buffer_objects">5.2.2. Reading, Writing and Copying Buffer Objects</h4>
+<div class="paragraph">
+<p>The following functions enqueue commands to read from a buffer object to
+host memory or write to a buffer object from host memory.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueReadBuffer(cl_command_queue command_queue,
+                           cl_mem buffer,
+                           cl_bool blocking_read,
+                           size_t offset,
+                           size_t size,
+                           <span class="directive">void</span> *ptr,
+                           cl_uint num_events_in_wait_list,
+                           <span class="directive">const</span> cl_event *event_wait_list,
+                           cl_event *event)</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueWriteBuffer(cl_command_queue command_queue,
+                            cl_mem buffer,
+                            cl_bool blocking_write,
+                            size_t offset,
+                            size_t size,
+                            <span class="directive">const</span> <span class="directive">void</span> *ptr,
+                            cl_uint num_events_in_wait_list,
+                            <span class="directive">const</span> cl_event *event_wait_list,
+                            cl_event *event)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><em>command_queue</em> is a valid host command-queue in which the read / write
+command will be queued.
+<em>command_queue</em> and <em>buffer</em> must be created with the same OpenCL context.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer</em> refers to a valid buffer object.</p>
+</div>
+<div class="paragraph">
+<p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
+operations are <em>blocking</em> or <em>non-blocking</em>.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
+<strong>clEnqueueReadBuffer</strong> does not return until the buffer data has been read
+and copied into memory pointed to by <em>ptr</em>.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
+<strong>clEnqueueReadBuffer</strong> queues a non-blocking read command and returns.
+The contents of the buffer that <em>ptr</em> points to cannot be used until the
+read command has completed.
+The <em>event</em> argument returns an event object which can be used to query the
+execution status of the read command.
+When the read command has completed, the contents of the buffer that <em>ptr</em>
+points to can be used by the application.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_write</em> is CL_TRUE, the OpenCL implementation copies the data
+referred to by <em>ptr</em> and enqueues the write operation in the command-queue.
+The memory pointed to by <em>ptr</em> can be reused by the application after the
+<strong>clEnqueueWriteBuffer</strong> call returns.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use <em>ptr</em> to
+perform a non-blocking write.
+As the write is non-blocking the implementation can return immediately.
+The memory pointed to by <em>ptr</em> cannot be reused by the application after the
+call returns.
+The <em>event</em> argument returns an event object which can be used to query the
+execution status of the write command.
+When the write command has completed, the memory pointed to by <em>ptr</em> can
+then be reused by the application.</p>
+</div>
+<div class="paragraph">
+<p><em>offset</em> is the offset in bytes in the buffer object to read from or write
+to.</p>
+</div>
+<div class="paragraph">
+<p><em>size</em> is the size in bytes of data being read or written.</p>
+</div>
+<div class="paragraph">
+<p><em>ptr</em> is the pointer to buffer in host memory where data is to be read into
+or to be written from.</p>
+</div>
+<div class="paragraph">
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
+complete before this particular command can be executed.
+If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
+on any event to complete.
+If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
+If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
+<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
+greater than 0.
+The events specified in <em>event_wait_list</em> act as synchronization points.
+The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
+must be the same.
+The memory associated with <em>event_wait_list</em> can be reused or freed after
+the function returns.</p>
+</div>
+<div class="paragraph">
+<p><em>event</em> returns an event object that identifies this particular read / write
+command and can be used to query or queue a wait for this particular command
+to complete.
+<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
+application to query the status of this command or queue a wait for this
+command to complete.
+If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
+<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
+array.</p>
+</div>
+<div class="paragraph">
+<p><strong>clEnqueueReadBuffer</strong> and <strong>clEnqueueWriteBuffer</strong> return CL_SUCCESS if the
+function is executed successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
+<em>buffer</em> are not the same or if the context associated with
+<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if the region being read or written specified by
+(<em>offset</em>, <em>size</em>) is out of bounds or if <em>ptr</em> is a <code>NULL</code> value.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
+<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
+are not valid events.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
+<em>offset</em> specified when the sub-buffer object is created is not aligned
+to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with
+<em>queue</em>.</p>
+</li>
+<li>
+<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
+operations are blocking and the execution status of any of the events in
+<em>event_wait_list</em> is a negative integer value.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with <em>buffer</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if <strong>clEnqueueReadBuffer</strong> is called on <em>buffer</em>
+which has been created with CL_MEM_HOST_WRITE_ONLY or
+CL_MEM_HOST_NO_ACCESS.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if <strong>clEnqueueWriteBuffer</strong> is called on <em>buffer</em>
+which has been created with CL_MEM_HOST_READ_ONLY or
+CL_MEM_HOST_NO_ACCESS.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The following functions enqueue commands to read a 2D or 3D rectangular
+region from a buffer object to host memory or write a 2D or 3D rectangular
+region to a buffer object from host memory.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueReadBufferRect(cl_command_queue command_queue,
+                               cl_mem buffer,
+                               cl_bool blocking_read,
+                               <span class="directive">const</span> size_t *buffer_origin,
+                               <span class="directive">const</span> size_t *host_origin,
+                               <span class="directive">const</span> size_t *region,
+                               size_t buffer_row_pitch,
+                               size_t buffer_slice_pitch,
+                               size_t host_row_pitch,
+                               size_t host_slice_pitch,
+                               <span class="directive">void</span> *ptr,
+                               cl_uint num_events_in_wait_list,
+                               <span class="directive">const</span> cl_event *event_wait_list,
+                               cl_event *event)</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueWriteBufferRect(cl_command_queue command_queue,
+                                cl_mem buffer,
+                                cl_bool blocking_write,
+                                <span class="directive">const</span> size_t *buffer_origin,
+                                <span class="directive">const</span> size_t *host_origin,
+                                <span class="directive">const</span> size_t *region,
+                                size_t buffer_row_pitch,
+                                size_t buffer_slice_pitch,
+                                size_t host_row_pitch,
+                                size_t host_slice_pitch,
+                                <span class="directive">const</span> <span class="directive">void</span> *ptr,
+                                cl_uint num_events_in_wait_list,
+                                <span class="directive">const</span> cl_event *event_wait_list,
+                                cl_event *event)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><em>command_queue</em> refers is a valid host command-queue in which the read /
+write command will be queued.
+<em>command_queue</em> and <em>buffer</em> must be created with the same OpenCL context.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer</em> refers to a valid buffer object.</p>
+</div>
+<div class="paragraph">
+<p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
+operations are <em>blocking</em> or <em>non-blocking</em>.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
+<strong>clEnqueueReadBufferRect</strong> does not return until the buffer data has been
+read and copied into memory pointed to by <em>ptr</em>.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
+<strong>clEnqueueReadBufferRect</strong> queues a non-blocking read command and returns.
+The contents of the buffer that <em>ptr</em> points to cannot be used until the
+read command has completed.
+The <em>event</em> argument returns an event object which can be used to query the
+execution status of the read command.
+When the read command has completed, the contents of the buffer that <em>ptr</em>
+points to can be used by the application.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_write</em> is CL_TRUE, the OpenCL implementation copies the data
+referred to by <em>ptr</em> and enqueues the write operation in the command-queue.
+The memory pointed to by <em>ptr</em> can be reused by the application after the
+<strong>clEnqueueWriteBufferRect</strong> call returns.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use <em>ptr</em> to
+perform a non-blocking write.
+As the write is non-blocking the implementation can return immediately.
+The memory pointed to by <em>ptr</em> cannot be reused by the application after the
+call returns.
+The <em>event</em> argument returns an event object which can be used to query the
+execution status of the write command.
+When the write command has completed, the memory pointed to by <em>ptr</em> can
+then be reused by the application.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
+associated with <em>buffer</em>.
+For a 2D rectangle region, the <em>z</em> value given by <em>buffer_origin</em>[2] should
+be 0.
+The offset in bytes is computed as <em>buffer_origin</em>[2] ×
+<em>buffer_slice_pitch</em> + <em>buffer_origin</em>[1] × <em>buffer_row_pitch</em> + 
+<em>buffer_origin</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>host_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
+pointed to by <em>ptr</em>.
+For a 2D rectangle region, the <em>z</em> value given by <em>host_origin</em>[2] should be
+0.
+The offset in bytes is computed as <em>host_origin</em>[2] ×
+<em>host_slice_pitch</em> + <em>host_origin</em>[1] × <em>host_row_pitch</em> + 
+<em>host_origin</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>region</em> defines the (<em>width</em> in bytes, <em>height</em> in rows, <em>depth</em> in slices)
+of the 2D or 3D rectangle being read or written.
+For a 2D rectangle copy, the <em>depth</em> value given by <em>region</em>[2] should be 1.
+The values in region cannot be 0.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer_row_pitch</em> is the length of each row in bytes to be used for the
+memory region associated with <em>buffer</em>.
+If <em>buffer_row_pitch</em> is 0, <em>buffer_row_pitch</em> is computed as <em>region</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>buffer_slice_pitch</em> is the length of each 2D slice in bytes to be used for
+the memory region associated with <em>buffer</em>.
+If <em>buffer_slice_pitch</em> is 0, <em>buffer_slice_pitch</em> is computed as
+<em>region</em>[1] × <em>buffer_row_pitch</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>host_row_pitch</em> is the length of each row in bytes to be used for the
+memory region pointed to by <em>ptr</em>.
+If <em>host_row_pitch</em> is 0, <em>host_row_pitch</em> is computed as <em>region</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>host_slice_pitch</em> is the length of each 2D slice in bytes to be used for
+the memory region pointed to by <em>ptr</em>.
+If <em>host_slice_pitch</em> is 0, <em>host_slice_pitch</em> is computed as <em>region</em>[1]
+× <em>host_row_pitch</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>ptr</em> is the pointer to buffer in host memory where data is to be read into
+or to be written from.</p>
+</div>
+<div class="paragraph">
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
+complete before this particular command can be executed.
+If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
+on any event to complete.
+If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
+If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
+<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
+greater than 0.
+The events specified in <em>event_wait_list</em> act as synchronization points.
+The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
+must be the same.
+The memory associated with <em>event_wait_list</em> can be reused or freed after
+the function returns.</p>
+</div>
+<div class="paragraph">
+<p><em>event</em> returns an event object that identifies this particular read / write
+command and can be used to query or queue a wait for this particular command
+to complete.
+<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
+application to query the status of this command or queue a wait for this
+command to complete.
+If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
+<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
+array.</p>
+</div>
+<div class="paragraph">
+<p><strong>clEnqueueReadBufferRect</strong> and <strong>clEnqueueWriteBufferRect</strong> return CL_SUCCESS
+if the function is executed successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
+<em>buffer</em> are not the same or if the context associated with
+<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if the region being read or written specified by
+(<em>buffer_origin</em>, <em>region</em>, <em>buffer_row_pitch</em>, <em>buffer_slice_pitch</em>) is
+out of bounds.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>ptr</em> is a <code>NULL</code> value.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if any <em>region</em> array element is 0.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>buffer_row_pitch</em> is not 0 and is less than
+<em>region</em>[0].</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>host_row_pitch</em> is not 0 and is less than
+<em>region</em>[0].</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>buffer_slice_pitch</em> is not 0 and is less than
+<em>region</em>[1] × <em>buffer_row_pitch</em> and not a multiple of
+<em>buffer_row_pitch</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>host_slice_pitch</em> is not 0 and is less than
+<em>region</em>[1] × <em>host_row_pitch</em> and not a multiple of
+<em>host_row_pitch</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
+<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
+are not valid events.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
+<em>offset</em> specified when the sub-buffer object is created is not aligned
+to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with
+<em>queue</em>.</p>
+</li>
+<li>
+<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
+operations are blocking and the execution status of any of the events in
+<em>event_wait_list</em> is a negative integer value.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with <em>buffer</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if <strong>clEnqueueReadBufferRect</strong> is called on <em>buffer</em>
+which has been created with CL_MEM_HOST_WRITE_ONLY or
+CL_MEM_HOST_NO_ACCESS.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if <strong>clEnqueueWriteBufferRect</strong> is called on <em>buffer</em>
+which has been created with CL_MEM_HOST_READ_ONLY or
+CL_MEM_HOST_NO_ACCESS.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>Calling <strong>clEnqueueReadBuffer</strong> to read a region of the buffer object with the
+<em>ptr</em> argument value set to <em>host_ptr</em> + <em>offset</em>, where <em>host_ptr</em> is a
+pointer to the memory region specified when the buffer object being read is
+created with CL_MEM_USE_HOST_PTR, must meet the following requirements in
+order to avoid undefined behavior:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>All commands that use this buffer object or a memory object (buffer or
+image) created from this buffer object have finished execution before
+the read command begins execution.</p>
+</li>
+<li>
+<p>The buffer object or memory objects created from this buffer object are
+not mapped.</p>
+</li>
+<li>
+<p>The buffer object or memory objects created from this buffer object are
+not used by any command-queue until the read command has finished
+execution.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Calling <strong>clEnqueueReadBufferRect</strong> to read a region of the buffer object with
+the <em>ptr</em> argument value set to <em>host_ptr</em> and <em>host_origin</em>,
+<em>buffer_origin</em> values are the same, where <em>host_ptr</em> is a pointer to the
+memory region specified when the buffer object being read is created with
+CL_MEM_USE_HOST_PTR, must meet the same requirements given above for
+<strong>clEnqueueReadBuffer</strong>.</p>
+</div>
+<div class="paragraph">
+<p>Calling <strong>clEnqueueWriteBuffer</strong> to update the latest bits in a region of the
+buffer object with the <em>ptr</em> argument value set to <em>host_ptr</em> + <em>offset</em>,
+where <em>host_ptr</em> is a pointer to the memory region specified when the buffer
+object being written is created with CL_MEM_USE_HOST_PTR, must meet the
+following requirements in order to avoid undefined behavior:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The host memory region given by (<em>host_ptr</em> + <em>offset</em>, <em>cb</em>) contains
+the latest bits when the enqueued write command begins execution.</p>
+</li>
+<li>
+<p>The buffer object or memory objects created from this buffer object are
+not mapped.</p>
+</li>
+<li>
+<p>The buffer object or memory objects created from this buffer object are
+not used by any command-queue until the write command has finished
+execution.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Calling <strong>clEnqueueWriteBufferRect</strong> to update the latest bits in a region of
+the buffer object with the <em>ptr</em> argument value set to <em>host_ptr</em> and
+<em>host_origin</em>, <em>buffer_origin</em> values are the same, where <em>host_ptr</em> is a
+pointer to the memory region specified when the buffer object being written
+is created with CL_MEM_USE_HOST_PTR, must meet the following requirements in
+order to avoid undefined behavior:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The host memory region given by (<em>buffer_origin region</em>) contains the
+latest bits when the enqueued write command begins execution.</p>
+</li>
+<li>
+<p>The buffer object or memory objects created from this buffer object are
+not mapped.</p>
+</li>
+<li>
+<p>The buffer object or memory objects created from this buffer object are
+not used by any command-queue until the write command has finished
+execution.</p>
+</li>
+</ul>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueCopyBuffer(cl_command_queue command_queue,
+                           cl_mem src_buffer,
+                           cl_mem dst_buffer,
+                           size_t src_offset,
+                           size_t dst_offset,
+                           size_t size,
+                           cl_uint num_events_in_wait_list,
+                           <span class="directive">const</span> cl_event *event_wait_list,
+                           cl_event *event)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>enqueues a command to copy a buffer object identified by <em>src_buffer</em> to
+another buffer object identified by <em>dst_buffer</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>command_queue</em> refers to a host command-queue in which the copy command
+will be queued.
+The OpenCL context associated with <em>command_queue</em>, <em>src_buffer</em> and
+<em>dst_buffer</em> must be the same.</p>
+</div>
+<div class="paragraph">
+<p><em>src_offset</em> refers to the offset where to begin copying data from
+<em>src_buffer</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>dst_offset</em> refers to the offset where to begin copying data into
+<em>dst_buffer</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>size</em> refers to the size in bytes to copy.</p>
+</div>
+<div class="paragraph">
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
+complete before this particular command can be executed.
+If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
+on any event to complete.
+If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
+If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
+<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
+greater than 0.
+The events specified in <em>event_wait_list</em> act as synchronization points.
+The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
+must be the same.
+The memory associated with <em>event_wait_list</em> can be reused or freed after
+the function returns.</p>
+</div>
+<div class="paragraph">
+<p><em>event</em> returns an event object that identifies this particular copy command
+and can be used to query or queue a wait for this particular command to
+complete.
+<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
+application to query the status of this command or queue a wait for this
+command to complete.
+<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
+If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
+<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
+array.</p>
+</div>
+<div class="paragraph">
+<p><strong>clEnqueueCopyBuffer</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>,
+<em>src_buffer</em> and <em>dst_buffer</em> are not the same or if the context
+associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
+the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>src_buffer</em> and <em>dst_buffer</em> are not valid
+buffer objects.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>src_offset</em>, <em>dst_offset</em>, <em>size</em>, <em>src_offset</em>
++  <em>size</em> or <em>dst_offset</em> + <em>size</em> require accessing elements
+outside the <em>src_buffer</em> and <em>dst_buffer</em> buffer objects respectively.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
+<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
+are not valid events.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
+and <em>offset</em> specified when the sub-buffer object is created is not
+aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
+with <em>queue</em>.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
+and <em>offset</em> specified when the sub-buffer object is created is not
+aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
+with <em>queue</em>.</p>
+</li>
+<li>
+<p>CL_MEM_COPY_OVERLAP if <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer
+or sub-buffer object and the source and destination regions overlap or
+if <em>src_buffer</em> and <em>dst_buffer</em> are different sub-buffers of the same
+associated buffer object and they overlap.
+The regions overlap if <em>src_offset</em> ≤ <em>dst_offset</em> ≤
+<em>src_offset</em> + <em>size</em> 1 or if <em>dst_offset</em> ≤ <em>src_offset</em> ≤
+<em>dst_offset</em> + <em>size</em> 1.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with <em>src_buffer</em> or <em>dst_buffer</em>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueCopyBufferRect(cl_command_queue command_queue,
+                               cl_mem src_buffer,
+                               cl_mem dst_buffer,
+                               <span class="directive">const</span> size_t *src_origin,
+                               <span class="directive">const</span> size_t *dst_origin,
+                               <span class="directive">const</span> size_t *region,
+                               size_t src_row_pitch,
+                               size_t src_slice_pitch,
+                               size_t dst_row_pitch,
+                               size_t dst_slice_pitch,
+                               cl_uint num_events_in_wait_list,
+                               <span class="directive">const</span> cl_event *event_wait_list,
+                               cl_event *event)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>enqueues a command to copy a 2D or 3D rectangular region from the buffer
+object identified by <em>src_buffer</em> to a 2D or 3D region in the buffer object
+identified by <em>dst_buffer</em>.
+Copying begins at the source offset and destination offset which are
+computed as described below in the description for <em>src_origin</em> and
+<em>dst_origin</em>.
+Each byte of the region&#8217;s width is copied from the source offset to the
+destination offset.
+After copying each width, the source and destination offsets are incremented
+by their respective source and destination row pitches.
+After copying each 2D rectangle, the source and destination offsets are
+incremented by their respective source and destination slice pitches.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>If <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer object, <em>src_row_pitch</em>
+must equal <em>dst_row_pitch</em> and <em>src_slice_pitch</em> must equal
+<em>dst_slice_pitch</em>.</p>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p><em>command_queue</em> refers to the host command-queue in which the copy command
+will be queued.
+The OpenCL context associated with <em>command_queue</em>, <em>src_buffer</em> and
+<em>dst_buffer</em> must be the same.</p>
+</div>
+<div class="paragraph">
+<p><em>src_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
+associated with <em>src_buffer</em>.
+For a 2D rectangle region, the <em>z</em> value given by <em>src_origin</em>[2] should be
+0.
+The offset in bytes is computed as <em>src_origin</em>[2] × <em>src_slice_pitch</em>
++ <em>src_origin</em>[1] × <em>src_row_pitch</em> + <em>src_origin</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>dst_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
+associated with <em>dst_buffer</em>.
+For a 2D rectangle region, the <em>z</em> value given by <em>dst_origin</em>[2] should be
+0.
+The offset in bytes is computed as <em>dst_origin</em>[2] × <em>dst_slice_pitch</em>
++ <em>dst_origin</em>[1] × <em>dst_row_pitch</em> + <em>dst_origin</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>region</em> defines the (<em>width</em> in bytes, <em>height</em> in rows, <em>depth</em> in slices)
+of the 2D or 3D rectangle being copied.
+For a 2D rectangle, the <em>depth</em> value given by <em>region</em>[2] should be 1.
+The values in region cannot be 0.</p>
+</div>
+<div class="paragraph">
+<p><em>src_row_pitch</em> is the length of each row in bytes to be used for the memory
+region associated with <em>src_buffer</em>.
+If <em>src_row_pitch</em> is 0, <em>src_row_pitch</em> is computed as <em>region</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>src_slice_pitch</em> is the length of each 2D slice in bytes to be used for the
+memory region associated with <em>src_buffer</em>.
+If <em>src_slice_pitch</em> is 0, <em>src_slice_pitch</em> is computed as <em>region</em>[1]
+× <em>src_row_pitch</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>dst_row_pitch</em> is the length of each row in bytes to be used for the memory
+region associated with <em>dst_buffer</em>.
+If <em>dst_row_pitch</em> is 0, <em>dst_row_pitch</em> is computed as <em>region</em>[0].</p>
+</div>
+<div class="paragraph">
+<p><em>dst_slice_pitch</em> is the length of each 2D slice in bytes to be used for the
+memory region associated with <em>dst_buffer</em>.
+If <em>dst_slice_pitch</em> is 0, <em>dst_slice_pitch</em> is computed as <em>region</em>[1]
+× <em>dst_row_pitch</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
+complete before this particular command can be executed.
+If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
+on any event to complete.
+If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
+If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
+<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
+greater than 0.
+The events specified in <em>event_wait_list</em> act as synchronization points.
+The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
+must be the same.
+The memory associated with <em>event_wait_list</em> can be reused or freed after
+the function returns.</p>
+</div>
+<div class="paragraph">
+<p><em>event</em> returns an event object that identifies this particular copy command
+and can be used to query or queue a wait for this particular command to
+complete.
+<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
+application to query the status of this command or queue a wait for this
+command to complete.
+<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
+If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
+<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
+array.</p>
+</div>
+<div class="paragraph">
+<p><strong>clEnqueueCopyBufferRect</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>,
+<em>src_buffer</em> and <em>dst_buffer</em> are not the same or if the context
+associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
+the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>src_buffer</em> and <em>dst_buffer</em> are not valid
+buffer objects.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if (<em>src_origin, region, src_row_pitch,
+src_slice_pitch</em>) or (<em>dst_origin, region, dst_row_pitch,
+dst_slice_pitch</em>) require accessing elements outside the <em>src_buffer</em>
+and <em>dst_buffer</em> buffer objects respectively.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if any <em>region</em> array element is 0.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>src_row_pitch</em> is not 0 and is less than
+<em>region</em>[0].</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>dst_row_pitch</em> is not 0 and is less than
+<em>region</em>[0].</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>src_slice_pitch</em> is not 0 and is less than
+<em>region</em>[1] × <em>src_row_pitch</em> or if <em>src_slice_pitch</em> is not 0 and
+is not a multiple of <em>src_row_pitch</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>dst_slice_pitch</em> is not 0 and is less than
+<em>region</em>[1] × <em>dst_row_pitch</em> or if <em>dst_slice_pitch</em> is not 0 and
+is not a multiple of <em>dst_row_pitch</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer
+object and <em>src_slice_pitch</em> is not equal to <em>dst_slice_pitch</em> and
+<em>src_row_pitch</em> is not equal to <em>dst_row_pitch</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
+<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
+are not valid events.</p>
+</li>
+<li>
+<p>CL_MEM_COPY_OVERLAP if <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer
+or sub-buffer object and the source and destination regions overlap or
+if <em>src_buffer</em> and <em>dst_buffer</em> are different sub-buffers of the same
+associated buffer object and they overlap.
+Refer to <a href="#check-copy-overlap">CL_MEM_COPY_OVERLAP</a> for details on how
+to determine if source and destination regions overlap.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
+and <em>offset</em> specified when the sub-buffer object is created is not
+aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
+with <em>queue</em>.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
+and <em>offset</em> specified when the sub-buffer object is created is not
+aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
+with <em>queue</em>.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with <em>src_buffer</em> or <em>dst_buffer</em>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_filling_buffer_objects">5.2.3. Filling Buffer Objects</h4>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueFillBuffer(cl_command_queue command_queue,
+                           cl_mem buffer,
+                           <span class="directive">const</span> <span class="directive">void</span> *pattern,
+                           size_t pattern_size,
+                           size_t offset,
+                           size_t size,
+                           cl_uint num_events_in_wait_list,
+                           <span class="directive">const</span> cl_event *event_wait_list,
+                           cl_event *event)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>enqueues a command to fill a buffer object with a pattern of a given pattern
+size.
+The usage information which indicates whether the memory object can be read
+or written by a kernel and/or the host and is given by the cl_mem_flags
+argument value specified when <em>buffer</em> is created is ignored by
+<strong>clEnqueueFillBuffer</strong>.</p>
+</div>
+<div class="paragraph">
+<p><em>command_queue</em> refers to the host command-queue in which the fill command
+will be queued.
+The OpenCL context associated with <em>command_queue</em> and <em>buffer</em> must be the
+same.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer</em> is a valid buffer object.</p>
+</div>
+<div class="paragraph">
+<p><em>pattern</em> is a pointer to the data pattern of size <em>pattern_size</em> in bytes.
+<em>pattern</em> will be used to fill a region in <em>buffer</em> starting at <em>offset</em> and
+is <em>size</em> bytes in size.
+The data pattern must be a scalar or vector integer or floating-point data
+type supported by OpenCL as described in <a href="#scalar-data-types">Shared
+Application Scalar Data Types</a> and <a href="#vector-data-types">Supported
+Application Vector Data Types</a>.
+For example, if <em>buffer</em> is to be filled with a pattern of <code>float4</code> values,
+then <em>pattern</em> will be a pointer to a <code>cl_float4</code> value and <em>pattern_size</em>
+will be <code>sizeof(cl_float4)</code>.
+The maximum value of <em>pattern_size</em> is the size of the largest integer or
+floating-point vector data type supported by the OpenCL device.
+The memory associated with <em>pattern</em> can be reused or freed after the
+function returns.</p>
+</div>
+<div class="paragraph">
+<p><em>offset</em> is the location in bytes of the region being filled in <em>buffer</em> and
+must be a multiple of <em>pattern_size</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>size</em> is the size in bytes of region being filled in <em>buffer</em> and must be a
+multiple of <em>pattern_size</em>.</p>
+</div>
+<div class="paragraph">
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
+complete before this particular command can be executed.
+If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
+on any event to complete.
+If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
+If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
+<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
+greater than 0.
+The events specified in <em>event_wait_list</em> act as synchronization points.
+The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
+must be the same.
+The memory associated with <em>event_wait_list</em> can be reused or freed after
+the function returns.</p>
+</div>
+<div class="paragraph">
+<p><em>event</em> returns an event object that identifies this particular command and
+can be used to query or queue a wait for this particular command to
+complete.
+<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
+application to query the status of this command or queue a wait for this
+command to complete.
+<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
+If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
+<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
+array.</p>
+</div>
+<div class="paragraph">
+<p><strong>clEnqueueFillBuffer</strong> returns CL_SUCCESS if the function is executed
+successfully.
+Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
+<em>buffer</em> are not the same or if the context associated with
+<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>offset</em> or <em>offset</em> + <em>size</em> require accessing
+elements outside the <em>buffer</em> buffer object respectively.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>pattern</em> is <code>NULL</code> or if <em>pattern_size</em> is 0 or if
+<em>pattern_size</em> is not one of { 1, 2, 4, 8, 16, 32, 64, 128 }.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>offset</em> and <em>size</em> are not a multiple of
+<em>pattern_size</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
+<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
+are not valid events.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
+offset specified when the sub-buffer object is created is not aligned to
+CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with <em>queue</em>.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with <em>buffer</em>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_mapping_buffer_objects">5.2.4. Mapping Buffer Objects</h4>
+<div class="paragraph">
+<p>The function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="directive">void</span> clEnqueueMapBuffer(cl_command_queue command_queue,
+                        cl_mem buffer,
+                        cl_bool blocking_map,
+                        cl_map_flags map_flags,
+                        size_t offset,
+                        size_t size,
+                        cl_uint num_events_in_wait_list,
+                        <span class="directive">const</span> cl_event *event_wait_list,
+                        cl_event *event,
+                        cl_int *errcode_ret)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>enqueues a command to map a region of the buffer object given by <em>buffer</em>
+into the host address space and returns a pointer to this mapped region.</p>
+</div>
+<div class="paragraph">
+<p><em>command_queue</em> must be a valid host command-queue.</p>
+</div>
+<div class="paragraph">
+<p><em>blocking_map</em> indicates if the map operation is <em>blocking</em> or
+<em>non-blocking</em>.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_map</em> is CL_TRUE, <strong>clEnqueueMapBuffer</strong> does not return until the
+specified region in <em>buffer</em> is mapped into the host address space and the
+application can access the contents of the mapped region using the pointer
+returned by <strong>clEnqueueMapBuffer</strong>.</p>
+</div>
+<div class="paragraph">
+<p>If <em>blocking_map</em> is CL_FALSE i.e. map operation is non-blocking, the
+pointer to the mapped region returned by <strong>clEnqueueMapBuffer</strong> cannot be used
+until the map command has completed.
+The <em>event</em> argument returns an event object which can be used to query the
+execution status of the map command.
+When the map command is completed, the application can access the contents
+of the mapped region using the pointer returned by <strong>clEnqueueMapBuffer</strong>.</p>
+</div>
+<div class="paragraph">
+<p><em>map_flags</em> is a bit-field and is described in the
+<a href="#memory-map-flags-table">Memory Map Flags</a> table.</p>
+</div>
+<div class="paragraph">
+<p><em>buffer</em> is a valid buffer object.
+The OpenCL context associated with <em>command_queue</em> and <em>buffer</em> must be the
+same.</p>
+</div>
+<div class="paragraph">
+<p><em>offset</em> and <em>size</em> are the offset in bytes and the size of the region in
+the buffer object that is being mapped.</p>
+</div>
+<div class="paragraph">
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
+complete before this particular command can be executed.
+If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
+on any event to complete.
+If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
+If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
+<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
+greater than 0.
+The events specified in <em>event_wait_list</em> act as synchronization points.
+The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
+must be the same.
+The memory associated with <em>event_wait_list</em> can be reused or freed after
+the function returns.</p>
+</div>
+<div class="paragraph">
+<p><em>event</em> returns an event object that identifies this particular command and
+can be used to query or queue a wait for this particular command to
+complete.
+<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
+application to query the status of this command or queue a wait for this
+command to complete.
+If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
+<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
+array.</p>
+</div>
+<div class="paragraph">
+<p><em>errcode_ret</em> will return an appropriate error code.
+If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
+</div>
+<div class="paragraph">
+<p><strong>clEnqueueMapBuffer</strong> will return a pointer to the mapped region.
+The <em>errcode_ret</em> is set to CL_SUCCESS.</p>
+</div>
+<div class="paragraph">
+<p>A <code>NULL</code> pointer is returned otherwise with one of the following error
+values returned in <em>errcode_ret</em>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if context associated with <em>command_queue</em> and
+<em>buffer</em> are not the same or if the context associated with
+<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if region being mapped given by (<em>offset</em>, <em>size</em>) is
+out of bounds or if <em>size</em> is 0 or if values specified in <em>map_flags</em>
+are not valid.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
+<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
+are not valid events.</p>
+</li>
+<li>
+<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
+<em>offset</em> specified when the sub-buffer object is created is not aligned
+to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for the device associated with
+<em>queue</em>.</p>
+</li>
+<li>
+<p>CL_MAP_FAILURE if there is a failure to map the requested region into
+the host address space.
+This error cannot occur for buffer objects created with
+CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR.</p>
+</li>
+<li>
+<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the map operation is
+blocking and the execution status of any of the events in
+<em>event_wait_list</em> is a negative integer value.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with <em>buffer</em>.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if buffer_ has been created with
+CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_READ is set
+in <em>map_flags</em> or if <em>buffer</em> has been created with
+CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_WRITE or
+CL_MAP_WRITE_INVALIDATE_REGION is set in <em>map_flags</em>.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
+by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if mapping would lead to overlapping regions being
+mapped for writing.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The pointer returned maps a region starting at <em>offset</em> and is at least
+<em>size</em> bytes in size.
+The result of a memory access outside this region is undefined.</p>
+</div>
+<div class="paragraph">
+<p>If the buffer object is created with CL_MEM_USE_HOST_PTR set in <em>mem_flags</em>,
+the following will be true:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The <em>host_ptr</em> specified in <strong>clCreateBuffer</strong> to contain the latest bits
+in the region being mapped when the <strong>clEnqueueMapBuffer</strong> command has
+completed.</p>
+</li>
+<li>
+<p>The pointer value returned by <strong>clEnqueueMapBuffer</strong> will be derived from
+the <em>host_ptr</em> specified when the buffer object is created.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Mapped buffer objects are unmapped using <strong>clEnqueueUnmapMemObject</strong>.
+This is described in <a href="#unmapping-mapped-memory">Unmapping Mapped Memory
+Objects</a>.</p>
+</div>
+<table id="memory-map-flags-table" class="tableblock frame-all grid-all spread">
+<caption class="title">Table 13. List of supported cl_map_flags values</caption>
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>cl_map_flags</strong></th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MAP_READ</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the region being mapped in the memory object is
+    being mapped for reading.</p>
+<p class="tableblock">    The pointer returned by <strong>clEnqueueMapBuffer</strong> (<strong>clEnqueueMapImage</strong>) is
+    guaranteed to contain the latest bits in the region being mapped when
+    the <strong>clEnqueueMapBuffer</strong> (<strong>clEnqueueMapImage</strong>) command has completed.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left vali