specs/opencl-2.2.html - external/github.com/KhronosGroup/OpenCL-Registry - Git at Google

 <!DOCTYPE html>
 <html lang="en">
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 <meta name="generator" content="AsciiDoc 8.6.9">
 <title>The OpenCL Specification</title>
 <style type="text/css">
 /* Shared CSS for AsciiDoc xhtml11 and html5 backends */

 /* Default font. */
 body {
   font-family: Georgia,serif;
 }

 /* Title font. */
 h1, h2, h3, h4, h5, h6,
 div.title, caption.title,
 thead, p.table.header,
 #toctitle,
 #author, #revnumber, #revdate, #revremark,
 #footer {
   font-family: Arial,Helvetica,sans-serif;
 }

 body {
   margin: 1em 5% 1em 5%;
 }

 a {
   color: blue;
   text-decoration: underline;
 }
 a:visited {
   color: fuchsia;
 }

 em {
   font-style: italic;
   color: navy;
 }

 strong {
   font-weight: bold;
   color: #083194;
 }

 h1, h2, h3, h4, h5, h6 {
   color: #527bbd;
   margin-top: 1.2em;
   margin-bottom: 0.5em;
   line-height: 1.3;
 }

 h1, h2, h3 {
   border-bottom: 2px solid silver;
 }
 h2 {
   padding-top: 0.5em;
 }
 h3 {
   float: left;
 }
 h3 + * {
   clear: left;
 }
 h5 {
   font-size: 1.0em;
 }

 div.sectionbody {
   margin-left: 0;
 }

 hr {
   border: 1px solid silver;
 }

 p {
   margin-top: 0.5em;
   margin-bottom: 0.5em;
 }

 ul, ol, li > p {
   margin-top: 0;
 }
 ul > li     { color: #aaa; }
 ul > li > * { color: black; }

 .monospaced, code, pre {
   font-family: "Courier New", Courier, monospace;
   font-size: inherit;
   color: navy;
   padding: 0;
   margin: 0;
 }
 pre {
   white-space: pre-wrap;
 }

 #author {
   color: #527bbd;
   font-weight: bold;
   font-size: 1.1em;
 }
 #email {
 }
 #revnumber, #revdate, #revremark {
 }

 #footer {
   font-size: small;
   border-top: 2px solid silver;
   padding-top: 0.5em;
   margin-top: 4.0em;
 }
 #footer-text {
   float: left;
   padding-bottom: 0.5em;
 }
 #footer-badges {
   float: right;
   padding-bottom: 0.5em;
 }

 #preamble {
   margin-top: 1.5em;
   margin-bottom: 1.5em;
 }
 div.imageblock, div.exampleblock, div.verseblock,
 div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
 div.admonitionblock {
   margin-top: 1.0em;
   margin-bottom: 1.5em;
 }
 div.admonitionblock {
   margin-top: 2.0em;
   margin-bottom: 2.0em;
   margin-right: 10%;
   color: #606060;
 }

 div.content { /* Block element content. */
   padding: 0;
 }

 /* Block element titles. */
 div.title, caption.title {
   color: #527bbd;
   font-weight: bold;
   text-align: left;
   margin-top: 1.0em;
   margin-bottom: 0.5em;
 }
 div.title + * {
   margin-top: 0;
 }

 td div.title:first-child {
   margin-top: 0.0em;
 }
 div.content div.title:first-child {
   margin-top: 0.0em;
 }
 div.content + div.title {
   margin-top: 0.0em;
 }

 div.sidebarblock > div.content {
   background: #ffffee;
   border: 1px solid #dddddd;
   border-left: 4px solid #f0f0f0;
   padding: 0.5em;
 }

 div.listingblock > div.content {
   border: 1px solid #dddddd;
   border-left: 5px solid #f0f0f0;
   background: #f8f8f8;
   padding: 0.5em;
 }

 div.quoteblock, div.verseblock {
   padding-left: 1.0em;
   margin-left: 1.0em;
   margin-right: 10%;
   border-left: 5px solid #f0f0f0;
   color: #888;
 }

 div.quoteblock > div.attribution {
   padding-top: 0.5em;
   text-align: right;
 }

 div.verseblock > pre.content {
   font-family: inherit;
   font-size: inherit;
 }
 div.verseblock > div.attribution {
   padding-top: 0.75em;
   text-align: left;
 }
 /* DEPRECATED: Pre version 8.2.7 verse style literal block. */
 div.verseblock + div.attribution {
   text-align: left;
 }

 div.admonitionblock .icon {
   vertical-align: top;
   font-size: 1.1em;
   font-weight: bold;
   text-decoration: underline;
   color: #527bbd;
   padding-right: 0.5em;
 }
 div.admonitionblock td.content {
   padding-left: 0.5em;
   border-left: 3px solid #dddddd;
 }

 div.exampleblock > div.content {
   border-left: 3px solid #dddddd;
   padding-left: 0.5em;
 }

 div.imageblock div.content { padding-left: 0; }
 span.image img { border-style: none; vertical-align: text-bottom; }
 a.image:visited { color: white; }

 dl {
   margin-top: 0.8em;
   margin-bottom: 0.8em;
 }
 dt {
   margin-top: 0.5em;
   margin-bottom: 0;
   font-style: normal;
   color: navy;
 }
 dd > *:first-child {
   margin-top: 0.1em;
 }

 ul, ol {
     list-style-position: outside;
 }
 ol.arabic {
   list-style-type: decimal;
 }
 ol.loweralpha {
   list-style-type: lower-alpha;
 }
 ol.upperalpha {
   list-style-type: upper-alpha;
 }
 ol.lowerroman {
   list-style-type: lower-roman;
 }
 ol.upperroman {
   list-style-type: upper-roman;
 }

 div.compact ul, div.compact ol,
 div.compact p, div.compact p,
 div.compact div, div.compact div {
   margin-top: 0.1em;
   margin-bottom: 0.1em;
 }

 tfoot {
   font-weight: bold;
 }
 td > div.verse {
   white-space: pre;
 }

 div.hdlist {
   margin-top: 0.8em;
   margin-bottom: 0.8em;
 }
 div.hdlist tr {
   padding-bottom: 15px;
 }
 dt.hdlist1.strong, td.hdlist1.strong {
   font-weight: bold;
 }
 td.hdlist1 {
   vertical-align: top;
   font-style: normal;
   padding-right: 0.8em;
   color: navy;
 }
 td.hdlist2 {
   vertical-align: top;
 }
 div.hdlist.compact tr {
   margin: 0;
   padding-bottom: 0;
 }

 .comment {
   background: yellow;
 }

 .footnote, .footnoteref {
   font-size: 0.8em;
 }

 span.footnote, span.footnoteref {
   vertical-align: super;
 }

 #footnotes {
   margin: 20px 0 20px 0;
   padding: 7px 0 0 0;
 }

 #footnotes div.footnote {
   margin: 0 0 5px 0;
 }

 #footnotes hr {
   border: none;
   border-top: 1px solid silver;
   height: 1px;
   text-align: left;
   margin-left: 0;
   width: 20%;
   min-width: 100px;
 }

 div.colist td {
   padding-right: 0.5em;
   padding-bottom: 0.3em;
   vertical-align: top;
 }
 div.colist td img {
   margin-top: 0.3em;
 }

 @media print {
   #footer-badges { display: none; }
 }

 #toc {
   margin-bottom: 2.5em;
 }

 #toctitle {
   color: #527bbd;
   font-size: 1.1em;
   font-weight: bold;
   margin-top: 1.0em;
   margin-bottom: 0.1em;
 }

 div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
   margin-top: 0;
   margin-bottom: 0;
 }
 div.toclevel2 {
   margin-left: 2em;
   font-size: 0.9em;
 }
 div.toclevel3 {
   margin-left: 4em;
   font-size: 0.9em;
 }
 div.toclevel4 {
   margin-left: 6em;
   font-size: 0.9em;
 }

 span.aqua { color: aqua; }
 span.black { color: black; }
 span.blue { color: blue; }
 span.fuchsia { color: fuchsia; }
 span.gray { color: gray; }
 span.green { color: green; }
 span.lime { color: lime; }
 span.maroon { color: maroon; }
 span.navy { color: navy; }
 span.olive { color: olive; }
 span.purple { color: purple; }
 span.red { color: red; }
 span.silver { color: silver; }
 span.teal { color: teal; }
 span.white { color: white; }
 span.yellow { color: yellow; }

 span.aqua-background { background: aqua; }
 span.black-background { background: black; }
 span.blue-background { background: blue; }
 span.fuchsia-background { background: fuchsia; }
 span.gray-background { background: gray; }
 span.green-background { background: green; }
 span.lime-background { background: lime; }
 span.maroon-background { background: maroon; }
 span.navy-background { background: navy; }
 span.olive-background { background: olive; }
 span.purple-background { background: purple; }
 span.red-background { background: red; }
 span.silver-background { background: silver; }
 span.teal-background { background: teal; }
 span.white-background { background: white; }
 span.yellow-background { background: yellow; }

 span.big { font-size: 2em; }
 span.small { font-size: 0.6em; }

 span.underline { text-decoration: underline; }
 span.overline { text-decoration: overline; }
 span.line-through { text-decoration: line-through; }

 div.unbreakable { page-break-inside: avoid; }


 /*
  * xhtml11 specific
  *
  * */

 div.tableblock {
   margin-top: 1.0em;
   margin-bottom: 1.5em;
 }
 div.tableblock > table {
   border: 3px solid #527bbd;
 }
 thead, p.table.header {
   font-weight: bold;
   color: #527bbd;
 }
 p.table {
   margin-top: 0;
 }
 /* Because the table frame attribute is overriden by CSS in most browsers. */
 div.tableblock > table[frame="void"] {
   border-style: none;
 }
 div.tableblock > table[frame="hsides"] {
   border-left-style: none;
   border-right-style: none;
 }
 div.tableblock > table[frame="vsides"] {
   border-top-style: none;
   border-bottom-style: none;
 }


 /*
  * html5 specific
  *
  * */

 table.tableblock {
   margin-top: 1.0em;
   margin-bottom: 1.5em;
 }
 thead, p.tableblock.header {
   font-weight: bold;
   color: #527bbd;
 }
 p.tableblock {
   margin-top: 0;
 }
 table.tableblock {
   border-width: 3px;
   border-spacing: 0px;
   border-style: solid;
   border-color: #527bbd;
   border-collapse: collapse;
 }
 th.tableblock, td.tableblock {
   border-width: 1px;
   padding: 4px;
   border-style: solid;
   border-color: #527bbd;
 }

 table.tableblock.frame-topbot {
   border-left-style: hidden;
   border-right-style: hidden;
 }
 table.tableblock.frame-sides {
   border-top-style: hidden;
   border-bottom-style: hidden;
 }
 table.tableblock.frame-none {
   border-style: hidden;
 }

 th.tableblock.halign-left, td.tableblock.halign-left {
   text-align: left;
 }
 th.tableblock.halign-center, td.tableblock.halign-center {
   text-align: center;
 }
 th.tableblock.halign-right, td.tableblock.halign-right {
   text-align: right;
 }

 th.tableblock.valign-top, td.tableblock.valign-top {
   vertical-align: top;
 }
 th.tableblock.valign-middle, td.tableblock.valign-middle {
   vertical-align: middle;
 }
 th.tableblock.valign-bottom, td.tableblock.valign-bottom {
   vertical-align: bottom;
 }


 /*
  * manpage specific
  *
  * */

 body.manpage h1 {
   padding-top: 0.5em;
   padding-bottom: 0.5em;
   border-top: 2px solid silver;
   border-bottom: 2px solid silver;
 }
 body.manpage h2 {
   border-style: none;
 }
 body.manpage div.sectionbody {
   margin-left: 3em;
 }

 @media print {
   body.manpage div#toc { display: none; }
 }


 @media screen {
   body {
     max-width: 50em; /* approximately 80 characters wide */
     margin-left: 16em;
   }

   #toc {
     position: fixed;
     top: 0;
     left: 0;
     bottom: 0;
     width: 13em;
     padding: 0.5em;
     padding-bottom: 1.5em;
     margin: 0;
     overflow: auto;
     border-right: 3px solid #f8f8f8;
     background-color: white;
   }

   #toc .toclevel1 {
     margin-top: 0.5em;
   }

   #toc .toclevel2 {
     margin-top: 0.25em;
     display: list-item;
     color: #aaaaaa;
   }

   #toctitle {
     margin-top: 0.5em;
   }
 }
 </style>
 <script type="text/javascript">
 /*<![CDATA[*/
 var asciidoc = {  // Namespace.

 /////////////////////////////////////////////////////////////////////
 // Table Of Contents generator
 /////////////////////////////////////////////////////////////////////

 /* Author: Mihai Bazon, September 2002
  * http://students.infoiasi.ro/~mishoo
  *
  * Table Of Content generator
  * Version: 0.4
  *
  * Feel free to use this script under the terms of the GNU General Public
  * License, as long as you do not remove or alter this notice.
  */

  /* modified by Troy D. Hanson, September 2006. License: GPL */
  /* modified by Stuart Rackham, 2006, 2009. License: GPL */

 // toclevels = 1..4.
 toc: function (toclevels) {

   function getText(el) {
     var text = "";
     for (var i = el.firstChild; i != null; i = i.nextSibling) {
       if (i.nodeType == 3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
         text += i.data;
       else if (i.firstChild != null)
         text += getText(i);
     }
     return text;
   }

   function TocEntry(el, text, toclevel) {
     this.element = el;
     this.text = text;
     this.toclevel = toclevel;
   }

   function tocEntries(el, toclevels) {
     var result = new Array;
     var re = new RegExp('[hH]([1-'+(toclevels+1)+'])');
     // Function that scans the DOM tree for header elements (the DOM2
     // nodeIterator API would be a better technique but not supported by all
     // browsers).
     var iterate = function (el) {
       for (var i = el.firstChild; i != null; i = i.nextSibling) {
         if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
           var mo = re.exec(i.tagName);
           if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") {
             result[result.length] = new TocEntry(i, getText(i), mo[1]-1);
           }
           iterate(i);
         }
       }
     }
     iterate(el);
     return result;
   }

   var toc = document.getElementById("toc");
   if (!toc) {
     return;
   }

   // Delete existing TOC entries in case we're reloading the TOC.
   var tocEntriesToRemove = [];
   var i;
   for (i = 0; i < toc.childNodes.length; i++) {
     var entry = toc.childNodes[i];
     if (entry.nodeName.toLowerCase() == 'div'
      && entry.getAttribute("class")
      && entry.getAttribute("class").match(/^toclevel/))
       tocEntriesToRemove.push(entry);
   }
   for (i = 0; i < tocEntriesToRemove.length; i++) {
     toc.removeChild(tocEntriesToRemove[i]);
   }

   // Rebuild TOC entries.
   var entries = tocEntries(document.getElementById("content"), toclevels);
   for (var i = 0; i < entries.length; ++i) {
     var entry = entries[i];
     if (entry.element.id == "")
       entry.element.id = "_toc_" + i;
     var a = document.createElement("a");
     a.href = "#" + entry.element.id;
     a.appendChild(document.createTextNode(entry.text));
     var div = document.createElement("div");
     div.appendChild(a);
     div.className = "toclevel" + entry.toclevel;
     toc.appendChild(div);
   }
   if (entries.length == 0)
     toc.parentNode.removeChild(toc);
 },


 /////////////////////////////////////////////////////////////////////
 // Footnotes generator
 /////////////////////////////////////////////////////////////////////

 /* Based on footnote generation code from:
  * http://www.brandspankingnew.net/archive/2005/07/format_footnote.html
  */

 footnotes: function () {
   // Delete existing footnote entries in case we're reloading the footnodes.
   var i;
   var noteholder = document.getElementById("footnotes");
   if (!noteholder) {
     return;
   }
   var entriesToRemove = [];
   for (i = 0; i < noteholder.childNodes.length; i++) {
     var entry = noteholder.childNodes[i];
     if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote")
       entriesToRemove.push(entry);
   }
   for (i = 0; i < entriesToRemove.length; i++) {
     noteholder.removeChild(entriesToRemove[i]);
   }

   // Rebuild footnote entries.
   var cont = document.getElementById("content");
   var spans = cont.getElementsByTagName("span");
   var refs = {};
   var n = 0;
   for (i=0; i<spans.length; i++) {
     if (spans[i].className == "footnote") {
       n++;
       var note = spans[i].getAttribute("data-note");
       if (!note) {
         // Use [\s\S] in place of . so multi-line matches work.
         // Because JavaScript has no s (dotall) regex flag.
         note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1];
         spans[i].innerHTML =
           "[<a id='_footnoteref_" + n + "' href='#_footnote_" + n +
           "' title='View footnote' class='footnote'>" + n + "</a>]";
         spans[i].setAttribute("data-note", note);
       }
       noteholder.innerHTML +=
         "<div class='footnote' id='_footnote_" + n + "'>" +
         "<a href='#_footnoteref_" + n + "' title='Return to text'>" +
         n + "</a>. " + note + "</div>";
       var id =spans[i].getAttribute("id");
       if (id != null) refs["#"+id] = n;
     }
   }
   if (n == 0)
     noteholder.parentNode.removeChild(noteholder);
   else {
     // Process footnoterefs.
     for (i=0; i<spans.length; i++) {
       if (spans[i].className == "footnoteref") {
         var href = spans[i].getElementsByTagName("a")[0].getAttribute("href");
         href = href.match(/#.*/)[0];  // Because IE return full URL.
         n = refs[href];
         spans[i].innerHTML =
           "[<a href='#_footnote_" + n +
           "' title='View footnote' class='footnote'>" + n + "</a>]";
       }
     }
   }
 },

 install: function(toclevels) {
   var timerId;

   function reinstall() {
     asciidoc.footnotes();
     if (toclevels) {
       asciidoc.toc(toclevels);
     }
   }

   function reinstallAndRemoveTimer() {
     clearInterval(timerId);
     reinstall();
   }

   timerId = setInterval(reinstall, 500);
   if (document.addEventListener)
     document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false);
   else
     window.onload = reinstallAndRemoveTimer;
 }

 }
 asciidoc.install(3);
 /*]]>*/
 </script>
     <script type="text/x-mathjax-config">
 	MathJax.Hub.Config({
 	    MathML: { extensions: ["content-mathml.js"] },
 	    tex2jax: { inlineMath: [['$','$'], ['\\(','\\)']] }
 	});
     </script>
     <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
     </script>
 </head>
 <body class="book">
 <div id="header">
 <h1>The OpenCL Specification</h1>
 <span id="author">Khronos OpenCL Working Group</span><br>
 <span id="revnumber">version v2.2-3</span>
 <div id="toc">
   <div id="toctitle">Table of Contents</div>
   <noscript><p><b>JavaScript must be enabled in your browser to display the table of contents.</b></p></noscript>
 </div>
 </div>
 <div id="content">
 <div id="preamble">
 <div class="sectionbody">
 <div class="paragraph"><p>Copyright 2008-2017 The Khronos Group.</p></div>
 <div class="paragraph"><p>This specification is protected by copyright laws and contains material proprietary
 to the Khronos Group, Inc. Except as described by these terms, it or any components
 may not be reproduced, republished, distributed, transmitted, displayed, broadcast
 or otherwise exploited in any manner without the express prior written permission
 of Khronos Group.</p></div>
 <div class="paragraph"><p>Khronos Group grants a conditional copyright license to use and reproduce the
 unmodified specification for any purpose, without fee or royalty, EXCEPT no licenses
 to any patent, trademark or other intellectual property rights are granted under
 these terms. Parties desiring to implement the specification and make use of
 Khronos trademarks in relation to that implementation, and receive reciprocal patent
 license protection under the Khronos IP Policy must become Adopters and confirm the
 implementation as conformant under the process defined by Khronos for this
 specification; see <a href="https://www.khronos.org/adopters">https://www.khronos.org/adopters</a>.</p></div>
 <div class="paragraph"><p>Khronos Group makes no, and expressly disclaims any, representations or warranties,
 express or implied, regarding this specification, including, without limitation:
 merchantability, fitness for a particular purpose, non-infringement of any
 intellectual property, correctness, accuracy, completeness, timeliness, and
 reliability. Under no circumstances will the Khronos Group, or any of its Promoters,
 Contributors or Members, or their respective partners, officers, directors,
 employees, agents or representatives be liable for any damages, whether direct,
 indirect, special or consequential damages for lost revenues, lost profits, or
 otherwise, arising from or in connection with these materials.</p></div>
 <div class="paragraph"><p>Vulkan is a registered trademark and Khronos, OpenXR, SPIR, SPIR-V, SYCL, WebGL,
 WebCL, OpenVX, OpenVG, EGL, COLLADA, glTF, NNEF, OpenKODE, OpenKCAM, StreamInput,
 OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL, OpenMAX DL, OpenML and DevU are
 trademarks of the Khronos Group Inc. ASTC is a trademark of ARM Holdings PLC,
 OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks
 and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics
 International used under license by Khronos. All other product names, trademarks,
 and/or company names are used solely for identification and belong to their
 respective owners.</p></div>
 <div style="page-break-after:always"></div>
 <div class="paragraph"><p><strong>Acknowledgements</strong></p></div>
 <div class="paragraph"><p>The OpenCL specification is the result of the contributions of many
 people, representing a cross section of the desktop, hand-held, and
 embedded computer industry. Following is a partial list of the
 contributors, including the company that they represented at the time of
 their contribution:</p></div>
 <div class="paragraph"><p>Chuck Rose, Adobe<br>
 Eric Berdahl, Adobe<br>
 Shivani Gupta, Adobe<br>
 Bill Licea Kane, AMD<br>
 Ed Buckingham, AMD<br>
 Jan Civlin, AMD<br>
 Laurent Morichetti, AMD<br>
 Mark Fowler, AMD<br>
 Marty Johnson, AMD<br>
 Michael Mantor, AMD<br>
 Norm Rubin, AMD<br>
 Ofer Rosenberg, AMD<br>
 Brian Sumner, AMD<br>
 Victor Odintsov, AMD<br>
 Aaftab Munshi, Apple<br>
 Abe Stephens, Apple<br>
 Alexandre Namaan, Apple<br>
 Anna Tikhonova, Apple<br>
 Chendi Zhang, Apple<br>
 Eric Bainville, Apple<br>
 David Hayward, Apple<br>
 Giridhar Murthy, Apple<br>
 Ian Ollmann, Apple<br>
 Inam Rahman, Apple<br>
 James Shearer, Apple<br>
 MonPing Wang, Apple<br>
 Tanya Lattner, Apple<br>
 Mikael Bourges-Sevenier, Aptina<br>
 Anton Lokhmotov, ARM<br>
 Dave Shreiner, ARM<br>
 Hedley Francis, ARM<br>
 Robert Elliott, ARM<br>
 Scott Moyers, ARM<br>
 Tom Olson, ARM<br>
 Anastasia Stulova, ARM<br>
 Christopher Thompson-Walsh, Broadcom<br>
 Holger Waechtler, Broadcom<br>
 Norman Rink, Broadcom<br>
 Andrew Richards, Codeplay<br>
 Maria Rovatsou, Codeplay<br>
 Alistair Donaldson, Codeplay<br>
 Alastair Murray, Codeplay<br>
 Stephen Frye, Electronic Arts<br>
 Eric Schenk, Electronic Arts<br>
 Daniel Laroche, Freescale<br>
 David Neto, Google<br>
 Robin Grosman, Huawei<br>
 Craig Davies, Huawei<br>
 Brian Horton, IBM<br>
 Brian Watt, IBM<br>
 Gordon Fossum, IBM<br>
 Greg Bellows, IBM<br>
 Joaquin Madruga, IBM<br>
 Mark Nutter, IBM<br>
 Mike Perks, IBM<br>
 Sean Wagner, IBM<br>
 Jon Parr, Imagination Technologies<br>
 Robert Quill, Imagination Technologies<br>
 James McCarthy, Imagination Technologie<br>
 Aaron Kunze, Intel<br>
 Aaron Lefohn, Intel<br>
 Adam Lake, Intel<br>
 Alexey Bader, Intel<br>
 Allen Hux, Intel<br>
 Andrew Brownsword, Intel<br>
 Andrew Lauritzen, Intel<br>
 Bartosz Sochacki, Intel<br>
 Ben Ashbaugh, Intel<br>
 Brian Lewis, Intel<br>
 Geoff Berry, Intel<br>
 Hong Jiang, Intel<br>
 Jayanth Rao, Intel<br>
 Josh Fryman, Intel<br>
 Larry Seiler, Intel<br>
 Mike MacPherson, Intel<br>
 Murali Sundaresan, Intel<br>
 Paul Lalonde, Intel<br>
 Raun Krisch, Intel<br>
 Stephen Junkins, Intel<br>
 Tim Foley, Intel<br>
 Timothy Mattson, Intel<br>
 Yariv Aridor, Intel<br>
 Michael Kinsner, Intel<br>
 Kevin Stevens, Intel<br>
 Jon Leech, Khronos<br>
 Benjamin Bergen, Los Alamos National Laboratory<br>
 Roy Ju, Mediatek<br>
 Bor-Sung Liang, Mediatek<br>
 Rahul Agarwal, Mediatek<br>
 Michal Witaszek, Mobica<br>
 JenqKuen Lee, NTHU<br>
 Amit Rao, NVIDIA<br>
 Ashish Srivastava, NVIDIA<br>
 Bastiaan Aarts, NVIDIA<br>
 Chris Cameron, NVIDIA<br>
 Christopher Lamb, NVIDIA<br>
 Dibyapran Sanyal, NVIDIA<br>
 Guatam Chakrabarti, NVIDIA<br>
 Ian Buck, NVIDIA<br>
 Jaydeep Marathe, NVIDIA<br>
 Jian-Zhong Wang, NVIDIA<br>
 Karthik Raghavan Ravi, NVIDIA<br>
 Kedar Patil, NVIDIA<br>
 Manjunath Kudlur, NVIDIA<br>
 Mark Harris, NVIDIA<br>
 Michael Gold, NVIDIA<br>
 Neil Trevett, NVIDIA<br>
 Richard Johnson, NVIDIA<br>
 Sean Lee, NVIDIA<br>
 Tushar Kashalikar, NVIDIA<br>
 Vinod Grover, NVIDIA<br>
 Xiangyun Kong, NVIDIA<br>
 Yogesh Kini, NVIDIA<br>
 Yuan Lin, NVIDIA<br>
 Mayuresh Pise, NVIDIA<br>
 Allan Tzeng, QUALCOMM<br>
 Alex Bourd, QUALCOMM<br>
 Anirudh Acharya, QUALCOMM<br>
 Andrew Gruber, QUALCOMM<br>
 Andrzej Mamona, QUALCOMM<br>
 Benedict Gaster, QUALCOMM<br>
 Bill Torzewski, QUALCOMM<br>
 Bob Rychlik, QUALCOMM<br>
 Chihong Zhang, QUALCOMM<br>
 Chris Mei, QUALCOMM<br>
 Colin Sharp, QUALCOMM<br>
 David Garcia, QUALCOMM<br>
 David Ligon, QUALCOMM<br>
 Jay Yun, QUALCOMM<br>
 Lee Howes, QUALCOMM<br>
 Richard Ruigrok, QUALCOMM<br>
 Robert J. Simpson, QUALCOMM<br>
 Sumesh Udayakumaran, QUALCOMM<br>
 Vineet Goel, QUALCOMM<br>
 Lihan Bin, QUALCOMM<br>
 Vlad Shimanskiy, QUALCOMM<br>
 Jian Liu, QUALCOMM<br>
 Tasneem Brutch, Samsung<br>
 Yoonseo Choi, Samsung<br>
 Dennis Adams, Sony<br>
 Pr-Anders Aronsson, Sony<br>
 Jim Rasmusson, Sony<br>
 Thierry Lepley, STMicroelectronics<br>
 Anton Gorenko, StreamComputing<br>
 Jakub Szuppe, StreamComputing<br>
 Vincent Hindriksen, StreamComputing<br>
 Alan Ward, Texas Instruments<br>
 Yuan Zhao, Texas Instruments<br>
 Pete Curry, Texas Instruments<br>
 Simon McIntosh-Smith, University of Bristol<br>
 James Price, University of Bristol<br>
 Paul Preney, University of Windsor<br>
 Shane Peelar, University of Windsor<br>
 Brian Hutsell, Vivante<br>
 Mike Cai, Vivante<br>
 Sumeet Kumar, Vivante<br>
 Wei-Lun Kao, Vivante<br>
 Xing Wang, Vivante<br>
 Jeff Fifield, Xilinx<br>
 Hem C. Neema, Xilinx<br>
 Henry Styles, Xilinx<br>
 Ralph Wittig, Xilinx<br>
 Ronan Keryell, Xilinx<br>
 AJ Guillon, YetiWare Inc<br></p></div>
 <div style="page-break-after:always"></div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_introduction">1. Introduction</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>Modern processor architectures have embraced parallelism as an important
 pathway to increased performance. Facing technical challenges with
 higher clock speeds in a fixed power envelope, Central Processing Units
 (CPUs) now improve performance by adding multiple cores. Graphics
 Processing Units (GPUs) have also evolved from fixed function rendering
 devices into programmable parallel processors. As todays computer
 systems often include highly parallel CPUs, GPUs and other types of
 processors, it is important to enable software developers to take full
 advantage of these heterogeneous processing platforms.
 <br>
 <br>
 Creating applications for heterogeneous parallel processing platforms is
 challenging as traditional programming approaches for multi-core CPUs
 and GPUs are very different. CPU-based parallel programming models are
 typically based on standards but usually assume a shared address space
 and do not encompass vector operations. General purpose GPU
 programming models address complex memory hierarchies and vector
 operations but are traditionally platform-, vendor- or
 hardware-specific. These limitations make it difficult for a developer
 to access the compute power of heterogeneous CPUs, GPUs and other types
 of processors from a single, multi-platform source code base. More than
 ever, there is a need to enable software developers to effectively take
 full advantage of heterogeneous processing platforms  from high
 performance compute servers, through desktop computer systems to
 handheld devices - that include a diverse mix of parallel CPUs, GPUs and
 other processors such as DSPs and the Cell/B.E. processor.
 <br>
 <br>
 <strong>OpenCL</strong> (Open Computing Language) is an open royalty-free standard for
 general purpose parallel programming across CPUs, GPUs and other
 processors, giving software developers portable and efficient access to
 the power of these heterogeneous processing platforms.
 <br>
 <br>
 OpenCL supports a wide range of applications, ranging from embedded and
 consumer software to HPC solutions, through a low-level,
 high-performance, portable abstraction. By creating an efficient,
 close-to-the-metal programming interface, OpenCL will form the
 foundation layer of a parallel computing ecosystem of
 platform-independent tools, middleware and applications. OpenCL is
 particularly suited to play an increasingly significant role in emerging
 interactive graphics applications that combine general parallel compute
 algorithms with graphics rendering pipelines.
 <br>
 <br>
 OpenCL consists of an API for coordinating parallel computation across
 heterogeneous processors; and a cross-platform intermediate language
 with a well-specified computation environment. The OpenCL standard:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Supports both data- and
 task-based parallel programming models
 </p>
 </li>
 <li>
 <p>
 Utilizes a portable and
 self-contained intermediate representation with support for parallel
 execution
 </p>
 </li>
 <li>
 <p>
 Defines consistent
 numerical requirements based on IEEE 754
 </p>
 </li>
 <li>
 <p>
 Defines a configuration
 profile for handheld and embedded devices
 </p>
 </li>
 <li>
 <p>
 Efficiently interoperates
 with OpenGL, OpenGL ES and other graphics APIs
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>This document begins with an overview of basic concepts and the
 architecture of OpenCL, followed by a detailed description of its
 execution model, memory model and synchronization support. It then
 discusses the OpenCL__platform and runtime API. Some examples are given
 that describe sample compute use-cases and how they would be written in
 OpenCL. The specification is divided into a core specification that any
 OpenCL compliant implementation must support; a handheld/embedded
 profile which relaxes the OpenCL compliance requirements for handheld
 and embedded devices; and a set of optional extensions that are likely
 to move into the core specification in later revisions of the OpenCL
 specification.</p></div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_glossary">2. Glossary</h2>
 <div class="sectionbody">
 <div class="paragraph"><p><strong>Application</strong>: The combination of the program running on the host and
 OpenCL devices.
 <br>
 <br>
 <strong>Acquire semantics</strong>: One of the memory order semantics defined for
 synchronization operations.  Acquire semantics apply to atomic
 operations that load from memory.  Given two units of execution, <strong>A</strong> and
 <strong>B</strong>, acting on a shared atomic object <strong>M</strong>, if <strong>A</strong> uses an atomic load of
 <strong>M</strong> with acquire semantics to synchronize-with an atomic store to <strong>M</strong> by
 <strong>B</strong> that used release semantics, then <strong>A</strong>'s atomic load will occur before
 any subsequent operations by <strong>A</strong>.  Note that the memory orders
 <em>release</em>, <em>sequentially consistent</em>, and <em>acquire_release</em> all include
 <em>release semantics</em> and effectively pair with a load using acquire
 semantics.
 <br>
 <br>
 <strong>Acquire release semantics</strong>: A memory order semantics for
 synchronization operations (such as atomic operations) that has the
 properties of both acquire and release memory orders. It is used with
 read-modify-write operations.
 <br>
 <br>
 <strong>Atomic operations</strong>: Operations that at any point, and from any
 perspective, have either occurred completely, or not at all. Memory
 orders associated with atomic operations may constrain the visibility of
 loads and stores with respect to the atomic operations (see <em>relaxed
 semantics</em>, <em>acquire semantics</em>, <em>release semantics</em> or <em>acquire release
 semantics</em>).
 <br>
 <br>
 <strong>Blocking and Non-Blocking Enqueue API calls</strong>: A <em>non-blocking enqueue
 API call</em> places a <em>command</em> on a <em>command-queue</em> and returns
 immediately to the host. The <em>blocking-mode enqueue API calls</em> do not
 return to the host until the command has completed.
 <br>
 <br>
 <strong>Barrier</strong>: There are three types of <em>barriers</em>  a command-queue barrier,
 a work-group barrier and a sub-group barrier.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The OpenCL API provides a
 function to enqueue a <em>command-queue</em> <em>barrier</em> command. This <em>barrier</em>
 command ensures that all previously enqueued commands to a command-queue
 have finished execution before any following <em>commands</em> enqueued in the
 <em>command-queue</em> can begin execution.
 </p>
 </li>
 <li>
 <p>
 The OpenCL kernel
 execution model provides built-in <em>work-group barrier</em> functionality.
 This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on
 a <em>device</em> to perform synchronization between <em>work-items</em> in a
 <em>work-group</em> executing the <em>kernel</em>. All the <em>work-items</em> of a
 <em>work-group</em> must execute the <em>barrier</em> construct before any are allowed
 to continue execution beyond the <em>barrier</em>.
 </p>
 </li>
 <li>
 <p>
 The OpenCL kernel
 execution model provides built-in <em>sub-group barrier</em> functionality.
 This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on
 a <em>device</em> to perform synchronization between <em>work-items</em> in a
 <em>sub-group</em> executing the <em>kernel</em>. All the <em>work-items</em> of a
 <em>sub-group</em> must execute the <em>barrier</em> construct before any are allowed
 to continue execution beyond the <em>barrier</em>.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>Buffer Object</strong>: A memory object that stores a linear collection of
 bytes. Buffer objects are accessible using a pointer in a <em>kernel</em>
 executing on a <em>device</em>. Buffer objects can be manipulated by the host
 using OpenCL API calls. A <em>buffer object</em> encapsulates the following
 information:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Size in bytes.
 </p>
 </li>
 <li>
 <p>
 Properties that describe
 usage information and which region to allocate from.
 </p>
 </li>
 <li>
 <p>
 Buffer data.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>Built-in Kernel</strong>: A <em>built-in kernel</em> is a <em>kernel</em> that is executed on
 an OpenCL <em>device</em> or <em>custom device</em> by fixed-function hardware or in
 firmware. <em>Applications</em> can query the <em>built-in kernels</em> supported by
 a <em>device</em> or <em>custom device</em>. A <em>program object</em> can only contain
 <em>kernels</em> written in OpenCL C or <em>built-in kernels</em> but not both. See
 also <em>Kernel</em> and <em>Program</em>.
 <br>
 <br>
 <strong>Child kernel</strong>: see <em>device-side enqueue.</em>
 <br>
 <br>
 <strong>Command</strong>: The OpenCL operations that are submitted to a <em>command-queue</em>
 for execution. For example, OpenCL commands issue kernels for execution
 on a compute device, manipulate memory objects, etc.
 <br>
 <br>
 <strong>Command-queue</strong>: An object that holds <em>commands</em> that will be executed on
 a specific <em>device</em>. The <em>command-queue</em> is created on a specific
 <em>device</em> in a <em>context</em>. <em>Commands</em> to a <em>command-queue</em> are queued
 in-order but may be executed in-order or out-of-order. <em>Refer to
 In-order Execution_and_Out-of-order Execution</em>.
 <br>
 <br>
 <strong>Command-queue Barrier</strong>. See <em>Barrier</em>.
 <br>
 <br>
 <strong>Command synchronization</strong>: Constraints on the order that commands are
 launched for execution on a device defined in terms of the
 synchronization points that occur between commands in host
 command-queues and between commands in device-side command-queues. See
 <em>synchronization points</em>.
 <br>
 <br>
 <strong>Complete</strong>: The final state in the six state model for the execution of
 a command. The transition into this state occurs is signaled through
 event objects or callback functions associated with a command.
 <br>
 <br>
 <strong>Compute Device Memory</strong>: This refers to one or more memories attached
 to the compute device.
 <br>
 <br>
 <strong>Compute Unit</strong>: An OpenCL <em>device</em> has one or more <em>compute units</em>. A
 <em>work-group</em> executes on a single <em>compute unit</em>. A <em>compute unit</em> is
 composed of one or more <em>processing elements</em> and <em>local memory</em>. A
 <em>compute unit</em> may also include dedicated texture filter units that can
 be accessed by its processing elements.
 <br>
 <br>
 <strong>Concurrency</strong>: A property of a system in which a set of tasks in a system
 can remain active and make progress at the same time. To utilize
 concurrent execution when running a program, a programmer must identify
 the concurrency in their problem, expose it within the source code, and
 then exploit it using a notation that supports concurrency.
 <br>
 <br>
 <strong>Constant Memory</strong>: A region of <em>global memory</em> that remains constant
 during the execution of a <em>kernel</em>. The <em>host</em> allocates and
 initializes memory objects placed into <em>constant memory</em>.</p></div>
 <div class="paragraph"><p><strong>Context</strong>: The environment within which the kernels execute and the
 domain in which synchronization and memory management is defined. The
 <em>context</em> includes a set of <em>devices</em>, the memory accessible to those
 <em>devices</em>, the corresponding memory properties and one or more
 <em>command-queues</em> used to schedule execution of a <em>kernel(s)</em> or
 operations on <em>memory objects</em>.
 <br>
 <br>
 <strong>Control flow</strong>: The flow of instructions executed by a work-item.
 Multiple logically related work items may or may not execute the same
 control flow. The control flow is said to be <em>converged</em> if all the
 work-items in the set execution the same stream of instructions. In a
 <em>diverged</em> control flow, the work-items in the set execute different
 instructions. At a later point, if a diverged control flow becomes
 converged, it is said to be a re-converged control flow.
 <br>
 <br>
 <strong>Converged control flow</strong>: see <strong>control flow</strong>.
 <br>
 <br>
 <strong>Custom Device</strong>: An OpenCL <em>device</em> that fully implements the OpenCL
 Runtime but does not support <em>programs</em> written in OpenCL C.  A custom
 device may be specialized non-programmable hardware that is very power
 efficient and performant for directed tasks or hardware with limited
 programmable capabilities such as specialized DSPs. Custom devices are
 not OpenCL conformant. Custom devices may support an online compiler.
 Programs for custom devices can be created using the OpenCL runtime APIs
 that allow OpenCL programs to be created from source (if an online
 compiler is supported) and/or binary, or from <em>built-in
 kernels_supported by the _device</em>.  See also <em>Device</em>.
 <br>
 <br>
 <strong>Data Parallel Programming Model</strong>: Traditionally, this term refers to a
 programming model where concurrency is expressed as instructions from a
 single program applied to multiple elements within a set of data
 structures.  The term has been generalized in OpenCL to refer to a model
 wherein a set of  instructions from a single program are applied
 concurrently to each point within an abstract domain of indices.
 <br>
 <br>
 <strong>Data race</strong>: The execution of a program contains a data race if it
 contains two actions in different work items or host threads where (1)
 one action modifies a memory location and the other action reads or
 modifies the same memory location, and (2) at least one of these actions
 is not atomic, or the corresponding memory scopes are not inclusive, and
 (3) the actions are global actions unordered by the
 global-happens-before relation or are local actions unordered by the
 local-happens before relation.
 <br>
 <br>
 <strong>Deprecation</strong>: existing features are marked as deprecated if their usage is not recommended as that feature is being de-emphasized, superseded and may be removed from a future version of the specification.[BA2]
 <br>
 <br>
 <strong>Device</strong>: A <em>device</em> is a collection of <em>compute units</em>. A
 <em>command-queue</em> is used to queue <em>commands</em> to a <em>device</em>. Examples of
 <em>commands</em> include executing <em>kernels</em>, or reading and writing <em>memory
 objects</em>. OpenCL devices typically correspond to a GPU, a multi-core
 CPU, and other processors such as DSPs and the Cell/B.E. processor.
 <br>
 <br>
 <strong>Device-side enqueue</strong>: A mechanism whereby a kernel-instance is enqueued
 by a kernel-instance running on a device without direct involvement by
 the host program. This produces <em>nested parallelism</em>; i.e. additional
 levels of concurrency are nested inside a running kernel-instance. The
 kernel-instance executing on a device (the <em>parent kernel</em>) enqueues a
 kernel-instance (the <em>child kernel</em>) to a device-side command queue.
 Child and parent kernels execute asynchronously though a parent kernel
 does not complete until all of its child-kernels have completed.
 <br>
 <br>
 <strong>Diverged control flow</strong>: see <em>control flow</em>.
 <br>
 <br>
 <strong>Ended</strong>: The fifth state in the six state model for the execution of a
 command. The transition into this state occurs when execution of a
 command has ended. When a Kernel-enqueue command ends, all of the
 work-groups associated with that command have finished their execution.
 <br>
 <br>
 <strong>Event Object</strong>: An <em>event</em> <em>object_encapsulates the status of an
 operation such as a _command</em>. It can be used to synchronize operations
 in a context.
 <br>
 <br>
 <strong>Event Wait List</strong>: An <em>event wait list</em> is a list of <em>event objects</em> that
 can be used to control when a particular <em>command</em> begins execution.
 <br>
 <br>
 <strong>Fence</strong>: A memory ordering operation without an associated atomic
 object. A fence can use the <em>acquire semantics, release semantics</em>, or
 <em>acquire release semantics</em>.
 <br>
 <br>
 <strong>Framework</strong>: A software system that contains the set of components to
 support software development and execution. A <em>framework</em> typically
 includes libraries, APIs, runtime systems, compilers, etc.
 <br>
 <br>
 <strong>Generic address space</strong>: An address space that include the <em>private</em>,
 <em>local</em>, and <em>global</em> address spaces available to a device. The generic
 address space supports conversion of pointers to and from private, local
 and global address spaces, and hence lets a programmer write a single
 function that at compile time can take arguments from any of the three
 named address spaces.
 <br>
 <br>
 <strong>Global Happens before</strong>: see <em>happens before</em>.
 <br>
 <br>
 <strong>Global ID</strong>: A <em>global ID</em> is used to uniquely identify a <em>work-item</em> and
 is derived from the number of <em>global work-items</em> specified when
 executing a <em>kernel</em>. The <em>global ID</em> is a N-dimensional value that
 starts at (0, 0,  0). See also <em>Local ID</em>.
 <br>
 <br>
 <strong>Global Memory</strong>: A memory region accessible to all <em>work-items</em> executing
 in a <em>context</em>. It is accessible to the <em>host</em> using <em>commands</em> such as
 read, write and map. <em>Global memory</em> is included within the <em>generic
 address space</em> that includes the private and local address spaces.
 <br>
 <br>
 <strong>GL share group</strong>: A <em>GL share group</em> object manages shared OpenGL or
 OpenGL ES resources
 such as textures, buffers, framebuffers, and renderbuffers and is
 associated with one or more GL context objects. The <em>GL share group</em> is
 typically an opaque object and not directly accessible.
 <br>
 <br>
 <strong>Handle</strong>: An opaque type that references an <em>object</em> allocated by
 OpenCL. Any operation on an <em>object</em> occurs by reference to that
 objects handle.
 <br>
 <br>
 <strong>Happens before</strong>: An ordering relationship between operations that
 execute on multiple units of execution. If an operation A happens-before
 operation B then A must occur before B; in particular, any value written
 by A will be visible to B.We define two separate happens before
 relations: <em>global-happens-before</em> and <em>local-happens-before</em>. These are
 defined in section 3.3.6.
 <br>
 <br>
 <strong>Host</strong>: The <em>host</em> interacts with the <em>context</em> using the OpenCL API.
 <br>
 <br>
 <strong>Host-thread</strong>: the unit of execution that executes the statements in the
 Host program.
 <br>
 <br>
 <strong>Host pointer</strong>: A pointer to memory that is in the virtual address space
 on the <em>host</em>.
 <br>
 <br>
 <strong>Illegal</strong>: Behavior of a system that is explicitly not allowed and will
 be reported as an error when encountered by OpenCL.
 <br>
 <br>
 <strong>Image Object</strong>: A <em>memory object</em> that stores a two- or three-
 dimensional structured array. Image data can only be accessed with read
 and write functions. The read functions use a <em>sampler</em>.
 <br>
 <br>
 The <em>image object</em> encapsulates the following information:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Dimensions of the image.
 </p>
 </li>
 <li>
 <p>
 Description of each
 element in the image.
 </p>
 </li>
 <li>
 <p>
 Properties that describe
 usage information and which region to allocate from.
 </p>
 </li>
 <li>
 <p>
 Image data.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The elements of an image are selected from a list of predefined image
 formats.
 <br>
 <br>
 <strong>Implementation Defined</strong>: Behavior that is explicitly allowed to vary
 between conforming implementations of OpenCL. An OpenCL implementor is
 required to document the implementation-defined behavior.
 <br>
 <br>
 <strong>Independent Forward Progress</strong>: If an entity supports independent forward
 progress, then if it is otherwise not dependent on any actions due to be
 performed by any other entity (for example it does not wait on a lock
 held by, and thus that must be released by, any other entity), then its
 execution cannot be blocked by the execution of any other entity in the
 system (it will not be starved). Work items in a subgroup, for example,
 typically do not support independent forward progress, so one work item
 in a subgroup may be completely blocked (starved) if a different work
 item in the same subgroup enters a spin loop.
 <br>
 <br>
 <strong>In-order Execution</strong>: A model of execution in OpenCL where the <em>commands</em>
 in a <em>command-queue_ are executed in order of submission with each
 _command</em> running to completion before the next one begins. See
 Out-of-order Execution.
 <br>
 <br>
 <strong>Intermediate Language</strong>: A lower-level language that may be used to
 create programs. SPIR-V is a required IL for OpenCL 2.2 runtimes.
 Additional ILs may be accepted on an implementation-defined basis.
 <br>
 <br>
 <strong>Kernel</strong>: A <em>kernel</em> is a function declared in a <em>program</em> and executed
 on an OpenCL <em>device</em>. A <em>kernel</em> is identified by the kernel or
 kernel qualifier applied to any function defined in a <em>program</em>.
 <br>
 <br>
 <strong>Kernel-instance</strong>: The work carried out by an OpenCL program occurs
 through the execution of kernel-instances on devices. The kernel
 instance is the <em>kernel object</em>, the values associated with the
 arguments to the kernel, and the parameters that define the <em>NDRange</em>
 index space.
 <br>
 <br>
 <strong>Kernel Object</strong>: A <em>kernel object</em> encapsulates a specific <em>kernel
 function declared in a <em>program</em> and the argument values to be used when
 executing this </em>kernel function.
 <br>
 <br>
 <strong>Kernel Language</strong>: A language that is used to create source code for kernel.
 Supported kernel languages include OpenCL C, OpenCL C++, and OpenCL dialect of SPIR-V.
 <br>
 <br>
 <strong>Launch</strong>: The transition of a command from the <em>submitted</em> state to the
 <em>ready</em> state. See <em>Ready</em>.
 <br>
 <br>
 <strong>Local ID</strong>: A <em>local ID</em> specifies a unique <em>work-item ID</em> within a given
 <em>work-group</em> that is executing a <em>kernel</em>. The <em>local ID</em> is a
 N-dimensional value that starts at (0, 0,  0). See also <em>Global ID</em>.
 <br>
 <br>
 <strong>Local Memory</strong>: A memory region associated with a <em>work-group</em> and
 accessible only by <em>work-items</em> in that <em>work-group</em>. <em>Local memory</em> is
 included within the <em>generic address space</em> that includes the private
 and global address spaces.
 <br>
 <br>
 <strong>Marker</strong>: A <em>command</em> queued in a <em>command-queue</em> that can be used to
 tag all <em>commands</em> queued before the <em>marker</em> in the <em>command-queue</em>.
 The <em>marker</em> command returns an <em>event</em> which can be used by the
 <em>application</em> to queue a wait on the marker event i.e. wait for all
 commands queued before the <em>marker</em> command to complete.
 <br>
 <br>
 <strong>Memory Consistency Model</strong>: Rules that define which values are observed
 when multiple units of execution load data from any shared memory plus
 the synchronization operations that constrain the order of memory
 operations and define synchronization relationships. The memory
 consistency model in OpenCL is based on the memory model from the ISO
 C11 programming language.
 <br>
 <br>
 <strong>Memory Objects</strong>: A <em>memory object</em> is a handle to a reference counted
 region of <em>global memory</em>. Also see_Buffer Object_and_Image Object_.
 <br>
 <br>
 <strong>Memory Regions (or Pools)</strong>: A distinct address space in OpenCL. <em>Memory
 regions</em> may overlap in physical memory though OpenCL will treat them as
 logically distinct. The <em>memory regions</em> are denoted as <em>private</em>,
 <em>local</em>, <em>constant,</em> and <em>global</em>.
 <br>
 <br>
 <strong>Memory Scopes</strong>: These memory scopes define a hierarchy of visibilities
 when analyzing the ordering constraints of memory operations. They are
 defined by the values of the memory_scope enumeration constant. Current
 values are <strong>memory_scope_work_item</strong>(memory constraints only apply to a
 single work-item and in practice apply only to image operations)<strong>,
 memory_scope_sub_group</strong> (memory-ordering constraints only apply to
 work-items executing in a sub-group), <strong>memory_scope_work_group</strong>
 (memory-ordering constraints only apply to work-items executing in a
 work-group), <strong>memory_scope_device</strong> (memory-ordering constraints only
 apply to work-items executing on a single device) and
 <strong>memory_scope_all_svm_devices</strong> (memory-ordering constraints only apply
 to work-items executing across multiple devices and when using shared
 virtual memory).
 <br>
 <br>
 <strong>Modification Order</strong>:All modifications to a particular atomic object M
 occur in some particular <strong>total order</strong>, called the <strong>modification
 order</strong> of M. If A and B are modifications of an atomic object M, and A
 happens-before B, then A shall precede B in the modification order of M.
 Note that the modification order of an atomic object M is independent of
 whether M is in local or global memory.
 <br>
 <br>
 <strong>Nested Parallelism</strong>: See <em>device-side enqueue</em>.
 <br>
 <br>
 <strong>Object</strong>: Objects are abstract representation of the resources that can
 be manipulated by the OpenCL API. Examples include <em>program objects</em>,
 <em>kernel objects</em>, and <em>memory objects</em>.
 <br>
 <br>
 <strong>Out-of-Order Execution</strong>: A model of execution in which <em>commands</em> placed
 in the <em>work queue</em> may begin and complete execution in any order
 consistent with constraints imposed by <em>event wait
 lists_and_command-queue barrier</em>. See <em>In-order Execution</em>.
 <br>
 <br>
 <strong>Parent device</strong>: The OpenCL <em>device</em> which is partitioned to create
 <em>sub-devices</em>. Not all <em>parent devices_are _root devices</em>. A <em>root
 device</em> might be partitioned and the <em>sub-devices</em> partitioned again.
 In this case, the first set of <em>sub-devices</em> would be <em>parent devices</em>
 of the second set, but not the <em>root devices</em>. Also see <em>device</em>,
 <em>parent device</em> and <em>root device</em>.
 <br>
 <br>
 <strong>Parent kernel</strong>: see <em>device-side enqueue</em>.
 <br>
 <br>
 <strong>Pipe</strong>: The <em>pipe</em> memory object conceptually is an ordered sequence of
 data items. A pipe has two endpoints: a write endpoint into which data
 items are inserted, and a read endpoint from which data items are
 removed. At any one time, only one kernel instance may write into a
 pipe, and only one kernel instance may read from a pipe. To support the
 producer consumer design pattern, one kernel instance connects to the
 write endpoint (the producer) while another kernel instance connects to
 the reading endpoint (the consumer).
 <br>
 <br>
 <strong>Platform</strong>: The <em>host</em> plus a collection of <em>devices</em> managed by the
 OpenCL <em>framework</em> that allow an application to share <em>resources</em> and
 execute <em>kernels</em> on <em>devices</em> in the <em>platform</em>.
 <br>
 <br>
 <strong>Private Memory</strong>: A region of memory private to a <em>work-item</em>. Variables
 defined in one <em>work-items</em> <em>private memory</em> are not visible to another
 <em>work-item</em>.
 <br>
 <br>
 <strong>Processing Element</strong>: A virtual scalar processor. A work-item may
 execute on one or more processing elements.
 <br>
 <br>
 <strong>Program</strong>: An OpenCL <em>program</em> consists of a set of <em>kernels</em>.
 <em>Programs</em> may also contain auxiliary functions called by the <em>_kernel
 functions and constant data.
 <br>
 <br>
 <strong>Program Object</strong>: A _program object</em> encapsulates the following
 information:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 A reference to an
 associated <em>context</em>.
 </p>
 </li>
 <li>
 <p>
 A <em>program</em> source or
 binary.
 </p>
 </li>
 <li>
 <p>
 The latest successfully
 built program executable, the list of <em>devices</em> for which the program
 executable is built, the build options used and a build log.
 </p>
 </li>
 <li>
 <p>
 The number of <em>kernel
 objects</em> currently attached.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>Queued</strong>: The first state in the six state model for the execution of a
 command. The transition into this state occurs when the command is
 enqueued into a command-queue.
 <br>
 <br>
 <strong>Ready</strong>: The third state in the six state model for the execution of a
 command. The transition into this state occurs when pre-requisites
 constraining execution of a command have been met; i.e. the command has
 been launched. When a Kernel-enqueue command is launched, work-groups
 associated with the command are placed in a devices work-pool from
 which they are scheduled for execution.
 <br>
 <br>
 <strong>Re-converged Control Flow</strong>: see <em>control flow</em>.
 <br>
 <br>
 <strong>Reference Count</strong>: The life span of an OpenCL object is determined by its
 <em>reference count_an internal count of the number of references to the
 object. When you create an object in OpenCL, its _reference count</em> is
 set to one. Subsequent calls to the appropriate <em>retain</em> API (such as
 clRetainContext, clRetainCommandQueue) increment the <em>reference count</em>.
 Calls to the appropriate <em>release</em> API (such as clReleaseContext,
 clReleaseCommandQueue) decrement the <em>reference count</em>.
 Implementations may also modify the <em>reference count</em>, e.g. to track
 attached objects or to ensure correct operation of in-progress or
 scheduled activities. The object becomes inaccessible to host code when
 the number of <em>release</em> operations performed matches the number of
 <em>retain</em> operations plus the allocation of the object. At this point the
 reference count may be zero but this is not guaranteed.
 <br>
 <br>
 <strong>Relaxed Consistency</strong>: A memory consistency model in which the contents
 of memory visible to different <em>work-items</em> or <em>commands</em> may be
 different except at a <em>barrier</em> or other explicit synchronization
 points.
 <br>
 <br>
 <strong>Relaxed Semantics</strong>: A memory order semantics for atomic operations that
 implies no order constraints. The operation is <em>atomic</em> but it has no
 impact on the order of memory operations.
 <br>
 <br>
 <strong>Release Semantics</strong>: One of the memory order semantics defined for
 synchronization operations.  Release semantics apply to atomic
 operations that store to memory.  Given two units of execution, <strong>A</strong> and
 <strong>B</strong>, acting on a shared atomic object <strong>M</strong>, if <strong>A</strong> uses an atomic store
 of <strong>M</strong> with release semantics to synchronize-with an atomic load to <strong>M</strong>
 by <strong>B*that used acquire semantics, then *A*s atomic store will occur
 <em>after</em> any prior operations by *A</strong>. Note that the memory orders
 <em>acquire</em>, <em>sequentialy consistent</em>, and <em>acquire_release</em> all include
 <em>acquire semantics</em> and effectively pair with a store using release
 semantics.
 <br>
 <br>
 <strong>Remainder work-groups</strong>: When the work-groups associated with a
 kernel-instance are defined, the sizes of a work-group in each dimension
 may not evenly divide the size of the NDRange in the corresponding
 dimensions. The result is a collection of work-groups on the boundaries
 of the NDRange that are smaller than the base work-group size. These are
 known as <em>remainder work-groups</em>.
 <br>
 <br>
 <strong>Running</strong>: The fourth state in the six state model for the execution of
 a command. The transition into this state occurs when the execution of
 the command starts. When a Kernel-enqueue command starts, one or more
 work-groups associated with the command start to execute.
 <br>
 <br>
 <strong>Root device</strong>: A <em>root device</em> is an OpenCL <em>device</em> that has not been
 partitioned. Also see <em>device</em>, <em>parent device</em> and <em>root device</em>.
 <br>
 <br>
 <strong>Resource</strong>: A class of <em>objects</em> defined by OpenCL. An instance of a
 <em>resource</em> is an <em>object</em>. The most common <em>resources</em> are the
 <em>context</em>, <em>command-queue</em>, <em>program objects</em>, <em>kernel objects</em>, and
 <em>memory objects</em>. Computational resources are hardware elements that
 participate in the action of advancing a program counter. Examples
 include the <em>host</em>, <em>devices</em>, <em>compute units</em> and <em>processing
 elements</em>.
 <br>
 <br>
 <strong>Retain</strong>, Release: The action of incrementing (retain) and decrementing
 (release) the reference count using an OpenCL <em>object</em>. This is a book
 keeping functionality to make sure the system doesnt remove an <em>object</em>
 before all instances that use this <em>object</em> have finished. Refer to
 <em>Reference Count</em>.
 <br>
 <br>
 <strong>Sampler</strong>: An <em>object</em> that describes how to sample an image when the
 image is read in the <em>kernel</em>. The image read functions take a
 <em>sampler</em> as an argument. The <em>sampler</em> specifies the image
 addressing-mode i.e. how out-of-range image coordinates are handled, the
 filter mode, and whether the input image coordinate is a normalized or
 unnormalized value.
 <br>
 <br>
 <strong>Scope inclusion</strong>: Two actions <strong>A</strong> and <strong>B</strong> are defined to have an
 inclusive scope if they have the same scope <strong>P</strong> such that: (1) if <strong>P</strong> is
 memory_scope_sub_group, and <strong>A</strong> and <strong>B</strong> are executed by work-items
 within the same sub-group, or (2) if <strong>P</strong> is memory_scope_work_group, and
 <strong>A</strong> and <strong>B</strong> are executed by work-items within the same work-group, or
 (3) if <strong>P</strong> is memory_scope_device, and <strong>A</strong> and <strong>B</strong> are executed by
 work-items on the same device, or (4) if <strong>P</strong> is
 memory_scope_all_svm_devices, if <strong>A</strong> and <strong>B</strong> are executed by host
 threads or by work-items on one or more devices that can share SVM
 memory with each other and the host process.
 <br>
 <br>
 <strong>Sequenced before</strong>: A relation between evaluations executed by a single
 unit of execution. Sequenced-before is an asymmetric, transitive,
 pair-wise relation that induces a partial order between evaluations.
 Given any two evaluations A and B, if A is sequenced-before B, then the
 execution of A shall precede the execution of B.
 <br>
 <br>
 <strong>Sequential consistency</strong>: Sequential consistency interleaves the steps
 executed by each unit of execution. Each access to a memory location
 sees the last assignment to that location in that interleaving.
 <br>
 <br>
 <strong>Sequentially consistent semantics</strong>: One of the memory order semantics
 defined for synchronization operations. When using
 sequentially-consistent synchronization operations, the loads and stores
 within one unit of execution appear to execute in program order (i.e.,
 the sequenced-before order), and loads and stores from different units
 of execution appear to be simply interleaved.
 <br>
 <br>
 <strong>Shared Virtual Memory (SVM)</strong>: An address space exposed to both the host
 and the devices within a context. SVM causes addresses to be meaningful
 between the host and all of the devices within a context and therefore
 supports the use of pointer based data structures in OpenCL kernels. It
 logically extends a portion of the global memory into the host address
 space therefore giving work-items access to the host address space.
 There are three types of SVM in OpenCL <strong>Coarse-Grained buffer SVM</strong>:
 Sharing occurs at the granularity of regions of OpenCL buffer memory
 objects. <strong>Fine-Grained buffer SVM</strong>: Sharing occurs at the granularity
 of individual loads/stores into bytes within OpenCL buffer memory
 objects. <strong>Fine-Grained system SVM</strong>: Sharing occurs at the granularity of
 individual loads/stores into bytes occurring anywhere within the host
 memory.
 <br>
 <br>
 <strong>SIMD</strong>: Single Instruction Multiple Data. A programming model where a
 <em>kernel</em> is executed concurrently on multiple <em>processing elements</em> each
 with its own data and a shared program counter. All <em>processing
 elements</em> execute a strictly identical set of instructions.
 <br>
 <br>
 <strong>Specialization constants</strong>: Specialization is intended for constant
 objects that will not have known constant values until after initial
 generation of a SPIR-V module. Such objects are called specialization
 constants. Application might provide values for
 the specialization constants that will be used when SPIR-V program is
 built. Specialization constants that do not receive a value from an
 application shall use default value as defined in SPIR-V specification.
 <br>
 <br>
 <strong>SPMD</strong>: Single Program Multiple Data. A programming model where a
 <em>kernel</em> is executed concurrently on multiple <em>processing elements</em> each
 with its own data and its own program counter. Hence, while all
 computational resources run the same <em>kernel</em> they maintain their own
 instruction counter and due to branches in a <em>kernel</em>, the actual
 sequence of instructions can be quite different across the set of
 <em>processing elements</em>.
 <br>
 <br>
 <strong>Sub-device</strong>: An OpenCL <em>device</em> can be partitioned into multiple
 <em>sub-devices</em>. The new <em>sub-devices</em> alias specific collections of
 compute units within the parent <em>device</em>, according to a partition
 scheme. The <em>sub-devices</em> may be used in any situation that their
 parent <em>device</em> may be used. Partitioning a <em>device</em> does not destroy
 the parent <em>device</em>, which may continue to be used along side and
 intermingled with its child <em>sub-devices</em>. Also see <em>device</em>, <em>parent
 device</em> and <em>root device</em>.
 <br>
 <br>
 <strong>Sub-group</strong>: Sub-groups are an implementation-dependent grouping of
 work-items within a work-group.   The size and number of sub-groups is
 implementation-defined.
 <br>
 <br>
 <strong>Sub-group Barrier</strong>. See <em>Barrier</em>.
 <br>
 <br>
 <strong>Submitted</strong>: The second state in the six state model for the execution
 of a command. The transition into this state occurs when the command is
 flushed from the command-queue and submitted for execution on the
 device. Once submitted, a programmer can assume a command will execute
 once its prerequisites have been met.
 <br>
 <br>
 <strong>SVM Buffer</strong>: A memory allocation enabled to work with Shared Virtual
 Memory (SVM). Depending on how the SVM buffer is created, it can be a
 coarse-grained or fine-grained SVM buffer. Optionally it may be wrapped
 by a Buffer Object. See <em>Shared Virtual Memory (SVM)</em>.
 <br>
 <br>
 <strong>Synchronization</strong>: Synchronization refers to mechanisms that constrain
 the order of execution and the visibility of memory operations between
 two or more units of execution.
 <br>
 <br>
 <strong>Synchronization operations</strong>: Operations that define memory order
 constraints in a program. They play a special role in controlling how
 memory operations in one unit of execution (such as work-items or, when
 using SVM a host thread) are made visible to another. Synchronization
 operations in OpenCL include <em>atomic operations</em> and <em>fences</em>.
 <br>
 <br>
 <strong>Synchronization point</strong>: A synchronization point between a pair of
 commands (A and B) assures that results of command A happens-before
 command B is launched (i.e. enters the ready state) .
 <br>
 <br>
 <strong>Synchronizes with</strong>: A relation between operations in two different
 units of execution that defines a memory order constraint in global
 memory (<em>global-synchronizes-with</em>) or local memory
 (<em>local-synchronizes-with</em>).
 <br>
 <br>
 <strong>Task Parallel Programming Model</strong>: A programming model in which
 computations are expressed in terms of multiple concurrent tasks
 executing in one or more <em>command-queues</em>. The concurrent tasks can be
 running different <em>kernels</em>.
 <br>
 <br>
 <strong>Thread-safe</strong>: An OpenCL API call is considered to be <em>thread-safe</em> if
 the internal state as managed by OpenCL remains consistent when called
 simultaneously by multiple <em>host</em> threads. OpenCL API calls that are
 <em>thread-safe</em> allow an application to call these functions in multiple
 <em>host</em> threads without having to implement mutual exclusion across these
 <em>host</em> threads i.e. they are also re-entrant-safe.
 <br>
 <br>
 <strong>Undefined</strong>: The behavior of an OpenCL API call, built-in function used
 inside a <em>kernel</em> or execution of a <em>kernel</em> that is explicitly not
 defined by OpenCL. A conforming implementation is not required to
 specify what occurs when an undefined construct is encountered in
 OpenCL.
 <br>
 <br>
 <strong>Unit of execution</strong>: a generic term for a process, OS managed thread
 running on the host (a host-thread), kernel-instance, host program,
 work-item or any other executable agent that advances the work
 associated with a program.
 <br>
 <br>
 <strong>Work-group</strong>: A collection of related <em>work-items</em> that execute on a
 single <em>compute unit</em>. The <em>work-items</em> in the group execute the same
 <em>kernel-instance</em> and share <em>local</em> <em>memory</em> and <em>work-group functions</em>.
 <br>
 <br>
 <strong>Work-group Barrier</strong>. See <em>Barrier</em>.
 <br>
 <br>
 <strong>Work-group Function</strong>: A function that carries out collective operations
 across all the work-items in a work-group. Available collective
 operations are a barrier, reduction, broadcast, prefix sum, and
 evaluation of a predicate. A work-group function must occur within a
 <em>converged control flow</em>; i.e. all work-items in the work-group must
 encounter precisely the same work-group function.
 <br>
 <br>
 <strong>Work-group Synchronization</strong>: Constraints on the order of execution for
 work-items in a single work-group.
 <br>
 <br>
 <strong>Work-pool</strong>: A logical pool associated with a device that holds commands
 and work-groups from kernel-instances that are ready to execute. OpenCL
 does not constrain the order that commands and work-groups are scheduled
 for execution from the work-pool; i.e. a programmer must assume that
 they could be interleaved. There is one work-pool per device used by
 all command-queues associated with that device. The work-pool may be
 implemented in any manner as long as it assures that work-groups placed
 in the pool will eventually execute.
 <br>
 <br>
 <strong>Work-item</strong>: One of a collection of parallel executions of a <em>kernel</em>
 invoked on a <em>device</em> by a <em>command</em>. A <em>work-item</em> is executed by one
 or more <em>processing elements</em> as part of a <em>work-group</em> executing on a
 <em>compute unit</em>. A <em>work-item</em> is distinguished from other work-items by
 its <em>global ID</em> or the combination of its <em>work-group</em> ID and its <em>local
 ID</em> within a <em>work-group</em>.</p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_the_opencl_architecture">3. The OpenCL Architecture</h2>
 <div class="sectionbody">
 <div class="paragraph"><p><strong>OpenCL</strong> is an open industry standard for programming a heterogeneous
 collection of CPUs, GPUs and other discrete computing devices organized
 into a single platform. It is more than a language. OpenCL is a
 framework for parallel programming and includes a language, API,
 libraries and a runtime system to support software development. Using
 OpenCL, for example, a programmer can write general purpose programs
 that execute on GPUs without the need to map their algorithms onto a 3D
 graphics API such as OpenGL or DirectX.
 <br>
 <br>
 The target of OpenCL is expert programmers wanting to write portable yet
 efficient code. This includes library writers, middleware vendors, and
 performance oriented application programmers. Therefore OpenCL provides
 a low-level hardware abstraction plus a framework to support programming
 and many details of the underlying hardware are exposed.
 <br>
 <br>
 To describe the core ideas behind OpenCL, we will use a hierarchy of
 models:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Platform Model
 </p>
 </li>
 <li>
 <p>
 Memory Model
 </p>
 </li>
 <li>
 <p>
 Execution Model
 </p>
 </li>
 <li>
 <p>
 Programming Model
 </p>
 </li>
 </ul></div>
 <div class="sect2">
 <h3 id="_platform_model">3.1. Platform Model</h3>
 <div class="paragraph"><p>The Platform model for OpenCL is defined in <em>figure 3.1</em>. The model
 consists of a <strong>host</strong> connected to one or more <strong>OpenCL devices</strong>. An OpenCL
 device is divided into one or more <strong>compute units</strong> (CUs) which are further
 divided into one or more <strong>processing elements</strong> (PEs). Computations on a
 device occur within the processing elements.
 <br>
 <br>
 An OpenCL application is implemented as both host code and device kernel
 code.  The host code portion of an OpenCL application runs on a host
 processor according to the models native to the host platform. The
 OpenCL application host code submits the kernel code as commands from
 the host to OpenCL devices.  An OpenCL device executes the commands
 computation on the processing elements within the device.
 <br>
 <br>
 An OpenCL device has considerable latitude on how computations are
 mapped onto the devices processing elements.  When processing elements
 within a compute unit execute the same sequence of statements across the
 processing elements, the control flow is said to be <em>converged.</em>
 Hardware optimized for executing a single stream of instructions over
 multiple processing elements is well suited to converged control
 flows. When the control flow varies from one processing element to
 another, it is said to be <em>diverged.</em> While a kernel always begins
 execution with a converged control flow, due to branching statements
 within a kernel, converged and diverged control flows may occur within a
 single kernel. This provides a great deal of flexibility in the
 algorithms that can be implemented with OpenCL.
 <br>
 <br></p></div>
 <div class="paragraph"><p><span class="image">
 <img src="opencl22-API_files/image004_new.png" alt="opencl22-API_files/image004_new.png" width="320" height="180">
 </span></p></div>
 <div class="paragraph"><p><strong>Figure 3.1</strong>: <em>Platform model &#8230; one host plus one or more compute devices each
 with one or more compute units composed of one or more processing elements</em>.
 <br>
 <br>
 Programmers provide programs in the form of SPIR-V source binaries,
 OpenCL C or OpenCL C++ source strings or implementation-defined binary objects. The
 OpenCL platform provides a compiler to translate program input of either
 form into executable program objects. The device code compiler may be
 <em>online</em> or <em>offline</em>. An <em>online</em> <em>compiler</em> is available during host
 program execution using standard APIs. An <em>offline compiler</em> is
 invoked outside of host program control, using platform-specific
 methods. The OpenCL runtime allows developers to get a previously
 compiled device program executable and be able to load and execute a
 previously compiled device program executable.
 <br>
 <br>
 OpenCL defines two kinds of platform profiles: a <em>full profile</em> and a
 reduced-functionality <em>embedded profile</em>. A full profile platform must
 provide an online compiler for all its devices. An embedded platform
 may provide an online compiler, but is not required to do so.
 <br>
 <br>
 A device may expose special purpose functionality as a <em>built-in
 function</em>. The platform provides APIs for enumerating and invoking the
 built-in functions offered by a device, but otherwise does not define
 their construction or semantics. A <em>custom device</em> supports only
 built-in functions, and cannot be programmed via a kernel language.
 <br>
 <br>
 All device types support the OpenCL execution model, the OpenCL memory
 model, and the APIs used in OpenCL to manage devices.
 <br>
 <br>
 The platform model is an abstraction describing how OpenCL views the
 hardware. The relationship between the elements of the platform model
 and the hardware in a system may be a fixed property of a device or it
 may be a dynamic feature of a program dependent on how a compiler
 optimizes code to best utilize physical hardware.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_execution_model">3.2. Execution Model</h3>
 <div class="paragraph"><p>The OpenCL execution model is defined in terms of two distinct units of
 execution: <strong>kernels</strong> that execute on one or more OpenCL devices and a
 <strong>host program</strong> that executes on the host. With regard to OpenCL, the
 kernels are where the "work" associated with a computation occurs. This
 work occurs through <strong>work-items</strong> that execute in groups (<strong>work-groups</strong>).
 <br>
 <br>
 A kernel executes within a well-defined context managed by the host.
 The context defines the environment within which kernels execute. It
 includes the following resources:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Devices</strong>: One or
 more devices exposed by the OpenCL platform.
 </p>
 </li>
 <li>
 <p>
 <strong>Kernel Objects</strong>:The
 OpenCL functions with their associated argument values that run on
 OpenCL devices.
 </p>
 </li>
 <li>
 <p>
 <strong>Program Objects</strong>:The
 program source and executable that implement the kernels.
 </p>
 </li>
 <li>
 <p>
 <strong>Memory
 Objects</strong>:Variables visible to the host and the OpenCL devices.
 Instances of kernels operate on these objects as they execute.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The host program uses the OpenCL API to create and manage the context.
 Functions from the OpenCL API enable the host to interact with a device
 through a <em>command-queue</em>. Each command-queue is associated with a
 single device. The commands placed into the command-queue fall into
 one of three types:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Kernel-enqueue commands</strong>:
 Enqueue a kernel for execution on a device.
 </p>
 </li>
 <li>
 <p>
 <strong>Memory commands</strong>:
 Transfer data between the host and device memory, between memory
 objects, or map and unmap memory objects from the host address space.
 </p>
 </li>
 <li>
 <p>
 <strong>Synchronization
 commands</strong>: Explicit synchronization points that define order constraints
 between commands.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>In addition to commands submitted from the host command-queue, a kernel
 running on a device can enqueue commands to a device-side command queue.
 This results in <em>child kernels</em> enqueued by a kernel executing on a
 device (the <em>parent kernel</em>). Regardless of whether the command-queue
 resides on the host or a device, each command passes through six states.</p></div>
 <div class="olist arabic"><ol class="arabic">
 <li>
 <p>
 <strong>Queued</strong>: The command is enqueued to a command-queue. A
 command may reside in the queue until it is flushed either explicitly (a
 call to clFlush) or implicitly by some other command.
 </p>
 </li>
 <li>
 <p>
 <strong>Submitted</strong>: The command is flushed from the command-queue and
 submitted for execution on the device. Once flushed from the
 command-queue, a command will execute after any prerequisites for
 execution are met.
 </p>
 </li>
 <li>
 <p>
 <strong>Ready</strong>: All prerequisites constraining execution of a command
 have been met. The command, or for a kernel-enqueue command the
 collection of work groups associated with a command, is placed in a
 device work-pool from which it is scheduled for execution.
 </p>
 </li>
 <li>
 <p>
 <strong>Running</strong>: Execution of the command starts. For the case of a
 kernel-enqueue command, one or more work-groups associated with the
 command start to execute.
 </p>
 </li>
 <li>
 <p>
 <strong>Ended</strong>: Execution of a command ends. When a Kernel-enqueue
 command ends, all of the work-groups associated with that command have
 finished their execution. <em>Immediate side effects</em>, i.e. those
 associated with the kernel but not necessarily with its child kernels,
 are visible to other units of execution. These side effects include
 updates to values in global memory.
 </p>
 </li>
 <li>
 <p>
 <strong>Complete</strong>: The command and its child commands have finished
 execution and the status of the event object, if any, associated with
 the command is set to CL_COMPLETE.
 </p>
 </li>
 </ol></div>
 <div class="paragraph"><p>The execution states and the transitions between them are summarized in
 Figure 3-2. These states and the concept of a device work-pool are
 conceptual elements of the execution model. An implementation of OpenCL
 has considerable freedom in how these are exposed to a program. Five of
 the transitions, however, are directly observable through a profiling
 interface. These profiled states are shown in Figure 3-2.</p></div>
 <div class="paragraph"><p><span class="image">
 <img src="opencl22-API_files/image006.jpg" alt="image">
 </span></p></div>
 <div class="paragraph"><p><strong>Figure 3-2: The states and transitions between states defined in the
 OpenCL execution model. A subset of these transitions is exposed
 through the profiling interface (see section 5.14).</strong></p></div>
 <div class="paragraph"><p>Commands communicate their status through <em>Event objects</em>. Successful
 completion is indicated by setting the event status associated with a
 command to CL_COMPLETE. Unsuccessful completion results in abnormal
 termination of the command which is indicated by setting the event
 status to a negative value. In this case, the command-queue associated
 with the abnormally terminated command and all other command-queues in
 the same context may no longer be available and their behavior is
 implementation defined.
 <br>
 <br>
 A command submitted to a device will not launch until prerequisites that
 constrain the order of commands have been resolved. These
 prerequisites have three sources:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 They may arise from
 commands submitted to a command-queue that constrain the order in which
 commands are launched. For example, commands that follow a command queue
 barrier will not launch until all commands prior to the barrier are
 complete.
 </p>
 </li>
 <li>
 <p>
 The second source of
 prerequisites is dependencies between commands expressed through events.
 A command may include an optional list of events. The command will wait
 and not launch until all the events in the list are in the state CL
 COMPLETE. By this mechanism, event objects define order constraints
 between commands and coordinate execution between the host and one or
 more devices.
 </p>
 </li>
 <li>
 <p>
 The third source of
 prerequisities can be the presence of non-trivial C initializers or C<span class="monospaced">
 constructors for program scope global variables. In this case, OpenCL
 C/C</span> compiler shall generate program initialization kernels that
 perform C initialization or C<span class="monospaced"> construction. These kernels must be
 executed by OpenCL runtime on a device before any kernel from the same
 program can be executed on the same device. The ND-range for any program
 initialization kernel is (1,1,1). When multiple programs are linked
 together, the order of execution of program initialization kernels
 that belong to different programs is undefined.
 <br>
 <br>
 Program clean up may result in the execution of one or more program
 clean up kernels by the OpenCL runtime. This is due to the presence of
 non-trivial C\</span> destructors for program scope variables. The ND-range
 for executing any program clean up kernel is (1,1,1). The order of
 execution of clean up kernels from different programs (that are linked
 together) is undefined.
 <br>
 <br>
 Note that C initializers, C<span class="monospaced"> constructors, or C</span> destructors for
 program scope variables cannot use pointers to coarse grain and fine
 grain SVM allocations.
 <br>
 <br>
 A command may be submitted to a device and yet have no visible side effects
 outside of waiting on and satisfying event dependences. Examples include
 markers, kernels executed over ranges of no work-items or copy
 operations with zero sizes. Such commands may pass directly from the
 <em>ready</em> state to the <em>ended</em> state.
 <br>
 <br>
 Command execution can be blocking or non-blocking. Consider a sequence
 of OpenCL commands. For blocking commands, the OpenCL API functions
 that enqueue commands don&#8217;t return until the command has completed.
 Alternatively, OpenCL functions that enqueue non-blocking commands
 return immediately and require that a programmer defines dependencies
 between enqueued commands to ensure that enqueued commands are not
 launched before needed resources are available. In both cases, the
 actual execution of the command may occur asynchronously with execution
 of the host program.
 <br>
 <br>
 Commands within a single command-queue execute relative to each other in
 one of two modes:
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>In-order Execution</strong>:
 Commands and any side effects associated with commands appear to the
 OpenCL application as if they execute in the same order they are
 enqueued to a command-queue.
 </p>
 </li>
 <li>
 <p>
 <strong>Out-of-order Execution</strong>:
 Commands execute in any order constrained only by explicit
 synchronization points (e.g. through command queue barriers) or explicit
 dependencies on events.
 <br>
 <br>
 Multiple command-queues can be present within a single context.
 Multiple command-queues execute commands independently. Event objects
 visible to the host program can be used to define synchronization points
 between commands in multiple command queues. If such synchronization
 points are established between commands in multiple command-queues, an
 implementation must assure that the command-queues progress concurrently
 and correctly account for the dependencies established by the
 synchronization points. For a detailed explanation of synchronization
 points, see section 3.2.4.
 <br>
 <br>
 The core of the OpenCL execution model is defined by how the kernels
 execute. When a kernel-enqueue command submits a kernel for execution,
 an index space is defined. The kernel, the argument values associated
 with the arguments to the kernel, and the parameters that define the
 index space define a <em>kernel-instance</em>. When a kernel-instance executes
 on a device, the kernel function executes for each point in the defined
 index space. Each of these executing kernel functions is called a
 <em>work-item</em>. The work-items associated with a given kernel-instance are
 managed by the device in groups called <em>work-groups</em>. These work-groups
 define a coarse grained decomposition of the Index space. Work-groups
 are further divided into <em>sub-groups</em>, which provide an additional level
 of control over execution.
 <br>
 <br>
 Work-items have a global ID based on their coordinates within the Index
 space. They can also be defined in terms of their work-group and the
 local ID within a work-group. The details of this mapping are described
 in the following section.
 </p>
 </li>
 </ul></div>
 <div class="sect3">
 <h4 id="_execution_model_mapping_work_items_onto_an_ndrange">3.2.1. Execution Model: Mapping work-items onto an NDRange</h4>
 <div class="paragraph"><p>The index space supported by OpenCL is called an NDRange. An NDRange is
 an N-dimensional index space, where N is one, two or three. The NDRange
 is decomposed into work-groups forming blocks that cover the Index
 space. An NDRange is defined by three integer arrays of length N:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The extent of the index
 space (or global size) in each dimension.
 </p>
 </li>
 <li>
 <p>
 An offset index F
 indicating the initial value of the indices in each dimension (zero by
 default).
 </p>
 </li>
 <li>
 <p>
 The size of a work-group
 (local size) in each dimension.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Each work-items global ID is an N-dimensional tuple. The global ID
 components are values in the range from F, to F plus the number of
 elements in that dimension minus one.
 <br>
 <br>
 If a kernel is created from OpenCL C 2.0 or SPIR-V, the size of work-groups
 in an NDRange (the local size) need not be the same for all work-groups.
 In this case, any single dimension for which the global size is not
 divisible by the local size will be partitioned into two regions. One
 region will have work-groups that have the same number of work items as
 was specified for that dimension by the programmer (the local size). The
 other region will have work-groups with less than the number of work
 items specified by the local size parameter in that dimension (the
 <em>remainder work-groups</em>). Work-group sizes could be non-uniform in
 multiple dimensions, potentially producing work-groups of up to 4
 different sizes in a 2D range and 8 different sizes in a 3D range.
 <br>
 <br>
 Each work-item is assigned to a work-group and given a local ID to
 represent its position within the work-group. A work-item&#8217;s local ID is
 an N-dimensional tuple with components in the range from zero to the
 size of the work-group in that dimension minus one.
 <br>
 <br>
 Work-groups are assigned IDs similarly. The number of work-groups in
 each dimension is not directly defined but is inferred from the local
 and global NDRanges provided when a kernel-instance is enqueued. A
 work-group&#8217;s ID is an N-dimensional tuple with components in the range 0
 to the ceiling of the global size in that dimension divided by the local
 size in the same dimension. As a result, the combination of a
 work-group ID and the local-ID within a work-group uniquely defines a
 work-item. Each work-item is identifiable in two ways; in terms of a
 global index, and in terms of a work-group index plus a local index
 within a work group.
 <br>
 <br>
 For example, consider the 2-dimensional index space in figure 3-3. We
 input the index space for the work-items (G<sub>x</sub>, G<sub>y</sub>), the size of each
 work-group (S<sub>x</sub>, S<sub>y</sub>) and the global ID offset (F<sub>x</sub>, F<sub>y</sub>). The
 global indices define an G<sub>x</sub>by G<sub>y</sub> index space where the total number
 of work-items is the product of G<sub>x</sub> and G<sub>y</sub>. The local indices define
 an S<sub>x</sub> by S<sub>y</sub> index space where the number of work-items in a single
 work-group is the product of S<sub>x</sub> and S<sub>y</sub>. Given the size of each
 work-group and the total number of work-items we can compute the number
 of work-groups. A 2-dimensional index space is used to uniquely identify
 a work-group. Each work-item is identified by its global ID (<em>g</em><sub>x</sub>,
 <em>g</em><sub>y</sub>) or by the combination of the work-group ID (<em>w</em><sub>x</sub>, <em>w</em><sub>y</sub>), the
 size of each work-group (S<sub>x</sub>,S<sub>y</sub>) and the local ID (s<sub>x</sub>, s<sub>y</sub>) inside
 the work-group such that
 <br></p></div>
 <div class="paragraph"><p>&#160; &#160; &#160; &#160; (g<sub>x</sub> , g<sub>y</sub>) = (w<sub>x</sub> * S<sub>x</sub> + s<sub>x</sub> + F<sub>x</sub>, w<sub>y</sub> * S<sub>y</sub> + s<sub>y</sub> + F<sub>y</sub>)
 <br>
 <br>
 The number of work-groups can be computed as:
 <br></p></div>
 <div class="paragraph"><p>&#160; &#160; &#160; &#160; (W<sub>x</sub>, W<sub>y</sub>) = (ceil(G<sub>x</sub> / S<sub>x</sub>),ceil( G<sub>y</sub> / S<sub>y</sub>))
 <br>
 <br>
 Given a global ID and the work-group size, the work-group ID for a
 work-item is computed as:
 <br></p></div>
 <div class="paragraph"><p>&#160; &#160; &#160; &#160; (w<sub>x</sub>, w<sub>y</sub>) = ( (g<sub>x</sub>  s<sub>x</sub>  F<sub>x</sub>) / S<sub>x</sub>, (g<sub>y</sub>  s<sub>y</sub>  F<sub>y</sub>) /
 S<sub>y</sub> )</p></div>
 <div class="paragraph"><p><span class="image">
 <img src="opencl22-API_files/image007.jpg" alt="image">
 </span></p></div>
 <div class="paragraph"><p><strong>Figure 3-3: An example of an NDRange index space showing work-items,
 their global IDs and their mapping onto the pair of work-group and local
 IDs. In this case, we assume that in each dimension, the size of the
 work-group evenly divides the global NDRange size (i.e. all work-groups
 have the same size) and that the offset is equal to zero.</strong>
 <br>
 <br>
 Within a work-group work-items may be divided into sub-groups. The
 mapping of work-items to sub-groups is implementation-defined and may be
 queried at runtime. While sub-groups may be used in multi-dimensional
 work-groups, each sub-group is 1-dimensional and any given work-item may
 query which sub-group it is a member of.
 <br>
 <br>
 Work items are mapped into sub-groups through a combination of
 compile-time decisions and the parameters of the dispatch. The mapping
 to sub-groups is invariant for the duration of a kernels execution,
 across dispatches of a given kernel with the same work-group dimensions,
 between dispatches and query operations consistent with the dispatch
 parameterization, and from one work-group to another within the dispatch
 (excluding the trailing edge work-groups in the presence of non-uniform
 work-group sizes). In addition, all sub-groups within a work-group will
 be the same size, apart from the sub-group with the maximum index which
 may be smaller if the size of the work-group is not evenly divisible by
 the size of the sub-groups.
 <br>
 <br>
 In the degenerate case, a single sub-group must be supported for each
 work-group. In this situation all sub-group scope functions are
 equivalent to their work-group level equivalents.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_execution_model_execution_of_kernel_instances">3.2.2. Execution Model: Execution of kernel-instances</h4>
 <div class="paragraph"><p>The work carried out by an OpenCL program occurs through the execution
 of kernel-instances on compute devices. To understand the details of
 OpenCLs execution model, we need to consider how a kernel object moves
 from the kernel-enqueue command, into a command-queue, executes on a
 device, and completes.
 <br>
 <br>
 A kernel-object is defined from a function within the program object and
 a collection of arguments connecting the kernel to a set of argument
 values. The host program enqueues a kernel-object to the command queue
 along with the NDRange, and the work-group decomposition. These define
 a <em>kernel-instance</em>. In addition, an optional set of events may be
 defined when the kernel is enqueued. The events associated with a
 particular kernel-instance are used to constrain when the
 kernel-instance is launched with respect to other commands in the queue
 or to commands in other queues within the same context.
 <br>
 <br>
 A kernel-instance is submitted to a device. For an in-order command
 queue, the kernel instances appear to launch and then execute in that
 same order; where we use the term appear to emphasize that when there
 are no dependencies between commands and hence differences in the order
 that commands execute cannot be observed in a program, an implementation
 can reorder commands even in an in-order command queue. For an out of
 order command-queue, kernel-instances wait to be launched until:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Synchronization commands
 enqueued prior to the kernel-instance are satisfied.
 </p>
 </li>
 <li>
 <p>
 Each of the events in an
 optional event list defined when the kernel-instance was enqueued are
 set to CL_COMPLETE.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Once these conditions are met, the kernel-instance is launched and the
 work-groups associated with the kernel-instance are placed into a pool
 of ready to execute work-groups. This pool is called a <em>work-pool</em>.
 The work-pool may be implemented in any manner as long as it assures
 that work-groups placed in the pool will eventually execute. The
 device schedules work-groups from the work-pool for execution on the
 compute units of the device. The kernel-enqueue command is complete when
 all work-groups associated with the kernel-instance end their execution,
 updates to global memory associated with a command are visible globally,
 and the device signals successful completion by setting the event
 associated with the kernel-enqueue command to CL_COMPLETE.
 <br>
 <br>
 While a command-queue is associated with only one device, a single
 device may be associated with multiple command-queues all feeding into
 the single work-pool. A device may also be associated with command
 queues associated with different contexts within the same platform,
 again all feeding into the single work-pool. The device will pull
 work-groups from the work-pool and execute them on one or several
 compute units in any order; possibly interleaving execution of
 work-groups from multiple commands. A conforming implementation may
 choose to serialize the work-groups so a correct algorithm cannot assume
 that work-groups will execute in parallel. There is no safe and
 portable way to synchronize across the independent execution of
 work-groups since once in the work-pool, they can execute in any order.
 <br>
 <br>
 The work-items within a single sub-group execute concurrently but not
 necessarily in parallel (i.e. they are not guaranteed to make
 independent forward progress). Therefore, only high-level
 synchronization constructs (e.g. sub-group functions such as barriers)
 that apply to all the work-items in a sub-group are well defined and
 included in OpenCL.
 <br>
 <br>
 Sub-groups execute concurrently within a given work-group and with
 appropriate device support (<em>see Section__4.2</em>) may make independent
 forward progress with respect to each other, with respect to host
 threads and with respect to any entities external to the OpenCL system
 but running on an OpenCL device, even in the absence of work-group
 barrier operations. In this situation, sub-groups are able to internally
 synchronize using barrier operations without synchronizing with each
 other and may perform operations that rely on runtime dependencies on
 operations other sub-groups perform.
 <br>
 <br>
 The work-items within a single work-group execute concurrently but are
 only guaranteed to make independent progress in the presence of
 sub-groups and device support. In the absence of this capability, only
 high-level synchronization constructs (e.g. work-group functions such as
 barriers) that apply to all the work-items in a work-group are well
 defined and included in OpenCL for synchronization within the
 work-group.
 <br>
 <br>
 In the absence of synchronization functions (e.g. a barrier), work-items
 within a sub-group may be serialized. In the presence of sub -group
 functions, work-items within a sub -group may be serialized before any
 given sub -group function, between dynamically encountered pairs of sub
 -group functions and between a work-group function and the end of the
 kernel.
 <br>
 <br>
 In the absence of independent forward progress of constituent
 sub-groups, work-items within a work-group may be serialized before,
 after or between work-group synchronization functions.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_execution_model_device_side_enqueue">3.2.3. Execution Model: Device-side enqueue</h4>
 <div class="paragraph"><p>Algorithms may need to generate additional work as they execute. In
 many cases, this additional work cannot be determined statically; so the
 work associated with a kernel only emerges at runtime as the
 kernel-instance executes. This capability could be implemented in logic
 running within the host program, but involvement of the host may add
 significant overhead and/or complexity to the application control
 flow. A more efficient approach would be to nest kernel-enqueue
 commands from inside other kernels. This <strong>nested parallelism</strong> can be
 realized by supporting the enqueuing of kernels on a device without
 direct involvement by the host program; so-called <strong>device-side
 enqueue</strong>.
 <br>
 <br>
 Device-side kernel-enqueue commands are similar to host-side
 kernel-enqueue commands. The kernel executing on a device (the <strong>parent
 kernel</strong>) enqueues a kernel-instance (the <strong>child kernel</strong>) to a
 device-side command queue. This is an out-of-order command-queue and
 follows the same behavior as the out-of-order command-queues exposed to
 the host program. Commands enqueued to a device side command-queue
 generate and use events to enforce order constraints just as for the
 command-queue on the host. These events, however, are only visible to
 the parent kernel running on the device. When these prerequisite
 events take on the value CL_COMPLETE, the work-groups associated with
 the child kernel are launched into the devices work pool. The device
 then schedules them for execution on the compute units of the device.
 Child and parent kernels execute asynchronously. However, a parent will
 not indicate that it is complete by setting its event to CL_COMPLETE
 until all child kernels have ended execution and have signaled
 completion by setting any associated events to the value CL_COMPLETE.
 Should any child kernel complete with an event status set to a negative
 value (i.e. abnormally terminate), the parent kernel will abnormally
 terminate and propagate the childs negative event value as the value of
 the parents event. If there are multiple children that have an event
 status set to a negative value, the selection of which childs negative
 event value is propagated is implementation-defined.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_execution_model_synchronization">3.2.4. Execution Model: Synchronization</h4>
 <div class="paragraph"><p>Synchronization refers to mechanisms that constrain the order of
 execution between two or more units of execution. Consider the
 following three domains of synchronization in OpenCL:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Work-group
 synchronization: Constraints on the order of execution for work-items in
 a single work-group
 </p>
 </li>
 <li>
 <p>
 Sub-group synchronization:
 Contraints on the order of execution for work-items in a single
 sub-group
 </p>
 </li>
 <li>
 <p>
 Command synchronization:
 Constraints on the order of commands launched for execution
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Synchronization across all work-items within a single work-group is
 carried out using a <em>work-group function</em>. These functions carry out
 collective operations across all the work-items in a work-group.
 Available collective operations are: barrier, reduction, broadcast,
 prefix sum, and evaluation of a predicate. A work-group function must
 occur within a converged control flow; i.e. all work-items in the
 work-group must encounter precisely the same work-group function. For
 example, if a work-group function occurs within a loop, the work-items
 must encounter the same work-group function in the same loop
 iterations. All the work-items of a work-group must execute the
 work-group function and complete reads and writes to memory before any
 are allowed to continue execution beyond the work-group function.
 Work-group functions that apply between work-groups are not provided in
 OpenCL since OpenCL does not define forward-progress or ordering
 relations between work-groups, hence collective synchronization
 operations are not well defined.
 <br>
 <br>
 Synchronization across all work-items within a single sub-group is
 carried out using a <em>sub-group function</em>. These functions carry out
 collective operations across all the work-items in a sub-group.
 Available collective operations are: barrier, reduction, broadcast,
 prefix sum, and evaluation of a predicate. A sub-group function must
 occur within a converged control flow; i.e. all work-items in the
 sub-group must encounter precisely the same sub-group function. For
 example, if a work-group function occurs within a loop, the work-items
 must encounter the same sub-group function in the same loop iterations.
 All the work-items of a sub-group must execute the sub-group function
 and complete reads and writes to memory before any are allowed to
 continue execution beyond the sub-group function. Synchronization
 between sub-groups must either be performed using work-group functions,
 or through memory operations. Using memory operations for sub-group
 synchronization should be used carefully as forward progress of
 sub-groups relative to each other is only supported optionally by OpenCL
 implementations.
 <br>
 <br>
 Command synchronization is defined in terms of distinct <strong>synchronization
 points</strong>. The synchronization points occur between commands in host
 command-queues and between commands in device-side command-queues. The
 synchronization points defined in OpenCL include:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Launching a command:</strong> A
 kernel-instance is launched onto a device after all events that kernel
 is waiting-on have been set to CL_COMPLETE.
 </p>
 </li>
 <li>
 <p>
 <strong>Ending a command:</strong> Child
 kernels may be enqueued such that they wait for the parent kernel to
 reach the <em>end</em> state before they can be launched. In this case, the
 ending of the parent command defines a synchronization point.
 </p>
 </li>
 <li>
 <p>
 <strong>Completion of a command:</strong>
 A kernel-instance is complete after all of the work-groups in the kernel
 and all of its child kernels have completed. This is signaled to the
 host, a parent kernel or other kernels within command queues by setting
 the value of the event associated with a kernel to CL_COMPLETE.
 </p>
 </li>
 <li>
 <p>
 <strong>Blocking Commands:</strong> A
 blocking command defines a synchronization point between the unit of
 execution that calls the blocking API function and the enqueued command
 reaching the complete state.
 </p>
 </li>
 <li>
 <p>
 <strong>Command-queue barrier:</strong>
 The command-queue barrier ensures that all previously enqueued commands
 have completed before subsequently enqueued commands can be launched.
 </p>
 </li>
 <li>
 <p>
 <strong>clFinish:</strong> This function
 blocks until all previously enqueued commands in the command queue have
 completed after which clFinish defines a synchronization point and the
 clFinish function returns.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>A synchronization point between a pair of commands (A and B) assures
 that results of command A happens-before command B is launched. This
 requires that any updates to memory from command A complete and are made
 available to other commands before the synchronization point completes.
 Likewise, this requires that command B waits until after the
 synchronization point before loading values from global memory. The
 concept of a synchronization point works in a similar fashion for
 commands such as a barrier that apply to two sets of commands. All the
 commands prior to the barrier must complete and make their results
 available to following commands. Furthermore, any commands following
 the barrier must wait for the commands prior to the barrier before
 loading values and continuing their execution.
 <br>
 <br>
 These <em>happens-before</em> relationships are a fundamental part of the
 OpenCL memory model. When applied at the level of commands, they are
 straightforward to define at a language level in terms of ordering
 relationships between different commands. Ordering memory operations
 inside different commands, however, requires rules more complex than can
 be captured by the high level concept of a synchronization point.
 These rules are described in detail in section 3.3.6.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_execution_model_categories_of_kernels">3.2.5. Execution Model: Categories of Kernels</h4>
 <div class="paragraph"><p>The OpenCL execution model supports three types of kernels:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>OpenCL kernels</strong> are
 managed by the OpenCL API as kernel-objects associated with kernel
 functions within program-objects. OpenCL kernels are provided via a
 kernel language.
 All OpenCL implementations must support OpenCL kernels supplied in the
 standard SPIR-V intermediate language with the appropriate environment
 specification, and the OpenCL C programming language defined in earlier
 versions of the OpenCL specification. Implementations must also support
 OpenCL kernels in
 SPIR-V intermediate language.  SPIR-V binaries nay be
 generated from an
 OpenCL kernel language or by a third party compiler from an
 alternative input.
 </p>
 </li>
 <li>
 <p>
 <strong>Native kernels</strong> are
 accessed through a host function pointer. Native kernels are queued for
 execution along with OpenCL kernels on a device and share memory objects
 with OpenCL kernels. For example, these native kernels could be
 functions defined in application code or exported from a library. The
 ability to execute native kernels is optional within OpenCL and the
 semantics of native kernels are implementation-defined. The OpenCL API
 includes functions to query capabilities of a device(s) and determine if
 this capability is supported.
 </p>
 </li>
 <li>
 <p>
 <strong>Built-in kernels</strong> are tied
 to particular device and are not built at runtime from source code in a
 program object. The common use of built in kernels is to expose
 fixed-function hardware or firmware associated with a particular OpenCL
 device or custom device. The semantics of a built-in kernel may be
 defined outside of OpenCL and hence are implementation defined.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>All three types of kernels are manipulated through the OpenCL command
 queues and must conform to the synchronization points defined in the
 OpenCL execution model.</p></div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_memory_model">3.3. Memory Model</h3>
 <div class="paragraph"><p>The OpenCL memory model describes the structure, contents, and behavior
 of the memory exposed by an OpenCL platform as an OpenCL program runs.
 The model allows a programmer to reason about values in memory as the
 host program and multiple kernel-instances execute.
 <br>
 <br>
 An OpenCL program defines a context that includes a host, one or more
 devices, command-queues, and memory exposed within the context.
 Consider the units of execution involved with such a program. The host
 program runs as one or more host threads managed by the operating system
 running on the host (the details of which are defined outside of
 OpenCL). There may be multiple devices in a single context which all
 have access to memory objects defined by OpenCL. On a single device,
 multiple work-groups may execute in parallel with potentially
 overlapping updates to memory. Finally, within a single work-group,
 multiple work-items concurrently execute, once again with potentially
 overlapping updates to memory.
 <br>
 <br>
 The memory model must precisely define how the values in memory as seen
 from each of these units of execution interact so a programmer can
 reason about the correctness of OpenCL programs. We define the memory
 model in four parts.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Memory regions: The
 distinct memories visible to the host and the devices that share a
 context.
 </p>
 </li>
 <li>
 <p>
 Memory objects: The
 objects defined by the OpenCL API and their management by the host and
 devices.
 </p>
 </li>
 <li>
 <p>
 Shared Virtual Memory: A
 virtual address space exposed to both the host and the devices within a
 context.
 </p>
 </li>
 <li>
 <p>
 Consistency Model: Rules
 that define which values are observed when multiple units of execution
 load data from memory plus the atomic/fence operations that constrain
 the order of memory operations and define synchronization relationships.
 </p>
 </li>
 </ul></div>
 <div class="sect3">
 <h4 id="_memory_model_fundamental_memory_regions">3.3.1. Memory Model: Fundamental Memory Regions</h4>
 <div class="paragraph"><p>Memory in OpenCL is divided into two parts.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Host Memory:</strong> The memory
 directly available to the host. The detailed behavior of host memory is
 defined outside of OpenCL. Memory objects move between the Host and the
 devices through functions within the OpenCL API or through a shared
 virtual memory interface.
 </p>
 </li>
 <li>
 <p>
 <strong>Device Memory:</strong> Memory
 directly available to kernels executing on OpenCL devices.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Device memory consists of four named address spaces or <em>memory regions</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Global Memory:</strong> This
 memory region permits read/write access to all work-items in all
 work-groups running on any device within a context. Work-items can read
 from or write to any element of a memory object. Reads and writes to
 global memory may be cached depending on the capabilities of the device.
 </p>
 </li>
 <li>
 <p>
 <strong>Constant Memory</strong>: A
 region of global memory that remains constant during the execution of a
 kernel-instance. The host allocates and initializes memory objects
 placed into constant memory.
 </p>
 </li>
 <li>
 <p>
 <strong>Local Memory</strong>: A memory
 region local to a work-group. This memory region can be used to allocate
 variables that are shared by all work-items in that work-group.
 </p>
 </li>
 <li>
 <p>
 <strong>Private Memory</strong>: A region
 of memory private to a work-item. Variables defined in one work-items
 private memory are not visible to another work-item.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The memory regions and their relationship to the OpenCL Platform model
 are summarized in figure 3-4. Local and private memories are always
 associated with a particular device. The global and constant memories,
 however, are shared between all devices within a given context. An
 OpenCL device may include a cache to support efficient access to these
 shared memories
 <br>
 <br>
 To understand memory in OpenCL, it is important to appreciate the
 relationships between these named address spaces.   The four named
 address spaces available to a device are disjoint meaning they do not
 overlap.   This is a logical relationship, however, and an
 implementation may choose to let these disjoint named address spaces
 share physical memory.
 <br>
 <br>
 Programmers often need functions callable from kernels where the
 pointers manipulated by those functions can point to multiple named
 address spaces. This saves a programmer from the error-prone and
 wasteful practice of creating multiple copies of functions; one for each
 named address space. Therefore the global, local and private address
 spaces belong to a single <em>generic address space</em>. This is closely
 modeled after the concept of a generic address space used in the
 embedded C standard (ISO/IEC 9899:1999). Since they all belong to a
 single generic address space, the following properties are supported for
 pointers to named address spaces in device memory:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 A pointer to the generic
 address space can be cast to a pointer to a global, local or private
 address space
 </p>
 </li>
 <li>
 <p>
 A pointer to a global,
 local or private address space can be cast to a pointer to the generic
 address space.
 </p>
 </li>
 <li>
 <p>
 A pointer to a global,
 local or private address space can be implicitly converted to a pointer
 to the generic address space, but the converse is not allowed.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The constant address space is disjoint from the generic address space.
 <br>
 <br>
 The addresses of memory associated with memory objects in Global memory
 are not preserved between kernel instances, between a device and the
 host, and between devices. In this regard global memory acts as a global
 pool of memory objects rather than an address space. This restriction is
 relaxed when shared virtual memory (SVM) is used.
 <br>
 <br>
 SVM causes addresses to be meaningful between the host and all of the
 devices within a context hence supporting the use of pointer based data
 structures in OpenCL kernels. It logically extends a portion of the
 global memory into the host address space giving work-items access to
 the host address space. On platforms with hardware support for a shared
 address space between the host and one or more devices, SVM may also
 provide a more efficient way to share data between devices and the host.
 Details about SVM are presented in section 3.3.3.</p></div>
 <div class="paragraph"><p><span class="image">
 <img src="opencl22-API_files/image008.jpg" alt="image">
 </span></p></div>
 <div class="paragraph"><p><strong>Figure 3-4: The named address spaces exposed in an OpenCL Platform.
 Global and Constant memories are shared between the one or more devices
 within a context, while local and private memories are associated with a
 single device. Each device may include an optional cache to support
 efficient access to their view of the global and constant address
 spaces.</strong></p></div>
 <div class="paragraph"><p>A programmer may use the features of the memory consistency model
 (section 3.3.4) to manage safe access to global memory from multiple
 work-items potentially running on one or more devices. In addition, when
 using shared virtual memory (SVM), the memory consistency model may also
 be used to ensure that host threads safely access memory locations in
 the shared memory region.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_memory_model_memory_objects">3.3.2. Memory Model: Memory Objects</h4>
 <div class="paragraph"><p>The contents of global memory are <em>memory objects</em>. A memory object is
 a handle to a reference counted region of global memory. Memory objects
 use the OpenCL type <em>cl_mem</em> and fall into three distinct classes.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Buffer</strong>: A memory object
 stored as a block of contiguous memory and used as a general purpose
 object to hold data used in an OpenCL program. The types of the values
 within a buffer may be any of the built in types (such as int, float),
 vector types, or user-defined structures. The buffer can be
 manipulated through pointers much as one would with any block of memory
 in C.
 </p>
 </li>
 <li>
 <p>
 <strong>Image</strong>: An image memory
 object holds one, two or three dimensional images. The formats are
 based on the standard image formats used in graphics applications. An
 image is an opaque data structure managed by functions defined in the
 OpenCL API. To optimize the manipulation of images stored in the
 texture memories found in many GPUs, OpenCL kernels have traditionally
 been disallowed from both reading and writing a single image. In OpenCL
 2.0, however, we have relaxed this restriction by providing
 synchronization and fence operations that let programmers properly
 synchronize their code to safely allow a kernel to read and write a
 single image.
 </p>
 </li>
 <li>
 <p>
 <strong>Pipe</strong>: The <em>pipe</em> memory
 object conceptually is an ordered sequence of data items. A pipe has
 two endpoints: a write endpoint into which data items are inserted, and
 a read endpoint from which data items are removed. At any one time,
 only one kernel instance may write into a pipe, and only one kernel
 instance may read from a pipe. To support the producer consumer design
 pattern, one kernel instance connects to the write endpoint (the
 producer) while another kernel instance connects to the reading endpoint
 (the consumer).
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Memory objects are allocated by host APIs. The host program can provide
 the runtime with a pointer to a block of continuous memory to hold the
 memory object when the object is created (CL_MEM_USE_HOST_PTR).
 Alternatively, the physical memory can be managed by the OpenCL runtime
 and not be directly accessible to the host program.
 <br>
 <br>
 Allocation and access to memory objects within the different memory
 regions varies between the host and work-items running on a device.
 This is summarized in table 3.1 which__describes whether the kernel or
 the host can allocate from a memory region, the type of allocation
 (static at compile time vs. dynamic at runtime) and the type of access
 allowed (i.e. whether the kernel or the host can read and/or write to a
 memory region).</p></div>
 <div style="page-break-after:always"></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:80%;
 ">
 <col style="width:20%;">
 <col style="width:20%;">
 <col style="width:20%;">
 <col style="width:20%;">
 <col style="width:20%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Global</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Constant</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Local</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Private</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" rowspan="2" ><p class="tableblock">Host</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Dynamic Allocation</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Dynamic Allocation</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Dynamic Allocation</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">No Allocation</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Read/Write access to buffers and images but not pipes</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Read/Write access</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">No access</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">No access</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" rowspan="2" ><p class="tableblock">Kernel</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Static Allocation for program scope variables</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Static Allocation</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Static Allocation. Dynamic allocation for child kernel</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Static Allocation</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Read/Write access</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Read-only access</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Read/Write access. No access to child&#8217;s local memory.</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Read/Write access</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>Table 3 1: The different memory regions in
 OpenCL and how memory objects are allocated and accessed by the host and
 by an executing instance of a kernel. For the case of kernels, we
 distinguish between the behavior of local memory with respect to a
 kernel (self) and its child kernels.</strong></p></div>
 <div class="paragraph"><p>Once allocated, a memory object is made available to kernel-instances
 running on one or more devices. In addition to shared virtual memory
 (section 3.3.3) there are three basic ways to manage the contents of
 buffers between the host and devices.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Read/Write/Fill
 commands</strong>: The data associated with a memory object is explicitly read
 and written between the host and global memory regions using commands
 enqueued to an OpenCL command queue.
 </p>
 </li>
 <li>
 <p>
 <strong>Map/Unmap commands</strong>: Data
 from the memory object is mapped into a contiguous block of memory
 accessed through a host accessible pointer. The host program enqueues a
 <em>map</em> command on block of a memory object before it can be safely
 manipulated by the host program. When the host program is finished
 working with the block of memory, the host program enqueues an <em>unmap</em>
 command to allow a kernel-instance to safely read and/or write the
 buffer.**
 </p>
 </li>
 <li>
 <p>
 <strong>Copy commands:</strong> The data
 associated with a memory object is copied between two buffers, each of
 which may reside either on the host or on the device.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In both cases, the commands to transfer data between devices and the
 host can be blocking or non-blocking operations. The OpenCL function
 call for a blocking memory transfer returns once the associated memory
 resources on the host can be safely reused. For a non-blocking memory
 transfer, the OpenCL function call returns as soon as the command is
 enqueued.
 <br>
 <br>
 Memory objects are bound to a context and hence can appear in multiple
 kernel-instances running on more than one physical device. The OpenCL
 platform must support a large range of hardware platforms including
 systems that do not support a single shared address space in hardware;
 hence the ways memory objects can be shared between kernel-instances is
 restricted. The basic principle is that multiple read operations on
 memory objects from multiple kernel-instances that overlap in time are
 allowed, but mixing overlapping reads and writes into the same memory
 objects from different kernel instances is only allowed when fine
 grained synchronization is used with shared virtual memory (see section
 3.3.3).
 <br>
 <br>
 When global memory is manipulated by multiple kernel-instances running
 on multiple devices, the OpenCL runtime system must manage the
 association of memory objects with a given device. In most cases the
 OpenCL runtime will implicitly associate a memory object with a device.
 A kernel instance is naturally associated with the command queue to
 which the kernel was submitted. Since a command-queue can only access a
 single device, the queue uniquely defines which device is involved with
 any given kernel-instance; hence defining a clear association between
 memory objects, kernel-instances and devices. Programmers may
 anticipate these associations in their programs and explicitly manage
 association of memory objects with devices in order to improve
 performance.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_memory_model_shared_virtual_memory">3.3.3. Memory Model: Shared Virtual Memory</h4>
 <div class="paragraph"><p>OpenCL extends the global memory region into the host memory region
 through a shared virtual memory (SVM) mechanism. There are three types
 of SVM in OpenCL</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>Coarse-Grained buffer
 SVM</strong>: Sharing occurs at the granularity of regions of OpenCL buffer
 memory objects. Consistency is enforced at synchronization points and
 with map/unmap commands to drive updates between the host and the
 device. This form of SVM is similar to non-SVM use of memory; however,
 it lets kernel-instances share pointer-based data structures (such as
 linked-lists) with the host program. Program scope global variables are
 treated as per-device coarse-grained SVM for addressing and sharing
 purposes.
 </p>
 </li>
 <li>
 <p>
 <strong>Fine-Grained buffer
 SVM</strong>: Sharing occurs at the granularity of individual loads/stores into
 bytes within OpenCL buffer memory objects. Loads and stores may be
 cached. This means consistency is guaranteed at synchronization points.
 If the optional OpenCL atomics are supported, they can be used to
 provide fine-grained control of memory consistency.
 </p>
 </li>
 <li>
 <p>
 <strong>Fine-Grained system SVM</strong>:
 Sharing occurs at the granularity of individual loads/stores into bytes
 occurring anywhere within the host memory. Loads and stores may be
 cached so consistency is guaranteed at synchronization points. If the
 optional OpenCL atomics are supported, they can be used to provide
 fine-grained control of memory consistency.
 </p>
 </li>
 </ul></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 1. <strong>A summary of shared virtual memory (SVM) options in OpenCL</strong></caption>
 <col style="width:20%;">
 <col style="width:20%;">
 <col style="width:20%;">
 <col style="width:20%;">
 <col style="width:20%;">
 <tbody>
 <tr>
 <td class="tableblock halign-center valign-top" ><p class="tableblock"></p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Granularity of sharing</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Memory Allocation</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Mechanisms to enforce Consistency</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Explicit updates
 between host and device</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Non-SVM buffers</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">OpenCL Memory objects(buffer)</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">clCreateBuffer</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Host synchronization points on the same or between
 devices.</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">yes, through Map and Unmap commands.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Coarse-Grained buffer SVM</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">OpenCL Memory objects (buffer)</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">clSVMAlloc</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Host synchronization points
 between devices</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">yes, through Map and Unmap commands.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Fine Grained buffer SVM</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Bytes within OpenCL Memory objects (buffer)</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">clSVMAlloc</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Synchronization points plus atomics (if supported)</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">No</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Fine-Grained system SVM</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Bytes within Host memory (system)</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Host memory allocation mechanisms (e.g. malloc)</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">Synchronization points plus atomics (if
 supported)</p></td>
 <td class="tableblock halign-center valign-top" ><p class="tableblock">No</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>Coarse-Grained buffer SVM is required in the core OpenCL specification.
 The two finer grained approaches are optional features in OpenCL. The
 various SVM mechanisms to access host memory from the work-items
 associated with a kernel instance are summarized in table 3-2.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_memory_model_memory_consistency_model">3.3.4. Memory Model: Memory Consistency Model</h4>
 <div class="paragraph"><p>The OpenCL memory model tells programmers what they can expect from an
 OpenCL implementation; which memory operations are guaranteed to happen
 in which order and which memory values each read operation will return.
 The memory model tells compiler writers which restrictions they must
 follow when implementing compiler optimizations; which variables they
 can cache in registers and when they can move reads or writes around a
 barrier or atomic operation. The memory model also tells hardware
 designers about limitations on hardware optimizations; for example, when
 they must flush or invalidate hardware caches.
 <br>
 <br>
 The memory consistency model in OpenCL is based on the memory model from
 the ISO C11 programming language. To help make the presentation more
 precise and self-contained, we include modified paragraphs taken
 verbatim from the ISO C11 international standard. When a paragraph is
 taken or modified from the C11 standard, it is identified as such along
 with its original location in the C11 standard.
 <br>
 <br>
 For programmers, the most intuitive model is the <em>sequential
 consistency</em> memory model. Sequential consistency interleaves the steps
 executed by each of the units of execution. Each access to a memory
 location sees the last assignment to that location in that
 interleaving. While sequential consistency is relatively
 straightforward for a programmer to reason about, implementing
 sequential consistency is expensive. Therefore, OpenCL implements a
 relaxed memory consistency model; i.e. it is possible to write programs
 where the loads from memory violate sequential consistency. Fortunately,
 if a program does not contain any races and if the program only uses
 atomic operations that utilize the sequentially consistent memory order
 (the default memory ordering for OpenCL), OpenCL programs appear to
 execute with sequential consistency.
 <br>
 <br>
 Programmers can to some degree control how the memory model is relaxed by choosing the memory order for synchronization operations. The precise semantics of synchronization and the memory orders are formally defined in section 3.3.6. Here, we give a high level description of how these memory orders apply to atomic operations on atomic objects shared between units of execution. OpenCL memory_order choices are based on those from the ANSI C11 standard memory model. They are specified in certain OpenCL functions through the following enumeration constants:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>memory_order_relaxed</strong>:
 implies no order constraints. This memory order can be used safely to
 increment counters that are concurrently incremented, but it doesnt
 guarantee anything about the ordering with respect to operations to
 other memory locations. It can also be used, for example, to do ticket
 allocation and by expert programmers implementing lock-free algorithms.
 </p>
 </li>
 <li>
 <p>
 <strong>memory_order_acquire</strong>: A
 synchronization operation (fence or atomic) that has acquire semantics
 "acquires" side-effects from a release operation that synchronises with
 it: if an acquire synchronises with a release, the acquiring unit of
 execution will see all side-effects preceding that release (and possibly
 subsequent side-effects.) As part of carefully-designed protocols,
 programmers can use an "acquire" to safely observe the work of another
 unit of execution.
 </p>
 </li>
 <li>
 <p>
 <strong>memory_order_release</strong>: A
 synchronization operation (fence or atomic operation) that has release
 semantics "releases" side effects to an acquire operation that
 synchronises with it. All side effects that precede the release are
 included in the release. As part of carefully-designed protocols,
 programmers can use a "release" to make changes made in one unit of
 execution visible to other units of execution.
 </p>
 </li>
 </ul></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">In general, no acquire must <em>always</em> synchronise with any
 particular release. However, synchronisation can be forced by certain
 executions. See 3.3.6.2 for detailed rules for when synchronisation
 must occur.</td>
 </tr></table>
 </div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>memory_order_acq_rel</strong>: A
 synchronization operation with acquire-release semantics has the
 properties of both the acquire and release memory orders. It is
 typically used to order read-modify-write operations.
 </p>
 </li>
 <li>
 <p>
 <strong>memory_order_seq_cst</strong>:
 The loads and stores of each unit of execution appear to execute in
 program (i.e., sequenced-before) order, and the loads and stores from
 different units of execution appear to be simply interleaved.
 <br>
 <br>
 Regardless of which memory_order is specified, resolving constraints on
 memory operations across a heterogeneous platform adds considerable
 overhead to the execution of a program. An OpenCL platform may be able
 to optimize certain operations that depend on the features of the memory
 consistency model by restricting the scope of the memory operations.
 Distinct memory scopes are defined by the values of the memory_scope
 enumeration constant:
 </p>
 </li>
 <li>
 <p>
 <strong>memory_scope_work_item</strong>:
 memory-ordering constraints only apply within the
 work-item.<span class="footnote"><br>[This value for memory_scope can only be used with atomic_work_item_fence with flags set
 to LCK_IMAGE_MEM_FENCE.]<br></span>.
 </p>
 </li>
 <li>
 <p>
 <strong>memory_scope_sub_group</strong>:memory-ordering constraints only apply within
 the sub-group.
 </p>
 </li>
 <li>
 <p>
 <strong>memory_scope_work_group</strong>:
 memory-ordering constraints only apply to work-items executing within a
 single work-group.
 </p>
 </li>
 <li>
 <p>
 <strong>memory_scope_device:</strong>
 memory-ordering constraints only apply to work-items executing on a
 single device
 </p>
 </li>
 <li>
 <p>
 <strong>memory_scope_all_svm_devices</strong>: memory-ordering constraints apply to
 work-items executing across multiple devices and (when using SVM) the
 host. A release performed with <strong>memory_scope_all_svm_devices</strong> to a
 buffer that does not have the CL_MEM_SVM_ATOMICS flag set will commit to
 at least <strong>memory_scope_device</strong> visibility, with full synchronization of
 the buffer at a queue synchronization point (e.g. an OpenCL event).
 <br>
 <br>
 These memory scopes define a hierarchy of visibilities when analyzing
 the ordering constraints of memory operations. For example if a
 programmer knows that a sequence of memory operations will only be
 associated with a collection of work-items from a single work-group (and
 hence will run on a single device), the implementation is spared the
 overhead of managing the memory orders across other devices within the
 same context. This can substantially reduce overhead in a program. All
 memory scopes are valid when used on global memory or local memory. For
 local memory, all visibility is constrained to within a given work-group
 and scopes wider than <strong>memory_scope_work_group</strong> carry no additional
 meaning.
 <br>
 <br>
 In the following subsections (leading up to section 3.4), we will
 explain the synchronization constructs and detailed rules needed to use
 OpenCLs relaxed memory models. It is important to appreciate,
 however, that many programs do not benefit from relaxed memory models.
 Even expert programmers have a difficult time using atomics and fences
 to write correct programs with relaxed memory models. A large number of
 OpenCL programs can be written using a simplified memory model. This is
 accomplished by following these guidelines.
 </p>
 </li>
 <li>
 <p>
 Write programs that manage
 safe sharing of global memory objects through the synchronization points
 defined by the command queues.
 </p>
 </li>
 <li>
 <p>
 Restrict low level
 synchronization inside work-groups to the work-group functions such as
 barrier.
 </p>
 </li>
 <li>
 <p>
 If you want sequential
 consistency behavior with system allocations or fine-grain SVM buffers
 with atomics support, use only memory_order_seq_cst operations with the
 scope memory_scope_all_svm_devices.
 </p>
 </li>
 <li>
 <p>
 If you want sequential
 consistency behavior when not using system allocations or fine-grain SVM
 buffers with atomics support, use only memory_order_seq_cst operations
 with the scope memory_scope_device or memory_scope_all_svm_devices.
 </p>
 </li>
 <li>
 <p>
 Ensure your program has no
 races.
 <br>
 <br>
 If these guidelines are followed in your OpenCL programs, you can skip
 the detailed rules behind the relaxed memory models and go directly to
 section 3.4.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_memory_model_overview_of_atomic_and_fence_operations">3.3.5. Memory Model: Overview of atomic and fence operations</h4>
 <div class="paragraph"><p>The OpenCL 2.0 specification defines a number of <em>synchronization
 operations</em> that are used to define memory order constraints in a
 program. They play a special role in controlling how memory operations
 in one unit of execution (such as work-items or, when using SVM a host
 thread) are made visible to another. There are two types of
 synchronization operations in OpenCL; <em>atomic operations</em> and <em>fences</em>.
 <br>
 <br>
 Atomic operations are indivisible. They either occur completely or not
 at all. These operations are used to order memory operations between
 units of execution and hence they are parameterized with the
 memory_order and memory_scope parameters defined by the OpenCL memory
 consistency model. The atomic operations for OpenCL kernel languages
 are similar to the corresponding operations defined
 by the C11 standard.
 <br>
 <br>
 The OpenCL 2.0 atomic operations apply to variables of an atomic type (a
 subset of those in the C11 standard) including atomic versions of the
 int, uint, long, ulong, float, double, half, intptr_t, uintptr_t,
 size_t, and ptrdiff_t types. However, support for some of these atomic
 types depends on support for the corresponding regular types.
 <br>
 <br>
 An atomic operation on one or more memory locations is either an acquire
 operation, a release operation, or both an acquire and release
 operation. An atomic operation without an associated memory location is
 a fence and can be either an acquire fence, a release fence, or both an
 acquire and release fence. In addition, there are relaxed atomic
 operations, which do not have synchronization properties, and atomic
 read-modify-write operations, which have special characteristics. [C11
 standard, Section 5.1.2.4, paragraph 5, modified]
 <br>
 <br>
 The orders memory_order_acquire (used for reads), memory_order_release
 (used for writes), and memory_order_acq_rel (used for read-modify-write
 operations) are used for simple communication between units of execution
 using shared variables. Informally, executing a memory_order_release on
 an atomic object A makes all previous side effects visible to any unit
 of execution that later executes a memory_order_acquire on A. The orders
 memory_order_acquire, memory_order_release, and memory_order_acq_rel do
 not provide sequential consistency for race-free programs because they
 will not ensure that atomic stores followed by atomic loads become
 visible to other threads in that order.
 <br>
 <br>
 The fence operation is atomic_work_item_fence, which includes a
 memory_order argument as well as the memory_scope and cl_mem_fence_flags
 arguments. Depending on the memory_order argument, this operation:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 has no effects, if
 memory_order_relaxed;
 </p>
 </li>
 <li>
 <p>
 is an acquire fence, if
 memory_order_acquire;
 </p>
 </li>
 <li>
 <p>
 is a release fence, if
 memory_order_release;
 </p>
 </li>
 <li>
 <p>
 is both an acquire fence
 and a release fence, if memory_order_acq_rel;
 </p>
 </li>
 <li>
 <p>
 is a
 sequentially-consistent fence with both acquire and release semantics,
 if memory_order_seq_cst.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>If specified, the cl_mem_fence_flags argument must be
 CLK_IMAGE_MEM_FENCE, CLK_GLOBAL_MEM_FENCE, CLK_LOCAL_MEM_FENCE, or
 CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE.
 <br>
 <br>
 The atomic_work_item_fence(CLK_IMAGE_MEM_FENCE) built-in function must
 be
 <br>
 <br>
 used to make sure that sampler-less writes are visible to later reads by
 the same work-item. Without use of the atomic_work_item_fence function,
 write-read coherence on image objects is not guaranteed: if a work-item
 reads from an image to which it has previously written without an
 intervening atomic_work_item_fence, it is not guaranteed that those
 previous writes are visible to the work-item.
 <br>
 <br>
 The synchronization operations in OpenCL can be parameterized by a
 memory_scope. Memory scopes control the extent that an atomic operation
 or fence is visible with respect to the memory model. These memory
 scopes may be used when performing atomic operations and fences on
 global memory and local memory. When used on global memory visibility
 is bounded by the capabilities of that memory. When used on a
 fine-grained non-atomic SVM buffer, a coarse-grained SVM buffer, or a
 non-SVM buffer, operations parameterized with
 memory_scope_all_svm_devices will behave as if they were parameterized
 with memory_scope_device. When used on local memory, visibility is
 bounded by the work-group and, as a result, memory_scope with wider
 visibility than memory_scope_work_group will be reduced to
 memory_scope_work_group.</p></div>
 <div class="paragraph"><p>Two actions <strong>A</strong> and <strong>B</strong> are defined to have an inclusive scope if they
 have the same scope <strong>P</strong> such that:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>P</strong> is memory_scope_sub_group and <strong>A</strong> and <strong>B</strong> are executed by
 work-items within the same sub-group.
 </p>
 </li>
 <li>
 <p>
 <strong>P</strong> is memory_scope_work_group and <strong>A</strong> and <strong>B</strong> are executed by
 work-items within the same work-group.
 </p>
 </li>
 <li>
 <p>
 <strong>P</strong> is memory_scope_device and <strong>A</strong> and <strong>B</strong> are executed by
 work-items on the same device when <strong>A</strong> and <strong>B</strong> apply to an SVM
 allocation or <strong>A</strong> and <strong>B</strong> are executed by work-items in the same kernel
 or one of its children when <strong>A</strong> and <strong>B</strong> apply to a cl_mem buffer.
 </p>
 </li>
 <li>
 <p>
 <strong>P</strong> is memory_scope_all_svm_devices if <strong>A</strong> and <strong>B</strong> are
 executed by host threads or by work-items on one or more devices that
 can share SVM memory with each other and the host process.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_memory_model_memory_ordering_rules">3.3.6. Memory Model: Memory Ordering Rules</h4>
 <div class="paragraph"><p>Fundamentally, the issue in a memory model is to understand the
 orderings in time of modifications to objects in memory. Modifying an
 object or calling a function that modifies an object are side effects,
 i.e. changes in the state of the execution environment. Evaluation of an
 expression in general includes both value computations and initiation of
 side effects. Value computation for an lvalue expression includes
 determining the identity of the designated object. [C11 standard,
 Section 5.1.2.3, paragraph 2, modified]
 <br>
 <br>
 We assume that the OpenCL kernel language and host
 programming languages have a sequenced-before relation between the
 evaluations executed by a single unit of execution. This
 sequenced-before relation is an asymmetric, transitive, pair-wise
 relation between those evaluations, which induces a partial order among
 them. Given any two evaluations <strong>A</strong> and <strong>B</strong>, if <strong>A</strong> is sequenced-before
 <strong>B</strong>, then the execution of <strong>A</strong> shall precede the execution of <strong>B</strong>.
 (Conversely, if <strong>A</strong> is sequenced-before <strong>B</strong>, then <strong>B</strong> is sequenced-after
 <strong>A</strong>.) If <strong>A</strong> is not sequenced-before or sequenced-after <strong>B</strong>, then <strong>A</strong>
 and <strong>B</strong> are unsequenced. Evaluations <strong>A</strong> and <strong>B</strong> are indeterminately
 sequenced when <strong>A</strong> is either sequenced-before or sequenced-after <strong>B</strong>,
 but it is unspecified which. [C11 standard, Section 5.1.2.3, paragraph
 3, modified]
 <br>
 <br>
 NOTE: sequenced-before is a partial order of the operations executed by
 a single unit of execution (e.g. a host thread or work-item). It
 generally corresponds to the source program order of those operations,
 and is partial because of the undefined argument evaluation order of
 OpenCLs kernel C language.
 <br>
 <br>
 In an OpenCL kernel language, the value of an object
 visible to a work-item W at a particular point is the initial value of
 the object, a value stored in the object by W, or a value stored in the
 object by another work-item or host thread, according to the rules
 below. Depending on details of the host programming language, the value
 of an object visible to a host thread may also be the value stored in
 that object by another work-item or host thread. [C11 standard, Section
 5.1.2.4, paragraph 2, modified]
 <br>
 <br>
 Two expression evaluations conflict if one of them modifies a memory
 location and the other one reads or modifies the same memory location. [C11 standard, Section 5.1.2.4, paragraph 4]
 <br>
 <br>
 All modifications to a particular atomic object <strong>M</strong> occur in some
 particular total order, called the modification order of <strong>M</strong>. If <strong>A</strong> and
 <strong>B</strong> are modifications of an atomic object <strong>M</strong>, and <strong>A</strong> happens-before
 <strong>B</strong>, then <strong>A</strong> shall precede <strong>B</strong> in the modification order of <strong>M</strong>, which
 is defined below. Note that the modification order of an atomic object
 <strong>M</strong> is independent of whether <strong>M</strong> is in local or global memory. [C11
 standard, Section 5.1.2.4, paragraph 7, modified]
 <br>
 <br>
 A release sequence begins with a release operation <strong>A</strong> on an atomic
 object <strong>M</strong> and is the maximal contiguous sub-sequence of side effects
 in the modification order of <strong>M</strong>, where the first operation is <strong>A</strong> and
 every subsequent operation either is performed by the same work-item or
 host thread that performed the release or is an atomic
 read-modify-write operation. [C11 standard, Section 5.1.2.4, paragraph
 10, modified]
 <br>
 <br>
 OpenCLs local and global memories are disjoint. Kernels may access both
 kinds of memory while host threads may only access global memory.
 Furthermore, the <em>flags</em> argument of OpenCLs work_group_barrier
 function specifies which memory operations the function will make
 visible: these memory operations can be, for example, just the ones to
 local memory, or the ones to global memory, or both. Since the
 visibility of memory operations can be specified for local memory
 separately from global memory, we define two related but independent
 relations, <em>global-synchronizes-with</em> and <em>local-synchronizes-with</em>.
 Certain operations on global memory may global-synchronize-with other
 operations performed by another work-item or host thread. An example is
 a release atomic operation in one work- item that
 global-synchronizes-with an acquire atomic operation in a second
 work-item. Similarly, certain atomic operations on local objects in
 kernels can local-synchronize- with other atomic operations on those
 local objects. [C11 standard, Section 5.1.2.4, paragraph 11, modified]
 <br>
 <br>
 We define two separate happens-before relations: global-happens-before
 and local-happens-before.</p></div>
 <div class="paragraph"><p>A global memory action <strong>A</strong> global-happens-before a global memory action
 *B*__if</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>A</strong> is sequenced before
 <strong>B</strong>, or
 </p>
 </li>
 <li>
 <p>
 <strong>A</strong>
 global-synchronizes-with <strong>B</strong>, or
 </p>
 </li>
 <li>
 <p>
 For some global memory
 action <strong>C</strong>, <strong>A</strong> global-happens-before <strong>C</strong> and <strong>C</strong> global-happens-before
 <strong>B</strong>.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A local memory action <strong>A</strong> local-happens-before a local memory action
 *B*__if</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>A</strong> is sequenced before
 <strong>B</strong>, or
 </p>
 </li>
 <li>
 <p>
 <strong>A</strong>
 local-synchronizes-with <strong>B</strong>, or
 </p>
 </li>
 <li>
 <p>
 For some local memory
 action <strong>C</strong>, <strong>A</strong> local-happens-before <strong>C</strong> and <strong>C</strong> local-happens-before
 <strong>B</strong>.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>An OpenCL implementation shall ensure that no program execution
 demonstrates a cycle in either the local-happens-before relation or
 the global-happens-before relation.
 <br>
 <br>
 NOTE: The global- and local-happens-before relations are critical to
 defining what values are read and when data races occur. The
 global-happens-before relation, for example, defines what global memory
 operations definitely happen before what other global memory operations.
 If an operation <strong>A</strong> global-happens-before operation <strong>B</strong> then <strong>A</strong> must
 occur before <strong>B</strong>; in particular, any write done by <strong>A</strong> will be visible
 to <strong>B</strong>. The local-happens-before relation has similar properties for
 local memory. Programmers can use the local- and global-happens-before
 relations to reason about the order of program actions.
 <br>
 <br>
 A visible side effect <strong>A</strong> on a global object <strong>M</strong> with respect to a value
 computation <strong>B</strong> of <strong>M</strong> satisfies the conditions:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>A</strong> global-happens-before
 <strong>B</strong>, and
 </p>
 </li>
 <li>
 <p>
 there is no other side
 effect <strong>X</strong> to <strong>M</strong> such that <strong>A</strong> global-happens-before <strong>X</strong> and <strong>X</strong>
 global-happens-before <strong>B</strong>.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>We define visible side effects for local objects <strong>M</strong> similarly. The
 value of a non-atomic scalar object <strong>M</strong>, as determined by evaluation
 <strong>B</strong>, shall be the value stored by the visible side effect <strong>A</strong>. [C11
 standard, Section 5.1.2.4, paragraph 19, modified]
 <br>
 <br>
 The execution of a program contains a data race if it contains two
 conflicting actions A and B in different units of execution, and</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 (1) at least one of <strong>A</strong> or
 <strong>B</strong> is not atomic, or <strong>A</strong> and <strong>B</strong> do not have inclusive memory scope,
 and
 </p>
 </li>
 <li>
 <p>
 (2) the actions are global
 actions unordered by the global-happens-before relation or are local
 actions unordered by the local-happens-before relation.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Any such data race results in undefined behavior. [C11 standard, Section
 5.1.2.4, paragraph 25, modified]
 <br>
 <br>
 We also define the visible sequence of side effects on local and global
 atomic objects. The remaining paragraphs of this subsection define this
 sequence for a global atomic object <strong>M</strong>; the visible sequence of side
 effects for a local atomic object is defined similarly by using the
 local-happens-before relation.
 <br>
 <br>
 The visible sequence of side effects on a global atomic object <strong>M</strong>, with
 respect to a value computation <strong>B</strong> of <strong>M</strong>, is a maximal contiguous
 sub-sequence of side effects in the modification order of <strong>M</strong>, where the
 first side effect is visible with respect to <strong>B</strong>, and for every side
 effect, it is not the case that <strong>B</strong> global-happens-before it. The value
 of*M*, as determined by evaluation <strong>B</strong>, shall be the value stored by
 some operation in the visible sequence of <strong>M</strong> with respect to <strong>B</strong>. [C11
 standard, Section 5.1.2.4, paragraph 22, modified]
 <br>
 <br>
 If an operation <strong>A</strong> that modifies an atomic object <strong>M</strong> global-happens
 before an operation <strong>B</strong> that modifies <strong>M</strong>, then <strong>A</strong> shall be earlier
 than <strong>B</strong> in the modification order of <strong>M</strong>. This requirement is known as
 write-write coherence.
 <br>
 <br>
 If a value computation <strong>A</strong> of an atomic object <strong>M</strong> global-happens-before
 a value computation <strong>B</strong> of <strong>M</strong>, and <strong>A</strong> takes its value from a side
 effect <strong>X</strong> on <strong>M</strong>, then the value computed by <strong>B</strong> shall either equal the
 value stored by <strong>X</strong>, or be the value stored by a side effect <strong>Y</strong> on <strong>M</strong>,
 where <strong>Y</strong> follows <strong>X</strong> in the modification order of <strong>M</strong>. This requirement
 is known as read-read coherence. [C11 standard, Section 5.1.2.4,
 paragraph 22, modified]
 <br>
 <br>
 If a value computation <strong>A</strong> of an atomic object <strong>M</strong> global-happens-before
 an operation <strong>B</strong> on <strong>M</strong>, then <strong>A</strong> shall take its value from a side
 effect <strong>X</strong> on <strong>M</strong>, where <strong>X</strong> precedes <strong>B</strong> in the modification order of
 <strong>M</strong>. This requirement is known as read-write coherence.
 <br>
 <br>
 If a side effect <strong>X</strong> on an atomic object <strong>M</strong> global-happens-before a
 value computation <strong>B</strong> of <strong>M</strong>, then the evaluation <strong>B</strong> shall take its
 value from <strong>X</strong> or from a side effect <strong>Y</strong> that follows <strong>X*in the
 modification order of *M</strong>. This requirement is known as write-read
 coherence.</p></div>
 <div class="sect4">
 <h5 id="_memory_ordering_rules_atomic_operations">Memory Ordering Rules: Atomic Operations</h5>
 <div class="paragraph"><p>This and following sections describe how different program actions in
 kernel C code and the host program contribute to the local- and
 global-happens-before relations. This section discusses ordering rules
 for OpenCL 2.0s atomic operations.</p></div>
 <div class="paragraph"><p>Section 3.2.3 defined the enumerated type memory_order.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 For memory_order_relaxed,
 no operation orders memory.
 </p>
 </li>
 <li>
 <p>
 For memory_order_release,
 memory_order_acq_rel, and memory_order_seq_cst, a store operation
 performs a release operation on the affected memory location.
 </p>
 </li>
 <li>
 <p>
 For memory_order_acquire,
 memory_order_acq_rel, and memory_order_seq_cst, a load operation
 performs an acquire operation on the affected memory location. [C11
 standard, Section 7.17.3, paragraphs 2-4, modified]
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Certain built-in functions synchronize with other built-in functions
 performed by another unit of execution. This is true for pairs of
 release and acquire operations under specific circumstances. An atomic
 operation <strong>A</strong> that performs a release operation on a global object <strong>M</strong>
 global-synchronizes-with an atomic operation <strong>B</strong> that performs an
 acquire operation on <strong>M</strong> and reads a value written by any side effect in
 the release sequence headed by <strong>A</strong>. A similar rule holds for atomic
 operations on objects in local memory: an atomic operation <strong>A</strong> that
 performs a release operation on a local object <strong>M</strong>
 local-synchronizes-with an atomic operation <strong>B</strong> that performs an acquire
 operation on <strong>M</strong> and reads a value written by any side effect in the
 release sequence headed by <strong>A</strong>. [C11 standard, Section 5.1.2.4,
 paragraph 11, modified]
 <br>
 <br>
 NOTE: Atomic operations specifying memory_order_relaxed are relaxed only
 with respect to memory ordering. Implementations must still guarantee
 that any given atomic access to a particular atomic object be
 indivisible with respect to all other atomic accesses to that object.
 <br>
 <br>
 There shall exist a single total order <strong>S</strong> for all memory_order_seq_cst
 operations that is consistent with the modification orders for all
 affected locations, as well as the appropriate global-happens-before and
 local-happens-before orders for those locations, such that each
 memory_order_seq operation <strong>B</strong> that loads a value from an atomic object
 <strong>M</strong> in global or local memory observes one of the following values:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 the result of the last
 modification <strong>A</strong> of <strong>M</strong> that precedes <strong>B</strong> in <strong>S</strong>, if it exists, or
 </p>
 </li>
 <li>
 <p>
 if <strong>A</strong> exists, the result
 of some modification of <strong>M</strong> in the visible sequence of side effects
 with respect to <strong>B</strong> that is not memory_order_seq_cst and that does not
 happen before <strong>A</strong>, or
 </p>
 </li>
 <li>
 <p>
 if <strong>A</strong> does not exist, the
 result of some modification of <strong>M</strong> in the visible sequence of side
 effects with respect to <strong>B</strong> that is not memory_order_seq_cst. [C11
 standard, Section 7.17.3, paragraph 6, modified]
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Let X and Y be two memory_order_seq_cst operations. If X
 local-synchronizes-with or global-synchronizes-with Y then X both
 local-synchronizes-with Y and global-synchronizes-with Y.
 <br>
 <br>
 If the total order <strong>S</strong> exists, the following rules hold:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 For an atomic operation
 <strong>B</strong> that reads the value of an atomic object <strong>M</strong>, if there is a
 memory_order_seq_cst fence <strong>X</strong> sequenced-before <strong>B</strong>, then <strong>B</strong> observes
 either the last memory_order_seq_cst modification of <strong>M</strong> preceding <strong>X</strong>
 in the total order <strong>S</strong> or a later modification of <strong>M</strong> in its
 modification order. [C11 standard, Section 7.17.3, paragraph 9]
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 For atomic operations <strong>A</strong>
 and <strong>B</strong> on an atomic object <strong>M</strong>, where <strong>A</strong> modifies <strong>M</strong> and <strong>B</strong> takes
 its value, if there is a memory_order_seq_cst fence <strong>X</strong> such that <strong>A</strong> is
 sequenced-before <strong>X</strong> and <strong>B</strong> follows <strong>X</strong> in <strong>S</strong>, then <strong>B</strong> observes
 either the effects of <strong>A</strong> or a later modification of <strong>M</strong> in its
 modification order. [C11 standard, Section 7.17.3, paragraph 10]
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 For atomic operations <strong>A</strong>
 and <strong>B</strong> on an atomic object <strong>M</strong>, where <strong>A*modifies *M</strong> and <strong>B</strong> takes its
 value, if there are memory_order_seq_cst fences <strong>X</strong> and <strong>Y</strong> such that
 <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>X</strong>
 precedes <strong>Y</strong> in <strong>S</strong>, then <strong>B</strong> observes either the effects of <strong>A</strong> or a
 later modification of <strong>M</strong> in its modification order. [C11 standard,
 Section 7.17.3, paragraph 11]
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 For atomic operations <strong>A</strong>
 and <strong>B</strong> on an atomic object <strong>M</strong>, if there are memory_order_seq_cst
 fences <strong>X</strong> and <strong>Y</strong> such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>Y</strong> is
 sequenced-before <strong>B</strong>, and <strong>X</strong> precedes <strong>Y</strong> in <strong>S</strong>, then <strong>B</strong> occurs later
 than <strong>A</strong> in the modification order of <strong>M</strong>.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">memory_order_seq_cst ensures sequential consistency only for a
 program that is (1) free of data races, and (2) exclusively uses
 memory_order_seq_cst synchronization operations. Any use of weaker
 ordering will invalidate this guarantee unless extreme care is used. In
 particular, memory_order_seq_cst fences ensure a total order only for
 the fences themselves. Fences cannot, in general, be used to restore
 sequential consistency for atomic operations with weaker ordering
 specifications.
 <br>
 <br>
 Atomic read-modify-write operations should always read the last value
 (in the modification order) stored before the write associated with the
 read-modify-write operation. [C11 standard, Section 7.17.3, paragraph
 12]</td>
 </tr></table>
 </div>
 <div class="paragraph"><p><span class="underline">Implementations should ensure that no "out-of-thin-air" values are computed that circularly depend on their own computation.</span></p></div>
 <div class="paragraph"><p>Note: Under the rules described above, and independent to the previously
 footnoted C++ issue, it is known that <em>x == y == 42</em> is a valid final
 state in the following problematic example:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>global atomic_int x = ATOMIC_VAR_INIT(0);

 local atomic_int y = ATOMIC_VAR_INIT(0);


 unit_of_execution_1:

 ... [execution not reading or writing x or y, leading up to:]

 int t = atomic_load_explicit(&amp;y, memory_order_acquire);

 atomic_store_explicit(&amp;x, t, memory_order_release);


 unit_of_execution_2:

 ... [execution not reading or writing x or y, leading up to:]

 int t = atomic_load_explicit(&amp;x, memory_order_acquire);

 atomic_store_explicit(&amp;y, t,
 memory_order_release);link:#_msocom_6[[BA6]] </pre>
 </div></div>
 <div class="paragraph"><p>This is not useful behavior and implementations should not exploit this
 phenomenon. It should be expected that in the future this may be
 disallowed by appropriate updates to the memory model description by the
 OpenCL committee.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Implementations should make atomic stores visible to atomic loads within
 a reasonable amount of time. [C11 standard, Section 7.17.3, paragraph
 16]</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>As long as the following conditions are met, a host program sharing SVM
 memory with a kernel executing on one or more OpenCL devices may use
 atomic and synchronization operations to ensure that its assignments,
 and those of the kernel, are visible to each other:</p></div>
 <div class="olist arabic"><ol class="arabic">
 <li>
 <p>
 Either fine-grained buffer or fine-grained system SVM must be
 used to share memory. While coarse-grained buffer SVM allocations may
 support atomic operations, visibility on these allocations is not
 guaranteed except at map and unmap operations.
 </p>
 </li>
 <li>
 <p>
 The optional OpenCL 2.0 SVM atomic-controlled visibility
 specified by provision of the CL_MEM_SVM_ATOMICS flag must be supported
 by the device and the flag provided to the SVM buffer on allocation.
 </p>
 </li>
 <li>
 <p>
 The host atomic and synchronization operations must be
 compatible with those of an OpenCL kernel language. This
 requires that the size and representation of the data types that the
 host atomic operations act on be consistent with the OpenCL kernel language atomic
 types.
 </p>
 </li>
 </ol></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>If these conditions are met, the host operations will apply at
 all_svm_devices scope.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_memory_ordering_rules_fence_operations">Memory Ordering Rules: Fence Operations</h5>
 <div class="paragraph"><p>This section describes how the OpenCL 2.0 fence operations contribute to
 the local- and global-happens-before relations.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Earlier, we introduced synchronization primitives called fences. Fences
 can utilize the acquire memory_order, release memory_order, or both. A
 fence with acquire semantics is called an acquire fence; a fence with
 release semantics is called a release fence.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A global release fence <strong>A</strong> global-synchronizes-with a global acquire
 fence <strong>B</strong> if there exist atomic operations <strong>X</strong> and <strong>Y</strong>, both operating
 on some global atomic object <strong>M</strong>, such that <strong>A</strong> is sequenced-before <strong>X</strong>,
 <strong>X</strong> modifies <strong>M</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, <strong>Y</strong> reads the value
 written by <strong>X</strong> or a value written by any side effect in the hypothetical
 release sequence <strong>X</strong> would head if it were a release operation, and that
 the scopes of <strong>A</strong>, <strong>B</strong> are inclusive. [C11 standard, Section 7.17.4,
 paragraph 2, modified.]</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A global release fence <strong>A</strong> global-synchronizes-with an atomic operation
 <strong>B</strong> that performs an acquire operation on a global atomic object <strong>M</strong> if
 there exists an atomic operation <strong>X</strong> such that <strong>A</strong> is sequenced-before
 <strong>X</strong>, <strong>X</strong> modifies <strong>M</strong>, <strong>B</strong> reads the value written by <strong>X</strong> or a value
 written by any side effect in the hypothetical release sequence <strong>X</strong>
 would head if it were a release operation, and the scopes of <strong>A</strong> and <strong>B</strong>
 are inclusive. [C11 standard, Section 7.17.4, paragraph 3, modified.]</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>An atomic operation <strong>A</strong> that is a release operation on a global atomic
 object <strong>M</strong> global-synchronizes-with a global acquire fence <strong>B</strong> if there
 exists some atomic operation <strong>X</strong> on <strong>M</strong> such that <strong>X</strong> is
 sequenced-before <strong>B</strong> and reads the value written by <strong>A</strong> or a value
 written by any side effect in the release sequence headed by <strong>A</strong>, and
 the scopes of <strong>A</strong> and <strong>B</strong> are inclusive. [C11 standard, Section 7.17.4,
 paragraph 4, modified.]</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A local release fence <strong>A</strong> local-synchronizes-with a local acquire fence
 <strong>B</strong> if there exist atomic operations <strong>X</strong> and <strong>Y</strong>, both operating on some
 local atomic object <strong>M</strong>, such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong>
 modifies <strong>M</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>Y</strong> reads the value
 written by <strong>X</strong> or a value written by any side effect in the hypothetical
 release sequence <strong>X</strong> would head if it were a</p></div>
 <div class="paragraph"><p>release operation, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive. [C11
 standard, Section 7.17.4, paragraph 2, modified.]</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A local release fence <strong>A</strong> local-synchronizes-with an atomic operation
 <strong>B</strong> that performs an acquire operation on a local atomic object <strong>M</strong> if
 there exists an atomic operation <strong>X</strong> such that <strong>A</strong> is sequenced-before
 <strong>X</strong>, <strong>X</strong> modifies <strong>M</strong>, and <strong>B</strong> reads the value written by <strong>X</strong> or a value
 written by any side effect in the hypothetical release sequence <strong>X</strong>
 would head if it were a release operation, and</p></div>
 <div class="paragraph"><p>the scopes of <strong>A</strong> and <strong>B</strong> are inclusive. [C11 standard, Section 7.17.4,
 paragraph 3, modified.]</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>An atomic operation <strong>A</strong> that is a release operation on a local atomic
 object <strong>M</strong> local-synchronizes-with a local acquire fence <strong>B</strong> if there
 exists some atomic operation <strong>X</strong> on <strong>M</strong> such that <strong>X</strong> is
 sequenced-before <strong>B</strong> and reads the value written by <strong>A</strong> or a value
 written by any side effect in the release sequence headed by <strong>A</strong>, and
 the scopes of <strong>A</strong> and <strong>B</strong> are inclusive. [C11 standard, Section 7.17.4,
 paragraph 4, modified.]</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Let <strong>X</strong> and <strong>Y</strong> be two work item fences that each have both the
 CLK_GLOBAL_MEM_FENCE and CLK_LOCAL_MEM_FENCE flags set. <strong>X</strong>
 global-synchronizes-with <strong>Y</strong> and <strong>X</strong> local synchronizes with <strong>Y</strong> if the
 conditions required for <strong>X</strong> to global-synchronize with <strong>Y</strong> are met, the
 conditions required for <strong>X</strong> to local-synchronize-with <strong>Y</strong> are met, or
 both sets of conditions are met.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_memory_ordering_rules_work_group_functions">Memory Ordering Rules: Work-group Functions</h5>
 <div class="paragraph"><p>The OpenCL kernel execution model includes collective operations across
 the work-items within a single work-group. These are called work-group
 functions. Besides the work-group barrier function, they include the
 scan, reduction and pipe work-group functions described in the SPIR-V IL
 specifications . We will first discuss the work-group barrier. The other
 work-group functions are discussed afterwards.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The barrier function provides a mechanism for a kernel to synchronize
 the work-items within a single work-group: informally, each work-item of
 the work-group must execute the barrier before any are allowed to
 proceed. It also orders memory operations to a specified combination of
 one or more address spaces such as local memory or global memory, in a
 similar manner to a fence.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>To precisely specify the memory ordering semantics for barrier, we need
 to distinguish between a dynamic and a static instance of the call to a
 barrier. A call to a barrier can appear in a loop, for example, and each
 execution of the same static barrier call results in a new dynamic
 instance of the barrier that will independently synchronize a
 work-groups work-items.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A work-item executing a dynamic instance of a barrier results in two
 operations, both fences, that are called the entry and exit fences.
 These fences obey all the rules for fences specified elsewhere in this
 chapter as well as the following:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The entry fence is a
 release fence with the same flags and scope as requested for the
 barrier.
 </p>
 </li>
 <li>
 <p>
 The exit fence is an
 acquire fence with the same flags and scope as requested for the
 barrier.
 </p>
 </li>
 <li>
 <p>
 For each work-item the
 entry fence is sequenced before the exit fence.
 </p>
 </li>
 <li>
 <p>
 If the flags have
 CLK_GLOBAL_MEM_FENCE set then for each work-item the entry fence
 global-synchronizes-with the exit fence of all other work-items in the
 same work-group.
 </p>
 </li>
 <li>
 <p>
 If the flags have
 CLK_LOCAL_MEM_FENCE set then for each work-item the entry fence
 local-synchronizes-with the exit fence of all other work-items in the
 same work-group.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The other work-group functions include such functions as
 work_group_all() and work_group_broadcast() and are described in the
 kernel language and IL specifications. The use of these work-group
 functions implies sequenced-before relationships between statements
 within the execution of a single work-item in order to satisfy data
 dependencies. For example, a work item that provides a value to a
 work-group function must behave as if it generates that value before
 beginning execution of that work-group function. Furthermore, the
 programmer must ensure that all work items in a work group must execute
 the same work-group function call site, or dynamic work-group function
 instance.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_memory_ordering_rules_sub_group_functions">Memory Ordering Rules: Sub-group Functions</h5>
 <div class="paragraph"><p>The OpenCL kernel execution model includes collective operations across
 the work-items within a single sub-group. These are called sub-group
 functions. Besides the sub-group-barrier function, they include the
 scan, reduction and pipe sub-group functions described in the SPIR-V IL
 specification. We will first discuss the sub-group barrier. The other
 sub-group functions are discussed afterwards.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The barrier function provides a mechanism for a kernel to synchronize
 the work-items within a single sub-group: informally, each work-item of
 the sub-group must execute the barrier before any are allowed to
 proceed. It also orders memory operations to a specified combination of
 one or more address spaces such as local memory or global memory, in a
 similar manner to a fence.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>To precisely specify the memory ordering semantics for barrier, we need
 to distinguish between a dynamic and a static instance of the call to a
 barrier. A call to a barrier can appear in a loop, for example, and each
 execution of the same static barrier call results in a new dynamic
 instance of the barrier that will independently synchronize a
 sub-groups work-items.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A work-item executing a dynamic instance of a barrier results in two
 operations, both fences, that are called the entry and exit fences.
 These fences obey all the rules for fences specified elsewhere in this
 chapter as well as the following:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The entry fence is a
 release fence with the same flags and scope as requested for the
 barrier.
 </p>
 </li>
 <li>
 <p>
 The exit fence is an
 acquire fence with the same flags and scope as requested for the
 barrier.
 </p>
 </li>
 <li>
 <p>
 For each work-item the
 entry fence is sequenced before the exit fence.
 </p>
 </li>
 <li>
 <p>
 If the flags have
 CLK_GLOBAL_MEM_FENCE set then for each work-item the entry fence
 global-synchronizes-with the exit fence of all other work-items in the
 same sub-group.
 </p>
 </li>
 <li>
 <p>
 If the flags have
 CLK_LOCAL_MEM_FENCE set then for each work-item the entry fence
 local-synchronizes-with the exit fence of all other work-items in the
 same sub-group.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The other sub-group functions include such functions as sub_group_all()
 and sub_group_broadcast() and are described in OpenCL kernel
 languages specifications. The use of these sub-group functions
 implies sequenced-before relationships between statements within the
 execution of a single work-item in order to satisfy data dependencies.
 For example, a work item that provides a value to a sub-group function
 must behave as if it generates that value before beginning execution of
 that sub-group function. Furthermore, the programmer must ensure that
 all work items in a sub-group must execute the same sub-group function
 call site, or dynamic sub-group function instance.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_memory_ordering_rules_host_side_and_device_side_commands">Memory Ordering Rules: Host-side and Device-side Commands</h5>
 <div class="paragraph"><p>This section describes how the OpenCL API functions associated with
 command-queues contribute to happens-before relations. There are two
 types of command queues and associated API functions in OpenCL 2.0;
 <em>host command-queues</em> and <em>device command-queues</em>. The interaction of
 these command queues with the memory model are for the most part
 equivalent. In a few cases, the rules only applies to the host
 command-queue. We will indicate these special cases by specifically
 denoting the host command-queue in the memory ordering rule. SVM memory
 consistency in such instances is implied only with respect to
 synchronizing host commands.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Memory ordering rules in this section apply to all memory objects
 (buffers, images and pipes) as well as to SVM allocations where no
 earlier, and more fine-grained, rules apply.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In the remainder of this section, we assume that each command <strong>C</strong>
 enqueued onto a command-queue has an associated event object <strong>E</strong> that
 signals its execution status, regardless of whether <strong>E*was returned to
 the unit of execution that enqueued *C</strong>. We also distinguish between
 the API function call that enqueues a command <strong>C</strong> and creates an event
 <strong>E</strong>, the execution of <strong>C</strong>, and the completion of <strong>C</strong>(which marks the
 event <strong>E</strong> as complete).</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The ordering and synchronization rules for API commands are defined as
 following:</p></div>
 <div class="olist arabic"><ol class="arabic">
 <li>
 <p>
 If an API function call <strong>X</strong> enqueues a command <strong>C</strong>, then <strong>X</strong>
 global-synchronizes-with <strong>C</strong>. For example, a host API function to
 enqueue a kernel global-synchronizes-with the start of that
 kernel-instances execution, so that memory updates sequenced-before the
 enqueue kernel function call will global-happen-before any kernel reads
 or writes to those same memory locations. For a device-side enqueue,
 global memory updates sequenced before <strong>X</strong> happens-before <strong>C</strong> reads or
 writes to those memory locations only in the case of fine-grained SVM.
 </p>
 </li>
 <li>
 <p>
 If <strong>E</strong> is an event upon which a command <strong>C</strong> waits, then <strong>E</strong>
 global-synchronizes-with <strong>C</strong>. In particular, if <strong>C</strong> waits on an event
 <strong>E</strong> that is tracking the execution status of the command <strong>C1</strong>, then
 memory operations done by <strong>C1</strong> will global-happen-before memory
 operations done by <strong>C</strong>. As an example, assume we have an OpenCL program
 using coarse-grain SVM sharing that enqueues a kernel to a host
 command-queue to manipulate the contents of a region of a buffer that
 the host thread then accesses after the kernel completes. To do this,
 the host thread can call clEnqueueMapBuffer to enqueue a blocking-mode
 map command to map that buffer region, specifying that the map command
 must wait on an event signaling the kernels completion. When
 clEnqueueMapBuffer returns, any memory operations performed by the
 kernel to that buffer region will global- happen-before subsequent
 memory operations made by the host thread.
 </p>
 </li>
 <li>
 <p>
 If a command <strong>C</strong> has an event <strong>E</strong> that signals its completion,
 then <strong>C</strong> global- synchronizes-with <strong>E</strong>.
 </p>
 </li>
 <li>
 <p>
 For a command <strong>C</strong> enqueued to a host-side command queue, if <strong>C</strong>
 has an event <strong>E</strong> that signals its completion, then <strong>E</strong> global-
 synchronizes-with an API call <strong>X</strong> that waits on <strong>E</strong>. For example, if a
 host thread or kernel-instance calls the wait-for-events function on
 <strong>E</strong>(e.g. the clWaitForEvents function called from a host thread)<strong>,*then
 *E</strong> global-synchronizes-with that wait-for-events function call.
 </p>
 </li>
 <li>
 <p>
 If commands <strong>C</strong> and <strong>C1</strong> are enqueued in that sequence onto an
 in-order command-queue, then the event (including the event implied
 between <strong>C</strong> and <strong>C1*due to the in-order queue) signaling *C*s
 completion global-synchronizes-with *C1</strong>. Note that in OpenCL 2.0, only
 a host command-queue can be configured as an in-order queue.
 </p>
 </li>
 <li>
 <p>
 If an API call enqueues a marker command <strong>C</strong> with an empty list
 of events upon which <strong>C</strong> should wait, then the events of all commands
 enqueued prior to <strong>C</strong> in the command-queue global-synchronize-with*C*.
 </p>
 </li>
 <li>
 <p>
 If a host API call enqueues a command-queue barrier command <strong>C</strong>
 with an empty list of events on which <strong>C</strong> should wait, then the events
 of all commands enqueued prior to <strong>C</strong> in the command-queue
 global-synchronize-with <strong>C</strong>. In addition, the event signaling the
 completion of <strong>C</strong> global-synchronizes-with all commands enqueued after
 <strong>C</strong> in the command-queue.
 </p>
 </li>
 <li>
 <p>
 If a host thread executes a clFinish call <strong>X</strong>, then the events
 of all commands enqueued prior to <strong>X</strong> in the command-queue
 global-synchronizes-with <strong>X</strong>.
 </p>
 </li>
 <li>
 <p>
 The start of a kernel-instance <strong>K</strong> global-synchronizes-with all
 operations in the work items of <strong>K</strong>. Note that this includes the
 execution of any atomic operations by the work items in a program using
 fine-grain SVM.
 </p>
 </li>
 <li>
 <p>
 All operations of all work items of a kernel-instance <strong>K</strong>
 global-synchronizes-with the event signaling the completion of <strong>K</strong>. Note
 that this also includes the execution of any atomic operations by the
 work items in a program using fine-grain SVM.
 </p>
 </li>
 <li>
 <p>
 If a callback procedure <strong>P</strong> is registered on an event <strong>E</strong>, then <strong>E</strong>
 global-synchronizes-with all operations of <strong>P</strong>. Note that callback
 procedures are only defined for commands within host command-queues.
 </p>
 </li>
 <li>
 <p>
 If <strong>C</strong> is a command that waits for an event <strong>E</strong>'s completion, and
 API function call <strong>X</strong> sets the status of a user event <strong>E</strong>'s status to
 CL_COMPLETE (for example, from a host thread using a
 clSetUserEventStatus function), then <strong>X</strong> global-synchronizes-with <strong>C</strong>.
 </p>
 </li>
 <li>
 <p>
 If a device enqueues a command <strong>C</strong> with the
 CLK_ENQUEUE_FLAGS_WAIT_KERNEL flag, then the end state of the parent
 kernel instance global-synchronizes with <strong>C</strong>.
 </p>
 </li>
 <li>
 <p>
 If a work-group enqueues a command <strong>C</strong> with the
 CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP flag, then the end state of the
 work-group global-synchronizes with <strong>C</strong>.
 </p>
 </li>
 </ol></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>When using an out-of-order command queue, a wait on an event or a marker
 or command-queue barrier command can be used to ensure the correct
 ordering of dependent commands. In those cases, the wait for the event
 or the marker or barrier command will provide the necessary
 global-synchronizes-with relation.</p></div>
 <div class="paragraph"><p>In this situation:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 access to shared locations
 or disjoint locations in a single cl_mem object when using atomic
 operations from different kernel instances enqueued from the host such
 that one or more of the atomic operations is a write is
 implementation-defined and correct behavior is not guaranteed except
 at synchronization points.
 </p>
 </li>
 <li>
 <p>
 access to shared locations
 or disjoint locations in a single cl_mem object when using atomic
 operations from different kernel instances consisting of a parent kernel
 and any number of child kernels enqueued by that kernel is guaranteed
 under the memory ordering rules described earlier in this section.
 </p>
 </li>
 <li>
 <p>
 access to shared locations
 or disjoint locations in a single program scope global variable,
 coarse-grained SVM allocation or fine-grained SVM allocation when using
 atomic operations from different kernel instances enqueued from the host
 to a single device is guaranteed under the memory ordering rules
 described earlier in this section.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>If fine-grain SVM is used but without support for the OpenCL 2.0 atomic
 operations, then the host and devices can concurrently read the same
 memory locations and can concurrently update non-overlapping memory
 regions, but attempts to update the same memory locations are
 undefined. Memory consistency is guaranteed at the OpenCL
 synchronization points without the need for calls to clEnqueueMapBuffer
 and clEnqueueUnmapMemObject. For fine-grained SVM buffers it is
 guaranteed that at synchronization points only values written by the
 kernel will be updated. No writes to fine-grained SVM buffers can be
 introduced that were not in the original program.
  </p></div>
 <div class="paragraph"><p>In the remainder of this section, we discuss a few points regarding the
 ordering rules for commands with a host command queue.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The OpenCL 1.2 standard describes a synchronization point as a
 kernel-instance or host program location where the contents of memory
 visible to different work-items or command-queue commands are the same.
 It also says that waiting on an event and a command-queue barrier are
 synchronization points between commands in command- queues. Four of the
 rules listed above (2, 4, 7, and 8) cover these OpenCL synchronization
 points.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>A map operation (clEnqueueMapBuffer or clEnqueueMapImage) performed on a
 non-SVM buffer or a coarse-grained SVM buffer is allowed to overwrite
 the entire target region with the latest runtime view of the data as
 seen by the command with which the map operation synchronizes, whether
 the values were written by the executing kernels or not. Any values
 that were changed within this region by another kernel or host thread
 while the kernel synchronizing with the map operation was executing may
 be overwritten by the map operation.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Access to non-SVM cl_mem buffers and coarse-grained SVM allocations is
 ordered at synchronization points between host commands. In the presence
 of an out-of-order command queue or a set of command queues mapped to
 the same device, multiple kernel instances may execute concurrently on
 the same device.</p></div>
 </div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_the_opencl_framework">3.4. The OpenCL Framework</h3>
 <div class="paragraph"><p>The OpenCL framework allows applications to use a host and one or more
 OpenCL devices as a single heterogeneous parallel computer system. The
 framework contains the following components:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 OpenCL Platform layer: The
 platform layer allows the host program to discover OpenCL devices and
 their capabilities and to create contexts.
 </p>
 </li>
 <li>
 <p>
 OpenCL Runtime: The
 runtime allows the host program to manipulate contexts once they have
 been created.
 </p>
 </li>
 <li>
 <p>
 OpenCL Compiler: The
 OpenCL compiler creates program executables that contain OpenCL kernels.
 SPIR-V intermediate language, OpenCL C, OpenCL C++, and OpenCL C language versions from earlier
 OpenCL specifications are supported by the compiler. Other input
 languages may be supported by some implementations.
 </p>
 </li>
 </ul></div>
 <div class="sect3">
 <h4 id="_opencl_framework_mixed_version_support">3.4.1. OpenCL Framework: Mixed Version Support</h4>
 <div class="paragraph"><p>OpenCL supports devices with different capabilities under a single
 platform. This includes devices which conform to different versions of
 the OpenCL specification. There are three version identifiers to
 consider for an OpenCL system: the platform version, the version of a
 device, and the version(s) of the kernel language or IL supported on a
 device.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The platform version indicates the version of the OpenCL runtime that is
 supported. This includes all of the APIs that the host can use to
 interact with resources exposed by the OpenCL runtime; including
 contexts, memory objects, devices, and command queues.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The device version is an indication of the device&#8217;s capabilities
 separate from the runtime and compiler as represented by the device info
 returned by <strong>clGetDeviceInfo</strong>. Examples of attributes associated with
 the device version are resource limits (e.g., minimum size of local
 memory per compute unit) and extended functionality (e.g., list of
 supported KHR extensions). The version returned corresponds to the
 highest version of the OpenCL specification for which the device is
 conformant, but is not higher than the platform version.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The language version for a device represents the OpenCL programming
 language features a developer can assume are supported on a given
 device. The version reported is the highest version of the language
 supported.</p></div>
 <div class="paragraph"><p>Backwards compatibility is an important goal for the OpenCL standard.
 Backwards compatibility is expected such that a device will consume
 earlier versions of the SPIR-V and OpenCL C programming languages with
 the following minimum requirements:</p></div>
 <div class="olist arabic"><ol class="arabic">
 <li>
 <p>
 An OpenCL 1.x device must support at least one 1.x version of
 the OpenCL C programming language.
 </p>
 </li>
 <li>
 <p>
 An OpenCL 2.0 device must support all the requirements of an
 OpenCL 1.x device in addition to the OpenCL C 2.0 programming language.
 If multiple language versions are supported, the compiler defaults to
 using the highest OpenCL 1.x language version supported for the device
 (typically OpenCL 1.2). To utilize the OpenCL 2.0 Kernel programming
 language, a programmer must specifically set the appropriate compiler
 flag (-cl-std=CL2.0). The language version must not be higher than the
 platform version, but may exceed the device version (see section
 5.8.4.5).
 </p>
 </li>
 <li>
 <p>
 An OpenCL 2.1 device must support all the requirements of an
 OpenCL 2.0 device in addition to the SPIR-V intermediate language at
 version 1.0 or above. Intermediate language versioning is encoded as
 part of the binary object and no flags are required to be passed to the
 compiler.
 </p>
 </li>
 <li>
 <p>
 An OpenCL 2.2 device must support all the requirements of an
 OpenCL 2.0 device in addition to the SPIR-V intermediate language at
 version 1.2 or above. Intermediate language
 is encoded as a part of the binary
 object and no flags are required to be passed to the compiler.
 </p>
 </li>
 </ol></div>
 </div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_the_opencl_platform_layer">4. The OpenCL Platform Layer</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>This section describes the OpenCL platform layer which implements
 platform-specific features that allow applications to query OpenCL
 devices, device configuration information, and to create OpenCL contexts
 using one or more devices.</p></div>
 <div class="sect2">
 <h3 id="_querying_platform_info">4.1. Querying Platform Info</h3>
 <div class="paragraph"><p>The list of platforms available can be obtained using the following
 function.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetPlatformIDs(cl_uint num_entries,
                         cl_platform_id *platforms,
                         cl_uint *num_platforms)</pre>
 </div></div>
 <div class="paragraph"><p><em>num_entries</em> is the number of cl_platform_id entries that can be added
 to <em>platforms._If _platforms</em> is not NULL, the <em>num_entries</em> must be
 greater than zero.
 <br>
 <br>
 <em>platforms</em> returns a list of OpenCL platforms found. The
 cl_platform_id values returned in <em>platforms_can be used to identify a
 specific OpenCL platform. If _platforms</em> argument is NULL, this
 argument is ignored. The number of OpenCL platforms returned is the
 minimum of the value specified by <em>num_entries</em> or the number of OpenCL
 platforms available.
 <br>
 <br>
 <em>num_platforms</em> returns the number of OpenCL platforms available. If
 <em>num_platforms</em> is NULL, this argument is ignored.
 <br>
 <br>
 <strong>clGetPlatformIDs</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>num_entries</em> is equal to zero and <em>platforms</em> is not NULL or if both
 <em>num_platforms</em> and <em>platforms</em> are NULL.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetPlatformInfo(cl_platform_id platform,
                          cl_platform_info param_name,
                          size_t param_value_size,
                          void *param_value,
                          size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>gets specific information about the OpenCL platform. The information
 that can be queried using <strong>clGetPlatformInfo</strong> is specified in <em>table
 4.1</em>.
 <br>
 <br>
 <em>platform</em> refers to the platform ID returned by <strong>clGetPlatformIDs</strong> or
 can be NULL. If <em>platform</em> is NULL, the behavior is
 implementation-defined.
 <br>
 <br>
 <em>param_name</em> is an enumeration constant that identifies the platform
 information being queried. It can be one of the following values as
 specified in <em>table 4.1</em>.
 <br>
 <br>
 <em>param_value</em> is a pointer to memory location where appropriate values
 for a given <em>param_name</em> as specified in <em>table</em> <em>4.1</em> will be
 returned. If <em>param_value</em> is NULL, it is ignored.
 <br>
 <br>
 <em>param_value_size</em> specifies the size in bytes of memory pointed to by
 <em>param_value</em>. This size in bytes must be &gt;= size of return type
 specified in <em>table 4.1.</em>
 <br>
 <br>
 <em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 2. <em>OpenCL Platform Queries</em></caption>
 <col style="width:50%;">
 <col style="width:10%;">
 <col style="width:40%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_platform_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PLATFORM_PROFILE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]<span class="footnote"><br>[A null terminated string is returned by OpenCL query function calls if the return type of the information being queried is a char[].]<br></span></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">OpenCL profile string. Returns the
 profile name supported by the
 implementation. The profile name
 returned can be one of the following
 strings:
 <br>
 <br>
 FULL_PROFILE – if the implementation
 supports the OpenCL specification
 (functionality defined as part of the core
 specification and does not require any
 extensions to be supported).
 <br>
 <br>
 EMBEDDED_PROFILE - if the
 CL_PLATFORM_VERSION
 char[]
 implementation supports the OpenCL
 embedded profile. The embedded profile
 is defined to be a subset for each version
 of OpenCL. The embedded profile for
 OpenCL 2.2 is described in <em>section 7</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PLATFORM_VERSION</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">OpenCL version string. Returns the
 OpenCL version supported by the
 implementation. This version string has
 the following format:
 <br>
 <br>
 <em>OpenCL&lt;space&gt;&lt;major_version.minor_
 version&gt;&lt;space&gt;&lt;platform-specific
 information&gt;</em>
 <br>
 <br>
 The <em>major_version.minor_version</em> value
 returned will be 2.2.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PLATFORM_NAME</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Platform name string.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PLATFORM_VENDOR</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Platform vendor string.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PLATFORM_EXTENSIONS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns a space separated list of extension
 names (the extension names themselves
 do not contain any spaces) supported by
 the platform. Each extension that is
 supported by all devices associated with
 this platform must be reported here.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PLATFORM_HOST_TIMER_RESOLUTION</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the resolution
 of the host timer in nanoseconds as used by <strong>clGetDeviceAndHostTimer</strong>.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetPlatformInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following
 errors.<span class="footnote"><br>[The OpenCL specification does not describe the order of precedence for error codes returned by API calls.]<br></span>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_PLATFORM if
 <em>platform</em> is not a valid platform.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>param_name</em> is not one of the supported values or if size in bytes
 specified by <em>param_value_size</em> is &lt; size of return type as specified in
 <em>table 4.1</em> and <em>param_value</em> is not a NULL value.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect2">
 <h3 id="_querying_devices">4.2. Querying Devices</h3>
 <div class="paragraph"><p>The list of devices available on a platform can be obtained using the
 following function.<span class="footnote"><br>[<strong>clGetDeviceIDs</strong> may returnal all or a subset of the actual physical devices present in the platform and that maths <em>device_type</em>]<br></span>.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetDeviceIDs(cl_platform_id platform,
                       cl_device_type device_type,
                       cl_uint num_entries,
                       cl_device_id * devices,
                       cl_uint *num_devices)</pre>
 </div></div>
 <div class="paragraph"><p><em>platform</em> refers to the platform ID returned by <strong>clGetPlatformIDs</strong> or
 can be NULL. If <em>platform</em> is NULL, the behavior is
 implementation-defined.</p></div>
 <div class="paragraph"><p><em>device_type</em> is a bitfield that identifies the type of OpenCL device.
 The <em>device_type</em> can be used to query specific OpenCL devices or all
 OpenCL devices available. The valid values for <em>device_type</em> are
 specified in <em>table 4.2</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_device_type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_TYPE_CPU</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">An OpenCL device that is the host processor. The
 host processor runs the OpenCL implementations and is a single or
 multi-core CPU.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_TYPE_GPU</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">An OpenCL device that is a GPU. By this we mean
 that the device can also be used to accelerate a 3D API such as OpenGL
 or DirectX.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_TYPE_ACCELERATOR</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Dedicated OpenCL accelerators (for
 example the IBM CELL Blade). These devices communicate with the host
 processor using a peripheral interconnect such as PCIe.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_TYPE_CUSTOM</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Dedicated accelerators that do not support
 programs written in an OpenCL kernel language,</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_TYPE_DEFAULT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The default OpenCL device in the system.
 The default device cannot be a <strong>CL_DEVICE_TYPE_CUSTOM</strong> device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_TYPE_ALL</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">All OpenCL devices available in the system except
 <strong>CL_DEVICE_TYPE_CUSTOM</strong> devices..</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><em>num_entries</em> is the number of cl_device_id entries that can be added to
 <em>devices._If _devices</em> is not NULL, the <em>num_entries</em> must be greater
 than zero.
 <br>
 <br>
 <em>devices</em> returns a list of OpenCL devices found. The cl_device_id
 values returned in <em>devices_can be used to identify a specific OpenCL
 device. If _devices</em> argument is NULL, this argument is ignored. The
 number of OpenCL devices returned is the minimum of the value specified
 by <em>num_entries</em> or the number of OpenCL devices whose type matches
 <em>device_type</em>.
 <br>
 <br>
 <em>num_devices</em> returns the number of OpenCL devices available that match
 <em>device_type</em>. If <em>num_devices</em> is NULL, this argument is ignored.</p></div>
 <div class="paragraph"><p><strong>clGetDeviceIDs</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_PLATFORM if
 <em>platform</em> is not a valid platform.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_DEVICE_TYPE if
 <em>device_type</em> is not a valid value.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>num_entries</em> is equal to zero and <em>devices</em> is not NULL or if both
 <em>num_devices</em> and <em>devices</em> are NULL.
 </p>
 </li>
 <li>
 <p>
 CL_DEVICE_NOT_FOUND if no
 OpenCL devices that matched <em>device_type</em> were found.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The application can query specific capabilities of the OpenCL device(s)
 returned by <strong>clGetDeviceIDs</strong>. This can be used by the application to
 determine which device(s) to use.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetDeviceInfo(cl_device_id device,
                        cl_device_info param_name,
                        size_t param_value_size,
                        void *param_value,
                        size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>gets specific information about an OpenCL device.
 <br>
 <br>
 <em>device</em> may be a device returned by <strong>clGetDeviceIDs</strong> or a sub-device
 created by <strong>clCreateSubDevices</strong>. If <em>device</em> is a sub-device, the
 specific information for the sub-device will be returned. The
 information that can be queried using <strong>clGetDeviceInfo</strong> is specified in
 <em>table 4.3</em>.
 <br>
 <br>
 <em>param_name</em> is an enumeration constant that identifies the device
 information being queried. It can be one of the following values as
 specified in <em>table 4.3</em>.
 <br>
 <br>
 <em>param_value</em> is a pointer to memory location where appropriate values
 for a given <em>param_name</em> as specified in <em>table</em> <em>4.3</em> will be
 returned. If <em>param_value</em> is NULL, it is ignored.
 <br>
 <br>
 <em>param_value_size</em> specifies the size in bytes of memory pointed to by
 <em>param_value</em>. This size in bytes must be &gt;= size of return type
 specified in <em>table 4.3.</em>
 <br>
 <br>
 <em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 3. <em>OpenCL Device Queries</em></caption>
 <col style="width:30%;">
 <col style="width:20%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_device_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_TYPE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_type</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The OpenCL device type. Currently supported values are:
 <br>
 <br>
 CL_DEVICE_TYPE_CPU,
 CL_DEVICE_TYPE_GPU, CL_DEVICE_TYPE_ACCELERATOR, CL_DEVICE_TYPE_DEFAULT, a combination of the above types or
 CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_VENDOR_ID</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A unique device vendor identifier. An example of a unique device identifier could be the PCIe ID.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ COMPUTE_UNITS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The number of parallel compute
 units on the OpenCL device. A work-group executes on a single compute
 unit. The minimum value is 1.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEIVCE_MAX_ WORK_ITEM_DIMENSIONS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum dimensions that specify the global and
 local work-item IDs used by the data parallel execution model. (Refer to
 <strong>clEnqueueNDRangeKernel</strong>). The minimum value is 3 for devices that are
 not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ WORK_ITEM_SIZES</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t []</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum number of work-items that can
 be specified in each dimension of the work-group to <strong>clEnqueueNDRangeKernel</strong>.
 <br>
 <br>
 Returns n size_t entries, where n is the
 value returned by the query for
 CL_DEVICE_MAX_WORK_ITEM_DIMEN
 SIONS.
 <br>
 <br>
 The minimum value is (1, 1, 1) for devices
 that are not of type
 CL_DEVICE_TYPE_CUSTOM .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ WORK_GROUP_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum number of work-items in a
 work-group that a device is capable of
 executing on a single compute unit, for any
 given kernel-instance running on the
 device. (Refer also to
 <strong>clEnqueueNDRangeKernel and CL_KERNEL_WORK_GROUP_SIZE</strong> ). The minimum value is 1.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PREFERRED_ VECTOR_WIDTH_CHAR
 <br>
 <br>
 CL_DEVICE_PREFERRED_
 VECTOR_WIDTH_SHORT
 <br>
 <br>
 CL_DEVICE_PREFERRED_
 VECTOR_WIDTH_INT
 <br>
 <br>
 CL_DEVICE_PREFERRED_
 VECTOR_WIDTH_LONG
 <br>
 <br>
 CL_DEVICE_PREFERRED_
 VECTOR_WIDTH_FLOAT
 <br>
 <br>
 CL_DEVICE_PREFERRED_
 VECTOR_WIDTH_DOUBLE
 <br>
 <br>
 CL_DEVICE_PREFERRED_
 VECTOR_WIDTH_HALF</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Preferred native vector width size for built-
 in scalar types that can be put into vectors.
 The vector width is defined as the number
 of scalar elements that can be stored in the
 vector.
 <br>
 <br>
 If double precision is not supported,CL_DEVICE_PREFERRED_VECTOR_WIDTH_
 DOUBLE must return 0.
 <br>
 <br>
 If the <strong>cl_khr_fp16</strong> extension is not supported,
 CL_DEVICE_PREFERRED_VECTOR_WIDTH_
 HALF must return 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_NATIVE_ VECTOR_WIDTH_CHAR
 <br>
 <br>
 CL_DEVICE_NATIVE_
 VECTOR_WIDTH_SHORT
 <br>
 <br>
 CL_DEVICE_NATIVE_
 VECTOR_WIDTH_INT
 <br>
 <br>
 CL_DEVICE_NATIVE_
 VECTOR_WIDTH_LONG
 <br>
 <br>
 CL_DEVICE_NATIVE_
 VECTOR_WIDTH_FLOAT
 <br>
 <br>
 CL_DEVICE_NATIVE_
 VECTOR_WIDTH_DOUBLE
 <br>
 <br>
 CL_DEVICE_NATIVE_
 VECTOR_WIDTH_HALF</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the native ISA vector width. The
 vector width is defined as the number of
 scalar elements that can be stored in the
 vector.
 <br>
 <br>
 If double precision is not supported,
 CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE must return 0.
 <br>
 <br>
 If the <strong>cl_khr_fp16</strong> extension is not supported,
 CL_DEVICE_NATIVE_VECTOR_WIDTH_
 HALF must return 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ CLOCK_FREQUENCY</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Clock frequency of the device in MHz. The
 meaning of this value is implementation-
 defined. For devices with multiple clock
 domains, the clock frequency for any of the
 clock domains may be returned. For
 devices that dynamically change frequency
 for power or thermal reasons, the returned
 clock frequency may be any valid
 frequency.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_ADDRESS_BITS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The default compute device address
 space size of the global address space specified as an unsigned integer
 value in bits. Currently supported values are 32 or 64 bits.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ MEM_ALLOC_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max size of memory object
 allocation in bytes. The minimum value is max (min(1024*1024*1024,
 1/4<sup>th</sup> of <strong>CL_DEVICE_GLOBAL_MEM_SIZE</strong>), 32*1024*1024) for devices that
 are not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE_ SUPPORT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_TRUE if images are supported
 by the OpenCL device and CL_FALSE otherwise.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ READ_IMAGE_ARGS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of image objects
 arguments of a kernel declared with the read_only qualifier. The
 minimum value is 128 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ WRITE_IMAGE_ARGS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of image objects
 arguments of a kernel declared with the write_only qualifier. The
 minimum value is 64 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ READ_WRITE_IMAGE_ARGS<span class="footnote"><br>[NOTE: <strong>CL_DEVICE_MAX_WRITE_IMAGE_ARGS</strong> is only there for backward compatibility.
 <strong>CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS</strong> should be used instead.]<br></span></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of image objects arguments
 of a kernel declared with the
 write_only or read_write qualifier.
 The minimum value is 64 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IL_VERSION</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The intermediate languages that can be
 supported by <strong>clCreateProgramWithIL</strong> for this device. Returns a
 space-separated list of IL version strings of the form
 &lt;IL_Prefix&gt;_&lt;Major_Version&gt;.&lt;Minor_Version&gt;. For OpenCL 2.2, SPIR-V is
 a required IL prefix.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE2D_ MAX_WIDTH</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max width of 2D image or 1D image not
 created from a buffer object in pixels.
 <br>
 <br>
 The minimum value is 16384 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE2D_ MAX_HEIGHT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max height of 2D image in pixels.
 <br>
 <br>
 The minimum value is 16384 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE3D_ MAX_WIDTH</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max width of 3D image in pixels.
 <br>
 <br>
 The minimum value is 2048 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE3D_ MAX_HEIGHT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max height of 3D image in pixels.
 <br>
 <br>
 The minimum value is 2048 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE3D_ MAX_DEPTH</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max depth of 3D image in pixels.
 <br>
 <br>
 The minimum value is 2048 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE_ MAX_BUFFER_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of pixels for a 1D image
 created from a buffer object.
 <br>
 <br>
 The minimum value is 65536 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE_ MAX_ARRAY_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of images in a 1D or 2D
 image array.
 <br>
 <br>
 The minimum value is 2048 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ SAMPLERS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum number of samplers that can be
 used in a kernel.
 <br>
 <br>
 The minimum value is 16 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE_ PITCH_ALIGNMENT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The row pitch alignment size in pixels for
 2D images created from a buffer. The
 value returned must be a power of 2.
 <br>
 <br>
 If the device does not support images, this
 value must be 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_IMAGE_ BASE_ADDRESS_ ALIGNMENT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This query should be used when a 2D
 image is created from a buffer which was
 created using CL_MEM_USE_HOST_PTR. The value returned must be a power of 2.
 <br>
 <br>
 This query specifies the minimum
 alignment in pixels of the host_ptr
 specified to <strong>clCreateBuffer</strong>.
 <br>
 <br>
 If the device does not support images, this
 value must be 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ PIPE_ARGS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The maximum number of pipe objects
 that can be passed as arguments to a kernel. The minimum value is 16.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PIPE_ MAX_ACTIVE_RESERVATIONS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The maximum number of reservations that can be
 active for a pipe per work-item in a kernel. A work-group reservation
 is counted as one reservation per work-item. The minimum value is 1.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PIPE_ MAX_PACKET_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The maximum size of pipe
 packet in bytes. The minimum value is 1024 bytes.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ PARAMETER_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max size in bytes of all arguments that can
 be passed to a kernel.
 <br>
 <br>
 The minimum value is 1024 for devices
 that are not of type
 CL_DEVICE_TYPE_CUSTOM . For this
 minimum value, only a maximum of 128
 arguments can be passed to a kernel</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MEM_ BASE_ADDR_ALIGN</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Alignment requirement (in
 bits) for sub-buffer offsets. The minimum value is the size (in bits) of
 the largest OpenCL built-in data type supported by the device (long16 in
 FULL profile,
 long16 or int16 in EMBEDDED profile) for devices that are
 not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_SINGLE_ FP_CONFIG<span class="footnote"><br>[The optional rounding modes should be included as a device capability only if it is supported natively. All explicit
 conversion functions with specific rounding modes must still operate correctly.]<br></span></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_
  fp_config</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes single precision floating-point
 capability of the device. This is a bit-field
 that describes one or more of the following values:
 CL_FP_DENORM – denorms are supported
 <br>
 <br>
 CL_FP_INF_NAN – INF and quiet NaNs are
 supported.
 <br>
 <br>
 CL_FP_ROUND_TO_NEAREST– round to
 nearest even rounding mode supported
 <br>
 <br>
 CL_FP_ROUND_TO_ZERO – round to zero
 rounding mode supported
 <br>
 <br>
 CL_FP_ROUND_TO_INF – round to positive
 and negative infinity rounding modes
 supported
 <br>
 <br>
 CL_FP_FMA – IEEE754-2008 fused multiply-
 add is supported.
 <br>
 <br>
 CL_FP_CORRECTLY_ROUNDED_DIVIDE
 _SQRT – divide and sqrt are correctly rounded
 as defined by the IEEE754 specification.
 <br>
 <br>
 CL_FP_SOFT_FLOAT – Basic floating-point
 operations (such as addition, subtraction,
 multiplication) are implemented in software.
 <br>
 <br>
 For the full profile, the mandated minimum
 floating-point capability for devices that
 are not of type
 CL_DEVICE_TYPE_CUSTOM is:
 CL_FP_ROUND_TO_NEAREST |
 CL_FP_INF_NAN.
 <br>
 <br>
 For the embedded profile, see section 10.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_DOUBLE_ FP_CONFIG<span class="footnote"><br>[The optional rounding modes should be included as a device capability only if it is supported natively. All explicit
 conversion functions with specific rounding modes must still operate correctly.]<br></span></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_
  fp_config</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes double precision floating-point
 capability of the OpenCL device. This is a
 bit-field that describes one or more of the
 following values:
 <br>
 <br>
 CL_FP_DENORM – denorms are supported
 <br>
 <br>
 CL_FP_INF_NAN – INF and NaNs are
 supported.
 <br>
 <br>
 CL_FP_ROUND_TO_NEAREST – round to
 nearest even rounding mode supported.
 <br>
 <br>
 CL_FP_ROUND_TO_ZERO – round to zero
 rounding mode supported.
 <br>
 <br>
 CL_FP_ROUND_TO_INF – round to
 positive and negative infinity rounding
 modes supported.
 <br>
 <br>
 CP_FP_FMA – IEEE754-2008 fused
 multiply-add is supported.
 <br>
 <br>
 CL_FP_SOFT_FLOAT – Basic floating-point
 operations (such as addition, subtraction,
 multiplication) are implemented in software.
 Double precision is an optional feature so
 the mandated minimum double precision
 floating-point capability is 0.
 If double precision is supported by the
 device, then the minimum double precision
 floating-point capability must be:
 CL_FP_FMA |
 CL_FP_ROUND_TO_NEAREST |
 CL_FP_INF_NAN |
 CL_FP_DENORM .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_GLOBAL_ MEM_CACHE_TYPE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_mem_
 cache_type</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Type of global memory cache supported.
 Valid values are:
 CL_NONE,
 CL_READ_ONLY_CACHE and
 CL_READ_WRITE_CACHE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_GLOBAL_ MEM_CACHELINE_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Size of global memory
 cache line in bytes.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_GLOBAL_ MEM_CACHE_
 SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Size of global memory cache in
 bytes.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_GLOBAL_ MEM_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Size of global device memory in
 bytes.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ CONSTANT_BUFFER_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max size in bytes of a constant
 buffer allocation. The minimum value is 64 KB for devices that are not
 of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ CONSTANT_ARGS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of arguments
 declared with the __constant qualifier in a kernel. The minimum value
 is 8 for devices that are not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ GLOBAL_VARIABLE_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The maximum number of bytes of storage
 that may be allocated for any single
 variable in program scope or inside a
 function in an OpenCL kernel language declared in the
 global address space.
 <br>
 <br>
 The minimum value is 64 KB.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_GLOBAL_ VARIABLE_PREFERRED_ TOTAL_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum
 preferred total size, in bytes, of all program variables in the global
 address space. This is a performance hint. An implementation may place
 such variables in storage with optimized device access. This query
 returns the capacity of such storage. The minimum value is 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_LOCAL_ MEM_TYPE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_ local_mem_type</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Type of local memory supported. This can
 be set to CL_LOCAL implying dedicated
 local memory storage such as SRAM , or
 CL_GLOBAL .
 <br>
 <br>
 For custom devices, CL_NONE can also be
 returned indicating no local memory
 support.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_LOCAL_ MEM_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Size of local memory region in
 bytes. The minimum value is 32 KB for devices that are not of type
 CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_ERROR_ CORRECTION_SUPPORT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_TRUE if the device implements
 error correction for all accesses to compute device memory (global and
 constant). Is CL_FALSE if the device does not implement such error
 correction.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PROFILING_ TIMER_RESOLUTION</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes the resolution of device
 timer. This is measured in nanoseconds. Refer to <em>section 5.14</em> for
 details.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_ENDIAN_LITTLE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_TRUE if the OpenCL device is a
 little endian device and CL_FALSE
 otherwise</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_AVAILABLE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_TRUE if the device is available
 and CL_FALSE otherwise. A device is considered to be available if the
 device can be expected to successfully execute commands enqueued to the
 device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_COMPILER_ AVAILABLE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_FALSE if the implementation does
 not have a compiler available to compile
 the program source.
 <br>
 <br>
 Is CL_TRUE if the compiler is available.
 This can be CL_FALSE for the embedded
 platform profile only.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_LINKER_ AVAILABLE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_FALSE if the implementation does
 not have a linker available.
 Is CL_TRUE if the linker is available.
 <br>
 <br>
 This can be CL_FALSE for the embedded
 platform profile only.
 <br>
 <br>
 This must be CL_TRUE if
 CL_DEVICE_COMPILER_AVAILABLE
 is
 CL_TRUE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_EXECUTION_ CAPABILITIES</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_exec_ capabilities</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes the execution capabilities of the
 device. This is a bit-field that describes
 one or more of the following values:
 <br>
 <br>
 CL_EXEC_KERNEL –
 The OpenCL device
 can execute OpenCL kernels.
 <br>
 <br>
 CL_EXEC_NATIVE_KERNEL – The OpenCL
 device can execute native kernels.
 <br>
 <br>
 The mandated minimum capability is:
 CL_EXEC_KERNEL .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_QUEUE_ ON_HOST_PROPERTIES<span class="footnote"><br>[CL_DEVICE_QUEUE_PROPERTIES is deprecated and replaced by
 CL_DEVICE_QUEUE_ON_HOST_PROPERTIES.]<br></span></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_command_ queue_properties</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes the on host command-queue
 properties supported by the device. This is
 a bit-field that describes one or more of the
 following values:
 <br>
 <br>
 CL_QUEUE_OUT_OF_ORDER_EXEC_
 MODE_ENABLE
 <br>
 <br>
 CL_QUEUE_PROFILING_ENABLE
 <br>
 <br>
 These properties are described in table 5.1.
 <br>
 <br>
 The mandated minimum capability is:
 CL_QUEUE_PROFILING_ENABLE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_QUEUE_ ON_DEVICE_PROPERTIES</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_command_ queue_properties</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes the on device command-queue properties supported by the device. This is
 a bit-field that describes one or more of the
 following values:
 <br>
 <br>
 CL_QUEUE_OUT_OF_ORDER_EXEC_
 MODE_ENABLE
 <br>
 <br>
 CL_QUEUE_PROFILING_ENABLE
 <br>
 <br>
 These properties are described in table 5.1.
 The mandated minimum capability is:
 CL_QUEUE_OUT_OF_ORDER_EXEC_
 MODE_ENABLE |
 CL_QUEUE_PROFILING_ENABLE .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_QUEUE_ ON_DEVICE_PREFERRED_ SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The size of the device queue in bytes
 preferred by the implementation.
 Applications should use this size for the
 device queue to ensure good performance.
 <br>
 <br>
 The minimum value is 16 KB</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_QUEUE_ ON_DEVICE_MAX_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The max. size of the device queue in bytes.
 <br>
 <br>
 The minimum value is 256 KB for the full
 profile and 64 KB for the embedded profile</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ ON_DEVICE_QUEUES</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The maximum number of device queues
 that can be created for this device in a
 single context.
 <br>
 <br>
 The minimum value is 1.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ ON_DEVICE_EVENTS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The maximum number of events in use by
 a device queue. These refer to events
 returned by the enqueue_ built-in
 functions to a device queue or user events
 returned by the create_user_event
 built-in function that have not been
 released.
 <br>
 <br>
 The minimum value is 1024.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_BUILT_IN_ KERNELS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A semi-colon separated list of
 built-in kernels supported by the device. An empty string is returned
 if no built-in kernels are supported by the device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PLATFORM</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_platform_id</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The platform associated with this
 device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_NAME</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Device name string.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_VENDOR</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Vendor name string.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DRIVER_VERSION</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">OpenCL software driver version string.
 Follows a vendor-specific format.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PROFILE<span class="footnote"><br>[The platform profile returns the profile that is implemented by the OpenCL framework. If the platform profile
 returned is FULL_PROFILE, the OpenCL framework will support devices that are FULL_PROFILE and may also
 support devices that are EMBEDDED_PROFILE. The compiler must be available for all devices i.e.
 CL_DEVICE_COMPILER_AVAILABLE is CL_TRUE. If the platform profile returned is
 EMBEDDED_PROFILE, then devices that are only EMBEDDED_PROFILE are supported.]<br></span></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">[OpenCL profile string. Returns the profile
 name supported by the device. The profile
 name returned can be one of the following
 strings:
 <br>
 <br>
 FULL_PROFILE – if the device supports
 the OpenCL specification (functionality
 defined as part of the core specification and
 does not require any extensions to be
 supported).
 <br>
 <br>
 EMBEDDED_PROFILE - if the device supports the OpenCL embedded profile.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_VERSION</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">OpenCL version string. Returns the
 OpenCL version supported by the device.
 This version string has the following
 format:
 <br>
 <br>
 <em>OpenCL&lt;space&gt;&lt;major_version.minor_v
 ersion&gt;&lt;space&gt;&lt;vendor-specific
 information&gt;</em>
 <br>
 <br>
 The major_version.minor_version value
 returned will be 2.2.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_OPENCL_C_ VERSION</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">OpenCL C version string. Returns the
 highest OpenCL C version supported by
 the compiler for this device that is not of
 type CL_DEVICE_TYPE_CUSTOM . This
 version string has the following format:
 <br>
 <br>
 <em>OpenCL&lt;space&gt;C&lt;space&gt;&lt;major_versio
 n.minor_version&gt;&lt;space&gt;&lt;vendor-
 specific information&gt;</em>
 <br>
 <br>
 The major_version.minor_version value
 returned must be 2.0 if
 CL_DEVICE_VERSION is OpenCL 2.0.
 <br>
 <br>
 The major_version.minor_version value
 returned must be 1.2 if
 CL_DEVICE_VERSION is OpenCL 1.2.
 <br>
 <br>
 The major_version.minor_version value
 returned must be 1.1 if
 CL_DEVICE_VERSION is OpenCL 1.1.
 <br>
 <br>
 The major_version.minor_version value
 returned can be 1.0 or 1.1 if
 CL_DEVICE_VERSION is OpenCL 1.0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_EXTENSIONS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns a space separated list of extension
 names (the extension names themselves do
 not contain any spaces) supported by the
 device. The list of extension names
 returned can be vendor supported extension
 names and one or more of the following
 Khronos approved extension names:
 <br>
 <br>
 <strong>cl_khr_int64_base_atomics
 cl_khr_int64_extended_atomics
 cl_khr_fp16
 cl_khr_gl_sharing
 cl_khr_gl_event
 cl_khr_d3d10_sharing
 cl_khr_dx9_media_sharing
 cl_khr_d3d11_sharing
 cl_khr_gl_depth_images
 cl_khr_gl_msaa_sharing
 cl_khr_initialize_memory
 cl_khr_terminate_context
 cl_khr_spir
 cl_khr_srgb_image_writes</strong>
 <br>
 <br>
 <strong>The following approved Khronos extension
 names must be returned by all devices that
 support OpenCL C 2.0:</strong>
 <br>
 <br>
 <strong>cl_khr_byte_addressable_store
 cl_khr_fp64 (for backward compatibility if
 double precision is supported)
 cl_khr_3d_image_writes
 cl_khr_image2d_from_buffer
 cl_khr_depth_images</strong>
 <br>
 <br>
 Please refer to the OpenCL 2.0 Extension
 Specification for a detailed description of
 these extensions.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PRINTF_ BUFFER_SIZE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum size in bytes of the
 internal buffer that holds the output of printf calls from a kernel.
 The minimum value for the FULL profile is 1 MB.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PREFERRED_ INTEROP_USER_SYNC</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_TRUE if the devices preference is for the
 user to be responsible for synchronization, when sharing memory objects
 between OpenCL and other APIs such as DirectX, CL_FALSE if the device /
 implementation has a performant path for performing synchronization of
 memory object shared between OpenCL and other APIs such as DirectX.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PARENT_ DEVICE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_id</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the cl_device_id of
 the parent device to which this sub-device belongs. If <em>device</em> is a
 root-level device, a NULL value is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PARTITION_ MAX_SUB_DEVICES</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the maximum number of sub-
 devices that can be created when a device
 is partitioned.
 <br>
 <br>
 The value returned cannot exceed
 CL_DEVICE_MAX_COMPUTE_UNITS .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PARTITION_ PROPERTIES</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_partition_ property[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the list of partition types supported
 by <em>device</em>. The is an array of
 cl_device_partition_property values drawn
 from the following list:
 <br>
 <br>
 CL_DEVICE_PARTITION_EQUALLY
 CL_DEVICE_PARTITION_BY_COUNTS
 CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN
 <br>
 <br>
 If the device cannot be partitioned (i.e.
 there is no partitioning scheme supported
 by the device that will return at least two
 subdevices), a value of 0 will be returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PARTITION_ AFFINITY_DOMAIN</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_affinity_ domain</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the list of supported affinity
 domains for partitioning the device using
 CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN .
 This is a bit-field that describes one or
 more of the following values:
 <br>
 <br>
 CL_DEVICE_AFFINITY_DOMAIN_NUMA
 CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE
 CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE
 CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE
 CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE
 CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITI
 ONABLE
 <br>
 <br>
 If the device does not support any affinity
 domains, a value of 0 will be returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PARTITION_ TYPE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_partition_ property[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the properties argument specified
 in <strong>clCreateSubDevices</strong> if device is a sub-
 device. In the case where the properties
 argument to <strong>clCreateSubDevices</strong> is
 CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN ,
 CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITI
 ONABLE , the affinity domain used to
 perform the partition will be returned. This
 can be one of the following values:
 <br>
 <br>
 CL_DEVICE_AFFINITY_DOMAIN_NUMA
 CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE
 CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE
 CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE
 CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE
 <br>
 <br>
 Otherwise the implementation may either
 return a <em>param_value_size_ret</em> of 0 i.e.
 there is no partition type associated with
 device or can return a property value of 0
 (where 0 is used to terminate the partition
 property list) in the memory that
 <em>param_value</em> points to.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_REFERENCE_ COUNT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the <em>device</em> reference
 count. If the device is a root-level device, a reference count of one
 is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_SVM_ CAPABILITIES</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_svm_ capabilities</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes the various shared virtual
 memory (a.k.a. SVM) memory allocation
 types the device supports. Coarse-grain
 SVM allocations are required to be
 supported by all OpenCL 2.0 devices. This
 is a bit-field that describes a combination
 of the following values:
 <br>
 <br>
 CL_DEVICE_SVM_COARSE_GRAIN_
 BUFFER – Support for coarse-grain buffer
 sharing using <strong>clSVMAlloc</strong>. Memory
 consistency is guaranteed at
 synchronization points and the host must
 use calls to <strong>clEnqueueMapBuffer</strong> and
 <strong>clEnqueueUnmapMemObject</strong>.
 <br>
 <br>
 CL_DEVICE_SVM_FINE_GRAIN_BUFFER
 – Support for fine-grain buffer sharing
 using <strong>clSVMAlloc</strong>. Memory consistency
 is guaranteed at synchronization points
 without need for <strong>clEnqueueMapBuffer</strong>
 and <strong>clEnqueueUnmapMemObject</strong>.
 <br>
 <br>
 CL_DEVICE_SVM_FINE_GRAIN_SYSTEM
 – Support for sharing the host’s entire
 virtual memory including memory
 allocated using <strong>malloc</strong>. Memory
 consistency is guaranteed at
 synchronization points.
 <br>
 <br>
 CL_DEVICE_SVM_ATOMICS – Support
 for the OpenCL 2.0 atomic operations that
 provide memory consistency across the
 host and all OpenCL devices supporting
 fine-grain SVM allocations.
 <br>
 <br>
 The mandated minimum capability is
 CL_DEVICE_SVM_COARSE_GRAIN_BUFFER.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PREFERRED_ PLATFORM_ATOMIC_ ALIGNMENT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the value representing the
 preferred alignment in bytes for OpenCL 2.0 fine-grained SVM atomic
 types. This query can return 0 which indicates that the preferred
 alignment is aligned to the natural size of the type.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PREFERRED_ GLOBAL_ATOMIC_ ALIGNMENT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the value representing the
 preferred alignment in bytes for OpenCL 2.0 atomic types to global
 memory. This query can return 0 which indicates that the preferred
 alignment is aligned to the natural size of the type.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_PREFERRED_ LOCAL_ATOMIC_ ALIGNMENT</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the
 value representing the preferred alignment in bytes for OpenCL 2.0
 atomic types to local memory. This query can return 0 which indicates
 that the preferred alignment is aligned to the natural size of the type.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_MAX_ NUM_SUB_GROUPS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum number of sub-groups
 in a work-group that a device is capable of executing on a single
 compute unit, for any given kernel-instance running on the device. The
 minimum value is 1. (Refer also to <strong>clGetKernelSubGroupInfo</strong>.)</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEVICE_SUB_ GROUP_INDEPENDENT_ FORWARD_PROGRESS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_TRUE
 if this device supports independent forward progress of sub-groups,
 CL_FALSE otherwise. If cl_khr_subgroups is supported by the device this
 must return CL_TRUE.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>The device queries described in <em>table 4.3</em> should return the same
 information for a root-level device i.e. a device returned by
 <strong>clGetDeviceIDs</strong> and any sub-devices created from this device except for
 the following queries:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="literalblock">
 <div class="content monospaced">
 <pre>CL_DEVICE_GLOBAL_MEM_CACHE_SIZE
 CL_DEVICE_BUILT_IN_KERNELS
 CL_DEVICE_PARENT_DEVICE
 CL_DEVICE_PARTITION_TYPE
 CL_DEVICE_REFERENCE_COUNT</pre>
 </div></div>
 <div class="paragraph"><p><strong>clGetDeviceInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_DEVICE if
 <em>device</em> is not valid.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>param_name</em> is not one of the supported values or if size in bytes
 specified by <em>param_value_size_is &lt; size of return type as specified in
 _table 4.3</em> and <em>param_value</em> is not a NULL value or if <em>param_name</em> is
 a value that is available as an extension and the corresponding
 extension is not supported by the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetDeviceAndHostTimer(cl_device_id device,
                                cl_ulong* device_timestamp,
                                cl_ulong* host_timestamp)</pre>
 </div></div>
 <div class="paragraph"><p>Returns a reasonably synchronized pair of timestamps from the device
 timer and the host timer as seen by <em>device</em>. Implementations may need
 to execute this query with a high latency in order to provide reasonable
 synchronization of the timestamps. The host timestamp and device
 timestamp returned by this function and <strong>clGetHostTimer</strong> each have an
 implementation deﬁned timebase. The timestamps will always be in their
 respective timebases regardless of which query function is used. The
 timestamp returned from <strong>clGetEventProﬁlingInfo</strong> for an event on a
 device and a device timestamp queried from the same device will always
 be in the same timebase.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>device</em> is a device returned by <strong>clGetDeviceIDs</strong>.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>device_timestamp</em> will be updated with the value of the device timer in
 nanoseconds. The resolution of the timer is the same as the device
 profiling timer returned by *clGetDeviceInfo*and the
 CL_DEVICE_PROFILING_TIMER_RESOLUTION query.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>host_timestamp</em> will be updated with the value of the host timer in
 nanoseconds at the closest possible point in time to that at which
 <em>device_timer</em> was returned. The resolution of the timer may be queried
 via <strong>clGetPlatformInfo</strong> and the flag CL_PLATFORM_HOST_TIMER_RESOLUTION.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>clGetDeviceAndHostTimer</strong> will return CL_SUCCESS with a time value in
 <em>host_timestamp</em> if provided. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_DEVICE if
 <em>device</em> is not a valid OpenCL device.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>host_timestamp</em> or _device_timestamp_is NULL.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetHostTimer(cl_device_id device,
                       cl_ulong* host_timestamp)</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Return the current value of the host clock as seen by <em>device</em>. This
 value is in the same timebase as the host_timestamp returned from
 <strong>clGetDeviceAndHostTimer</strong>. The implementation will return with as low a
 latency as possible to allow a correlation with a subsequent application
 sampled time. The host timestamp and device timestamp returned by this
 function and <strong>clGetDeviceAndHostTimer</strong> each have an implementation
 defined timebase. The timestamps will always be in their respective
 timebases regardless of which query function is used. The timestamp
 returned from <strong>clGetEventProfilingInfo</strong> for an event on a device and a
 device timestamp queried from the same device will always be in the same
 timebase.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>device</em> is a device returned by <strong>clGetDeviceIDs</strong>.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>host_timestamp</em> will be updated with the value of the current timer in
 nanoseconds. The resolution of the timer may be queried via
 <strong>clGetPlatformInfo</strong> and the flag CL_PLATFORM_HOST_TIMER_RESOLUTION.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>clGetHostTimer</strong> will return CL_SUCCESS with a time value in
 <em>host_timestamp</em> if provided. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_DEVICE if
 <em>device</em> is not a valid OpenCL device.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 _host_timestamp_is NULL.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect2">
 <h3 id="_partitioning_a_device">4.3. Partitioning a Device</h3>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clCreateSubDevices(cl_device_id in_device,
                           const cl_device_partition_property *properties,
                           cl_uint num_devices,
                           cl_device_id *out_devices,
                           cl_uint *num_devices_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates an array of sub-devices that each reference a non-intersecting
 set of compute units within in_device, according to a partition scheme
 given by <em>properties</em>. The output sub-devices may be used in every way
 that the root (or parent) device can be used, including creating
 contexts, building programs, further calls to <strong>clCreateSubDevices</strong> and
 creating command-queues. When a command-queue is created against a
 sub-device, the commands enqueued on the queue are executed only on the
 sub-device.</p></div>
 <div class="paragraph"><p><em>in_device</em> is the device to be partitioned.</p></div>
 <div class="paragraph"><p><em>properties</em> specifies how <em>in_device</em> is to be partition described by a
 partition name and its corresponding value. Each partition name is
 immediately followed by the corresponding desired value. The list is
 terminated with 0. The list of supported partitioning schemes is
 described in <em>table 4.4</em>. Only one of the listed partitioning schemes
 can be specified in <em>properties</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 4. <em>List of supported partition schemes by</em> <strong>clCreateSubDevices</strong></caption>
 <col style="width:30%;">
 <col style="width:20%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_device_partition_property enum</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Partition value</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_PARTITION_ EQUALLY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Split the aggregate device
 into as many smaller aggregate devices as can be created, each
 containing <em>n</em> compute units. The value <em>n</em> is passed as the value
 accompanying this property. If <em>n</em> does not divide evenly into
 CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS, then the remaining compute units
 are not used.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_PARTITION_ BY_COUNTS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This property is followed by a
 CL_DEVICE_PARTITION_BY_COUNTS_LIST_END
 terminated list of compute unit counts. For each non-
 zero count <em>m</em> in the list, a sub-device is created with
 <em>m</em> compute units in it.
 CL_DEVICE_PARTITION_BY_COUNTS_LIST_END
 is defined to be 0.
 <br>
 <br>
 The number of non-zero count entries in the list may
 not exceed
 CL_DEVICE_PARTITION_MAX_SUB_DEVICES.
 <br>
 <br>
 The total number of compute units specified may not
 exceed
 CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS .</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_PARTITION_ BY_AFFINITY_DOMAIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_affinity_ domain</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Split the device into smaller aggregate devices
 containing one or more compute units that all share
 part of a cache hierarchy. The value accompanying
 this property may be drawn from the following list:
 <br>
 <br>
 CL_DEVICE_AFFINITY_DOMAIN_NUMA – Split the
 device into sub-devices comprised of compute units
 that share a NUMA node.
 <br>
 <br>
 CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE –
 Split the device into sub-devices comprised of
 compute units that share a level 4 data cache.
 <br>
 <br>
 CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE –
 Split the device into sub-devices comprised of
 compute units that share a level 3 data cache.
 <br>
 <br>
 CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE –
 Split the device into sub-devices comprised of
 compute units that share a level 2 data cache.
 <br>
 <br>
 CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE –
 Split the device into sub-devices comprised of
 compute units that share a level 1 data cache.
 <br>
 <br>
 CL_DEVICE_AFFINITY_ DOMAIN_NEXT_ PARTITIO
 NABLE – Split the device along the next partitionable
 affinity domain. The implementation shall find the
 first level along which the device or sub-device may
 be further subdivided in the order NUMA, L4, L3,
 L2, L1, and partition the device into sub-devices
 comprised of compute units that share memory
 subsystems at this level.
 <br>
 <br>
 The user may determine what happened by calling
 clGetDeviceInfo( CL_DEVICE_PARTITION_TYPE )
 on the sub-devices.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><em>num_devices</em> is the size of memory pointed to by <em>out_devices</em>
 specified as the number of cl_device_id entries.</p></div>
 <div class="paragraph"><p><em>out_devices</em> is the buffer where the OpenCL sub-devices will be
 returned. If <em>out_devices_is NULL, this argument is ignored. If
 _out_devices</em> is not NULL, <em>num_devices</em> must be greater than or equal
 to the number of sub-devices that <em>device</em> may be partitioned into
 according to the partitioning scheme specified in <em>properties</em>.</p></div>
 <div class="paragraph"><p><em>num_devices_ret</em> returns the number of sub-devices that <em>device</em> may be
 partitioned into according to the partitioning scheme specified in
 <em>properties</em>. If <em>num_devices_ret</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><strong>clCreateSubDevices</strong> returns CL_SUCCESS if the partition is created
 successfully. Otherwise, it returns a NULL value with the following
 error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_DEVICE if
 <em>in_device</em> is not valid.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if values
 specified in <em>properties</em> are not valid or if values specified in
 <em>properties</em> are valid but not supported by the device.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>out_devices</em> is not NULL and <em>num_devices</em> is less than the number of
 sub-devices created by the partition scheme.
 </p>
 </li>
 <li>
 <p>
 CL_DEVICE_PARTITION_FAILED
 if the partition name is supported by the implementation but in_device
 could not be further partitioned.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_DEVICE_PARTITION_COUNT if the partition name specified in
 <em>properties</em> is CL_DEVICE_PARTITION_BY_COUNTS and the number of
 sub-devices requested exceeds CL_DEVICE_PARTITION_MAX_SUB_DEVICES or the
 total number of compute units requested exceeds
 CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS for <em>in_device</em>, or the number of
 compute units requested for one or more sub-devices is less than zero or
 the number of sub-devices requested exceeds
 CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS for <em>in_device</em>.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>A few examples that describe how to specify partition properties in
 <em>properties</em> argument to <strong>clCreateSubDevices</strong> are given below:</p></div>
 <div class="paragraph"><p>To partition a device containing 16 compute units into two sub-devices,
 each containing 8 compute units, pass the following in <em>properties</em>:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>{ CL_DEVICE_PARTITION_EQUALLY, 8, 0 }</pre>
 </div></div>
 <div class="paragraph"><p>To partition a device with four compute units into two sub-devices with
 one sub-device containing 3 compute units and the other sub-device 1
 compute unit, pass the following in properties argument:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>{ CL_DEVICE_PARTITION_BY_COUNTS,
   3, 1, CL_DEVICE_PARTITION_BY_COUNTS_LIST_END, 0 }</pre>
 </div></div>
 <div class="paragraph"><p>To split a device along the outermost cache line (if any), pass the
 following in properties argument:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>{ CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN,
   CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE,
   0 }</pre>
 </div></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainDevice(cl_device_id device)</pre>
 </div></div>
 <div class="paragraph"><p>increments the <em>device</em> reference count if <em>device</em> is a valid
 sub-device created by a call to <strong>clCreateSubDevices</strong>. If <em>device</em> is a
 root level device i.e. a cl_device_id returned by <strong>clGetDeviceIDs</strong>, the
 <em>device</em> reference count remains unchanged. <strong>clRetainDevice</strong> returns
 CL_SUCCESS if the function is executed successfully or the device is a
 root-level device. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_DEVICE if
 <em>device</em> is not a valid sub-device created by a call to
 <strong>clCreateSubDevices</strong>.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.

 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseDevice(cl_device_id device)</pre>
 </div></div>
 <div class="paragraph"><p>decrements the <em>device</em> reference count if device is a valid sub-device
 created by a call to <strong>clCreateSubDevices</strong>. If <em>device</em> is a root level
 device i.e. a cl_device_id returned by <strong>clGetDeviceIDs</strong>, the <em>device</em>
 reference count remains unchanged. <strong>clReleaseDevice</strong> returns CL_SUCCESS
 if the function is executed successfully. Otherwise, it returns one of
 the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_DEVICE if <em>device</em> is not a valid sub-device created by a call to
 <strong>clCreateSubDevices</strong>.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>After the <em>device</em> reference count becomes zero and all the objects
 attached to <em>device</em> (such as command-queues) are released, the <em>device</em>
 object is deleted. Using this function to release a reference that was
 not obtained by creating the object or by calling <strong>clRetainDevice</strong>
 causes undefined behavior.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_contexts">4.4. Contexts</h3>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_context clCreateContext(const cl_context_properties *properties,
                            cl_uint num_devices,
                            const cl_device_id *devices,
                            void(CL_CALLBACK *pfn_notify)
                                (const char *errinfo,
                                const void *private_info,
                                size_t cb,
                                void *user_data),
                            void *user_data,
                            cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates an OpenCL context. An OpenCL context is created with one or
 more devices. Contexts are used by the OpenCL runtime for managing
 objects such as command-queues, memory, program and kernel objects and
 for executing kernels on one or more devices specified in the context.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>properties_specifies a list of context property names and their
 corresponding values. Each property name is immediately followed by the
 corresponding desired value. The list is terminated with 0. The list of
 supported properties is described in _table 4.5.</em> <em>properties</em> can be
 NULL in which case the platform that is selected is
 implementation-defined.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 5. <em>List of supported properties by</em> <strong>clCreateContext</strong></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_context_properties enum</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Property value</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CONTEXT_PLATFORM</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_platform_id</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Specifies the platform to use.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CONTEXT_INTEROP_ USER_SYNC</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Specifies whether the user is
 responsible for synchronization
 between OpenCL and other APIs.
 Please refer to the specific sections
 in the OpenCL 2.0 extension
 specification that describe sharing
 with other APIs for restrictions on
 using this flag.
 <br>
 <br>
 If CL_CONTEXT_INTEROP_USER_
 SYNC is not specified, a default of
 CL_FALSE is assumed.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><em>num_devices</em> is the number of devices specified in the <em>devices</em>
 argument.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>devices</em> is a pointer to a list of unique deviceslink<span class="footnote"><br>[Duplicate devices specified in <em>devices</em> are ignored.]<br></span>
 returned by <strong>clGetDeviceIDs</strong> or sub-devices created by
 <strong>clCreateSubDevices</strong> for a platform.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>pfn_notify</em> is a callback function that can be registered by the
 application. This callback function will be used by the OpenCL
 implementation to report information on errors during context creation
 as well as errors that occur at runtime in this context. This callback
 function may be called asynchronously by the OpenCL implementation. It
 is the applications responsibility to ensure that the callback function
 is thread-safe. The parameters to this callback function are:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <em>errinfo</em> is a pointer to
 an error string.
 </p>
 </li>
 <li>
 <p>
 <em>private_info</em> and <em>cb</em>
 represent a pointer to binary data that is returned by the OpenCL
 implementation that can be used to log additional information helpful in
 debugging the error.
 </p>
 </li>
 <li>
 <p>
 <em>user_data</em> is a pointer
 to user supplied data.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>If <em>pfn_notify</em> is NULL, no callback function is registered.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">There are a number of cases where error notifications need to be
 delivered due to an error that occurs outside a context. Such
 notifications may not be delivered through the <em>pfn_notify</em> callback.
  Where these notifications go is implementation-defined.</td>
 </tr></table>
 </div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>user_data</em> will be passed as the <em>user_data</em> argument when <em>pfn_notify</em>
 is called. <em>user_data</em> can be NULL.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>clCreateContext</strong> returns a valid non-zero context and <em>errcode_ret</em> is
 set to CL_SUCCESS if the context is created successfully. Otherwise, it
 returns a NULL value with the following error values returned in
 <em>errcode_ret</em>:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_PLATFORM if
 <em>properties_is NULL and no platform could be selected or if platform
 value specified in _properties</em> is not a valid platform.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_PROPERTY if
 context property name in <em>properties</em> is not a supported property name,
 if the value specified for a supported property name is not valid, or if
 the same property name is specified more than once.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 _devices_is NULL.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 _num_devices_is equal to zero.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>pfn_notify</em> is NULL but <em>user_data</em> is not NULL.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_DEVICE if
 <em>devices</em> contains an invalid device.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_DEVICE_NOT_AVAILABLE if
 a device in <em>devices</em> is currently not available even though the device
 was returned by <strong>clGetDeviceIDs</strong>.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function<span class="footnote"><br>[<strong>clCreateContextfromType</strong> may return all or a subset of the actual physical devices present in the platform and
 that match device_type.]<br></span></p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_context clCreateContextFromType(const cl_context_properties *properties,
                                    cl_device_type device_type,
                                    void(CL_CALLBACK *pfn_notify)
                                        (const char *errinfo,
                                         const void *private_info,
                                         size_t cb,
                                         void *user_data),
                                    void *user_data,
                                    cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates an OpenCL context from a device type that identifies the
 specific device(s) to use. Only devices that are returned by
 <strong>clGetDeviceIDs</strong> for <em>device_type</em> are used to create the context. The
 context does not reference any sub-devices that may have been created
 from these devices.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>properties_specifies a list of context property names and their
 corresponding values. Each property name is immediately followed by the
 corresponding desired value. The list of supported properties is
 described in _table 4.5</em>. <em>properties</em> can also be NULL in which case
 the platform that is selected is implementation-defined.</p></div>
 <div class="paragraph"><p><em>device_type</em> is a bit-field that identifies the type of device and is
 described in <em>table 4.2</em> in <em>section 4.2</em>.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>pfn_notify</em> and <em>user_data</em> are described in <strong>clCreateContext</strong>.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>clCreateContextFromType</strong> returns a valid non-zero context and
 <em>errcode_ret</em> is set to CL_SUCCESS if the context is created
 successfully. Otherwise, it returns a NULL value with the following
 error values returned in <em>errcode_ret</em>:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_PLATFORM if
 <em>properties_is NULL and no platform could be selected or if platform
 value specified in _properties</em> is not a valid platform.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_PROPERTY if
 context property name in <em>properties</em> is not a supported property name,
 if the value specified for a supported property name is not valid, or if
 the same property name is specified more than once.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>pfn_notify</em> is NULL but <em>user_data</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_DEVICE_TYPE if
 <em>device_type</em> is not a valid value.
 </p>
 </li>
 <li>
 <p>
 CL_DEVICE_NOT_AVAILABLE if
 no devices that match <em>device_type</em> and property values specified in
 <em>properties</em> are currently available.
 </p>
 </li>
 <li>
 <p>
 CL_DEVICE_NOT_FOUND if no
 devices that match <em>device_type</em> and property values specified in
 <em>properties</em> were found.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainContext(cl_context context)</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>increments the <em>context</em> reference count. <strong>clRetainContext</strong> returns
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_CONTEXT if
 <em>context</em> is not a valid OpenCL context.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>clCreateContext*and*clCreateContextFromType</strong> perform an implicit
 retain. This is very helpful for 3<sup>rd</sup> party libraries, which typically
 get a context passed to them by the application. However, it is
 possible that the application may delete the context without informing
 the library. Allowing functions to attach to (i.e. retain) and release
 a context solves the problem of a context being used by a library no
 longer being valid.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseContext(cl_context context)</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>decrements the <em>context</em> reference count. <strong>clReleaseContext</strong> returns
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_CONTEXT if
 <em>context</em> is not a valid OpenCL context.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>After the <em>context</em> reference count becomes zero and all the objects
 attached to <em>context</em> (such as memory objects, command-queues) are
 released, the <em>context</em> is deleted. Using this function to release a
 reference that was not obtained by creating the object or by calling
 *clRetainContext*causes undefined behavior.
 The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetContextInfo(cl_context context,
                         cl_context_info param_name,
                         size_t param_value_size,
                         void *param_value,
                         size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>can be used to query information about a context.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>context</em> specifies the OpenCL context being queried.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>param_name</em> is an enumeration constant that specifies the information
 to query.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>param_value_size</em> specifies the size in bytes of memory pointed to by
 <em>param_value</em>. This size must be greater than or equal to the size of
 return type as described in <em>table 4.6</em>.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The list of supported <em>param_name_values and the information returned in
 _param_value</em> by <strong>clGetContextInfo</strong> is described in <em>table 4.6</em>.</p></div>
 <div class="paragraph"><div class="title">List of supported param_names</div><p>by <strong>clGetContextInfo</strong></p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_context_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Information returned in param_value</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CONTEXT_ REFERENCE_COUNT</strong> <span class="footnote"><br>[The reference count returned should be considered immediately stale. It is unsuitable for general use in
 applications. This feature is provided for identifying memory leaks.]<br></span></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the <em>context</em> reference
 count.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CONTEXT_NUM_DEVICES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the number of devices in <em>context</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CONTEXT_DEVICES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_id[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the list of devices and
 sub-devices in <em>context</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CONTEXT_PROPERTIES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_context_properties[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the properties argument
 specified in <strong>clCreateContext</strong> or
 <strong>clCreateContextFromType</strong>.
 <br>
 <br>
 If the <em>properties</em> argument specified
 in <strong>clCreateContext</strong> or
 <strong>clCreateContextFromType</strong> used
 to create <em>context</em> is not NULL , the
 implementation must return the
 values specified in the properties
 argument.
 <br>
 <br>
 If the <em>properties</em> argument specified
 in <strong>clCreateContext</strong> or
 <strong>clCreateContextFromType</strong> used
 to create <em>context</em> is NULL , the
 implementation may return either a
 <em>param_value_size_ret</em> of 0 i.e. there
 is no context property value to be
 returned or can return a context
 property value of 0 (where 0 is used
 to terminate the context properties
 list) in the memory that
 <em>param_value</em> points to.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetContextInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_CONTEXT if
 <em>context</em> is not a valid context.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>param_name</em> is not one of the supported values or if size in bytes
 specified by <em>param_value_size_is &lt; size of return type as specified in
 _table 4.6</em> and <em>param_value</em> is not a NULL value.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_the_opencl_runtime">5. The OpenCL Runtime</h2>
 <div class="sectionbody">
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In this section we describe the API calls that manage OpenCL objects
 such as command-queues, memory objects, program objects, kernel objects
 for kernel functions in a program and calls that allow you to enqueue
 commands to a command-queue such as executing a kernel, reading, or
 writing a memory object.</p></div>
 <div class="sect2">
 <h3 id="_command_queues">5.1. Command Queues</h3>
 <div class="paragraph"><p>OpenCL objects such as memory, program and kernel objects are created
 using a context. Operations on these objects are performed using a
 command-queue. The command-queue can be used to queue a set of
 operations (referred to as commands) in order. Having multiple
 command-queues allows applications to queue multiple independent
 commands without requiring synchronization. Note that this should work
 as long as these objects are not being shared. Sharing of objects
 across multiple command-queues will require the application to perform
 appropriate synchronization. This is described in <em>Appendix A</em>.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_command_queue clCreateCommandQueueWithProperties(
     cl_context context,
     cl_device_id device,
     const cl_queue_properties *properties,
     cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates a host or device command-queue on a specific device.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>device</em> must be a device or sub-device associated with <em>context</em>. It
 can either be in the list of devices and sub-devices specified when
 <em>context</em> is created using <strong>clCreateContext  or
 be a root device with the same device type as specified when <em>context</em>
 is created using *clCreateContextFromType</strong>.</p></div>
 <div class="paragraph"><p><em>properties</em> specifies a list of properties for the command-queue and
 their corresponding values. Each property name is immediately followed
 by the corresponding desired value. The list is terminated with 0. The
 list of supported properties is described in the table below_._ If a
 supported property and its value is not specified in <em>properties</em>, its
 default value will be used. <em>properties</em> can be NULL in which case the
 default values for supported command-queue properties will be used.
  </p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 6. <em>List of supported cl_queue_properties values and description</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Queue Properties</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Property Value</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_QUEUE_PROPERTIES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bitfield</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This is a bitfield and can be set to a
 CL_QUEUE_OUT_OF_ORDER_ EXEC_MODE_ENABLE
  – Determines whether the
 commands queued in the command-queue are executed in-order or out-of-order. If
 set, the commands in the command-queue are executed out-of-order. Otherwise,
 commands are executed in-order.
 <br>
 <br>
 CL_QUEUE_PROFILING_ENABLE – Enable or disable profiling of commands in
 the command-queue. If set, the profiling of commands is enabled. Otherwise profiling
 of commands is disabled.
 <br>
 <br>
 CL_QUEUE_ON_DEVICE – Indicates that this is a device queue. If
 <br>
 <br>
 CL_QUEUE_ON_DEVICE is set, CL_QUEUE_OUT_OF_ORDER_ EXEC_MODE_ENABLE<span class="footnote"><br>[Only out-of-order device queues are supported.]<br></span>: must also be set.
 <br>
 <br>
 CL_QUEUE_ ON_DEVICE_DEFAULT<span class="footnote"><br>[The application must create the default device queue if any kernels containing calls to get_default_queue are
 enqueued. There can only be one default device queue for each device within a context.
 clCreateCommandQueueWithProperties with CL_QUEUE_PROPERTIES set to CL_QUEUE_ON_DEVICE or
 CL_QUEUE_ON_DEVICE_DEFAULT will return the default device queue that has already been created and
 increment its retain count by 1.]<br></span>:–indicates that this is the default device
 queue. This can only be used with CL_QUEUE_ON_DEVICE.
 <br>
 <br>
 If CL_QUEUE_PROPERTIES is not specified an in-order host command queue
 is created for the specified device</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_QUEUE_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Specifies the size of the device queue in bytes.
 <br>
 <br>
 This can only be specified if CL_QUEUE_ON_DEVICE is set in CL_QUEUE_PROPERTIES.
 This must be a value &#8656; CL_DEVICE_QUEUE_ ON_DEVICE_MAX_SIZE.
 <br>
 <br>
 For best performance, this should be &#8656; CL_DEVICE_QUEUE_ ON_DEVICE_PREFERRED_SIZE.
 <br>
 <br>
 If CL_QUEUE_SIZE is not specified, the device queue is created with
 CL_DEVICE_QUEUE_ ON_DEVICE_PREFERRED_SIZE as the size of the queue.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>
 <em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateCommandQueueWithProperties</strong> returns a valid non-zero
 command-queue and <em>errcode_ret</em> is set to CL_SUCCESS if the
 command-queue is created successfully. Otherwise, it returns a NULL
 value with one of the following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_DEVICE if
 <em>device_is not a valid device or is not associated with _context</em>.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if values
 specified in <em>properties</em> are not valid.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_QUEUE_PROPERTIES if values specified in <em>properties</em> are
 valid but are not supported by the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clSetDefaultDeviceCommandQueue(cl_context context,
                                       cl_device_id device,
                                       cl_command_queue command_queue)</pre>
 </div></div>
 <div class="paragraph"><p>replaces the default command queue on the <em>device</em>.</p></div>
 <div class="paragraph"><p><strong>clSetDefaultDeviceCommandQueue</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_CONTEXT if <em>context</em> is not a valid context.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_DEVICE if <em>device</em> is not a valid device or is not associated with <em>context</em>.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid command-queue for <em>device</em>.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>clSetDefaultDeviceCommandQueue</strong> may be used to replace a default device
 command queue created with <strong>clCreateCommandQueueWithProperties</strong> and the
 CL_QUEUE_ON_DEVICE_DEFAULT flag.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainCommandQueue(cl_command_queue command_queue)</pre>
 </div></div>
 <div class="paragraph"><p>increments the <em>command_queue</em> reference count. <strong>clRetainCommandQueue</strong>
 returns CL_SUCCESS if the function is executed successfully. Otherwise,
 it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid command-queue.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>clCreateCommandQueueWithProperties</strong> performs an implicit retain. This
 is very helpful for 3<sup>rd</sup> party libraries, which typically get a
 command-queue passed to them by the application. However, it is
 possible that the application may delete the command-queue without
 informing the library. Allowing functions to attach to (i.e. retain)
 and release a command-queue solves the problem of a command-queue being
 used by a library no longer being valid.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseCommandQueue(cl_command_queue command_queue)</pre>
 </div></div>
 <div class="paragraph"><p>decrements the <em>command_queue</em> reference count. <strong>clReleaseCommandQueue</strong>
 returns CL_SUCCESS if the function is executed successfully. Otherwise,
 it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid command-queue.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>After the <em>command_queue</em> reference count becomes zero and all commands
 queued to <em>command_queue</em> have finished (eg. kernel-instances, memory
 object updates etc.), the command-queue is deleted.</p></div>
 <div class="paragraph"><p><strong>clReleaseCommandQueue</strong> performs an implicit flush to issue any
 previously queued OpenCL commands in <em>command_queue</em>. Using this
 function to release a reference that was not obtained by creating the
 object or by calling <strong>clRetainCommandQueue</strong> causes undefined behavior.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetCommandQueueInfo(cl_command_queue command_queue,
                              cl_command_queue_info param_name,
                              size_t param_value_size,
                              void *param_value,
                              size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>can be used to query information about a command-queue.</p></div>
 <div class="paragraph"><p><em>command_queue</em> specifies the command-queue being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.2</em>. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <div class="paragraph"><p>The list of supported <em>param_name_values and the information returned in
 _param_value</em> by <strong>clGetCommandQueueInfo</strong> is described in <em>table 5.2</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 7. <em>List of supported param_names by clGetCommandQueueInfo</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_command_queue_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Information returned in
 param_value</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_QUEUE_CONTEXT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_context</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the context specified when the
 command-queue is created.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_QUEUE_DEVICE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_id</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the device specified when the
 command-queue is created.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_QUEUE_REFERENCE_COUNT<span class="footnote"><br>[The reference count returned should be considered immediately stale. It is unsuitable for general use in
 applications. This feature is provided for identifying memory leaks.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the command-queue reference count.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_QUEUE_PROPERTIES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_command_queue_properties</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the currently specified properties for the
 command-queue. These properties are specified by the value associated
 with the CL_COMMAND_QUEUE_ PROPERTIES passed in <em>properties</em> argument in
 <strong>clCreateCommandQueueWithProperties.</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_QUEUE_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the currently specified size for the
 device command-queue. This query is only supported for device command
 queues.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_QUEUE_DEVICE_DEFAULT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_command_queue</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the current default
 command queue for the underlying device.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetCommandQueueInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid command-queue.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or if size in bytes
 specified by <em>param_value_size_is &lt; size of return type as specified in
 _table 5.2</em> and <em>param_value</em> is not a NULL value.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>NOTE</strong></p></div>
 <div class="paragraph"><p>It is possible that a device(s) becomes unavailable after a context and
 command-queues that use this device(s) have been created and commands
 have been queued to command-queues. In this case the behavior of OpenCL
 API calls that use this context (and command-queues) are considered to
 be implementation-defined. The user callback function, if specified,
 when the context is created can be used to record appropriate
 information in the <em>errinfo</em>, <em>private_info</em> arguments passed to the
 callback function when the device becomes unavailable.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_buffer_objects">5.2. Buffer Objects</h3>
 <div class="paragraph"><p>A <em>buffer</em> object stores a one-dimensional collection of elements.
 Elements of a <em>buffer</em> object can be a scalar data type (such as an int,
 float), vector data type, or a user-defined structure.</p></div>
 <div class="sect3">
 <h4 id="_creating_buffer_objects">5.2.1. Creating Buffer Objects</h4>
 <div class="paragraph"><p>A <strong>buffer object</strong> is created using the following function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_mem clCreateBuffer(cl_context context,
                       cl_mem_flags flags,
                       size_t size,
                       void *host_ptr,
                       cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p><em>context</em> is a valid OpenCL context used to create the buffer object.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify allocation and usage
 information such as the memory arena that should be used to allocate the
 buffer object and how it will be used. <em>Table 5.3</em> describes the
 possible values for <em>flags</em>. If value specified for <em>flags</em> is 0, the
 default is used which is CL_MEM_READ_WRITE.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 8. <em>List of supported cl_mem_flags values</em></caption>
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_mem_flags</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_READ_WRITE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the memory object will be read
 and written by a kernel. This is the default.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_WRITE_ONLY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the memory object will be
 written but not read by a kernel.
 <br>
 <br>
 Reading from a buffer or image object created with
 CL_MEM_WRITE_ONLY inside a kernel is undefined.
 <br>
 <br>
 CL_MEM_READ_WRITE and
 CL_MEM_WRITE_ONLY are mutually exclusive.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_READ_ONLY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the memory object is a readonly memory object when used inside a kernel.
 <br>
 <br>
 Writing to a buffer or image object created with
 CL_MEM_READ_ONLY inside a kernel is undefined.
 <br>
 <br>
 CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY
 and CL_MEM_READ_ONLY are mutually exclusive.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_USE_HOST_PTR</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag is valid only if host_ptr is not NULL. If
 specified, it indicates that the application wants the
 OpenCL implementation to use memory referenced by
 host_ptr as the storage bits for the memory object.
 <br>
 <br>
 OpenCL implementations are allowed to cache the
 buffer contents pointed to by host_ptr in device
 memory. This cached copy can be used when kernels
 are executed on a device.
 <br>
 <br>
 The result of OpenCL commands that operate on
 multiple buffer objects created with the same  host_ptr
 or from overlapping host or SVM regions is
 considered to be undefined.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_ALLOC_HOST_PTR</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the application wants the
 OpenCL implementation to allocate memory from
 host accessible memory.
 <br>
 <br>
 CL_MEM_ALLOC_HOST_PTR and
 CL_MEM_USE_HOST_PTR are mutually exclusive.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_COPY_HOST_PTR</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag is valid only if host_ptr is not NULL. If
 specified, it indicates that the application wants the
 OpenCL implementation to allocate memory for the
 memory object and copy the data from memory
 referenced by host_ptr.  The implementation will copy
 the memory immediately and host_ptr is available for
 reuse by the application when the clCreateBuffer or
 clCreateImage operation returns.
 <br>
 <br>
 CL_MEM_COPY_HOST_PTR and
 CL_MEM_USE_HOST_PTR are mutually exclusive.
 <br>
 <br>
 CL_MEM_COPY_HOST_PTR can be used with
 CL_MEM_ALLOC_HOST_PTR to initialize the
 contents of the cl_mem object allocated using hostaccessible (e.g. PCIe) memory.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_HOST_WRITE_ONLY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the host will only
 write to the memory object (using OpenCL APIs that enqueue a write or a
 map for write). This can be used to optimize write access from the host
 (e.g. enable write-combined allocations for memory objects for devices
 that communicate with the host over a system bus such as PCIe).</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_HOST_READ_ONLY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the host will only read the
 memory object (using OpenCL APIs that enqueue a
 read or a map for read).
 <br>
 <br>
 CL_MEM_HOST_WRITE_ONLY and
 CL_MEM_HOST_READ_ONLY are mutually
 exclusive.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_HOST_NO_ACCESS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the host will not read or write
 the memory object.
 <br>
 <br>
 CL_MEM_HOST_WRITE_ONLY or
 CL_MEM_HOST_READ_ONLY and
 CL_MEM_HOST_NO_ACCESS are mutually
 exclusive.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><em>size</em> is the size in bytes of the buffer memory object to be allocated.</p></div>
 <div class="paragraph"><p><em>host_ptr</em> is a pointer to the buffer data that may already be allocated
 by the application. The size of the buffer that <em>host_ptr</em> points to
 must be &gt;= <em>size</em> bytes.</p></div>
 <div class="paragraph"><p>The user is responsible for ensuring that data passed into and out of
 OpenCL images are natively aligned relative to the start of the buffer
 as per kernel language or IL requirements. OpenCL buffers created with
 CL_MEM_USE_HOST_PTR need to provide an appropriately aligned host memory
 pointer that is aligned to the data types used to access these buffers
 in a kernel(s).</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p>If <strong>clCreateBuffer</strong> is called with a pointer returned by <strong>clSVMAlloc</strong> as
 its <em>host_ptr</em> argument, and CL_MEM_USE_HOST_PTR is set in its <em>flags</em>
 argument, <strong>clCreateBuffer</strong> will succeed and return a valid non-zero
 buffer object as long as the <em>size</em> argument to <strong>clCreateBuffer</strong> is no
 larger than the <em>size</em> argument passed in the original <strong>clSVMAlloc</strong>
 call. The new buffer object returned has the shared memory as the
 underlying storage. Locations in the buffers underlying shared memory
 can be operated on using atomic operations to the devices level of
 support as defined in the memory model.</p></div>
 <div class="paragraph"><p><strong>clCreateBuffer</strong> returns a valid non-zero buffer object and
 <em>errcode_ret</em> is set to CL_SUCCESS if the buffer object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_CONTEXT if _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if values specified in <em>flags_are not valid as defined in _table 5.3</em>.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_BUFFER_SIZE if <em>size</em> is 0<span class="footnote"><br>[Implementations may return CL_INVALID_BUFFER_SIZE if size is greater than
 CL_DEVICE_MAX_MEM_ALLOC_SIZE value specified in <em>table 4.3</em> for all devices in context. ]<br></span>:.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_HOST_PTR if <em>host_ptr</em> is NULL and CL_MEM_USE_HOST_PTR or CL_MEM_COPY_HOST_PTR are
 set in <em>flags</em> or if <em>host_ptr</em> is not NULL but CL_MEM_COPY_HOST_PTR or
 CL_MEM_USE_HOST_PTR are not set in <em>flags</em>.
 </p>
 </li>
 <li>
 <p>
 CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for buffer object.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
 CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_mem clCreateSubBuffer(cl_mem buffer,
                          cl_mem_flags flags,
                          cl_buffer_create_type buffer_create_type,
                          const void *buffer_create_info,
                          cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>can be used to create a new buffer object (referred to as a sub-buffer
 object) from an existing buffer object.</p></div>
 <div class="paragraph"><p><em>buffer</em> must be a valid buffer object and cannot be a sub-buffer
 object.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify allocation and usage
 information about the sub-buffer memory object being created and is
 described in <em>table 5.3</em>. If the CL_MEM_READ_WRITE, CL_MEM_READ_ONLY or
 CL_MEM_WRITE_ONLY values are not specified in <em>flags</em>, they are
 inherited from the corresponding memory access qualifers associated with
 <em>buffer</em>. The CL_MEM_USE_HOST_PTR, CL_MEM_ALLOC_HOST_PTR and
 CL_MEM_COPY_HOST_PTR values cannot be specified in <em>flags</em> but are
 inherited from the corresponding memory access qualifiers associated
 with <em>buffer</em>. If CL_MEM_COPY_HOST_PTR is specified in the memory
 access qualifier values associated with <em>buffer</em> it does not imply any
 additional copies when the sub-buffer is created from <em>buffer</em>. If the
 CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS
 values are not specified in <em>flags</em>, they are inherited from the
 corresponding memory access qualifiers associated with <em>buffer</em>.</p></div>
 <div class="paragraph"><p><em>buffer_create_type_and _buffer_create_info</em> describe the type of buffer
 object to be created. The list of supported values for
 <em>buffer_create_type</em> and corresponding descriptor that
 <em>buffer_create_info</em> points to is described in <em>table 5.4</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 9. <em>List of supported names and values in clCreateSubBuffer</em></caption>
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_buffer_create_type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_BUFFER_CREATE_TYPE_REGION</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Create a buffer object that represents a specific
 region in buffer.
 <br>
 <br>
 buffer_create_info is a pointer to the following
 structure:
 typedef struct _cl_buffer_region {
 size_t origin;
 size_t size;
 } cl_buffer_region;
 <br>
 <br>
 (origin, size) defines the offset and size in bytes in
 buffer.
 <br>
 <br>
 If buffer is created with
 CL_MEM_USE_HOST_PTR, the host_ptr
 associated with the buffer object returned is
 host_ptr + origin.
 <br>
 <br>
 The buffer object returned references the data store
 allocated for buffer and points to a specific region
 given by (origin, size) in this data store.
 <br>
 <br>
 CL_INVALID_VALUE is returned in errcode_ret if
 the region specified by (origin, size) is out of
 bounds in buffer.
 <br>
 <br>
 CL_INVALID_BUFFER_SIZE if size is 0.
 <br>
 <br>
 CL_MISALIGNED_SUB_BUFFER_OFFSET is
 returned in errcode_ret if there are no devices in
 context associated with buffer for which the origin
 value is aligned to the
 CL_DEVICE_MEM_BASE_ADDR_ALIGN value.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clCreateSubBuffer</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors in
 <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_MEM_OBJECT if
 <em>buffer</em> is not a valid buffer object or is a sub-buffer object.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>buffer</em> was created with CL_MEM_WRITE_ONLY and <em>flags</em> specifies
 CL_MEM_READ_WRITE or CL_MEM_READ_ONLY, or if <em>buffer</em> was created with
 CL_MEM_READ_ONLY and <em>flags</em> specifies CL_MEM_READ_WRITE or
 CL_MEM_WRITE_ONLY, or if <em>flags</em> specifies CL_MEM_USE_HOST_PTR or
 CL_MEM_ALLOC_HOST_PTR or CL_MEM_COPY_HOST_PTR.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 <em>buffer</em> was created with CL_MEM_HOST_WRITE_ONLY and <em>flags</em> specify
 CL_MEM_HOST_READ_ONLY, or if <em>buffer</em> was created with
 CL_MEM_HOST_READ_ONLY and <em>flags</em> specify CL_MEM_HOST_WRITE_ONLY, or if
 <em>buffer</em> was created with CL_MEM_HOST_NO_ACCESS and <em>flags</em> specify
 CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if value
 specified in _buffer_create_type_is not valid.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_VALUE if
 value(s) specified in <em>buffer_create_info</em> (for a given
 <em>buffer_create_type</em>) is not valid or if <em>buffer_create_info</em> is NULL.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_BUFFER_SIZE if <em>size</em> is 0.
 </p>
 </li>
 <li>
 <p>
 CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for sub-buffer object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>NOTE:</p></div>
 <div class="paragraph"><p>Concurrent reading from, writing to and copying between both a buffer
 object and its sub-buffer object(s) is undefined. Concurrent reading
 from, writing to and copying between overlapping sub-buffer objects
 created with the same buffer object is undefined. Only reading from
 both a buffer object and its sub-buffer objects or reading from multiple
 overlapping sub-buffer objects is defined.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_reading_writing_and_copying_buffer_objects">5.2.2. Reading, Writing and Copying Buffer Objects</h4>
 <div class="paragraph"><p>The following functions enqueue commands to read from a buffer object to
 host memory or write to a buffer object from host memory.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueReadBuffer(cl_command_queue command_queue,
                            cl_mem buffer,
                            cl_bool blocking_read,
                            size_t offset,
                            size_t size,
                            void *ptr,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueWriteBuffer(cl_command_queue command_queue,
                             cl_mem buffer,
                             cl_bool blocking_write,
                             size_t offset,
                             size_t size,
                             const void *ptr,
                             cl_uint num_events_in_wait_list,
                             const cl_event *event_wait_list,
                             cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command-queue in which the read / write
 command will be queued. <em>command_queue</em> and <em>buffer</em> must be created
 with the same OpenCL context.</p></div>
 <div class="paragraph"><p><em>buffer</em> refers to a valid buffer object.</p></div>
 <div class="paragraph"><p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
 operations are <em>blocking</em> or <em>non-blocking</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
 <strong>clEnqueueReadBuffer</strong> does not return until the buffer data has been
 read and copied into memory pointed to by <em>ptr</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
 <strong>clEnqueueReadBuffer</strong> queues a non-blocking read command and returns.
 The contents of the buffer that <em>ptr</em> points to cannot be used until the
 read command has completed. The <em>event</em> argument returns an event
 object which can be used to query the execution status of the read
 command. When the read command has completed, the contents of the
 buffer that _ptr_points to__can be used by the application.</p></div>
 <div class="paragraph"><p>If <em>blocking_write_is CL_TRUE, the OpenCL implementation copies the data
 referred to by _ptr</em> and enqueues the write operation in the
 command-queue. The memory pointed to by <em>ptr</em> can be reused by the
 application after the <strong>clEnqueueWriteBuffer</strong> call returns.</p></div>
 <div class="paragraph"><p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use
 <em>ptr</em> to perform a non-blocking write. As the write is non-blocking the
 implementation can return immediately. The memory pointed to by <em>ptr</em>
 cannot be reused by the application after the call returns. The <em>event</em>
 argument returns an event object which can be used to query the
 execution status of the write command. When the write command has
 completed, the memory pointed to by <em>ptr</em> can then be reused by the
 application.</p></div>
 <div class="paragraph"><p><em>offset</em> is the offset in bytes in the buffer object to read from or
 write to.</p></div>
 <div class="paragraph"><p><em>size</em> is the size in bytes of data being read or written.</p></div>
 <div class="paragraph"><p><em>ptr</em> is the pointer to buffer in host memory where data is to be read
 into or to be written from.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular read /
 write command and can be used to query or queue a wait for this
 particular command to complete. <em>event</em> can be NULL in which case it
 will not be possible for the application to query the status of this
 command or queue a wait for this command to complete. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueReadBuffer</strong> and <strong>clEnqueueWriteBuffer</strong> return CL_SUCCESS if
 the function is executed successfully. Otherwise, it returns one of the
 following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and <em>buffer</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>buffer</em> is not a valid buffer object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 region being read or written specified by (<em>offset</em>, <em>size</em>) is out of
 bounds or if <em>ptr</em> is a NULL value.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
 CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
 <em>offset</em> specified when the sub-buffer object is created is not aligned
 to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
 operations are blocking and the execution status of any of the events in
 <em>event_wait_list</em> is a negative integer value.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>buffer</em>.
 </p>
 </li>
 <li>
 <p>
      CL_INVALID_OPERATION if
 <strong>clEnqueueReadBuffer</strong> is called on <em>buffer</em> which has been created with
 CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 <strong>clEnqueueWriteBuffer</strong> is called on <em>buffer</em> which has been created with
 CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following functions enqueue commands to read a 2D or 3D rectangular
 region from a buffer object to host memory or write a 2D or 3D
 rectangular region to a buffer object from host memory.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueReadBufferRect(cl_command_queue command_queue,
                                cl_mem buffer,
                                cl_bool blocking_read,
                                const size_t *buffer_origin,
                                const size_t *host_origin,
                                const size_t *region,
                                size_t buffer_row_pitch,
                                size_t buffer_slice_pitch,
                                size_t host_row_pitch,
                                size_t host_slice_pitch,
                                void *ptr,
                                cl_uint num_events_in_wait_list,
                                const cl_event *event_wait_list,
                                cl_event *event)</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueWriteBufferRect(cl_command_queue command_queue,
                                 cl_mem buffer,
                                 cl_bool blocking_write,
                                 const size_t *buffer_origin,
                                 const size_t *host_origin,
                                 const size_t *region,
                                 size_t buffer_row_pitch,
                                 size_t buffer_slice_pitch,
                                 size_t host_row_pitch,
                                 size_t host_slice_pitch,
                                 const void *ptr,
                                 cl_uint num_events_in_wait_list,
                                 const cl_event *event_wait_list,
                                 cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p><em>command_queue</em> refers is a valid host command-queue in which the read /
 write command will be queued. <em>command_queue</em> and <em>buffer</em> must be
 created with the same OpenCL context.</p></div>
 <div class="paragraph"><p><em>buffer</em> refers to a valid buffer object.</p></div>
 <div class="paragraph"><p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
 operations are <em>blocking</em> or <em>non-blocking</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
 <strong>clEnqueueReadBufferRect</strong> does not return until the buffer data has been
 read and copied into memory pointed to by <em>ptr</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
 <strong>clEnqueueReadBufferRect</strong> queues a non-blocking read command and
 returns. The contents of the buffer that <em>ptr</em> points to cannot be used
 until the read command has completed. The <em>event</em> argument returns an
 event object which can be used to query the execution status of the read
 command. When the read command has completed, the contents of the
 buffer that _ptr_points to__can be used by the application.</p></div>
 <div class="paragraph"><p>If <em>blocking_write_is CL_TRUE, the OpenCL implementation copies the data
 referred to by _ptr</em> and enqueues the write operation in the
 command-queue. The memory pointed to by <em>ptr</em> can be reused by the
 application after the <strong>clEnqueueWriteBufferRect</strong> call returns.</p></div>
 <div class="paragraph"><p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use
 <em>ptr</em> to perform a non-blocking write. As the write is non-blocking the
 implementation can return immediately. The memory pointed to by <em>ptr</em>
 cannot be reused by the application after the call returns. The <em>event</em>
 argument returns an event object which can be used to query the
 execution status of the write command. When the write command has
 completed, the memory pointed to by <em>ptr</em> can then be reused by the
 application.</p></div>
 <div class="paragraph"><p><em>buffer_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
 associated with <em>buffer</em>. For a 2D rectangle region, the <em>z</em> value
 given by <em>buffer_origin</em>[2] should be 0. The offset in bytes is
 computed as <em>buffer_origin</em>[2] * <em>buffer_slice_pitch</em><br>
 <em>buffer_origin</em>[1] * <em>buffer_row_pitch</em> + <em>buffer_origin</em>[0].</p></div>
 <div class="paragraph"><p><em>host_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
 pointed to by <em>ptr</em>. For a 2D rectangle region, the <em>z</em> value given by
 <em>host_origin</em>[2] should be 0. The offset in bytes is computed as
 <em>host_origin</em>[2] * <em>host_slice_pitch</em> + <em>host_origin</em>[1] *
 <em>host_row_pitch</em> + <em>host_origin</em>[0].</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em> in bytes, <em>height</em> in rows_,_ <em>depth_in
 slices) of the 2D or 3D rectangle being read or written. For a 2D
 rectangle copy, the _depth</em> value given by <em>region</em>[2] should be 1. The
 values in region cannot be 0.</p></div>
 <div class="paragraph"><p><em>buffer_row_pitch</em> is the length of each row in bytes to be used for the
 memory region associated with <em>buffer</em>. If <em>buffer_row_pitch</em> is 0,
 <em>buffer_row_pitch</em> is computed as <em>region</em>[0].</p></div>
 <div class="paragraph"><p><em>buffer_slice_pitch</em> is the length of each 2D slice in bytes to be used
 for the memory region associated with <em>buffer</em>. If <em>buffer_slice_pitch</em>
 is 0, <em>buffer_slice_pitch</em> is computed as <em>region</em>[1] *
 <em>buffer_row_pitch</em>.</p></div>
 <div class="paragraph"><p><em>host_row_pitch</em> is the length of each row in bytes to be used for the
 memory region pointed to by <em>ptr</em>. If <em>host_row_pitch</em> is 0,
 <em>host_row_pitch</em> is computed as <em>region</em>[0].</p></div>
 <div class="paragraph"><p><em>host_slice_pitch</em> is the length of each 2D slice in bytes to be used
 for the memory region pointed to by <em>ptr</em>. If <em>host_slice_pitch</em> is 0,
 <em>host_slice_pitch</em> is computed as <em>region</em>[1] * <em>host_row_pitch</em>.</p></div>
 <div class="paragraph"><p><em>ptr</em> is the pointer to buffer in host memory where data is to be read
 into or to be written from.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular read /
 write command and can be used to query or queue a wait for this
 particular command to complete. <em>event</em> can be NULL in which case it
 will not be possible for the application to query the status of this
 command or queue a wait for this command to complete. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueReadBufferRect</strong> and <strong>clEnqueueWriteBufferRect</strong> return
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em> and <em>buffer</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>buffer</em> is not a valid buffer object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 region being read or written specified by (<em>buffer_origin</em>, <em>region,
 buffer_row_pitch, buffer_slice_pitch</em>) is out of bounds.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if <em>ptr</em>
 is a NULL value.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if any
 <em>region</em> array element is 0.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>buffer_row_pitch</em> is not 0 and is less than <em>region</em>[0].
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>host_row_pitch</em> is not 0 and is less than <em>region</em>[0].
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>buffer_slice_pitch</em> is not 0 and is less than <em>region</em>[1] *
 <em>buffer_row_pitch</em> and not a multiple of <em>buffer_row_pitch</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>host_slice_pitch</em> is not 0 and is less than <em>region</em>[1] *
 <em>host_row_pitch</em> and not a multiple of <em>host_row_pitch</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
 CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
 <em>offset</em> specified when the sub-buffer object is created is not aligned
 to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
 operations are blocking and the execution status of any of the events in
 <em>event_wait_list</em> is a negative integer value.

 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>buffer</em>.
 </p>
 </li>
 <li>
 <p>
      CL_INVALID_OPERATION if
 <strong>clEnqueueReadBufferRect</strong> is called on <em>buffer</em> which has been created
 with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS.
 </p>
 </li>
 <li>
 <p>
      CL_INVALID_OPERATION if
 <strong>clEnqueueWriteBufferRect</strong> is called on <em>buffer</em> which has been created
 with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>NOTE:</p></div>
 <div class="paragraph"><p>Calling <strong>clEnqueueReadBuffer</strong> to read a region of the buffer object with
 the <em>ptr</em> argument value set to <em>host_ptr</em> + <em>offset</em>, where <em>host_ptr</em>
 is a pointer to the memory region specified when the buffer object being
 read is created with CL_MEM_USE_HOST_PTR, must meet the following
 requirements in order to avoid undefined behavior:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 All commands that use this buffer object or a memory object (buffer or
 image) created from this buffer object have finished execution before
 the read command begins execution.
 </p>
 </li>
 <li>
 <p>
 The buffer object or memory objects created from this buffer object
 are not mapped.
 </p>
 </li>
 <li>
 <p>
 The buffer object or memory objects created from this buffer object
 are not used by any command-queue until the read command has finished
 execution.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Calling <strong>clEnqueueReadBufferRect</strong> to read a region of the buffer object
 with the <em>ptr</em> argument value set to <em>host_ptr</em> and <em>host_origin</em>,
 <em>buffer_origin</em> values are the same, where <em>host_ptr</em> is a pointer to
 the memory region specified when the buffer object being read is created
 with CL_MEM_USE_HOST_PTR, must meet the same requirements given above
 for <strong>clEnqueueReadBuffer</strong>.</p></div>
 <div class="paragraph"><p>Calling <strong>clEnqueueWriteBuffer</strong> to update the latest bits in a region of
 the buffer object with the <em>ptr</em> argument value set to <em>host_ptr</em><br>
 <em>offset</em>, where <em>host_ptr</em> is a pointer to the memory region specified
 when the buffer object being written is created with
 CL_MEM_USE_HOST_PTR, must meet the following requirements in order to
 avoid undefined behavior:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The host memory region given by (<em>host_ptr</em> + <em>offset</em>, <em>cb</em>) contains
 the latest bits when the enqueued write command begins execution.
 </p>
 </li>
 <li>
 <p>
 The buffer object or memory objects created from this buffer object
 are not mapped.
 </p>
 </li>
 <li>
 <p>
 The buffer object or memory objects created from this buffer object
 are not used by any command-queue until the write command has finished
 execution.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Calling*clEnqueueWriteBufferRect* to update the latest bits in a region
 of the buffer object with the <em>ptr</em> argument value set to <em>host_ptr</em> and
 <em>host_origin</em>, <em>buffer_origin</em> values are the same, where <em>host_ptr</em> is
 a pointer to the memory region specified when the buffer object being
 written is created with CL_MEM_USE_HOST_PTR, must meet the following
 requirements in order to avoid undefined behavior:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The host memory region given by (<em>buffer_origin region</em>) contains the
 latest bits when the enqueued write command begins execution.
 </p>
 </li>
 <li>
 <p>
 The buffer object or memory objects created from this buffer object
 are not mapped.
 </p>
 </li>
 <li>
 <p>
 The buffer object or memory objects created from this buffer object
 are not used by any command-queue until the write command has finished
 execution.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueCopyBuffer(cl_command_queue command_queue,
                            cl_mem src_buffer,
                            cl_mem dst_buffer,
                            size_t src_offset,
                            size_t dst_offset,
                            size_t size,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to copy a buffer object identified by <em>src_buffer</em> to
 another buffer object identified by <em>dst_buffer</em>.</p></div>
 <div class="paragraph"><p><em>command_queue</em> refers to a host command-queue in which the copy command
 will be queued. The OpenCL context associated with <em>command_queue</em>,
 <em>src_buffer</em> and <em>dst_buffer</em> must be the same.</p></div>
 <div class="paragraph"><p><em>src_offset</em> refers to the offset where to begin copying data from
 <em>src_buffer</em>.</p></div>
 <div class="paragraph"><p><em>dst_offset</em> refers to the offset where to begin copying data into
 <em>dst_buffer</em>.</p></div>
 <div class="paragraph"><p><em>size</em> refers to the size in bytes to copy.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular copy
 command and can be used to query or queue a wait for this particular
 command to complete. <em>event</em> can be NULL in which case it will not be
 possible for the application to query the status of this command or
 queue a wait for this command to complete.
 <strong>clEnqueueBarrierWithWaitList</strong> can be used instead. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueCopyBuffer</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em>, <em>src_buffer</em> and <em>dst_buffer</em>
 are not the same or if the context associated with <em>command_queue</em> and
 events in <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>src_buffer</em> and <em>dst_buffer</em> are not valid buffer objects.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>src_offset</em>, <em>dst_offset</em>, <em>size</em>, <em>src_offset</em> + <em>size</em> or
 <em>dst_offset</em> + <em>size</em> require accessing elements outside the
 <em>src_buffer</em> and <em>dst_buffer</em> buffer objects respectively.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>

 CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
 and <em>offset</em> specified when the sub-buffer object is created is not
 aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
 with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>

 CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
 and <em>offset</em> specified when the sub-buffer object is created is not
 aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
 with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_MEM_COPY_OVERLAP if
 <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer or sub-buffer object
 and the source and destination regions overlap or if <em>src_buffer</em> and
 <em>dst_buffer</em> are different sub-buffers of the same associated buffer
 object and they overlap. The regions overlap if <em>src_offset</em> &#8656;
 <em>dst_offset</em> &#8656; <em>src_offset</em> + <em>size</em>  1 or if <em>dst_offset</em> &#8656;
 <em>src_offset</em> &#8656; <em>dst_offset</em> + <em>size</em>  1.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>src_buffer</em> or <em>dst_buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueCopyBufferRect(cl_command_queue command_queue,
                                cl_mem src_buffer,
                                cl_mem dst_buffer,
                                const size_t *src_origin,
                                const size_t *dst_origin,
                                const size_t *region,
                                size_t src_row_pitch,
                                size_t src_slice_pitch,
                                size_t dst_row_pitch,
                                size_t dst_slice_pitch,
                                cl_uint num_events_in_wait_list,
                                const cl_event *event_wait_list,
                                cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to copy a 2D or 3D rectangular region from the buffer
 object identified by <em>src_buffer_to a 2D or 3D region in the buffer
 object identified by _dst_buffer</em>. Copying begins at the source offset
 and destination offset which are computed as described below in the
 description for <em>src_origin</em> and <em>dst_origin</em>. Each byte of the
 region&#8217;s width is copied from the source offset to the destination
 offset. After copying each width, the source and destination offsets
 are incremented by their respective source and destination row pitches.
 After copying each 2D rectangle, the source and destination offsets are
 incremented by their respective source and destination slice pitches.</p></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">If <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer object,
 <em>src_row_pitch</em> must equal <em>dst_row_pitch</em> and <em>src_slice_pitch</em> must
 equal <em>dst_slice_pitch</em>.</td>
 </tr></table>
 </div>
 <div class="paragraph"><p><em>command_queue</em> refers to the host command-queue in which the copy
 command will be queued. The OpenCL context associated with
 <em>command_queue</em>, <em>src_buffer</em> and <em>dst_buffer</em> must be the same.</p></div>
 <div class="paragraph"><p><em>src_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
 associated with <em>src_buffer</em>. For a 2D rectangle region, the <em>z</em> value
 given by <em>src_origin</em>[2] should be 0. The offset in bytes is computed
 as <em>src_origin</em>[2] * <em>src_slice_pitch</em> + <em>src_origin</em>[1] *
 <em>src_row_pitch</em> + <em>src_origin</em>[0].</p></div>
 <div class="paragraph"><p><em>dst_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
 associated with <em>dst_buffer</em>. For a 2D rectangle region, the <em>z</em> value
 given by <em>dst_origin</em>[2] should be 0. The offset in bytes is computed
 as <em>dst_origin</em>[2] * <em>dst_slice_pitch</em> + <em>dst_origin</em>[1] *
 <em>dst_row_pitch</em> + <em>dst_origin</em>[0].</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em> in bytes, <em>height</em> in rows_,_ <em>depth_in
 slices) of the 2D or 3D rectangle being copied. For a 2D rectangle, the
 _depth</em> value given by <em>region</em>[2] should be 1. The values in region
 cannot be 0.</p></div>
 <div class="paragraph"><p><em>src_row_pitch</em> is the length of each row in bytes to be used for the
 memory region associated with <em>src_buffer</em>. If <em>src_row_pitch</em> is 0,
 <em>src_row_pitch</em> is computed as <em>region</em>[0].</p></div>
 <div class="paragraph"><p><em>src_slice_pitch</em> is the length of each 2D slice in bytes to be used for
 the memory region associated with <em>src_buffer</em>. If <em>src_slice_pitch</em> is
 0, <em>src_slice_pitch</em> is computed as <em>region</em>[1] * <em>src_row_pitch</em>.</p></div>
 <div class="paragraph"><p><em>dst_row_pitch</em> is the length of each row in bytes to be used for the
 memory region associated with <em>dst_buffer</em>. If <em>dst_row_pitch</em> is 0,
 <em>dst_row_pitch</em> is computed as <em>region</em>[0].</p></div>
 <div class="paragraph"><p><em>dst_slice_pitch</em> is the length of each 2D slice in bytes to be used for
 the memory region associated with <em>dst_buffer</em>. If <em>dst_slice_pitch</em> is
 0, <em>dst_slice_pitch</em> is computed as <em>region</em>[1] * <em>dst_row_pitch</em>.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular copy
 command and can be used to query or queue a wait for this particular
 command to complete. <em>event</em> can be NULL in which case it will not be
 possible for the application to query the status of this command or
 queue a wait for this command to complete.
 <strong>clEnqueueBarrierWithWaitList</strong> can be used instead. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueCopyBufferRect</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em>, <em>src_buffer</em> and <em>dst_buffer</em>
 are not the same or if the context associated with <em>command_queue</em> and
 events in <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>src_buffer</em> and <em>dst_buffer</em> are not valid buffer objects.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 (<em>src_origin, region, src_row_pitch, src_slice_pitch</em>) or (<em>dst_origin,
 region, dst_row_pitch, dst_slice_pitch</em>) require accessing elements
 outside the <em>src_buffer</em> and <em>dst_buffer</em> buffer objects respectively.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if any
 <em>region</em> array element is 0.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>src_row_pitch</em> is not 0 and is less than <em>region</em>[0].
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>dst_row_pitch</em> is not 0 and is less than <em>region</em>[0].
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>src_slice_pitch</em> is not 0 and is less than <em>region</em>[1] *
 <em>src_row_pitch</em> or if <em>src_slice_pitch</em> is not 0 and is not a multiple
 of <em>src_row_pitch</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>dst_slice_pitch</em> is not 0 and is less than <em>region</em>[1] *
 <em>dst_row_pitch</em> or if <em>dst_slice_pitch</em> is not 0 and is not a multiple
 of <em>dst_row_pitch</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer object and
 <em>src_slice_pitch</em> is not equal to <em>dst_slice_pitch</em> and <em>src_row_pitch</em>
 is not equal to <em>dst_row_pitch</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_MEM_COPY_OVERLAP if
 <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer or sub-buffer object
 and the source and destination regions overlap or if <em>src_buffer</em> and
 <em>dst_buffer</em> are different sub-buffers of the same associated buffer
 object and they overlap. Refer to Appendix D for details on how to
 determine if source and destination regions overlap.
 </p>
 </li>
 <li>
 <p>

 CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
 and <em>offset</em> specified when the sub-buffer object is created is not
 aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
 with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
 and <em>offset</em> specified when the sub-buffer object is created is not
 aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
 with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>src_buffer</em> or <em>dst_buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_filling_buffer_objects">5.2.3. Filling Buffer Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueFillBuffer(cl_command_queue command_queue,
                            cl_mem buffer,
                            const void *pattern,
                            size_t pattern_size,
                            size_t offset,
                            size_t size,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to fill a buffer object with a pattern of a given
 pattern size. The usage information which indicates whether the memory
 object can be read or written by a kernel and/or the host and is given
 by the cl_mem_flags argument value specified when <em>buffer</em> is created is
 ignored by <strong>clEnqueueFillBuffer</strong>.</p></div>
 <div class="paragraph"><p><em>command_queue</em> refers to the host command-queue in which the fill
 command will be queued. The OpenCL context associated with
 <em>command_queue</em> and <em>buffer</em> must be the same.</p></div>
 <div class="paragraph"><p><em>buffer</em> is a valid buffer object.</p></div>
 <div class="paragraph"><p><em>pattern</em> is a pointer to the data pattern of size <em>pattern_size</em> in
 bytes. <em>pattern</em> will be used to fill a region in <em>buffer</em> starting at
 <em>offset</em> and is <em>size</em> bytes in size. The data pattern must be a scalar
 or vector integer or floating-point data type supported by OpenCL as
 described in <em>sections 6.1.1</em> and <em>6.1.2</em>. For example, if <em>buffer</em> is
 to be filled with a pattern of float4 values, then <em>pattern</em> will be a
 pointer to a cl_float4 value and <em>pattern_size</em> will be
 sizeof(cl_float4). The maximum value of <em>pattern_size</em> is the size of
 the largest integer or floating-point vector data type supported by the
 OpenCL device. The memory associated with <em>pattern</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>offset</em> is the location in bytes of the region being filled in <em>buffer</em>
 and must be a multiple of <em>pattern_size</em>.</p></div>
 <div class="paragraph"><p><em>size</em> is the size in bytes of region being filled in <em>buffer</em> and must
 be a multiple of <em>pattern_size</em>.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. <strong>clEnqueueBarrierWithWaitList</strong> can be used
 instead. If the <em>event_wait_list</em> and the <em>event</em> arguments are not
 NULL, the <em>event</em> argument should not refer to an element of the
 <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueFillBuffer</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em> and <em>buffer</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>buffer</em> is not a valid buffer object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>offset</em> or <em>offset</em> + <em>size</em> require accessing elements outside the
 <em>buffer</em> buffer object respectively.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pattern</em> is NULL or if <em>pattern_size</em> is 0 or if <em>pattern_size</em> is not
 one of {1, 2, 4, 8, 16, 32, 64, 128}.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>offset</em> and <em>size</em> are not a multiple of <em>pattern_size</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>

 CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
 offset specified when the sub-buffer object is created is not aligned to
 CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>

 CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_mapping_buffer_objects">5.2.4. Mapping Buffer Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>void clEnqueueMapBuffer(cl_command_queue command_queue,
                         cl_mem buffer,
                         cl_bool blocking_map,
                         cl_map_flags map_flags,
                         size_t offset,
                         size_t size,
                         cl_uint num_events_in_wait_list,
                         const cl_event *event_wait_list,
                         cl_event *event,
                         cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to map a region of the buffer object given by
 <em>buffer</em> into the host address space and returns a pointer to this
 mapped region.</p></div>
 <div class="paragraph"><p><em>command_queue</em> must be a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>blocking_map</em> indicates if the map operation is <em>blocking</em> or
 <em>non-blocking</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_map</em> is CL_TRUE, <strong>clEnqueueMapBuffer</strong> does not return until
 the specified region in <em>buffer</em> is mapped into the host address space
 and the application can access the contents of the mapped region using
 the pointer returned by <strong>clEnqueueMapBuffer</strong>.</p></div>
 <div class="paragraph"><p>If <em>blocking_map</em> is CL_FALSE i.e. map operation is non-blocking, the
 pointer to the mapped region returned by <strong>clEnqueueMapBuffer</strong> cannot be
 used until the map command has completed. The <em>event</em> argument returns
 an event object which can be used to query the execution status of the
 map command. When the map command is completed, the application can
 access the contents of the mapped region using the pointer returned by
 <strong>clEnqueueMapBuffer</strong>.</p></div>
 <div class="paragraph"><p><em>map_flags</em> is a bit-field and is described in <em>table 5.5</em>.</p></div>
 <div class="paragraph"><p><em>buffer</em> is a valid buffer object. The OpenCL context associated with
 <em>command_queue</em> and <em>buffer</em> must be the same.</p></div>
 <div class="paragraph"><p><em>offset</em> and <em>size</em> are the offset in bytes and the size of the region
 in the buffer object that is being mapped.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. If the <em>event_wait_list</em> and the <em>event</em>
 arguments are not NULL, the <em>event</em> argument should not refer to an
 element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clEnqueueMapBuffer</strong> will return a pointer to the mapped region. The
 <em>errcode_ret</em> is set to CL_SUCCESS.</p></div>
 <div class="paragraph"><p>A NULL pointer is returned otherwise with one of the following error
 values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if _command_queue_is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue_and _buffer</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>buffer</em> is not a valid buffer object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if region
 being mapped given by (<em>offset</em>, <em>size</em>) is out of bounds or if <em>size</em>
 is 0 or if values specified in _map_flags_are not valid.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
      CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
 <em>offset</em> specified when the sub-buffer object is created is not aligned
 to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for the device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_MAP_FAILURE if there is
 a failure to map the requested region into the host address space. This
 error cannot occur for buffer objects created with CL_MEM_USE_HOST_PTR
 or CL_MEM_ALLOC_HOST_PTR.
 </p>
 </li>
 <li>
 <p>
      CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the map operation is
 blocking and the execution status of any of the events in
 <em>event_wait_list</em> is a negative integer value.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 buffer_ has been created with CL_MEM_HOST_WRITE_ONLY or
 CL_MEM_HOST_NO_ACCESS and CL_MAP_READ is set in <em>map_flags</em> or if
 <em>buffer</em> has been created with CL_MEM_HOST_READ_ONLY or
 CL_MEM_HOST_NO_ACCESS and CL_MAP_WRITE or CL_MAP_WRITE_INVALIDATE_REGION
 is set in <em>map_flags</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 mapping would lead to overlapping regions being mapped for writing.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The pointer returned maps a region starting at <em>offset</em> and is at least
 <em>size</em> bytes in size. The result of a memory access outside this region
 is undefined.</p></div>
 <div class="paragraph"><p>If the buffer object is created with CL_MEM_USE_HOST_PTR set in
 <em>mem_flags</em>, the following will be true:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       The <em>host_ptr</em> specified
 in <strong>clCreateBuffer</strong>  to contain the latest bits in the
 region being mapped when the <strong>clEnqueueMapBuffer</strong> command has completed.
 </p>
 </li>
 <li>
 <p>
       The pointer value returned
 by <strong>clEnqueueMapBuffer</strong> will be derived from the <em>host_ptr</em> specified
 when the buffer object is created.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Mapped buffer objects are unmapped using <strong>clEnqueueUnmapMemObject</strong>.
 This is described in <em>section 5.5.2</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 10. <em>List of supported cl_map_flags values</em></caption>
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_map_flags</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MAP_READ</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the region being mapped
 in the memory object is being mapped for
 reading.
 <br>
 <br>
 The pointer returned by clEnqueueMapBuffer (clEnqueueMapImage) is guaranteed
 to contain the latest bits in the region being
 mapped when the clEnqueueMapBuffer (clEnqueueMapImage) command has completed.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MAP_WRITE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the region being mapped
 in the memory object is being mapped for
 writing.
 <br>
 <br>
 The pointer returned by
 clEnqueueMap{Buffer</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Image} is guaranteed
 to contain the latest bits in the region being
 mapped when the clEnqueueMapBuffer (clEnqueueMapImage) command has completed</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MAP_WRITE_INVALIDATE_REGION</strong></p></td>
 </tr>
 </tbody>
 </table>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_image_objects">5.3. Image Objects</h3>
 <div class="paragraph"><p>An <em>image</em> object is used to store a one-, two- or three- dimensional
 texture, frame-buffer or image. The elements of an image object are
 selected from a list of predefined image formats. The minimum number of
 elements in a memory object is one.</p></div>
 <div class="sect3">
 <h4 id="_creating_image_objects">5.3.1. Creating Image Objects</h4>
 <div class="paragraph"><p>A <strong>1D image</strong>,<strong>1D image buffer, 1D image array</strong>,<strong>2D image</strong>,<strong>2D image
 array and 3D image object</strong> can be created using the following function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_mem clCreateImage(cl_context context,
                      cl_mem_flags flags,
                      const cl_image_format *image_format,
                      const cl_image_desc *image_desc,
                      void  *host_ptr,
                      cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p><em>context</em> is a valid OpenCL context on which the image object is to be
 created.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify allocation and usage
 information about the image memory object being created and is described
 in <em>table 5.3</em>.</p></div>
 <div class="paragraph"><p>For all image types except CL_MEM_OBJECT_IMAGE1D_BUFFER, if value
 specified for <em>flags</em> is 0, the default is used which is
 CL_MEM_READ_WRITE.</p></div>
 <div class="paragraph"><p>For CL_MEM_OBJECT_IMAGE1D_BUFFER image type, or an image created from
 another memory object (image or buffer), if the CL_MEM_READ_WRITE,
 CL_MEM_READ_ONLY or CL_MEM_WRITE_ONLY values are not specified in
 <em>flags</em>, they are inherited from the corresponding memory access
 qualifers associated with <em>mem_object</em>. The CL_MEM_USE_HOST_PTR,
 CL_MEM_ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR values cannot be
 specified in <em>flags</em> but are inherited from the corresponding memory
 access qualifiers associated with <em>mem_object</em>. If CL_MEM_COPY_HOST_PTR
 is specified in the memory access qualifier values associated with
 <em>mem_object</em> it does not imply any additional copies when the image is
 created from <em>mem_object</em>. If the CL_MEM_HOST_WRITE_ONLY,
 CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS values are not specified
 in <em>flags</em>, they are inherited from the corresponding memory access
 qualifiers associated with <em>mem_object</em>.</p></div>
 <div class="paragraph"><p><em>image_format</em> is a pointer to a structure that describes format
 properties of the image to be allocated. A 1D image buffer or 2D image
 can belink:#<em>msocom_12<a id="BA12"></a>  created from a buffer by specifying a
 buffer object in the _image_desc&#8594;mem_object</em>. A 2D image can be
 created from another 2D image object by specifyging an image object in
 the <em>image_desc&#8594;mem_object</em>. Refer to <em>section 5.3.1.1</em> for a detailed
 description of the image format descriptor.</p></div>
 <div class="paragraph"><p><em>image_desc</em> is a pointer to a structure that describes type and
 dimensions of the image to be allocated. Refer to <em>section 5.3.1.2</em> for
 a detailed description of the image descriptor.</p></div>
 <div class="paragraph"><p><em>host_ptr</em> is a pointer to the image data that may already be allocated
 by the application. It is only used to initialize the image, and can be
 freed after the call to <strong>clCreateImage</strong>. Refer to table below for a
 description of how large the buffer that <em>host_ptr</em> points to must be.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Image Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Size of buffer that <em>host_ptr</em> points to</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE1D</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&gt;= image_row_pitch</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE1D_BUFFER</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&gt;= image_row_pitch</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE2D</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&gt;= image_row_pitch * image_height</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE3D</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&gt;= image_slice_pitch * image_depth</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE1D_ARRAY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&gt;= image_slice_pitch * image_array_size</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE2D_ARRAY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&gt;= image_slice_pitch * image_array_size</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>For a 3D image or 2D image array, the image data specified by <em>host_ptr</em>
 is stored as a linear sequence of adjacent 2D image slices or 2D images
 respectively. Each 2D image is a linear sequence of adjacent
 scanlines. Each scanline is a linear sequence of image elements.</p></div>
 <div class="paragraph"><p>For a 2D image, the image data specified by <em>host_ptr</em> is stored as a
 linear sequence of adjacent scanlines. Each scanline is a linear
 sequence of image elements.</p></div>
 <div class="paragraph"><p>For a 1D image array, the image data specified by <em>host_ptr</em> is stored
 as a linear sequence of adjacent 1D images. Each 1D image is stored as
 a single scanline which is a linear sequence of adjacent elements.</p></div>
 <div class="paragraph"><p>For 1D image or 1D image buffer, the image data specified by <em>host_ptr</em>
 is stored as a single scanline which is a linear sequence of adjacent
 elements.</p></div>
 <div class="paragraph"><p>Image elements are stored according to their image format as described
 in section 5.3.1.1</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateImage</strong> returns a valid non-zero image object created and the
 <em>errcode_ret</em> is set to CL_SUCCESS if the image object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 specified in _flags_are not valid.
 </p>
 </li>
 <li>
 <p>
  CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if values specified in
 <em>image_format_are not valid or if _image_format</em> is NULL.
 </p>
 </li>
 <li>
 <p>
  CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if a 2D image is created from a
 buffer and the row pitch and base address alignment does not follow the
 rules described for creating a 2D image from a buffer.
 </p>
 </li>
 <li>
 <p>
   CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if a 2D image is created from a 2D
 image object and the rules described above are not followed.
 </p>
 </li>
 <li>
 <p>
 CL_INVALID_IMAGE_DESCRIPTOR if values specified in <em>image_desc_are not
 valid or if _image_desc</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 image dimensions specified in <em>image_desc</em> exceed the maximum image
 dimensions described in <em>table 4.3</em> for all devices in_context_.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_HOST_PTR if
 <em>host_ptr</em> is NULL and CL_MEM_USE_HOST_PTR or CL_MEM_COPY_HOST_PTR are
 set in <em>flags</em> or if <em>host_ptr</em> is not NULL but CL_MEM_COPY_HOST_PTR or
 CL_MEM_USE_HOST_PTR are not set in <em>flags</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if an
 image is being created from another memory object (buffer or image)
 under one of the following circumstances: 1) <em>mem_object</em> was created
 with CL_MEM_WRITE_ONLY and <em>flags</em> specifies CL_MEM_READ_WRITE or
 CL_MEM_READ_ONLY, 2) <em>mem_object</em> was created with CL_MEM_READ_ONLY and
 <em>flags</em> specifies CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY, 3) <em>flags</em>
 specifies CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR or
 CL_MEM_COPY_HOST_PTR.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if an
 image is being created from another memory object (buffer or image) and
 <em>mem_object</em> object was created with CL_MEM_HOST_WRITE_ONLY and <em>flags</em>
 specifies CL_MEM_HOST_READ_ONLY, or if <em>mem_object</em> was created with
 CL_MEM_HOST_READ_ONLY and <em>flags</em> specifies CL_MEM_HOST_WRITE_ONLY, or
 if <em>mem_object</em> was created with CL_MEM_HOST_NO_ACCESS and_flags_
 specifies CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY.
 </p>
 </li>
 <li>
 <p>
 CL_IMAGE_FORMAT_NOT_SUPPORTED if the <em>image_format</em> is not supported.
 </p>
 </li>
 <li>
 <p>
 CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for image object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 there are no devices in <em>context</em> that support images (i.e.
 CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_FALSE).
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="sect4">
 <h5 id="_image_format_descriptor">Image Format Descriptor</h5>
 <div class="paragraph"><p>The image format descriptor structure is defined as</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>typedef struct cl_image_format {
   cl_channel_order image_channel_order;
   cl_channel_type image_channel_data_type;
 } cl_image_format;</pre>
 </div></div>
 <div class="paragraph"><p>image_channel_order specifies the number of channels and the channel
 layout i.e. the memory layout in which channels are stored in the
 image. Valid values are described in <em>table 5.6.</em></p></div>
 <div class="paragraph"><p>image_channel_data_type describes the size of the channel data type.
 The list of supported values is described in <em>table 5.7</em>. The number of
 bits per element determined by the image_channel_data_type and
 image_channel_order must be a power of two.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 11. <em>List of supported Image Channel Order Values</em></caption>
 <col style="width:100%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Enum values that can be specified in channel_order</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_R</strong>, <strong>CL_Rx</strong> or <strong>CL_A</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_INTENSITY</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_LUMINANCE</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEPTH</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_RG</strong>, <strong>CL_RGx</strong> or <strong>CL_RA</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_RGB</strong> or <strong>CL_RGBx</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_RGBA</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_sRGB, CL_sRGBx, CL_sRGBA, CL_sBGRA</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_ARGB, CL_BGRA, CL_ABGR</strong></p></td>
 </tr>
 </tbody>
 </table>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 12. <em>List of supported Image Channel Data Types</em></caption>
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Image Channel Data Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SNORM_INT8</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is a normalized signed 8-bit
 integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SNORM_INT16</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is a normalized signed 16-bit
 integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNORM_INT8</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is a normalized unsigned 8-bit
 integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNORM_INT16</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is a normalized unsigned
 16-bit integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNORM_SHORT_565</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Represents a normalized 5-6-5 3-channel RGB
 image. The channel order must be CL_RGB or CL_RGBx.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNORM_SHORT_555</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Represents a normalized x-5-5-5 4-channel xRGB
 image. The channel order must be CL_RGB or CL_RGBx.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNORM_INT_101010</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Represents a normalized x-10-10-10 4-channel
 xRGB image. The channel order must be CL_RGB or CL_RGBx.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNORM_INT_101010_2</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Represents a normalized 10-10-10-2
 four-channel RGBA image. The channel order must be CL_RGBA.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SIGNED_INT8</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is an unnormalized signed
 8-bit integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SIGNED_INT16</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is an unnormalized signed
 16-bit integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SIGNED_INT32</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is an unnormalized signed
 32-bit integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNSIGNED_INT8</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is an unnormalized unsigned
 8-bit integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNSIGNED_INT16</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is an unnormalized unsigned
 16-bit integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UNSIGNED_INT32</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is an unnormalized unsigned
 32-bit integer value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_HALF_FLOAT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is a 16-bit half-float value</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLOAT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Each channel component is a single precision floating-point value</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>For example, to specify a normalized unsigned 8-bit / channel RGBA
 image, image_channel_order = CL_RGBA, and__image_channel_data_type =
 CL_UNORM_INT8. The memory layout of this image format is described
 below:</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:60%;
 ">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:60%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">R</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">G</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">B</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&#8230;</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>with the corresponding byte offsets</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:60%;
 ">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:60%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">1</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">2</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">3</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&#8230;</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>Similar, if image_channel_order = CL_RGBA and image_channel_data_type =
 CL_SIGNED_INT16, the memory layout of this image format is described
 below:</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:60%;
 ">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:60%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">R</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">G</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">B</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&#8230;</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>with the corresponding byte offsets</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:60%;
 ">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:10%;">
 <col style="width:60%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">2</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">4</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">6</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">&#8230;</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>image_channel_data_type values of CL_UNORM_SHORT_565,
 CL_UNORM_SHORT_555, CL_UNORM_INT_101010 and CL_UNORM_INT_101010_2 are
 special cases of packed image formats where the channels of each element
 are packed into a single unsigned short or unsigned int. For these
 special packed image formats, the channels are normally packed with the
 first channel in the most significant bits of the bitfield, and successive
 channels occupying progressively less significant locations. For
 CL_UNORM_SHORT_565, R is in bits 15:11, G is in bits 10:5 and B is in
 bits 4:0. For CL_UNORM_SHORT_555, bit 15 is undefined, R is in bits
 14:10, G in bits 9:5 and B in bits 4:0. For CL_UNORM_INT_101010, bits
 31:30 are undefined, R is in bits 29:20, G in bits 19:10 and B in bits
 9:0. For CL_UNORM_INT_101010_2, R is in bits 31:22, G in bits 21:12, B
 in bits 11:2 and A in bits 1:0.</p></div>
 <div class="paragraph"><p>OpenCL implementations must maintain the minimum precision specified by
 the number of bits in image_channel_data_type. If the image format
 specified by image_channel_order, and image_channel_data_type cannot be
 supported by the OpenCL implementation, then the call to <strong>clCreateImage</strong>
 will return a NULL memory object.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_image_descriptor">Image Descriptor</h5>
 <div class="paragraph"><p>The image descriptor structure describes the type and dimensions of the
 image or image array and is defined as:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>typedef struct cl_image_desc {
     cl_mem_object_type image_type,
     size_t  image_width;
     size_t  image_height;
     size_t image_depth;
     size_t image_array_size;
     size_t image_row_pitch;
     size_t image_slice_pitch;
     cl_uint num_mip_levels;
     cl_uint num_samples;
     cl_mem mem_object;
 } cl_image_desc;</pre>
 </div></div>
 <div class="paragraph"><p>image_type describes the image type and must be either
 CL_MEM_OBJECT_IMAGE1D, CL_MEM_OBJECT_IMAGE1D_BUFFER,
 CL_MEM_OBJECT_IMAGE1D_ARRAY, CL_MEM_OBJECT_IMAGE2D,
 CL_MEM_OBJECT_IMAGE2D_ARRAY or CL_MEM_OBJECT_IMAGE3D.</p></div>
 <div class="paragraph"><p>image_width is the width of the image in pixels. For a 2D image and
 image array, the image width must be a value &gt;= 1 and &#8656;
 CL_DEVICE_IMAGE2D_MAX_WIDTH. For a 3D image, the image width must be a
 value &gt;=1 and &#8656; CL_DEVICE_IMAGE3D_MAX_WIDTH. For a 1D image buffer,
 the image width must be a value &gt;=1 and &#8656;
 CL_DEVICE_IMAGE_MAX_BUFFER_SIZE. For a 1D image and 1D image array, the
 image width must be a value &gt;=1 and &#8656; CL_DEVICE_IMAGE2D_MAX_WIDTH.</p></div>
 <div class="paragraph"><p>image_height is height of the image in pixels. This is only used if the
 image is a 2D or 3D image, or a 2D image array. For a 2D image or image
 array, the image height must be a value &gt;=1 and &#8656;
 CL_DEVICE_IMAGE2D_MAX_HEIGHT. For a 3D image, the image height must be
 a value &gt;=1 and &#8656; CL_DEVICE_IMAGE3D_MAX_HEIGHT.</p></div>
 <div class="paragraph"><p>image_depth is the depth of the image in pixels. This is only used if
 the image is a 3D image and must be a value &gt;= 1 and &#8656;
 CL_DEVICE_IMAGE3D_MAX_DEPTH.</p></div>
 <div class="paragraph"><p>image_array_size<span class="footnote"><br>[Note that reading and writing 2D image arrays from a kernel with image_array_size =1 may be lower
 performance than 2D images]<br></span>: is the number of images in the image
 array. This is only used if the image is a 1D or 2D image array. The
 values for image_array_size, if specified, must be a value &gt;= 1 and &#8656;
 CL_DEVICE_IMAGE_MAX_ARRAY_SIZE.</p></div>
 <div class="paragraph"><p>image_row_pitch is the scan-line pitch in bytes. This must be 0 if
 <em>host_ptr</em> is NULL and can be either 0 or &gt;= image_width * size of
 element in bytes if <em>host_ptr</em> is not NULL. If <em>host_ptr</em> is not NULL
 and image_row_pitch__= 0, image_row_pitch is calculated as image_width *
 size of element in bytes. If image_row_pitch is not 0, it must be a
 multiple of the image element size in bytes. For a 2D image created
 from a buffer, the pitch specified (or computed if pitch specified is 0)
 must be a multiple of the maximum of the
 CL_DEVICE_IMAGE_PITCH_ALIGNMENT value for all devices in the context
 associated with image_desc&#8594;mem_object and that support images.</p></div>
 <div class="paragraph"><p>image_slice_pitch is the size in bytes of each 2D slice in the 3D image
 or the size in bytes of each image in a 1D or 2D image array. This must
 be 0 if <em>host_ptr</em> is NULL. If <em>host_ptr</em> is not NULL,
 image_slice_pitch can be either 0 or &gt;= image_row_pitch * image_height
 for a 2D image array or 3D image and can be either 0 or &gt;=
 image_row_pitch for a 1D image array. If <em>host_ptr</em> is not NULL and
 image_slice_pitch<em>= 0, image_slice_pitch is calculated as
 image_row_pitch * image_height for a 2D image array or 3D image and
 image_row_pitch for a 1D image array. If image_slice_pitch</em>is not 0,
 it must be a multiple of the image_row_pitch.</p></div>
 <div class="paragraph"><p>num_mip_levels and num_samples must be 0.</p></div>
 <div class="paragraph"><p>mem_object may refer to a valid buffer or image memory object.
 mem_object can be a buffer memory object if image_type is
 CL_MEM_OBJECT_IMAGE1D_BUFFER or
 CL_MEM_OBJECT_IMAGE2D<span class="footnote"><br>[To create a 2D image from a buffer object that share the data store between the image and buffer object]<br></span>:. mem_object can be an image
 object if image_type is CL_MEM_OBJECT_IMAGE2D<span class="footnote"><br>[To create an image object from another image object that share the data store between these image objects.]<br></span>:.
 Otherwise it must be NULL. The image pixels are taken from the memory
 objects data store. When the contents of the specified memory objects
 data store are modified, those changes are reflected in the contents of
 the image object and vice-versa at corresponding synchronization points.</p></div>
 <div class="paragraph"><p>For a 1D image buffer create from a buffer object, the image_width *
 size of element in bytes must be &#8656; size of the buffer object. The
 image data in the buffer object is stored as a single scanline which is
 a linear sequence of adjacent elements. </p></div>
 <div class="paragraph"><p>For a 2D image created from a buffer object, the image_row_pitch *
 image_height must be &#8656; size of the buffer object specified by
 mem_object. The image data in the buffer object is stored as a linear
 sequence of adjacent scanlines. Each scanline is a linear sequence of
 image elements padded to image_row_pitch bytes. </p></div>
 <div class="paragraph"><p>For an image object created from another image object, the values
 specified in the image descriptor except for mem_object must match the
 image descriptor information associated with mem_object. </p></div>
 <div class="paragraph"><p>Image elements are stored according to their image format as described
 in section 5.3.1.1. </p></div>
 <div class="paragraph"><p>If the buffer object specified by mem_object is created with
 CL_MEM_USE_HOST_PTR, the <em>host_ptr</em> specified to <strong>clCreateBuffer</strong> must
 be aligned to the minimum of the
 <strong>CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT</strong> value for all devices in the
 context associated with the buffer specified by mem_object and that
 support images.</p></div>
 <div class="paragraph"><p>Creating a 2D image object from another 2D image object allows users to
 create a new image object that shares the image data store with
 mem_object but views the pixels in the image with a different channel
 order. The restrictions are:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       all the values specified
 in image_desc except for mem_object must match the image descriptor
 information associated with mem_object.
 </p>
 </li>
 <li>
 <p>
       The <em>image_desc</em> used for
 creation of <em>mem_object</em> may not be equivalent to image descriptor
 information associated with mem_object. To ensure the values in
 <em>image_desc</em> will match one can query mem_object for associated
 information using <strong>clGetImageInfo</strong> function described in section 5.3.7.
 </p>
 </li>
 <li>
 <p>
       the channel data type
 specified in image_format must match the channel data type associated
 with mem_object. The channel order values<span class="footnote"><br>[This allows developers to create a sRGB view of the image from a linear RGB view or vice-versa i.e. the pixels
 stored in the image can be accessed as linear RGB or sRGB values.  ]<br></span>: supported
 are:
 </p>
 </li>
 </ul></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>image_channel_order specified in image_format</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>image channel order
 of mem_object</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sBGRA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_BGRA</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_BGRA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sBGRA</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sRGBA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGBA</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGBA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sRGBA</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sRGB</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGB</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGB</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sRGB</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sRGBx</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGBx</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGBx</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sRGBx</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEPTH</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_R</p></td>
 </tr>
 </tbody>
 </table>
 <div class="ulist"><ul>
 <li>
 <p>
       the channel order
 specified must have the same number of channels as the channel order of
 mem_object.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>NOTE:</p></div>
 <div class="paragraph"><p>Concurrent reading from, writing to and copying between both a buffer
 object and 1D image buffer or 2D image object associated with the buffer
 object is undefined. Only reading from both a buffer object and 1D
 image buffer or 2D image object associated with the buffer object is
 defined.</p></div>
 <div class="paragraph"><p>Writing to an image created from a buffer and then reading from this
 buffer in a kernel even if appropriate synchronization operations (such
 as a barrier) are performed between the writes and reads is undefined.
 Similarly, writing to the buffer and reading from the image created from
 this buffer with appropriate synchronization between the writes and
 reads is undefined.</p></div>
 </div>
 </div>
 <div class="sect3">
 <h4 id="_querying_list_of_supported_image_formats">5.3.2. Querying List of Supported Image Formats</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int   clGetSupportedImageFormats(cl_context context,
                                     cl_mem_flags flags,
                                     cl_mem_object_type image_type,
                                     cl_uint num_entries,
                                     cl_image_format *image_formats,
                                     cl_uint *num_image_formats)</pre>
 </div></div>
 <div class="paragraph"><p>can be used to get the list of image formats supported by an OpenCL
 implementation when the following information about an image memory
 object is specified:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       Context
 </p>
 </li>
 <li>
 <p>
       Image type  1D, 2D, or 3D
 image, 1D image buffer, 1D or 2D image array.
 </p>
 </li>
 <li>
 <p>
       Image object allocation
 information
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>clGetSupportedImageFormats</strong> returns a union of image formats supported
 by all devices in the context.</p></div>
 <div class="paragraph"><p><em>context</em> is a valid OpenCL context on which the image object(s) will be
 created.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify allocation and usage
 information about the image memory object being queried and is described
 in <em>table 5.3</em>. To get a list of supported image formats that can be
 read from or written to by a kernel, <em>flags</em> must be set to
 CL_MEM_READ_WRITE (get a list of images that can be read from and
 written to by different kernel instances when correctly ordered by event
 dependencies), CL_MEM_READ_ONLY (list of images that can be read from by
 a kernel) or CL_MEM_WRITE_ONLY (list of images that can be written to by
 a kernel). To get a list of supported image formats that can be both
 read from and written to by the same kernel instance, <em>flags</em> must be
 set to CL_MEM_KERNEL_READ_AND_WRITE. Please see section 5.3.2.2 for
 clarification.</p></div>
 <div class="paragraph"><p><em>image_type</em> describes the image type and must be either
 CL_MEM_OBJECT_IMAGE1D, CL_MEM_OBJECT_IMAGE1D_BUFFER,
 CL_MEM_OBJECT_IMAGE2D, CL_MEM_OBJECT_IMAGE3D,
 CL_MEM_OBJECT_IMAGE1D_ARRAY or CL_MEM_OBJECT_IMAGE2D_ARRAY.

 <em>num_entries</em> specifies the number of entries that can be returned in
 the memory location given by <em>image_formats</em>.</p></div>
 <div class="paragraph"><p><em>image_formats</em> is a pointer to a memory location where the list of
 supported image formats are returned. Each entry describes a
 <em>cl_image_format</em> structure supported by the OpenCL implementation. If
 <em>image_formats</em> is NULL, it is ignored.

 <em>num_image_formats</em> is the actual number of supported image formats for
 a specific <em>context</em> and values specified by <em>flags</em>. If
 <em>num_image_formats</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><strong>clGetSupportedImageFormats</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 <em>context</em> is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>flags</em> or <em>image_type</em> are not valid, or if <em>num_entries</em> is 0 and
 <em>image_formats</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.

 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>If CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_TRUE, the
 values assigned to CL_DEVICE_MAX_READ_IMAGE_ARGS,
 CL_DEVICE_MAX_WRITE_IMAGE_ARGS, CL_DEVICE_IMAGE2D_MAX_WIDTH,
 CL_DEVICE_IMAGE2D_MAX_HEIGHT, CL_DEVICE_IMAGE3D_MAX_WIDTH,
 CL_DEVICE_IMAGE3D_MAX_HEIGHT, CL_DEVICE_IMAGE3D_MAX_DEPTH and
 CL_DEVICE_MAX_SAMPLERS by the implementation must be greater than or
 equal to the minimum values specified in <em>table 4.3</em>.</p></div>
 <div class="sect4">
 <h5 id="_minimum_list_of_supported_image_formats">Minimum List of Supported Image Formats</h5>
 <div class="paragraph"><p>For 1D, 1D image from buffer, 2D, 3D image objects, 1D and 2D image
 array objects, the mandated minimum list of image formats that can be
 read from and written to by different kernel instances when correctly
 ordered by event dependencies and that must be supported by all devices
 that support images is described in <em>table 5.8</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 13. <em>Min. list of supported image formats kernel read or write</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>num_channels</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>channel_order</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>channel_data_type</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">1</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_R</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT8
 CL_UNORM_INT16
 CL_SNORM_INT8
 CL_SNORM_INT16
 CL_SIGNED_INT8
 CL_SIGNED_INT16
 CL_SIGNED_INT32
 CL_UNSIGNED_INT8
 CL_UNSIGNED_INT16
 CL_UNSIGNED_INT32
 CL_HALF_FLOAT
 CL_FLOAT</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">1</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_DEPTH<span class="footnote"><br>[CL_DEPTH channel order is supported only for 2D image and 2D image array objects.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT16 CL_FLOAT</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">2</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RG</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT8
 CL_UNORM_INT16
 CL_SNORM_INT8
 CL_SNORM_INT16
 CL_SIGNED_INT8
 CL_SIGNED_INT16
 CL_SIGNED_INT32
 CL_UNSIGNED_INT8
 CL_UNSIGNED_INT16
 CL_UNSIGNED_INT32
 CL_HALF_FLOAT
 CL_FLOAT</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">4</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGBA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT8
 CL_UNORM_INT16
 CL_SNORM_INT8
 CL_SNORM_INT16
 CL_SIGNED_INT8
 CL_SIGNED_INT16  CL_SIGNED_INT32
 CL_UNSIGNED_INT8
 CL_UNSIGNED_INT16
 CL_UNSIGNED_INT32
 CL_HALF_FLOAT
 CL_FLOAT</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">4</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_BGRA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT8</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">4</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_sRGBA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT8<span class="footnote"><br>[sRGB channel order support is not required for 1D image buffers. Writes to images with sRGB channel orders
 requires device support of the cl_khr_srgb_image_writes extension.]<br></span>:]</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>For 1D, 1D image from buffer, 2D, 3D image objects, 1D and 2D image
 array objects, the mandated minimum list of image formats that can be
 read from and written to by the same kernel instance and that must be
 supported by all devices that support images is described in <em>table
 5.9</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 14. <em>Min. list of supported image formats kernel read and write</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>num_channels</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>channel_order</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>channel_data_type</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">1</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_R</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT8
 CL_SIGNED_INT8
 CL_SIGNED_INT16
 CL_SIGNED_INT32
 CL_UNSIGNED_INT8
 CL_UNSIGNED_INT16
 CL_UNSIGNED_INT32
 CL_HALF_FLOAT
 CL_FLOAT</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">4</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_RGBA</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_UNORM_INT8
 CL_SIGNED_INT8
 CL_SIGNED_INT16
 CL_SIGNED_INT32
 CL_UNSIGNED_INT8
 CL_UNSIGNED_INT16
 CL_UNSIGNED_INT32
 CL_HALF_FLOAT
 CL_FLOAT</p></td>
 </tr>
 </tbody>
 </table>
 </div>
 <div class="sect4">
 <h5 id="_image_format_mapping_to_opencl_kernel_language_image_access_qualifiers">Image format mapping to OpenCL kernel language image access qualifiers</h5>
 <div class="paragraph"><p>Image arguments to kernels may have the read_only, write_only or
 read_write qualifier. Not all image formats supported by the device and
 platform are valid to be passed to all of these access qualifiers. For
 each access qualifier, only images whose format is in the list of
 formats returned by clGetSupportedImageFormats with the given flag
 arguments in <em>table 5.9</em> are permitted. It is not valid to pass an image
 supporting writing as both a read_only image and a write_only image
 parameter, or to a read_write image parameter and any other image
 parameter.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 15. <em>Mapping from format flags passed to clGetSupportedImageFormats to OpenCL kernel language image access qualifiers</em></caption>
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Access Qualifier</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_mem_flags</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>read_only</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_MEM_READ_ONLY, CL_MEM_READ_WRITE, CL_MEM_KERNEL_READ_AND_WRITE</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>write_only</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_MEM_WRITE_ONLY, CL_MEM_READ_WRITE, CL_MEM_KERNEL_READ_AND_WRITE</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>read_write</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_MEM_KERNEL_READ_AND_WRITE</p></td>
 </tr>
 </tbody>
 </table>
 </div>
 </div>
 <div class="sect3">
 <h4 id="_reading_writing_and_copying_image_objects">5.3.3. Reading, Writing and Copying Image Objects</h4>
 <div class="paragraph"><p>The following functions enqueue commands to read from an image or image
 array object to host memory or write to an image or image array object
 from host memory.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueReadImage(cl_command_queue command_queue,
                           cl_mem image,
                           cl_bool blocking_read,
                           const size_t *origin,
                           const size_t *region,
                           size_t row_pitch,
                           size_t slice_pitch,
                           void *ptr,
                           cl_uint num_events_in_wait_list,
                           const cl_event *event_wait_list,
                           cl_event *event)</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueWriteImage(cl_command_queue command_queue,
                            cl_mem image,
                            cl_bool blocking_write,
                            const size_t *origin,
                            const size_t *region,
                            size_t input_row_pitch,
                            size_t input_slice_pitch,
                            const void *ptr,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p><em>command_queue</em> refers to the host command-queue in which the read /
 write command will be queued. <em>command_queue</em> and <em>image</em> must be
 created with the same OpenCL context.</p></div>
 <div class="paragraph"><p><em>image</em> refers to a valid image or image array object.</p></div>
 <div class="paragraph"><p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
 operations are <em>blocking</em> or <em>non-blocking</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
 <strong>clEnqueueReadImage</strong> does not return until the buffer data has been read
 and copied into memory pointed to by <em>ptr</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
 <strong>clEnqueueReadImage</strong> queues a non-blocking read command and returns. The
 contents of the buffer that <em>ptr</em> points to cannot be used until the
 read command has completed. The <em>event</em> argument returns an event
 object which can be used to query the execution status of the read
 command. When the read command has completed, the contents of the
 buffer that _ptr_points to__can be used by the application.</p></div>
 <div class="paragraph"><p>If <em>blocking_write_is CL_TRUE, the OpenCL implementation copies the data
 referred to by _ptr</em> and enqueues the write command in the
 command-queue. The memory pointed to by <em>ptr</em> can be reused by the
 application after the <strong>clEnqueueWriteImage</strong> call returns.</p></div>
 <div class="paragraph"><p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use
 <em>ptr</em> to perform a non-blocking write. As the write is non-blocking the
 implementation can return immediately. The memory pointed to by <em>ptr</em>
 cannot be reused by the application after the call returns. The <em>event</em>
 argument returns an event object which can be used to query the
 execution status of the write command. When the write command has
 completed, the memory pointed to by <em>ptr</em> can then be reused by the
 application.</p></div>
 <div class="paragraph"><p><em>origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or
 3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image
 array or the (<em>x</em>) offset and the image index in the 1D image array. If
 <em>image</em> is a 2D image object, <em>origin</em>[2] must be 0. If <em>image</em> is a 1D
 image or 1D image buffer object, <em>origin</em>[1] and <em>origin</em>[2] must be 0.
 If <em>image</em> is a 1D image array object, <em>origin</em>[2] must be 0. If
 <em>image</em> is a 1D image array object, <em>origin</em>[1] describes the image
 index in the 1D image array. If <em>image</em> is a 2D image array object,
 <em>origin</em>[2] describes the image index in the 2D image array.</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em>, <em>height,</em> <em>depth</em>) in pixels of the 1D,
 2D or 3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D
 rectangle and the number of images of a 2D image array or the (<em>width</em>)
 in pixels of the 1D rectangle and the number of images of a 1D image
 array. If <em>image</em> is a 2D image object, <em>region</em>[2] must be 1. If
 <em>image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
 <em>region</em>[2] must be 1. If <em>image</em> is a 1D image array object,
 <em>region</em>[2] must be 1. The values in <em>region</em> cannot be 0.</p></div>
 <div class="paragraph"><p><em>row_pitch</em> in <strong>clEnqueueReadImage</strong> and <em>input_row_pitch</em> in
 <strong>clEnqueueWriteImage</strong> is the length of each row in bytes. This value
 must be greater than or equal to the element size in bytes * <em>width</em>.
 If <em>row_pitch</em> (or <em>input_row_pitch</em>) is set to 0, the appropriate row
 pitch is calculated based on the size of each element in bytes
 multiplied by <em>width</em>.</p></div>
 <div class="paragraph"><p><em>slice_pitch</em> in <strong>clEnqueueReadImage</strong> and <em>input_slice_pitch</em> in
 <strong>clEnqueueWriteImage</strong> is the size in bytes of the 2D slice of the 3D
 region of a 3D image or each image of a 1D or 2D image array being read
 or written respectively. This must be 0 if <em>image</em> is a 1D or 2D
 image. Otherwise this value must be greater than or equal to
 <em>row_pitch</em> * <em>height</em>. If <em>slice_pitch</em> (or <em>input_slice_pitch</em>) is
 set to 0, the appropriate slice pitch is calculated based on the
 <em>row_pitch</em> * <em>height</em>.</p></div>
 <div class="paragraph"><p><em>ptr</em> is the pointer to a buffer in host memory where image data is to be read from or to be written to. The alignment requirements for ptr are specified in section C.3.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular read /
 write command and can be used to query or queue a wait for this
 particular command to complete. <em>event</em> can be NULL in which case it
 will not be possible for the application to query the status of this
 command or queue a wait for this command to complete. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueReadImage</strong> and <strong>clEnqueueWriteImage</strong> return CL_SUCCESS if the
 function is executed successfully. Otherwise, it returns one of the
 following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em> and <em>image</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 i_mage_ is not a valid image object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 region being read or written specified by <em>origin</em> and <em>region</em> is out
 of bounds or if <em>ptr</em> is a NULL value.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 in <em>origin</em> and <em>region</em> do not follow rules described in the argument
 description for <em>origin</em> and <em>region</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 image dimensions (image width, height, specified or compute row and/or
 slice pitch) for <em>image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
 data type) for <em>image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>image</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the device associated with <em>command_queue</em> does not support images (i.e.
 CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_FALSE).
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 <strong>clEnqueueReadImage</strong> is called on <em>image</em> which has been created with
 CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 <strong>clEnqueueWriteImage</strong> is called on <em>image</em> which has been created with
 CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.
 </p>
 </li>
 <li>
 <p>
      CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
 operations are blocking and the execution status of any of the events in
 <em>event_wait_list</em> is a negative integer value.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>NOTE:</p></div>
 <div class="paragraph"><p>Calling <strong>clEnqueueReadImage</strong> to read a region of the <em>image</em> with the
 <em>ptr</em> argument value set to <em>host_ptr</em> + (<em>origin[2]</em>*<em>image slice pitch
 + origin[1]</em>*<em>image row pitch + origin[0]</em>*<em>bytes per pixel</em>)<em>,</em> where
 <em>host_ptr</em> is a pointer to the memory region specified when the <em>image</em>
 being read is created with CL_MEM_USE_HOST_PTR, must meet the following
 requirements in order to avoid undefined behavior:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 All commands that use this image object have finished execution before
 the read command begins execution.
 </p>
 </li>
 <li>
 <p>
 The <em>row_pitch</em> and <em>slice_pitch</em> argument values in
 <strong>clEnqueueReadImage</strong> must be set to the image row pitch and slice pitch.
 </p>
 </li>
 <li>
 <p>
 The image object is not mapped.
 </p>
 </li>
 <li>
 <p>
 The image object is not used by any command-queue until the read
 command has finished execution.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Calling <strong>clEnqueueWriteImage</strong> to update the latest bits in a region of
 the <em>image</em> with the <em>ptr</em> argument value set to <em>host_ptr</em><br>
 (<em>origin[2]</em>*<em>image slice pitch + origin[1]</em>*<em>image row pitch<br>
 origin[0]</em>*<em>bytes per pixel</em>), where <em>host_ptr</em> is a pointer to the
 memory region specified when the <em>image</em> being written is created with
 CL_MEM_USE_HOST_PTR, must meet the following requirements in order to
 avoid undefined behavior:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The host memory region being written contains the latest bits when the
 enqueued write command begins execution.
 </p>
 </li>
 <li>
 <p>
 The <em>input_row_pitch</em> and <em>input_slice_pitch</em> argument values in
 <strong>clEnqueueWriteImage</strong> must be set to the image row pitch and slice
 pitch.
 </p>
 </li>
 <li>
 <p>
 The image object is not mapped.
 </p>
 </li>
 <li>
 <p>
 The image object is not used by any command-queue until the write
 command has finished execution.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueCopyImage(cl_command_queue command_queue,
                           cl_mem src_image,
                           cl_mem dst_image,
                           const size_t *src_origin,
                           const size_t *dst_origin,
                           const size_t *region,
                           cl_uint num_events_in_wait_list,
                           const cl_event *event_wait_list,
                           cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to copy image objects. <em>src_image</em> and <em>dst_image</em>
 can be 1D, 2D, 3D image or a 1D, 2D image array objects. It is
 possible to copy subregions between any combinations of source and
 destination types, provided that the dimensions of the subregions are
 the same e.g., one can copy a rectangular region from a 2D image to a
 slice of a 3D image.</p></div>
 <div class="paragraph"><p><em>command_queue</em> refers to the host command-queue in which the copy
 command will be queued. The OpenCL context associated with
 <em>command_queue</em>, <em>src_image</em> and <em>dst_image</em> must be the same.</p></div>
 <div class="paragraph"><p><em>src_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D
 or 3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image
 array or the (<em>x</em>) offset and the image index in the 1D image array. If
 <em>image</em> is a 2D image object, <em>src_origin</em>[2] must be 0. If <em>src_image</em>
 is a 1D image object, <em>src_origin</em>[1] and <em>src_origin</em>[2] must be 0. If
 <em>src_image</em> is a 1D image array object, <em>src_origin</em>[2] must be 0. If
 <em>src_image</em> is a 1D image array object, <em>src_origin</em>[1] describes the
 image index in the 1D image array. If <em>src_image</em> is a 2D image array
 object, <em>src_origin</em>[2] describes the image index in the 2D image array.</p></div>
 <div class="paragraph"><p><em>dst_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D
 or 3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image
 array or the (<em>x</em>) offset and the image index in the 1D image array. If
 <em>dst_image</em> is a 2D image object, <em>dst_origin</em>[2] must be 0. If
 <em>dst_image</em> is a 1D image or 1D image buffer object, <em>dst_origin</em>[1] and
 <em>dst_origin</em>[2] must be 0. If <em>dst_image</em> is a 1D image array object,
 <em>dst_origin</em>[2] must be 0. If <em>dst_image</em> is a 1D image array object,
 <em>dst_origin</em>[1] describes the image index in the 1D image array. If
 <em>dst_image</em> is a 2D image array object, <em>dst_origin</em>[2] describes the
 image index in the 2D image array.</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em>, <em>height,</em> <em>depth</em>) in pixels of the 1D,
 2D or 3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D
 rectangle and the number of images of a 2D image array or the (<em>width</em>)
 in pixels of the 1D rectangle and the number of images of a 1D image
 array. If <em>src_image</em> or <em>dst_image</em> is a 2D image object, <em>region</em>[2]
 must be 1. If <em>src_image</em> or <em>dst_image</em> is a 1D image or 1D image
 buffer object, <em>region</em>[1] and <em>region</em>[2] must be 1. If <em>src_image</em> or
 <em>dst_image</em> is a 1D image array object, <em>region</em>[2] must be 1. The
 values in <em>region</em> cannot be 0.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular copy
 command and can be used to query or queue a wait for this particular
 command to complete. <em>event</em> can be NULL in which case it will not be
 possible for the application to query the status of this command or
 queue a wait for this command to complete.
 <strong>clEnqueueBarrierWithWaitList</strong> can be used instead. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p>It is currently a requirement that the <em>src_image</em> and <em>dst_image</em> image
 memory objects for <strong>clEnqueueCopyImage</strong> must have the exact same image
 format (i.e. the cl_image_format descriptor specified when <em>src_image</em>
 and <em>dst_image</em> are created must match).</p></div>
 <div class="paragraph"><p><strong>clEnqueueCopyImage</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em>, <em>src_image</em> and <em>dst_image</em> are
 not the same or if the context associated with <em>command_queue</em> and
 events in <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>src_image</em> and <em>dst_image</em> are not valid image objects.
 </p>
 </li>
 <li>
 <p>
       CL_IMAGE_FORMAT_MISMATCH
 if <em>src_image</em> and <em>dst_image</em> do not use the same image format.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the 2D
 or 3D rectangular region specified by <em>src_origin</em> and <em>src_origin</em><br>
 <em>region</em> refers to a region outside <em>src_image</em>, or if the 2D or 3D
 rectangular region specified by <em>dst_origin</em> and <em>dst_origin</em> + <em>region</em>
 refers to a region outside <em>dst_image</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 in <em>src_origin</em>, <em>dst_origin</em> and <em>region</em> do not follow rules described
 in the argument description for <em>src_origin</em>, <em>dst_origin</em> and <em>region</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 image dimensions (image width, height, specified or compute row and/or
 slice pitch) for <em>src_image</em> or <em>dst_image</em> are not supported by device
 associated with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
 data type) for <em>src_image</em> or <em>dst_image</em> are not supported by device
 associated with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>src_image</em> or <em>dst_image</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the device associated with <em>command_queue</em> does not support images (i.e.
 CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_FALSE).
 </p>
 </li>
 <li>
 <p>
       CL_MEM_COPY_OVERLAP if
 <em>src_image</em> and <em>dst_image</em> are the same image object and the source and
 destination regions overlap.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_filling_image_objects">5.3.4. Filling Image Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueFillImage*(cl_command_queue command_queue,
                            cl_mem image,
                            const void *fill_color,
                            const size_t *origin,
                            const size_t *region,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to fill an image object with a specified color. The
 usage information which indicates whether the memory object can be read
 or written by a kernel and/or the host and is given by the cl_mem_flags
 argument value specified when <em>image</em> is created is ignored by
 <strong>clEnqueueFillImage</strong>.</p></div>
 <div class="paragraph"><p><em>command_queue</em> refers to the host command-queue in which the fill
 command will be queued. The OpenCL context associated with
 <em>command_queue</em> and <em>image</em> must be the same.</p></div>
 <div class="paragraph"><p><em>image</em> is a valid image object.</p></div>
 <div class="paragraph"><p><em>fill_color</em> is the color used to fill the image. The fill color is a
 single floating point value if the channel order is CL_DEPTH. Otherwise,
 the fill color is a four component RGBA floating-point color value if
 the <em>image</em> channel data type is not an unnormalized signed or unsigned
 integer type, is a four component signed integer value if the <em>image</em>
 channel data type is an unnormalized signed integer type and is a four
 component unsigned integer value if the <em>image</em> channel data type is an
 unnormalized unsigned integer type. The fill color will be converted to
 the appropriate image channel format and order associated with <em>image</em>
 as described in <em>sections 6.12.14</em> and <em>8.3</em>.</p></div>
 <div class="paragraph"><p><em>origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or
 3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image
 array or the (<em>x</em>) offset and the image index in the 1D image array. If
 <em>image</em> is a 2D image object, <em>origin</em>[2] must be 0. If <em>image</em> is a 1D
 image or 1D image buffer object, <em>origin</em>[1] and <em>origin</em>[2] must be 0.
 If <em>image</em> is a 1D image array object, <em>origin</em>[2] must be 0. If
 <em>image</em> is a 1D image array object, <em>origin</em>[1] describes the image
 index in the 1D image array. If <em>image</em> is a 2D image array object,
 <em>origin</em>[2] describes the image index in the 2D image array.</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em>, <em>height,</em> <em>depth</em>) in pixels of the 1D,
 2D or 3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D
 rectangle and the number of images of a 2D image array or the (<em>width</em>)
 in pixels of the 1D rectangle and the number of images of a 1D image
 array. If <em>image</em> is a 2D image object, <em>region</em>[2] must be 1. If
 <em>image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
 <em>region</em>[2] must be 1. If <em>image</em> is a 1D image array object,
 <em>region</em>[2] must be 1. The values in <em>region</em> cannot be 0.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. <strong>clEnqueueBarrierWithWaitList</strong> can be used
 instead. If the <em>event_wait_list</em> and the <em>event</em> arguments are not
 NULL, the <em>event</em> argument should not refer to an element of the
 <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueFillImage</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em> and <em>image</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>image</em> is not a valid image object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>fill_color</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 region being filled as specified by <em>origin</em> and <em>region</em> is out of
 bounds or if <em>ptr</em> is a NULL value.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 in <em>origin</em> and <em>region</em> do not follow rules described in the argument
 description for <em>origin</em> and <em>region</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 image dimensions (image width, height, specified or compute row and/or
 slice pitch) for <em>image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
 data type) for <em>image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>image</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_copying_between_image_and_buffer_objects">5.3.5. Copying between Image and Buffer Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueCopyImageToBuffer(cl_command_queue command_queue,
                                   cl_mem src_image,
                                   cl_mem dst_buffer,
                                   const size_t *src_origin,
                                   const size_t *region,
                                   size_t dst_offset,
                                   cl_uint num_events_in_wait_list,
                                   const cl_event *event_wait_list,
                                   cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to copy an image object to a buffer object.</p></div>
 <div class="paragraph"><p><em>command_queue</em> must be a valid host command-queue. The OpenCL context
 associated with <em>command_queue</em>, <em>src_image</em> and <em>dst_buffer</em> must be
 the same.</p></div>
 <div class="paragraph"><p><em>src_image</em> is a valid image object.</p></div>
 <div class="paragraph"><p><em>dst_buffer</em> is a valid buffer object.</p></div>
 <div class="paragraph"><p><em>src_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D
 or 3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image
 array or the (<em>x</em>) offset and the image index in the 1D image array. If
 <em>src_image</em> is a 2D image object, <em>src_origin</em>[2] must be 0. If
 <em>src_image</em> is a 1D image or 1D image buffer object, <em>src_origin</em>[1] and
 <em>src_origin</em>[2] must be 0. If <em>src_image</em> is a 1D image array object,
 <em>src_origin</em>[2] must be 0. If <em>src_image</em> is a 1D image array object,
 <em>src_origin</em>[1] describes the image index in the 1D image array. If
 <em>src_image</em> is a 2D image array object, <em>src_origin</em>[2] describes the
 image index in the 2D image array.</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em>, <em>height,</em> <em>depth</em>) in pixels of the 1D,
 2D or 3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D
 rectangle and the number of images of a 2D image array or the (<em>width</em>)
 in pixels of the 1D rectangle and the number of images of a 1D image
 array. If <em>src_image</em> is a 2D image object, <em>region</em>[2] must be 1. If
 <em>src_image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
 <em>region</em>[2] must be 1. If <em>src_image</em> is a 1D image array object,
 <em>region</em>[2] must be 1. The values in <em>region</em> cannot be 0.</p></div>
 <div class="paragraph"><p><em>dst_offset</em> refers to the offset where to begin copying data into
 <em>dst_buffer</em>. The size in bytes of the region to be copied referred to
 as <em>dst_cb</em> is computed as <em>width</em> * <em>height</em> * <em>depth</em> * <em>bytes/image
 element</em> if <em>src_image</em> is a 3D image object, is computed as <em>width</em> *
 <em>height</em> * <em>bytes/image element</em> if <em>src_image</em> is a 2D image, is
 computed as <em>width</em> * <em>height</em> * <em>arraysize</em> * <em>bytes/image element</em> if
 <em>src_image</em> is a 2D image array object, is computed as <em>width</em> *
 <em>bytes/image element</em> if <em>src_image</em> is a 1D image or 1D image buffer
 object and is computed as <em>width</em> * <em>arraysize</em> * <em>bytes/image element</em>
 if <em>src_image</em> is a 1D image array object.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular copy
 command and can be used to query or queue a wait for this particular
 command to complete. <em>event</em> can be NULL in which case it will not be
 possible for the application to query the status of this command or
 queue a wait for this command to complete.
 <strong>clEnqueueBarrierWithWaitList</strong> can be used instead. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueCopyImageToBuffer</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em>, <em>src_image</em> and <em>dst_buffer</em>
 are not the same or if the context associated with <em>command_queue</em> and
 events in <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>src_image</em> is not a valid image object or <em>dst_buffer</em> is not a valid
 buffer object or if <em>src_image</em> is a 1D image buffer object created from
 <em>dst_buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 1D, 2D or 3D rectangular region specified by <em>src_origin</em> and
 <em>src_origin</em> + <em>region</em> refers to a region outside <em>src_image</em>, or if
 the region specified by <em>dst_offset</em> and <em>dst_offset</em> + <em>dst_cb</em> to a
 region outside <em>dst_buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 in <em>src_origin</em> and <em>region</em> do not follow rules described in the
 argument description for <em>src_origin</em> and <em>region</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
      CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
 and <em>offset</em> specified when the sub-buffer object is created is not
 aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
 with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 image dimensions (image width, height, specified or compute row and/or
 slice pitch) for <em>src_image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
 data type) for <em>src_image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>src_image</em> or <em>dst_buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the device associated with <em>command_queue</em> does not support images (i.e.
 CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_FALSE).
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueCopyBufferToImage(cl_command_queue command_queue,
                                   cl_mem src_buffer,
                                   cl_mem dst_image,
                                   size_t src_offset,
                                   const size_t *dst_origin,
                                   const size_t *region,
                                   cl_uint num_events_in_wait_list,
                                   const cl_event *event_wait_list,
                                   cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to copy a buffer object to an image object.</p></div>
 <div class="paragraph"><p><em>command_queue</em> must be a valid host command-queue. The OpenCL context
 associated with <em>command_queue</em>, <em>src_buffer</em> and <em>dst_image</em> must be
 the same.</p></div>
 <div class="paragraph"><p><em>src_buffer</em> is a valid buffer object.</p></div>
 <div class="paragraph"><p><em>dst_image</em> is a valid image object.</p></div>
 <div class="paragraph"><p><em>src_offset</em> refers to the offset where to begin copying data from
 <em>src_buffer</em>.</p></div>
 <div class="paragraph"><p><em>dst_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D
 or 3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image
 array or the (<em>x</em>) offset and the image index in the 1D image array. If
 <em>dst_image</em> is a 2D image object, <em>dst_origin</em>[2] must be 0. If
 <em>dst_image</em> is a 1D image or 1D image buffer object, <em>dst_origin</em>[1] and
 <em>dst_origin</em>[2] must be 0. If <em>dst_image</em> is a 1D image array object,
 <em>dst_origin</em>[2] must be 0. If <em>dst_image</em> is a 1D image array object,
 <em>dst_origin</em>[1] describes the image index in the 1D image array. If
 <em>dst_image</em> is a 2D image array object, <em>dst_origin</em>[2] describes the
 image index in the 2D image array.</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em>, <em>height,</em> <em>depth</em>) in pixels of the 1D,
 2D or 3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D
 rectangle and the number of images of a 2D image array or the (<em>width</em>)
 in pixels of the 1D rectangle and the number of images of a 1D image
 array. If <em>dst_image</em> is a 2D image object, <em>region</em>[2] must be 1. If
 <em>dst_image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
 <em>region</em>[2] must be 1. If <em>dst_image</em> is a 1D image array object,
 <em>region</em>[2] must be 1. The values in <em>region</em> cannot be 0.</p></div>
 <div class="paragraph"><p>The size in bytes of the region to be copied from <em>src_buffer</em> referred
 to as <em>src_cb</em> is computed as <em>width</em> * <em>height</em> * <em>depth</em> *
 <em>bytes/image element</em> if <em>dst_image</em> is a 3D image object, is computed
 as <em>width</em> * <em>height</em> * <em>bytes/image element</em> if <em>dst_image</em> is a 2D
 image, is computed as <em>width</em> * <em>height</em> * <em>arraysize</em> * <em>bytes/image
 element</em> if <em>dst_image</em> is a 2D image array object, is computed as
 <em>width</em> * <em>bytes/image element</em> if <em>dst_image</em> is a 1D image or 1D image
 buffer object and is computed as <em>width</em> * <em>arraysize</em> * <em>bytes/image
 element</em> if <em>dst_image</em> is a 1D image array object.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular copy
 command and can be used to query or queue a wait for this particular
 command to complete. <em>event</em> can be NULL in which case it will not be
 possible for the application to query the status of this command or
 queue a wait for this command to complete.
 <strong>clEnqueueBarrierWithWaitList</strong> can be used instead. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueCopyBufferToImage</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em>, <em>src_buffer</em> and <em>dst_image</em>
 are not the same or if the context associated with <em>command_queue</em> and
 events in <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>src_buffer</em> is not a valid buffer object or <em>dst_image</em> is not a valid
 image object or if <em>dst_image</em> is a 1D image buffer object created from
 <em>src_buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 1D, 2D or 3D rectangular region specified by <em>dst_origin</em> and
 <em>dst_origin</em> + <em>region</em> refer to a region outside <em>dst_image</em>, or if the
 region specified by <em>src_offset</em> and <em>src_offset</em> + <em>src_cb</em> refer to a
 region outside <em>src_buffer</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 in <em>dst_origin</em> and <em>region</em> do not follow rules described in the
 argument description for <em>dst_origin</em> and <em>region</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
      CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
 and <em>offset</em> specified when the sub-buffer object is created is not
 aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
 with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 image dimensions (image width, height, specified or compute row and/or
 slice pitch) for <em>dst_image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
 data type) for <em>dst_image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>src_buffer</em> or <em>dst_image</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the device associated with <em>command_queue</em> does not support images (i.e.
 CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_FALSE).
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_mapping_image_objects">5.3.6. Mapping Image Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>void clEnqueueMapImage(cl_command_queue command_queue,
                        cl_mem image,
                        cl_bool blocking_map,
                        cl_map_flags map_flags,
                        const size_t *origin,
                        const size_t *region,
                        size_t *image_row_pitch,
                        size_t *image_slice_pitch,
                        cl_uint num_events_in_wait_list,
                        const cl_event *event_wait_list,
                        cl_event *event,
                        cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to map a region in the image object given by <em>image</em>
 into the host address space and returns a pointer to this mapped region.</p></div>
 <div class="paragraph"><p><em>command_queue</em> must be a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>image</em> is a valid image object. The OpenCL context associated with
 <em>command_queue</em> and <em>image</em> must be the same.</p></div>
 <div class="paragraph"><p><em>blocking_map</em> indicates if the map operation is <em>blocking</em> or
 <em>non-blocking</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_map</em> is CL_TRUE, <strong>clEnqueueMapImage</strong> does not return until
 the specified region in <em>image</em> is mapped into the host address space
 and the application can access the contents of the mapped region using
 the pointer returned by <strong>clEnqueueMapImage</strong>.</p></div>
 <div class="paragraph"><p>If <em>blocking_map</em> is CL_FALSE i.e. map operation is non-blocking, the
 pointer to the mapped region returned by <strong>clEnqueueMapImage</strong> cannot be
 used until the map command has completed. The <em>event</em> argument returns
 an event object which can be used to query the execution status of the
 map command. When the map command is completed, the application can
 access the contents of the mapped region using the pointer returned by
 <strong>clEnqueueMapImage</strong>.</p></div>
 <div class="paragraph"><p><em>map_flags</em> is a bit-field and is described in <em>table 5.5</em>.</p></div>
 <div class="paragraph"><p><em>origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or
 3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image
 array or the (<em>x</em>) offset and the image index in the 1D image array. If
 <em>image</em> is a 2D image object, <em>origin</em>[2] must be 0. If <em>image</em> is a 1D
 image or 1D image buffer object, <em>origin</em>[1] and <em>origin</em>[2] must be 0.
 If <em>image</em> is a 1D image array object, <em>origin</em>[2] must be 0. If
 <em>image</em> is a 1D image array object, <em>origin</em>[1] describes the image
 index in the 1D image array. If <em>image</em> is a 2D image array object,
 <em>origin</em>[2] describes the image index in the 2D image array.</p></div>
 <div class="paragraph"><p><em>region_defines the_</em>(<em>width</em>, <em>height,</em> <em>depth</em>) in pixels of the 1D,
 2D or 3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D
 rectangle and the number of images of a 2D image array or the (<em>width</em>)
 in pixels of the 1D rectangle and the number of images of a 1D image
 array. If <em>image</em> is a 2D image object, <em>region</em>[2] must be 1. If
 <em>image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
 <em>region</em>[2] must be 1. If <em>image</em> is a 1D image array object,
 <em>region</em>[2] must be 1. The values in <em>region</em> cannot be 0.</p></div>
 <div class="paragraph"><p><em>image_row_pitch</em> returns the scan-line pitch in bytes for the mapped
 region. This must be a non-NULL value.</p></div>
 <div class="paragraph"><p><em>image_slice_pitch</em> returns the size in bytes of each 2D slice of a 3D
 image or the size of each 1D or 2D image in a 1D or 2D image array for
 the mapped region. For a 1D and 2D image, zero is returned if this
 argument is not NULL. For a 3D image, 1D and 2D image array,
 <em>image_slice_pitch</em> must be a non-NULL value.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before <strong>clEnqueueMapImage</strong> can be executed. If
 <em>event_wait_list</em> is NULL, then <strong>clEnqueueMapImage</strong> does not wait on any
 event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. If the <em>event_wait_list</em> and the <em>event</em>
 arguments are not NULL, the <em>event</em> argument should not refer to an
 element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clEnqueueMapImage</strong> will return a pointer to the mapped region. The
 <em>errcode_ret</em> is set to CL_SUCCESS.</p></div>
 <div class="paragraph"><p>A NULL pointer is returned otherwise with one of the following error
 values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if _command_queue_is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue_and _image</em> are not the same or
 if context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>image</em> is not a valid image object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if region
 being mapped given by (<em>origin</em>, <em>origin+region</em>) is out of bounds or if
 values specified in _map_flags_are not valid.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 in <em>origin</em> and <em>region</em> do not follow rules described in the argument
 description for <em>origin</em> and <em>region</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>image_row_pitch</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>image</em> is a 3D image, 1D or 2D image array object and
 <em>image_slice_pitch</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 image dimensions (image width, height, specified or compute row and/or
 slice pitch) for <em>image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
      CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
 data type) for <em>image</em> are not supported by device associated with
 <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_MAP_FAILURE if there is
 a failure to map the requested region into the host address space. This
 error cannot occur for image objects created with CL_MEM_USE_HOST_PTR or
 CL_MEM_ALLOC_HOST_PTR.
 </p>
 </li>
 <li>
 <p>
      CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the map operation is
 blocking and the execution status of any of the events in
 <em>event_wait_list</em> is a negative integer value.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with <em>image</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the device associated with <em>command_queue</em> does not support images (i.e.
 CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_FALSE).
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 <em>image</em> has been created with CL_MEM_HOST_WRITE_ONLY or
 CL_MEM_HOST_NO_ACCESS and CL_MAP_READ is set in <em>map_flags</em> or if
 <em>image</em> has been created with CL_MEM_HOST_READ_ONLY or
 CL_MEM_HOST_NO_ACCESS and CL_MAP_WRITE or CL_MAP_WRITE_INVALIDATE_REGION
 is set in <em>map_flags</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 mapping would lead to overlapping regions being mapped for writing.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The pointer returned maps a 1D, 2D or 3D region starting at <em>origin</em> and
 is at least <em>region[0]</em> pixels in size for a 1D image, 1D image buffer
 or 1D image array, (<em>image_row_pitch * region[1])</em> pixels in size for a
 2D image or 2D image array, and (<em>image_slice_pitch * region[2])</em> pixels
 in size for a 3D image. The result of a memory access outside this
 region is undefined.</p></div>
 <div class="paragraph"><p>If the image object is created with CL_MEM_USE_HOST_PTR set in
 <em>mem_flags</em>, the following will be true:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       The <em>host_ptr</em> specified
 in <strong>clCreateImage</strong> is guaranteed to contain the latest bits in the
 region being mapped when the <strong>clEnqueueMapImage</strong> command has completed.
 </p>
 </li>
 <li>
 <p>
       The pointer value returned
 by <strong>clEnqueueMapImage</strong> will be derived from the <em>host_ptr</em> specified
 when the image object is created.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Mapped image objects are unmapped using <strong>clEnqueueUnmapMemObject</strong>. This
 is described in <em>section 5.5.2</em>.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_image_object_queries">5.3.7. Image Object Queries</h4>
 <div class="paragraph"><p>To get information that is common to all memory objects, use the
 <strong>clGetMemObjectInfo</strong> function described in <em>section 5.5.5</em>.</p></div>
 <div class="paragraph"><p>To get information specific to an image object created with
 <strong>clCreateImage</strong>, use the following function
  </p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetImageInfo(cl_mem image,
                       cl_image_info param_name,
                       size_t param_value_size,
                       void *param_value,
                       size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p><em>image</em> specifies the image object being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetImageInfo</strong> is described in <em>table 5.10</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.10</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <div class="paragraph"><p><strong>clGetImageInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table
 5.10_and _param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>image</em> is a not a valid image object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 16. <em>List of supported param_names by clGetImageInfo</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_image_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_FORMAT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_image_format</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return image format descriptor
 specified when <em>image</em> is created with <strong>clCreateImage</strong>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_ELEMENT_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return size of each element of the
 image memory object given by <em>image</em> in bytes. An element is made up of
 <em>n</em> channels. The value of <em>n</em> is given in <em>cl_image_format</em>
 descriptor.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_ROW_PITCH</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return calculated row pitch in bytes of a
 row of elements of the image object given by <em>image</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_SLICE_PITCH</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return calculated slice pitch in bytes
 of a 2D slice for the 3D image object or size of each image in a 1D or
 2D image array given by <em>image</em>. For a 1D image, 1D image buffer and 2D
 image object return 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_WIDTH</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return width of the image in pixels.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_HEIGHT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return height of the image in pixels. For a
 1D image, 1D image buffer and 1D image array object, height = 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_DEPTH</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return depth of the image in pixels. For a
 1D image, 1D image buffer, 2D image or 1D and 2D image array object,
 depth = 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_ARRAY_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return number of images in the image
 array. If <em>image</em> is not an image array, 0 is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_NUM_MIP_<br>
  LEVELS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return num_mip_levels associated with <em>image</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_IMAGE_NUM_SAMPLES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return num_samples associated with
 <em>image</em>.</p></td>
 </tr>
 </tbody>
 </table>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_pipes">5.4. Pipes</h3>
 <div class="paragraph"><p>A <em>pipe</em> is a memory object that stores data organized as a FIFO. Pipe
 objects can only be accessed using built-in functions that read from and
 write to a pipe. Pipe objects are not accessible from the host. A pipe
 object encapsulates the following information:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       Packet size in bytes
 </p>
 </li>
 <li>
 <p>
       Maximum capacity in packets
 </p>
 </li>
 <li>
 <p>
       Information about the number of packets currently in the pipe
 </p>
 </li>
 <li>
 <p>
       Data packets
 </p>
 </li>
 </ul></div>
 <div class="sect3">
 <h4 id="_creating_pipe_objects">5.4.1. Creating Pipe Objects</h4>
 <div class="paragraph"><p>A <strong>pipe object</strong> is created using the following function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_mem clCreatePipe(cl_context context,
                     cl_mem_flags flags,
                     cl_uint pipe_packet_size,
                     cl_uint pipe_max_packets,
                     const cl_pipe_properties *properties,
                     cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p><em>context</em> is a valid OpenCL context used to create the pipe object.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify allocation and usage
 information such as the memory arena that should be used to allocate the
 pipe object and how it will be used. <em>Table 5.3</em> describes the possible
 values for <em>flags</em>. Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS
 can be specified when creating a pipe object. If the value specified for
 <em>flags</em> is 0, the default is used which is CL_MEM_READ_WRITE |
 CL_MEM_HOST_NO_ACCESS.</p></div>
 <div class="paragraph"><p><em>pipe_packet_size</em> is the size in bytes of a pipe packet.</p></div>
 <div class="paragraph"><p><em>pipe_max_packets</em> specifies the pipe capacity by specifying the maximum
 number of packets the pipe can hold.</p></div>
 <div class="paragraph"><p><em>properties</em> specifies a list of properties for the pipe and their
 corresponding values. Each property name is immediately followed by the
 corresponding desired value. The list is terminated with 0. In OpenCL
 2.2, <em>properties</em> must be NULL.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreatePipe</strong> returns a valid non-zero pipe object and <em>errcode_ret</em> is
 set to CL_SUCCESS if the pipe object is created successfully.
 Otherwise, it returns a NULL value with one of the following error
 values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if values
 specified in _flags_are not as defined above.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>properties</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_PIPE_SIZE if
 <em>pipe_packet_size</em> is 0 or the <em>pipe_packet_size</em> exceeds
 CL_DEVICE_PIPE_MAX_PACKET_SIZE value specified in <em>table 4.3</em> for all
 devices in <em>context_or if _pipe_max_packets</em> is 0.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for the pipe object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Pipes follow the same memory consistency model as defined for buffer and
 image objects. The pipe state i.e. contents of the pipe across
 kernel-instances (on the same or different devices) is enforced at a
 synchronization point.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_pipe_object_queries">5.4.2. Pipe Object Queries</h4>
 <div class="paragraph"><p>To get information that is common to all memory objects, use the
 <strong>clGetMemObjectInfo</strong> function described in <em>section 5.5.5</em>.</p></div>
 <div class="paragraph"><p>To get information specific to a pipe object created with
 <strong>clCreatePipe</strong>, use the following function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetPipeInfo(cl_mem pipe,
                      cl_pipe_info param_name,
                      size_t param_value_size,
                      void *param_value,
                      size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p><em>pipe</em> specifies the pipe object being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetPipeInfo</strong> is described in <em>table 5.11</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.11</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <div class="paragraph"><p><strong>clGetPipeInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table
 5.11_and _param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>pipe</em> is a not a valid pipe object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 17. <em>List of supported param_names by clGetPipeInfo</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_pipe_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PIPE_PACKET_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return pipe packet size specified when
 <em>pipe</em> is created with <strong>clCreatePipe</strong>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PIPE_MAX_PACKETS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return max. number of packets specified
 when <em>pipe</em> is created with <strong>clCreatePipe</strong>.</p></td>
 </tr>
 </tbody>
 </table>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_handling_memory_objects">5.5. Handling Memory Objects</h3>
 <div class="sect3">
 <h4 id="_retaining_and_releasing_memory_objects">5.5.1. Retaining and Releasing Memory Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainMemObject(cl_mem memobj)</pre>
 </div></div>
 <div class="paragraph"><p>increments the <em>memobj</em> reference count. <strong>clRetainMemObject</strong> returns
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>memobj</em> is not a valid memory object (buffer or image object).
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>clCreateBuffer</strong>, <strong>clCreateSubBuffer</strong>, <strong>clCreateImage</strong> and
 <strong>clCreatePipe</strong> perform an implicit retain.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseMemObject(cl_mem memobj)</pre>
 </div></div>
 <div class="paragraph"><p>decrements the <em>memobj</em> reference count. <strong>clReleaseMemObject</strong> returns
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>memobj</em> is not a valid memory object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>After the <em>memobj</em> reference count becomes zero and commands queued for
 execution on a command-queue(s) that use <em>memobj</em> have finished, the
 memory object is deleted. If <em>memobj</em> is a buffer object, <em>memobj</em>
 cannot be deleted until all sub-buffer objects associated with <em>memobj</em>
 are deleted. Using this function to release a reference that was not
 obtained by creating the object or by calling <strong>clRetainMemObject</strong> causes
 undefined behavior.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clSetMemObjectDestructorCallback
        (cl_mem memobj,
         void (CL_CALLBACK *pfn_notify)(cl_mem memobj,void *user_data),
         void *user_data)</pre>
 </div></div>
 <div class="paragraph"><p>registers a user callback function with a memory object. Each call to
 <strong>clSetMemObjectDestructorCallback</strong> registers the specified user callback
 function on a callback stack associated with <em>memobj</em>. The registered
 user callback functions are called in the reverse order in which they
 were registered. The user callback functions are called and then the
 memory objects resources are freed and the memory object is deleted.
 This provides a mechanism for the application (and libraries) using
 <em>memobj</em> to be notified when the memory referenced by <em>host_ptr</em>,
 specified when the memory object is created and used as the storage bits
 for the memory object, can be reused or freed.</p></div>
 <div class="paragraph"><p><em>memobj</em> is a valid memory object.</p></div>
 <div class="paragraph"><p><em>pfn_notify</em> is the callback function that can be registered by the
 application. This callback function may be called asynchronously by the
 OpenCL implementation. It is the applications responsibility to ensure
 that the callback function is thread-safe. The parameters to this
 callback function are:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       <em>memobj_is the memory
 object being deleted. When the user callback is called by the
 implementation, this memory object is not longer valid. _memobj</em> is
 only provided for reference purposes.
 </p>
 </li>
 <li>
 <p>
       _user_data_is a pointer to
 user supplied data.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>_user_data_will be passed as the _user_data_argument when _pfn_notify_is
 called. _user_data_can be NULL.</p></div>
 <div class="paragraph"><p><strong>clSetMemObjectDestructorCallback</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>memobj</em> is not a valid memory object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pfn_notify</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">When the user callback function is called by the implementation,
 the contents of the memory region pointed to by <em>host_ptr</em> (if the
 memory object is created with CL_MEM_USE_HOST_PTR) are undefined. The
 callback function is typically used by the application to either free or
 reuse the memory region pointed to by <em>host_ptr</em>.</td>
 </tr></table>
 </div>
 <div class="paragraph"><p>The behavior of calling expensive system routines, OpenCL API calls to
 create contexts or command-queues, or blocking OpenCL operations from
 the following list below, in a callback is undefined.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>clFinish</strong>,
 </p>
 </li>
 <li>
 <p>
 <strong>clWaitForEvents</strong>,
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clEnqueueReadBuffer</strong>, <strong>clEnqueueReadBufferRect</strong>,
 </p>
 </li>
 <li>
 <p>
 <strong>clEnqueueWriteBuffer</strong>,<strong>clEnqueueWriteBufferRect,</strong>
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clEnqueueReadImage</strong> and *clEnqueueWriteImage, *
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clEnqueueMapBuffer,</strong>
 </p>
 </li>
 <li>
 <p>
 <strong>clEnqueueMapImage</strong>,
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clBuildProgram</strong>, <strong>clCompileProgram</strong> or <strong>clLinkProgram</strong>
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>If an application needs to wait for completion of a routine from the
 above list in a callback, please use the non-blocking form of the
 function, and assign a completion callback to it to do the remainder of
 your work.  Note that when a callback (or other code) enqueues commands
 to a command-queue, the commands are not required to begin execution
 until the queue is flushed. In standard usage, blocking enqueue calls
 serve this role by implicitly flushing the queue. Since blocking calls
 are not permitted in callbacks, those callbacks that enqueue commands on
 a command queue should either call <strong>clFlush</strong> on the queue before
 returning or arrange for <strong>clFlush</strong> to be called later on another thread.</p></div>
 <div class="paragraph"><p>The user callback function may not call OpenCL APIs with the memory
 object for which the callback function is invoked and for such cases the
 behavior of OpenCL APIs is considered to be undefined.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_unmapping_mapped_memory_objects">5.5.2. Unmapping Mapped Memory Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueUnmapMemObject(cl_command_queue command_queue,
                                cl_mem memobj,
                                void *mapped_ptr,
                                cl_uint num_events_in_wait_list,
                                const cl_event *event_wait_list,
                                cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to unmap a previously mapped region of a memory
 object. Reads or writes from the host using the pointer returned by
 <strong>clEnqueueMapBuffer</strong> or *clEnqueueMapImage*are considered to be complete.</p></div>
 <div class="paragraph"><p><em>command_queue</em> must be a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>memobj</em> is a valid memory (buffer or image) object. The OpenCL context
 associated with <em>command_queue</em> and <em>memobj</em> must be the same.</p></div>
 <div class="paragraph"><p><em>mapped_ptr</em> is the host address returned by a previous call to
 <strong>clEnqueueMapBuffer</strong>, or <strong>clEnqueueMapImage</strong> for <em>memobj</em>.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before <strong>clEnqueueUnmapMemObject</strong> can be executed. If
 <em>event_wait_list</em> is NULL, then <strong>clEnqueueUnmapMemObject</strong> does not wait
 on any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. <strong>clEnqueueBarrierWithWaitList</strong> can be used
 instead. If the <em>event_wait_list</em> and the <em>event</em> arguments are not
 NULL, the <em>event</em> argument should not refer to an element of the
 <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueUnmapMemObject</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>memobj</em> is not a valid memory object or is a pipe object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>mapped_ptr</em> is not a valid pointer returned by
 <strong>clEnqueueMapBuffer</strong> or <strong>clEnqueueMapImage</strong> for <em>memobj</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or if
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and <em>memobj</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueMapImage</strong> increment the mapped count
 of the memory object. The initial mapped count value of the memory
 object is zero. Multiple calls to
 <strong>clEnqueueMapBuffer</strong>, or <strong>clEnqueueMapImage</strong> on the same memory object
 will increment this mapped count by appropriate number of calls.
 *clEnqueueUnmapMemObject*decrements the mapped count of the memory
 object.</p></div>
 <div class="paragraph"><p><strong>clEnqueueMapBuffer</strong>, and <strong>clEnqueueMapImage</strong> act as synchronization
 points for a region of the buffer object being mapped.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_accessing_mapped_regions_of_a_memory_object">5.5.3. Accessing mapped regions of a memory object</h4>
 <div class="paragraph"><p>This section describes the behavior of OpenCL commands that access
 mapped regions of a memory object.</p></div>
 <div class="paragraph"><p>The contents of the region of a memory object and associated memory
 objects (sub-buffer objects or 1D image buffer objects that overlap this
 region) mapped for writing (i.e. CL_MAP_WRITE or
 CL_MAP_WRITE_INVALIDATE_REGION is set in <em>map_flags</em> argument to
 <strong>clEnqueueMapBuffer</strong>, or <strong>clEnqueueMapImage</strong>) are considered to be
 undefined until this region is unmapped.</p></div>
 <div class="paragraph"><p>Multiple commands in command-queues can map a region or overlapping
 regions of a memory object and associated memory objects (sub-buffer
 objects or 1D image buffer objects that overlap this region) for reading
 (i.e. <em>map_flags</em> = CL_MAP_READ). The contents of the regions of a
 memory object mapped for reading can also be read by kernels and other
 OpenCL commands (such as <strong>clEnqueueCopyBuffer</strong>) executing on a
 device(s).</p></div>
 <div class="paragraph"><p>Mapping (and unmapping) overlapped regions in a memory object and/or
 associated memory objects (sub-buffer objects or 1D image buffer objects
 that overlap this region) for writing is an error and will result in
 CL_INVALID_OPERATION error returned by <strong>clEnqueueMapBuffer</strong>, or
 <strong>clEnqueueMapImage</strong>.</p></div>
 <div class="paragraph"><p>If a memory object is currently mapped for writing, the application must
 ensure that the memory object is unmapped before any enqueued kernels or
 commands that read from or write to this memory object or any of its
 associated memory objects (sub-buffer or 1D image buffer objects) or its
 parent object (if the memory object is a sub-buffer or 1D image buffer
 object) begin execution; otherwise the behavior is undefined.</p></div>
 <div class="paragraph"><p>If a memory object is currently mapped for reading, the application must
 ensure that the memory object is unmapped before any enqueued kernels or
 commands that write to this memory object or any of its associated
 memory objects (sub-buffer or 1D image buffer objects) or its parent
 object (if the memory object is a sub-buffer or 1D image buffer object)
 begin execution; otherwise the behavior is undefined.</p></div>
 <div class="paragraph"><p>A memory object is considered as mapped if there are one or more active
 mappings for the memory object irrespective of whether the mapped
 regions span the entire memory object.</p></div>
 <div class="paragraph"><p>Accessing the contents of the memory region referred to by the mapped
 pointer that has been unmapped is undefined.</p></div>
 <div class="paragraph"><p>The mapped pointer returned by <strong>clEnqueueMapBuffer</strong> or
 <strong>clEnqueueMapImage</strong> can be used as <em>ptr</em> argument value to
 <strong>clEnqueue{Read | Write}Buffer</strong>, <strong>clEnqeue{Read | Write}BufferRect</strong>,
 <strong>clEnqueue{Read | Write}Image</strong> provided the rules described above are
 adhered to.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_migrating_memory_objects">5.5.4. Migrating Memory Objects</h4>
 <div class="paragraph"><p>This section describes a mechanism for assigning which device an OpenCL
 memory object resides. A user may wish to have more explicit control
 over the location of their memory objects on creation. This could be
 used to:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       Ensure that an object is
 allocated on a specific device prior to usage.
 </p>
 </li>
 <li>
 <p>
       Preemptively migrate an
 object from one device to another.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueMigrateMemObjects(cl_command_queue command_queue,
                                   cl_uint num_mem_objects,
                                   const cl_mem *mem_objects,
                                   cl_mem_migration_flags flags,
                                   cl_uint num_events_in_wait_list
                                   const cl_event *event_wait_list,
                                   cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to indicate which device a set of memory objects
 should be associated with. Typically, memory objects are implicitly
 migrated to a device for which enqueued commands, using the memory
 object, are targeted. <strong>clEnqueueMigrateMemObjects</strong> allows this
 migration to be explicitly performed ahead of the dependent commands.
 This allows a user to preemptively change the association of a memory
 object, through regular command queue scheduling, in order to prepare
 for another upcoming command. This also permits an application to
 overlap the placement of memory objects with other unrelated operations
 before these memory objects are needed potentially hiding transfer
 latencies. Once the event, returned from <strong>clEnqueueMigrateMemObjects</strong>,
 has been marked CL_COMPLETE the memory objects specified in
 <em>mem_objects</em> have been successfully migrated to the device associated
 with <em>command_queue</em>. The migrated memory object shall remain resident
 on the device until another command is enqueued that either implicitly
 or explicitly migrates it away.</p></div>
 <div class="paragraph"><p><strong>clEnqueueMigrateMemObjects</strong> can also be used to direct the initial
 placement of a memory object, after creation, possibly avoiding the
 initial overhead of instantiating the object on the first enqueued
 command to use it.</p></div>
 <div class="paragraph"><p>The user is responsible for managing the event dependencies, associated
 with this command, in order to avoid overlapping access to memory
 objects. Improperly specified event dependencies passed to
 <strong>clEnqueueMigrateMemObjects</strong> could result in undefined results.</p></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command-queue. The specified set of
 memory objects in <em>mem_objects</em> will be migrated to the OpenCL device
 associated with <em>command_queue</em> or to the host if the
 CL_MIGRATE_MEM_OBJECT_HOST has been specified.</p></div>
 <div class="paragraph"><p><em>num_mem_objects_is the number of memory objects specified in
 _mem_objects</em>.</p></div>
 <div class="paragraph"><p><em>mem_objects</em> is a pointer to a list of memory objects.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify migration options. The
 <em>table 5.12</em> describes the possible values for flags.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 18. <em>Supported values for cl_mem_migration_flags</em></caption>
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_mem_migration flags</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MIGRATE_MEM_OBJECT_HOST</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag indicates that the specified
 set of memory objects are to be migrated to the host, regardless of the
 target command-queue.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MIGRATE_MEM_OBJECT_<br>
  CONTENT_UNDEFINED</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag indicates that the contents of the set of
 memory objects are undefined after migration. The specified set of
 memory objects are migrated to the device associated with
 _command_queue_without incurring the overhead of migrating their
 contents.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. If the <em>event_wait_list</em> and the <em>event</em>
 arguments are not NULL, the <em>event</em> argument should not refer to an
 element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueMigrateMemObjects</strong> return CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em> and memory objects in
 <em>mem_objects</em> are not the same or if the context associated with
 <em>command_queue</em> and events in <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 any of the memory objects in <em>mem_objects</em> is not a valid memory object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>num_mem_objects</em> is zero or if <em>mem_objects</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>flags</em> is not 0 or is not any of the values described in the table
 above.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for the specified set of memory objects in <em>mem_objects</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_memory_object_queries">5.5.5. Memory Object Queries</h4>
 <div class="paragraph"><p>To get information that is common to all memory objects (buffer and
 image objects), use the following function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetMemObjectInfo(cl_mem memobj,
                           cl_mem_info param_name,
                           size_t param_value_size,
                           void *param_value,
                           size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p><em>memobj</em> specifies the memory object being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetMemObjectInfo</strong> is described in <em>table 5.13</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.13</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <div class="paragraph"><p><strong>clGetMemObjectInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table 5.13</em>
 and <em>param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 <em>memobj</em> is a not a valid memory object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.

 </p>
 </li>
 </ul></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 19. <em>List of supported param_names by clGetMemObjectInfo</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_mem_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_TYPE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_mem_object_type</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns one of the following values:
 <br>
 <br>
 CL_MEM_OBJECT_BUFFER if memobj
 is created with clCreateBuffer or
 clCreateSubBuffer.
 <br>
 <br>
 cl_image_desc.image_type argument
 value if memobj is created with
 clCreateImage.
 <br>
 <br>
 CL_MEM_OBJECT_PIPE if memobj is
 created with clCreatePipe.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_FLAGS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_mem_flags</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the flags argument value specified
 when memobj is created with
 clCreateBuffer,
 clCreateSubBuffer,
 clCreateImage or
 clCreatePipe.
 <br>
 <br>
 If memobj is a sub-buffer the memory
 access qualifiers inherited from parent
 buffer is also returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return actual size of the data store associated
 with <em>memobj</em> in bytes.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_HOST_PTR</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">void *</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">If memobj is created with
 clCreateBuffer or clCreateImage and
 CL_MEM_USE_HOST_PTR is specified
 in mem_flags, return the host_ptr
 argument value specified when memobj
 is created. Otherwise a NULL value is
 returned.
 <br>
 <br>
 If memobj is created with
 clCreateSubBuffer, return the host_ptr
 + origin value specified when memobj is
 created.  host_ptr is the argument value
 specified to clCreateBuffer and
 CL_MEM_USE_HOST_PTR is specified
 in mem_flags for memory object from
 which memobj is created.  Otherwise a
 NULL value is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_MEM_MAP_COUNT*<span class="footnote"><br>[The map count returned should be considered immediately stale. It is unsuitable for general use in applications.
 This feature is provided for debugging.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Map count.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_MEM_REFERENCE_COUNT*<span class="footnote"><br>[The reference count returned should be considered immediately stale. It is unsuitable for general use in
 applications. This feature is provided for identifying memory leaks.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return <em>memobj</em> reference count.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_CONTEXT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_context</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return context specified when memory
 object is created. If <em>memobj</em> is created using <strong>clCreateSubBuffer</strong>,
 the context associated with the memory object specified as the <em>buffer</em>
 argument to <strong>clCreateSubBuffer</strong> is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_ASSOCIATED_ MEMOBJECT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_mem</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return memory object from which
 memobj is created. This returns the memory object specified
 as buffer argument to
 clCreateSubBuffer if memobj is a subbuffer object created using
 clCreateSubBuffer.
 <br>
 <br>
 This returns the mem_object specified in
 cl_image_desc if memobj is an image
 object.
 <br>
 <br>
 Otherwise a NULL value is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_OFFSET</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return offset if memobj is a sub-buffer
 object created using clCreateSubBuffer.
 <br>
 <br>
 This return 0 if memobj is not a subbuffer object.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_USES_SVM_ POINTER</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return CL_TRUE if <em>memobj</em> is a buffer object that
 was created with CL_MEM_USE_HOST_PTR or is a sub-buffer object of a
 buffer object that was created with CL_MEM_USE_HOST_PTR and the
 <em>host_ptr</em> specified when the buffer object was created is a SVM
 pointer; otherwise returns CL_FALSE.</p></td>
 </tr>
 </tbody>
 </table>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_shared_virtual_memory">5.6. Shared Virtual Memory</h3>
 <div class="paragraph"><p>OpenCL 2.2 adds support for shared virtual memory (a.k.a. SVM). SVM
 allows the host and kernels executing on devices to directly share
 complex, pointer-containing data structures such as trees and linked
 lists. It also eliminates the need to marshal data between the host and
 devices. As a result, SVM substantially simplifies OpenCL programming
 and may improve performance.</p></div>
 <div class="sect3">
 <h4 id="_svm_sharing_granularity_coarse_and_fine_grained_sharing">5.6.1. SVM sharing granularity: coarse- and fine- grained sharing</h4>
 <div class="paragraph"><p>OpenCL maintains memory consistency in a coarse-grained fashion in
 regions of buffers. We call this coarse-grained sharing. Many platforms
 such as those with integrated CPU-GPU processors and ones using the
 SVM-related PCI-SIG IOMMU services can do better, and can support
 sharing at a granularity smaller than a buffer. We call this
 fine-grained sharing. OpenCL 2.0 requires that the host and all OpenCL
 2.2 devices support coarse-grained sharing at a minimum.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       Coarse-grained sharing:
 Coarse-grain sharing may be used for memory and virtual pointer sharing
 between multiple devices as well as between the host and one or more
 devices. The shared memory region is a memory buffer allocated using
 <strong>clSVMAlloc</strong>. Memory consistency is guaranteed at synchronization points
 and the host can use calls to <strong>clEnqueueSVMMap</strong> and <strong>clEnqueueSVMUnmap</strong>
 or create a cl_mem buffer object using the SVM pointer and use OpenCLs
 existing host API functions <strong>clEnqueueMapBuffer</strong> and
 <strong>clEnqueueUnmapMemObject</strong> to update regions of the buffer. What
 coarse-grain buffer SVM adds to OpenCLs earlier buffer support are the
 ability to share virtual memory pointers and a guarantee that concurrent
 access to the same memory allocation from multiple kernels on a single
 device is valid. The coarse-grain buffer SVM provides a memory
 consistency model similar to the global memory consistency model
 described in <em>sections 3.3.1</em> and <em>3.4.3</em> of the OpenCL 1.2
 specification. This memory consistency applies to the regions of buffers
 being shared in a coarse-grained fashion. It is enforced at the
 synchronization points between commands enqueued to command queues in a
 single context with the additional consideration that multiple kernels
 concurrently running on the same device may safely share the data.
 </p>
 </li>
 <li>
 <p>
       Fine-grained sharing:
 Shared virtual memory where memory consistency is maintained at a
 granularity smaller than a buffer. How fine-grained SVM is used depends
 on whether the device supports SVM atomic operations.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>o   If SVM atomic operations are supported, they provide memory
 consistency for loads and stores by the host and kernels executing on
 devices supporting SVM. This means that the host and devices can
 concurrently read and update the same memory. The consistency provided
 by SVM atomics is in addition to the consistency provided at
 synchronization points. There is no need for explicit calls to
 <strong>clEnqueueSVMMap</strong> and <strong>clEnqueueSVMUnmap</strong> or <strong>clEnqueueMapBuffer</strong> and
 <strong>clEnqueueUnmapMemObject</strong> on a cl_mem buffer object created using the
 SVM pointer.</p></div>
 <div class="paragraph"><p>o   If SVM atomic operations are not supported, the host and devices can
 concurrently read the same memory locations and can concurrently update
 non-overlapping memory regions, but attempts to update the same memory
 locations are undefined. Memory consistency is guaranteed at
 synchronization points without the need for explicit calls to to
 <strong>clEnqueueSVMMap</strong> and <strong>clEnqueueSVMUnmap</strong> or <strong>clEnqueueMapBuffer</strong> and
 <strong>clEnqueueUnmapMemObject</strong> on a cl_mem buffer object created using the
 SVM pointer.</p></div>
 <div class="paragraph"><p>There are two kinds of fine-grain sharing support. Devices may support
 either fine-grain buffer sharing or fine-grain system sharing.</p></div>
 <div class="paragraph"><p>o   Fine-grain buffer sharing provides fine-grain SVM only within
 buffers and is an extension of coarse-grain sharing. To support
 fine-grain buffer sharing in an OpenCL context, all devices in the
 context must support CL_DEVICE_SVM_FINE_GRAIN_BUFFER.</p></div>
 <div class="paragraph"><p>o   Fine-grain system sharing enables fine-grain sharing of the hosts
 entire virtual memory, including memory regions allocated by the system
 <strong>malloc</strong> API. OpenCL buffer objects are unnecessary and programmers can
 pass pointers allocated using <strong>malloc</strong> to OpenCL kernels.</p></div>
 <div class="paragraph"><p>As an illustration of fine-grain SVM using SVM atomic operations to
 maintain memory consistency, consider the following example. The host
 and a set of devices can simultaneously access and update a shared
 work-queue data structure holding work-items to be done. The host can
 use atomic operations to insert new work-items into the queue at the
 same time as the devices using similar atomic operations to remove
 work-items for processing.</p></div>
 <div class="paragraph"><p>It is the programmers responsibility to ensure that no host code or
 executing kernels attempt to access a shared memory region after that
 memory is freed. We require the SVM implementation to work with either
 32- or 64- bit host applications subject to the following requirement:
 the address space size must be the same for the host and all OpenCL
 devices in the context.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>void* clSVMAlloc(cl_context context,
                  cl_svm_mem_flags flags,
                  size_t size,
                  cl_uint alignment)</pre>
 </div></div>
 <div class="paragraph"><p>allocates a shared virtual memory buffer (referred to as a SVM buffer)
 that can be shared by the host and all devices in an OpenCL context that
 support shared virtual memory.</p></div>
 <div class="paragraph"><p><em>context</em> is a valid OpenCL context used to create the SVM buffer.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify allocation and usage
 information. <em>Table 5.14</em> describes the possible values for <em>flags</em>.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 20. <em>List of supported cl_svm_mem_flags_values</em></caption>
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_svm_mem_flags</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_READ_WRITE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the SVM buffer will be read
 and written by a kernel. This is the default.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_WRITE_ONLY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the SVM buffer will be written
 but not read by a kernel.
 <br>
 <br>
 Reading from a SVM buffer created with
 CL_MEM_WRITE_ONLY inside a kernel is undefined.
 <br>
 <br>
 CL_MEM_READ_WRITE and
 CL_MEM_WRITE_ONLY are mutually exclusive.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_READ_ONLY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag specifies that the SVM buffer object is a
 read-only memory object when used inside a kernel.
 <br>
 <br>
 Writing to a SVM buffer created with
 CL_MEM_READ_ONLY inside a kernel is undefined.
 <br>
 <br>
 CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY
 and CL_MEM_READ_ONLY are mutually exclusive.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_SVM_FINE_GRAIN_<br>
  BUFFER</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This specifies that the application wants the OpenCL
 implementation to do a fine-grained allocation.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MEM_SVM_ATOMICS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag is valid only if
 CL_MEM_SVM_FINE_GRAIN_BUFFER is specified in flags. It is used to
 indicate that SVM atomic operations can control visibility of memory
 accesses in this SVM buffer.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>If CL_MEM_SVM_FINE_GRAIN_BUFFER is not specified, the buffer can be
 created as a coarse grained SVM allocation. Similarly, if
 CL_MEM_SVM_ATOMICS is not specified, the buffer can be created without
 support for SVM atomic operations (refer to an OpenCL kernel
 language specifications).</p></div>
 <div class="paragraph"><p><em>size</em> is the size in bytes of the SVM buffer to be allocated.</p></div>
 <div class="paragraph"><p><em>alignment</em> is the minimum alignment in bytes that is required for the
 newly created buffers memory region. It must be a power of two up to
 the largest data type supported by the OpenCL device. For the full
 profile, the largest data type is long16. For the embedded profile, it
 is long16 if the device supports 64-bit integers; otherwise it is
 int16. If alignment is 0, a default alignment will be used that is
 equal to the size of largest data type supported by the OpenCL
 implementation.</p></div>
 <div class="paragraph"><p><strong>clSVMAlloc</strong> returns a valid non-NULL shared virtual memory address if
 the SVM buffer is successfully allocated. Otherwise, like <strong>malloc</strong>, it
 returns a NULL pointer value. <strong>clSVMAlloc</strong> will fail if</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       <em>context</em> is not a valid context.
 </p>
 </li>
 <li>
 <p>
       <em>flags</em> does not contain
 CL_MEM_SVM_FINE_GRAIN_BUFFER but does contain CL_MEM_SVM_ATOMICS.
 </p>
 </li>
 <li>
 <p>
       Values specified in
 <em>flags</em> do not follow rules described for supported values in <em>table
 5.14</em>.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_SVM_FINE_GRAIN_BUFFER or CL_MEM_SVM_ATOMICS is specified in
 <em>flags</em> and these are not supported by at least one device in <em>context</em>.
 </p>
 </li>
 <li>
 <p>
       The values specified in
 <em>flags</em> are not valid i.e. dont match those defined in <em>table 5.14</em>.
 </p>
 </li>
 <li>
 <p>
       <em>size</em> is 0 or &gt;
 CL_DEVICE_MAX_MEM_ALLOC_SIZE value for any device in <em>context</em>.
 </p>
 </li>
 <li>
 <p>
       <em>alignment</em> is not a power
 of two or the OpenCL implementation cannot support the specified
 alignment for at least one device in <em>context</em>.
 </p>
 </li>
 <li>
 <p>
       There was a failure to
 allocate resources.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Calling <strong>clSVMAlloc</strong> does not itself provide consistency for the shared
 memory region. When the host cant use the SVM atomic operations, it
 must rely on OpenCLs guaranteed memory consistency at synchronization
 points.</p></div>
 <div class="paragraph"><p>For SVM to be used efficiently, the host and any devices sharing a
 buffer containing virtual memory pointers should have the same
 endianness. If the context passed to <strong>clSVMAlloc</strong> has devices with
 mixed endianness and the OpenCL implementation is unable to implement
 SVM because of that mixed endianness, <strong>clSVMAlloc</strong> will fail and return
 NULL.</p></div>
 <div class="paragraph"><p>Although SVM is generally not supported for image objects,
 <strong>clCreateImage</strong> may create an image from a buffer (a 1D image from a
 buffer or a 2D image from buffer) if the buffer specified in its image
 description parameter is a SVM buffer. Such images have a linear memory
 representation so their memory can be shared using SVM. However, fine
 grained sharing and atomics are not supported for image reads and writes
 in a kernel.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>void clSVMFree(cl_context context,
                void * svm_pointer)</pre>
 </div></div>
 <div class="paragraph"><p>frees a shared virtual memory buffer allocated using <strong>clSVMAlloc</strong>.</p></div>
 <div class="paragraph"><p><em>context</em> is a valid OpenCL context used to create the SVM buffer.</p></div>
 <div class="paragraph"><p><em>svm_pointer</em> must be the value returned by a call to <strong>clSVMAlloc</strong>. If
 a NULL pointer is passed in <em>svm_pointer</em>, no action occurs.</p></div>
 <div class="paragraph"><p>Note that <strong>clSVMFree</strong> does not wait for previously enqueued commands
 that may be using <em>svm_pointer</em> to finish before freeing
 <em>svm_pointer</em>. It is the responsibility of the application to make
 sure that enqueued commands that use <em>svm_pointer</em> have finished before
 freeing <em>svm_pointer</em>. This can be done by enqueuing a blocking
 operation such as <strong>clFinish</strong>, <strong>clWaitForEvents</strong>, <strong>clEnqueueReadBuffer</strong>
 or by registering a callback with the events associated with enqueued
 commands and when the last enqueued comamnd has finished freeing
 <em>svm_pointer</em>.</p></div>
 <div class="paragraph"><p>The behavior of using <em>svm_pointer</em> after it has been freed is
 undefined. In addition, if a buffer object is created using
 <strong>clCreateBuffer</strong> with <em>svm_pointer</em>, the buffer object must first be
 released before the <em>svm_pointer</em> is freed.</p></div>
 <div class="paragraph"><p>The <strong>clEnqueueSVMFree</strong> API can also be used to enqueue a callback to
 free the shared virtual memory buffer allocated using <strong>clSVMAlloc</strong> or a
 shared system memory pointer.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueSVMFree(cl_command_queue command_queue,
                         cl_uint num_svm_pointers,
                         void *svm_pointers[],
                         void (CL_CALLBACK *_pfn_free_func_)
                             (cl_command_queue queue,
                             cl_uint num_svm_pointers,
                             void *svm_pointers[],
                             void *user_data),
                         void *user_data,
                         cl_uint num_events_in_wait_list,
                         const cl_event *event_wait_list,
                         cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to free the shared virtual memory allocated using
 <strong>clSVMAlloc</strong> or a shared system memory pointer.</p></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>svm_pointers</em> and <em>num_svm_pointers</em> specify shared virtual memory
 pointers to be freed. Each pointer in <em>svm_pointers</em> that was allocated
 using <strong>clSVMAlloc</strong> must have been allocated from the same context from
 which <em>command_queue</em> was created. The memory associated with
 <em>svm_pointers</em> can be reused or freed after the function returns.</p></div>
 <div class="paragraph"><p><em>pfn_free_func</em> specifies the callback function to be called to free the
 SVM pointers. <em>pfn_free_func</em> takes four arguments: <em>queue</em> which is
 the command queue in which <strong>clEnqueueSVMFree</strong> was enqueued, the count
 and list of SVM pointers to free and <em>user_data</em> which is a pointer to
 user specified data. If <em>pfn_free_func</em> is NULL, all pointers
 specified in <em>svm_pointers</em> must be allocated using <strong>clSVMAlloc</strong> and the
 OpenCL implementation will free these SVM pointers. <em>pfn_free_func</em>
 must be a valid callback function if any SVM pointer to be freed is a
 shared system memory pointer i.e. not allocated using <strong>clSVMAlloc</strong>. If
 <em>pfn_free_func</em> is a valid callback function, the OpenCL implementation
 will call <em>pfn_free_func</em> to free all the SVM pointers specified in
 <em>svm_pointers</em>.</p></div>
 <div class="paragraph"><p><em>user_data</em> will be passed as the user_data argument when
 <em>pfn_free_func</em> is called. <em>user_data</em> can be NULL.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before <strong>clEnqueueSVMFree</strong> can be executed. If
 <em>event_wait_list</em> is NULL, then <strong>clEnqueueSVMFree</strong> does not wait on any
 event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. If the <em>event_wait_list</em> and the <em>event</em>
 arguments are not NULL, the <em>event</em> argument should not refer to an
 element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueSVMFree</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>num_svm_pointers</em> is 0 and <em>svm_pointers_is non-NULL, _or</em> if
 <em>svm_pointers</em> is NULL and _num_svm_pointers_is not 0.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following function enqueues a command to do a memcpy operation.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueSVMMemcpy(cl_command_queue command_queue,
                           cl_bool blocking_copy,
                           void *dst_ptr,
                           const void *src_ptr,
                           size_t size,
                           cl_uint num_events_in_wait_list,
                           const cl_event *event_wait_list,
                           cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p><em>command_queue</em> refers to the host command-queue in which the read /
 write command will be queued. If either <em>dst_ptr</em> or <em>src_ptr</em> is
 allocated using clSVMAlloc then the OpenCL context allocated against
 must match that of <em>command_queue</em>.</p></div>
 <div class="paragraph"><p><em>blocking_copy</em> indicates if the copy operation is <em>blocking</em> or
 <em>non-blocking</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_copy</em> is CL_TRUE i.e. the copy command is blocking,
 <strong>clEnqueueSVMMemcpy</strong> does not return until the buffer data has been
 copied into memory pointed to by <em>dst_ptr</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_copy</em> is CL_FALSE i.e. the copy command is non-blocking,
 <strong>clEnqueueSVMMemcpy</strong> queues a non-blocking copy command and returns.
 The contents of the buffer that <em>dst_ptr</em> point to cannot be used until
 the copy command has completed. The <em>event</em> argument returns an event
 object which can be used to query the execution status of the read
 command. When the copy command has completed, the contents of the
 buffer that _dst_ptr_points to__can be used by the application.</p></div>
 <div class="paragraph"><p><em>size</em> is the size in bytes of data being copied.</p></div>
 <div class="paragraph"><p><em>dst_ptr</em> is the pointer to a host or SVM memory allocation where data
 is copied to.</p></div>
 <div class="paragraph"><p><em>src_ptr</em> is the pointer to a host or SVM memory allocation where data
 is copied from.</p></div>
 <div class="paragraph"><p>If the memory allocation(s) containing <em>dst_ptr</em> and/or <em>src_ptr</em> are
 allocated using <strong>clSVMAlloc</strong> and either is not allocated from the same
 context from which <em>command_queue</em> was created the behavior is
 undefined.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular read /
 write command and can be used to query or queue a wait for this
 particular command to complete. <em>event</em> can be NULL in which case it
 will not be possible for the application to query the status of this
 command or queue a wait for this command to complete. If the
 <em>event_wait_list</em> and the <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueSVMMemcpy</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
   CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the copy operation is
 blocking and the execution status of any of the events in
 <em>event_wait_list</em> is a negative integer value.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>dst_ptr</em> or <em>src_ptr</em> are NULL.
 </p>
 </li>
 <li>
 <p>
       CL_MEM_COPY_OVERLAP if the
 values specified for <em>dst_ptr</em>, <em>src_ptr</em> and <em>size</em> result in an
 overlapping copy.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueSVMMemFill(cl_command_queue command_queue,
                            void *svm_ptr,
                            const void *pattern,
                            size_t pattern_size,
                            size_t size,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to fill a region in memory with a pattern of a given
 pattern size.</p></div>
 <div class="paragraph"><p><em>command_queue</em> refers to the host command-queue in which the fill
 command will be queued. The OpenCL context associated with
 <em>command_queue</em> and SVM pointer referred to by <em>svm_ptr</em> must be the
 same.</p></div>
 <div class="paragraph"><p><em>svm_ptr</em> is a pointer to a memory region that will be filled with
 <em>pattern</em>. It must be aligned to <em>pattern_size</em> bytes. If <em>svm_ptr</em> is
 allocated using <strong>clSVMAlloc</strong> then it must be allocated from the same
 context from which <em>command_queue</em> was created. Otherwise the behavior
 is undefined.</p></div>
 <div class="paragraph"><p><em>pattern</em> is a pointer to the data pattern of size <em>pattern_size</em> in
 bytes. <em>pattern</em> will be used to fill a region in <em>buffer</em> starting at
 <em>svm_ptr</em> and is <em>size</em> bytes in size. The data pattern must be a
 scalar or vector integer or floating-point data type supported by OpenCL
 as described in <em>sections 6.1.1</em> and <em>6.1.2</em>. For example, if region
 pointed to by <em>svm_ptr</em> is to be filled with a pattern of float4 values,
 then <em>pattern</em> will be a pointer to a cl_float4 value and <em>pattern_size</em>
 will be sizeof(cl_float4). The maximum value of <em>pattern_size</em> is the
 size of the largest integer or floating-point vector data type supported
 by the OpenCL device. The memory associated with <em>pattern</em> can be
 reused or freed after the function returns.</p></div>
 <div class="paragraph"><p><em>size</em> is the size in bytes of region being filled starting with
 <em>svm_ptr</em> and must be a multiple of <em>pattern_size</em>.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. <strong>clEnqueueBarrierWithWaitList</strong> can be used
 instead. If the <em>event_wait_list</em> and the <em>event</em> arguments are not
 NULL, the <em>event</em> argument should not refer to an element of the
 <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueSVMMemFill</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if the
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>svm_ptr</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>svm_ptr</em> is not aligned to <em>pattern_size</em> bytes.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pattern</em> is NULL or if <em>pattern_size</em> is 0 or if <em>pattern_size</em> is not
 one of {1, 2, 4, 8, 16, 32, 64, 128}.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if <em>size</em>
 is not a multiple of <em>pattern_size</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueSVMMap(cl_command_queue command_queue,
                        cl_bool blocking_map,
                        cl_map_flags map_flags,
                        void *svm_ptr,
                        size_t size,
                        cl_uint num_events_in_wait_list,
                        const cl_event *event_wait_list,
                        cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command that will allow the host to update a region of a SVM
 buffer. Note that since we are enqueuing a command with a SVM buffer,
 the region is already mapped in the host address space.</p></div>
 <div class="paragraph"><p><em>command_queue</em> must be a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>blocking_map</em> indicates if the map operation is <em>blocking</em> or
 <em>non-blocking</em>.</p></div>
 <div class="paragraph"><p>If <em>blocking_map</em> is CL_TRUE, <strong>clEnqueueSVMMap</strong> does not return until
 the application can access the contents of the SVM region specified by
 <em>svm_ptr</em> and <em>size</em> on the host.</p></div>
 <div class="paragraph"><p>If <em>blocking_map</em> is CL_FALSE i.e. map operation is non-blocking, the
 region specified by <em>svm_ptr</em> and <em>size</em> cannot be used until the map
 command has completed. The <em>event</em> argument returns an event object
 which can be used to query the execution status of the map command.
 When the map command is completed, the application can access the
 contents of the region specified by <em>svm_ptr</em> and <em>size</em>.</p></div>
 <div class="paragraph"><p><em>map_flags</em> is a bit-field and is described in <em>table 5.5</em>.</p></div>
 <div class="paragraph"><p><em>svm_ptr</em> and <em>size</em> are a pointer to a memory region and size in bytes
 that will be updated by the host. If <em>svm_ptr</em> is allocated using
 <strong>clSVMAlloc</strong> then it must be allocated from the same context from which
 <em>command_queue</em> was created. Otherwise the behavior is undefined.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. <strong>clEnqueueBarrierWithWaitList</strong> can be used
 instead. If the <em>event_wait_list</em> and the <em>event</em> arguments are not
 NULL, the <em>event</em> argument should not refer to an element of the
 <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueSVMMap</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if _command_queue_is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>svm_ptr</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if <em>size</em>
 is 0 or if values specified in _map_flags_are not valid.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
   CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the map operation is
 blocking and the execution status of any of the events in
 <em>event_wait_list</em> is a negative integer value.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueSVMUnmap(cl_command_queue command_queue,
                          void *svm_ptr,
                          cl_uint num_events_in_wait_list,
                          const cl_event *event_wait_list,
                          cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to indicate that the host has completed updating the
 region given by <em>svm_ptr</em> and which was specified in a previous call to
 <strong>clEnqueueSVMMap</strong>.</p></div>
 <div class="paragraph"><p><em>command_queue</em> must be a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>svm_ptr</em> is a pointer that was specified in a previous call to
 <strong>clEnqueueSVMMap</strong>. If <em>svm_ptr</em> is allocated using <strong>clSVMAlloc</strong> then it
 must be allocated from the same context from which <em>command_queue</em> was
 created. Otherwise the behavior is undefined.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before <strong>clEnqueueSVMUnmap</strong> can be executed. If
 <em>event_wait_list</em> is NULL, then <strong>clEnqueueUnmap</strong> does not wait on any
 event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue a wait for
 this command to complete. <strong>clEnqueueBarrierWithWaitList</strong> can be used
 instead. If the <em>event_wait_list</em> and the <em>event</em> arguments are not
 NULL, the <em>event</em> argument should not refer to an element of the
 <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueSVMUnmap</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>svm_ptr</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or if
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>clEnqueueSVMMap</strong>  and <strong>clEnqueueSVMUnmap</strong> act as synchronization points
 for the region of the SVM buffer specified in these calls.</p></div>
 <div class="paragraph"><p>NOTE:</p></div>
 <div class="paragraph"><p>If a coarse-grained SVM buffer is currently mapped for writing, the
 application must ensure that the SVM buffer is unmapped before any
 enqueued kernels or commands that read from or write to this SVM buffer
 or any of its associated cl_mem buffer objects begin execution;
 otherwise the behavior is undefined.</p></div>
 <div class="paragraph"><p>If a coarse-grained SVM buffer is currently mapped for reading, the
 application must ensure that the SVM buffer is unmapped before any
 enqueued kernels or commands that write to this memory object or any of
 its associated cl_mem buffer objects begin execution; otherwise the
 behavior is undefined.</p></div>
 <div class="paragraph"><p>A SVM buffer is considered as mapped if there are one or more active
 mappings for the SVM buffer irrespective of whether the mapped regions
 span the entire SVM buffer.</p></div>
 <div class="paragraph"><p>The above note does not apply to fine-grained SVM buffers (fine-grained
 buffers allocated using <strong>clSVMAlloc</strong> or fine-grained system
 allocations).</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueSVMMigrateMem(cl_command_queue command_queue,
                               cl_uint num_svm_pointers,
                               const void **svm_pointers,
                               const size_t *sizes,
                               cl_mem_migration_flags flags,
                               cl_uint num_events_in_wait_list,
                               const cl_event *event_wait_list,
                               cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to indicate which device a set of ranges of SVM
 allocations should be associated with. Once the event returned by
 <strong>clEnqueueSVMMigrateMem</strong> has become CL_COMPLETE, the ranges specified by
 svm pointers and sizes have been successfully migrated to the device
 associated with command queue.</p></div>
 <div class="paragraph"><p>The user is responsible for managing the event dependencies associated
 with this command in order to avoid overlapping access to SVM
 allocations. Improperly specified event dependencies passed to
 <strong>clEnqueueSVMMigrateMem</strong> could result in undefined results.</p></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command queue. The specified set of
 allocation ranges will be migrated to the OpenCL device associated with
 <em>command_queue</em>.</p></div>
 <div class="paragraph"><p><em>num_svm_pointers</em> is the number of pointers in the specified
 <em>svm_pointers</em> array, and the number of sizes in the <em>sizes</em> array, if
 _sizes_is not NULL.</p></div>
 <div class="paragraph"><p><em>svm_pointers</em> is a pointer to an array of pointers. Each pointer in
 this array must be within an allocation produced by a call to
 <strong>clSVMAlloc</strong>.</p></div>
 <div class="paragraph"><p><em>sizes</em> is an array of sizes. The pair <em>svm_pointers</em>[i] and <em>sizes</em>[i]
 together define the starting address and number of bytes in a range to
 be migrated. <em>sizes</em> may be NULL indicating that every allocation
 containing any <em>svm_pointer</em>[i] is to be migrated. Also, if <em>sizes</em>[i]
 is zero, then the entire allocation containing <em>svm_pointer</em>[i] is
 migrated.</p></div>
 <div class="paragraph"><p><em>flags</em> is a bit-field that is used to specify migration options. <em>Table
 5.12</em> describes the possible values for <em>flags</em>.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular command
 and can be used to query or queue a wait for this particular command to
 complete. <em>event</em> can be NULL in which case it will not be possible for
 the application to query the status of this command or queue another
 command that waits for this command to complete. If the
 <em>event_wait_list</em> and <em>event</em> arguments are not NULL, the <em>event</em>
 argument should not refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueSVMMigrateMem</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 _num_svm_pointers is zero_or_svm_pointers_is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>sizes</em><em><span class="i">is non-zero</span></em>range [<em>svm_pointers</em>[i],
 <em>svm_pointers</em>[i]+<em>sizes</em>[i]) is not contained within an existing
 <strong>clSVMAlloc</strong> allocation.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or if
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_memory_consistency_for_svm_allocations">5.6.2. Memory consistency for SVM allocations</h4>
 <div class="paragraph"><p>To ensure memory consistency in SVM allocations, the program can rely on
 the guaranteed memory consistency at synchronization points. This
 consistency support already exists in OpenCL 1.x and can be used for
 coarse-grained SVM allocations or for fine-grained buffer SVM
 allocations; what SVM adds is the ability to share pointers between the
 host and all SVM devices.</p></div>
 <div class="paragraph"><p>In addition, sub-buffers can also be used to ensure that each device
 gets a consistent view of a SVM buffers memory when it is shared by
 multiple devices. For example, assume that two devices share a SVM
 pointer. The host can create a cl_mem buffer object using
 <strong>clCreateBuffer</strong> with CL_MEM_USE_HOST_PTR and <em>host_ptr</em> set to the SVM
 pointer and then create two disjoint sub-buffers with starting virtual
 addresses <em>sb1_ptr</em> and <em>sb2_ptr</em>. These pointers (<em>sb1_ptr</em> and
 <em>sb2_ptr</em>) can be passed to kernels executing on the two devices.
 <strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueUnmapMemObject</strong> and the existing
 access rules for memory objects (in <em>section 5.5.3</em>) can be used to
 ensure consistency for buffer regions (<em>sb1_ptr</em> and <em>sb2_ptr</em>) read and
 written by these kernels.</p></div>
 <div class="paragraph"><p>When the host and devices are able to use SVM atomic operations (i.e.
 CL_DEVICE_SVM_ATOMICS is set in CL_DEVICE_SVM_CAPABILITIES), these
 atomic operations can be used to provide memory consistency at a fine
 grain in a shared memory region. The effect of these operations is
 visible to the host and all devices with which that memory is shared.</p></div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_sampler_objects">5.7. Sampler Objects</h3>
 <div class="paragraph"><p>A sampler object describes how to sample an image when the image is read
 in the kernel. The built-in functions to read from an image in a kernel
 take a sampler as an argument. The sampler arguments to the image read
 function can be sampler objects created using OpenCL functions and
 passed as argument values to the kernel or can be samplers declared
 inside a kernel. In this section we discuss how sampler objects are
 created using OpenCL functions.</p></div>
 <div class="sect3">
 <h4 id="_creating_sampler_objects">5.7.1. Creating Sampler Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_sampler clCreateSamplerWithProperties(cl_context context,
                                          const cl_sampler_properties *sampler_properties,
                                          cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates a sampler object.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>sampler_properties</em> specifies a list of sampler property names and
 their corresponding values. Each sampler property name is immediately
 followed by the corresponding desired value. The list is terminated
 with 0. The list of supported properties is described in <em>table 5.15</em>.
 If a supported property and its value is not specified in
 <em>sampler_properties</em>, its default value will be used.
 <em>sampler_properties</em> can be NULL in which case the default values for
 supported sampler properties will be used.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 21. <em>List of supported cl_sampler_properties values and description</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_sampler_properties <br>
  enum</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Property Value</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SAMPLER_NORMALIZED_ COORDS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A boolean value that specifies
 whether the image coordinates
 specified are normalized or not.
 <br>
 <br>
 The default value (i.e. the value used
 if this property is not specified in
 sampler_properties) is CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SAMPLER_ADDRESSING_ MODE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_addressing_ + mode</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Specifies how out-of-range image
 coordinates are handled when reading
 from an image.
 <br>
 <br>
 Valid values are:
 <br>
 <br>
 CL_ADDRESS_MIRRORED_REPEAT
 CL_ADDRESS_REPEAT
 CL_ADDRESS_CLAMP_
 CL_ADDRESS_CLAMP
 CL_ADDRESS_NONE
 <br>
 <br>
 The default is
 CL_ADDRESS_CLAMP.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SAMPLER_FILTER_MODE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_filter_mode</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Specifies the type of filter that must
 be applied when reading an image.
 Valid values are:
 <br>
 <br>
 CL_FILTER_NEAREST
 CL_FILTER_LINEAR
 <br>
 <br>
 The default value is
 CL_FILTER_NEAREST.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateSamplerWithProperties</strong> returns a valid non-zero sampler object
 and <em>errcode_ret</em> is set to CL_SUCCESS if the sampler object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 property name in <em>sampler_properties</em> is not a supported property name,
 if the value specified for a supported property name is not valid, or if
 the same property name is specified more than once.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 images are not supported by any device associated with <em>context</em> (i.e.
 CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_FALSE).
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainSampler(cl_sampler sampler)</pre>
 </div></div>
 <div class="paragraph"><p>increments the <em>sampler</em> reference count.
 <strong>clCreateSamplerWithProperties</strong> performs an implicit retain.
 <strong>clRetainSampler</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_SAMPLER if
 <em>sampler</em> is not a valid sampler object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseSampler(cl_sampler sampler)</pre>
 </div></div>
 <div class="paragraph"><p>decrements the <em>sampler</em> reference count. The sampler object is deleted
 after the reference count becomes zero and commands queued for execution
 on a command-queue(s) that use <em>sampler</em> have finished.
 <strong>clReleaseSampler</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_SAMPLER if
 <em>sampler</em> is not a valid sampler object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Using this function to release a reference that was not obtained by
 creating the object or by calling <strong>clRetainSampler</strong> causes undefined
 behavior.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_sampler_object_queries">5.7.2. Sampler Object Queries</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetSamplerInfo(cl_sampler sampler,
                         cl_sampler_info param_name,
                         size_t param_value_size,
                         void *param_value,
                         size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns information about the sampler object.</p></div>
 <div class="paragraph"><p><em>sampler</em> specifies the sampler being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetSamplerInfo</strong> is described in <em>table 5.16</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.16.</em></p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.
  </p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 22. <em>clGetSamplerInfo</em> <em>parameter queries</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_sampler_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_SAMPLER_REFERENCE_ COUNT*<span class="footnote"><br>[The reference count returned should be considered immediately stale. It is unsuitable for general use in
 applications. This feature is provided for identifying memory leaks.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the <em>sampler</em> reference
 count.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SAMPLER_CONTEXT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_context</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the context specified when the
 sampler is created.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SAMPLER_NORMALIZED_ COORDS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the normalized coords value
 associated with <em>sampler</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SAMPLER_ADDRESSING_ MODE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_addressing_mode</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the
 addressing mode value associated with <em>sampler</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SAMPLER_FILTER_MODE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_filter_mode</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the filter mode value
 associated with <em>sampler</em>.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetSamplerInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table
 5.16_and _param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_SAMPLER if
 <em>sampler</em> is a not a valid sampler object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_program_objects">5.8. Program Objects</h3>
 <div class="paragraph"><p>An OpenCL program consists of a set of kernels that are identified as
 functions declared with the <em>kernel qualifier in the program source.
 OpenCL programs may also contain auxiliary functions and constant data
 that can be used by </em>kernel functions. The program executable can be
 generated <em>online</em> or <em>offline</em> by the OpenCL compiler for the
 appropriate target device(s).</p></div>
 <div class="paragraph"><p>A program object encapsulates the following information:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       An associated context.
 </p>
 </li>
 <li>
 <p>
       A program source or binary.
 </p>
 </li>
 <li>
 <p>
       The latest successfully
 built program executable, library or compiled binary, the list of
 devices for which the program executable, library or compiled binary is
 built, the build options used and a build log.
 </p>
 </li>
 <li>
 <p>
       The number of kernel objects currently attached.
 </p>
 </li>
 </ul></div>
 <div class="sect3">
 <h4 id="_creating_program_objects">5.8.1. Creating Program Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_program clCreateProgramWithSource(cl_context context,
                                      cl_uint count,
                                      const char **strings,
                                      const size_t *lengths,
                                      cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates a program object for a context, and loads the source code
 specified by the text strings in the <em>strings</em> array into the program
 object. The devices associated with the program object are the devices
 associated with <em>context</em>. The source code specified by <em>strings</em> is
 either an OpenCL C program source, header or implementation-defined
 source for custom devices that support an online compiler. OpenCL C++ is
 not supported as an online-compiled kernel language through this
 interface.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>strings</em> is an array of <em>count</em> pointers to optionally null-terminated
 character strings that make up the source code.</p></div>
 <div class="paragraph"><p>The <em>lengths</em> argument is an array with the number of chars in each
 string (the string length). If an element in <em>lengths</em> is zero, its
 accompanying string is null-terminated. If <em>lengths</em> is NULL, all
 strings in the <em>strings</em> argument are considered null-terminated. Any
 length value passed in that is greater than zero excludes the null
 terminator in its count.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateProgramWithSource</strong> returns a valid non-zero program object and
 <em>errcode_ret</em> is set to CL_SUCCESS if the program object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>count</em> is zero or if <em>strings</em> or any entry in <em>strings</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_program clCreateProgramWithIL(cl_context context,
                                  const void *il,
                                  size_t length,
                                  cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates a program object for a context, and loads the IL pointed to by
 _il_and with length in bytes _length_into the program object.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>il_is a pointer to a _length</em>-byte block of memory containing SPIR-V or
 an implementation-defined intermediate language.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateProgramWithIL</strong> returns a valid non-zero program object and
 <em>errcode_ret</em> is set to CL_SUCCESS if the program object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if <em>il</em>
 is NULL or if _length_is zero.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 <em>length</em>-byte memory pointed to by <em>il</em> does not contain well-formed
 intermediate language input that can be consumed by the OpenCL runtime.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_program clCreateProgramWithBinary(cl_context context,
                                      cl_uint num_devices,
                                      const cl_device_id *device_list,
                                      const size_t *lengths,
                                      const unsigned char **binaries,
                                      cl_int *binary_status,
                                      cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates a program object for a context, and loads the binary bits
 specified by <em>binary</em> into the program object.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>device_list</em> is a pointer to a list of devices that are in <em>context</em>.
 <em>device_list</em> must be a non-NULL value. The binaries are loaded for
 devices specified in this list.</p></div>
 <div class="paragraph"><p><em>num_devices_is the number of devices listed in _device_list</em>.</p></div>
 <div class="paragraph"><p>The devices associated with the program object will be the list of
 devices specified by <em>device_list</em>. The list of devices specified by
 <em>device_list</em> must be devices associated with <em>context</em>.</p></div>
 <div class="paragraph"><p><em>lengths</em> is an array of the size in bytes of the program binaries to be
 loaded for devices specified by <em>device_list</em>.</p></div>
 <div class="paragraph"><p><em>binaries</em> is an array of pointers to program binaries to be loaded for
 devices specified by <em>device_list</em>. For each device given by
 <em>device_list</em>[i], the pointer to the program binary for that device is
 given by <em>binaries</em>[i] and the length of this corresponding binary is
 given by <em>lengths</em>[i]. <em>lengths</em>[i] cannot be zero and <em>binaries</em>[i]
 cannot be a NULL pointer.</p></div>
 <div class="paragraph"><p>The program binaries specified by <em>binaries</em> contain the bits that
 describe one of the following:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       a program executable to be
 run on the device(s) associated with <em>context</em>,
 </p>
 </li>
 <li>
 <p>
       a compiled program for
 device(s) associated with <em>context</em>, or
 </p>
 </li>
 <li>
 <p>
       a library of compiled
 programs for device(s) associated with <em>context</em>.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The program binary can consist of either or both:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       Device-specific code and/or,
 </p>
 </li>
 <li>
 <p>
       Implementation-specific
 intermediate representation (IR) which will be converted to the
 device-specific code.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><em>binary_status</em> returns whether the program binary for each device
 specified in <em>device_list</em> was loaded successfully or not. It is an
 array of <em>num_devices</em> entries and returns CL_SUCCESS in
 <em>binary_status[i]</em> if binary was successfully loaded for device
 specified by <em>device_list[i]</em>; otherwise returns CL_INVALID_VALUE if
 <em>lengths[i]</em> is zero or if <em>binaries[i]</em> is a NULL value or
 CL_INVALID_BINARY in <em>binary_status[i]</em> if program binary is not a valid
 binary for the specified device. If <em>binary_status</em> is NULL, it is
 ignored.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateProgramWithBinary</strong> returns a valid non-zero program object and
 <em>errcode_ret</em> is set to CL_SUCCESS if the program object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>device_list</em> is NULL or <em>num_devices</em> is zero.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_DEVICE if
 OpenCL devices listed in <em>device_list</em> are not in the list of devices
 associated with <em>context</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>lengths</em> or <em>binaries</em> are NULL or if any entry in <em>lengths</em>[i] is zero
 or <em>binaries</em>[i] is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_BINARY if an
 invalid program binary was encountered for any device. <em>binary_status</em>
 will return specific status for each device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>OpenCL allows applications to create a program object using the program
 source or binary and build appropriate program executables. This can be
 very useful as it allows applications to load program source and then
 compile and link to generate a program executable online on its first
 instance for appropriate OpenCL devices in the system. These
 executables can now be queried and cached by the application.
 The cached executables can be read
 and loaded by the application, which can help significantly reduce the
 application initialization time.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_program clCreateProgramWithBuiltInKernels(cl_context context,
                                              cl_uint num_devices,
                                              const cl_device_id *device_list,
                                              const char *kernel_names,
                                              cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates a program object for a context, and loads the information
 related to the built-in kernels into a program object.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>num_devices_is the number of devices listed in _device_list</em>.</p></div>
 <div class="paragraph"><p><em>device_list</em> is a pointer to a list of devices that are in <em>context</em>.
 <em>device_list</em> must be a non-NULL value. The built-in kernels are
 loaded for devices specified in this list.</p></div>
 <div class="paragraph"><p>The devices associated with the program object will be the list of
 devices specified by <em>device_list</em>. The list of devices specified by
 <em>device_list</em> must be devices associated with <em>context</em>.</p></div>
 <div class="paragraph"><p><em>kernel_names</em> is a semi-colon separated list of built-in kernel names.</p></div>
 <div class="paragraph"><p><strong>clCreateProgramWithBuiltInKernels</strong> returns a valid non-zero program
 object and <em>errcode_ret</em> is set to CL_SUCCESS if the program object is
 created successfully. Otherwise, it returns a NULL value with one of
 the following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>device_list</em> is NULL or <em>num_devices</em> is zero.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>kernel_names</em> is NULL or <em>kernel_names</em> contains a kernel name that is
 not supported by any of the devices in <em>device_list</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_DEVICE if
 devices listed in <em>device_list</em> are not in the list of devices
 associated with <em>context</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_retaining_and_releasing_program_objects">5.8.2. Retaining and Releasing Program Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainProgram(cl_program program)</pre>
 </div></div>
 <div class="paragraph"><p>increments the <em>program</em> reference count. All <strong>clCreateProgram</strong> APIs
 do an implicit retain. <strong>clRetainProgram</strong> returns CL_SUCCESS if the
 function is executed successfully. Otherwise, it returns one of the
 following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is not a valid program object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseProgram(cl_program program)</pre>
 </div></div>
 <div class="paragraph"><p>decrements the <em>program</em> reference count. The program object is deleted
 after all kernel objects associated with <em>program</em> have been deleted and
 the <em>program</em> reference count becomes zero. <strong>clReleaseProgram</strong> returns
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is not a valid program object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Using this function to release a reference that was not obtained by
 creating the object or by calling <strong>clRetainProgram</strong> causes undefined
 behavior.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clSetProgramReleaseCallback(cl_program program,
                                    void (CL_CALLBACK *pfn_notify)
                                        (cl_program prog,
                                        void *user_data),
                                    void *user_data)</pre>
 </div></div>
 <div class="paragraph"><p>registers a user callback function with a program object. Each call to
 <strong>clSetProgramReleaseCallback</strong> registers the specified user callback
 function on a callback stack associated with program. The registered
 user callback functions are called in the reverse order in which they
 were registered. The user callback functions are called after
 destructors (if any) for program scope global variables (if any) are
 called and before the program is released. This provides a mechanism for
 the application (and libraries) to be notified when destructors are
 complete.</p></div>
 <div class="paragraph"><p><em>program</em> is a valid program object</p></div>
 <div class="paragraph"><p><em>pfn_notify</em> is the callback function that can be registered by the
 application. This callback function may be called asynchronously by the
 OpenCL implementation. It is the applications responsibility to ensure
 that the callback function is thread safe. The parameters to this
 callback function are:</p></div>
 <div class="paragraph"><p><em>prog</em> is the program object whose destructors are being called. When
 the user callback is called by the implementation, this program object
 is not longer valid. prog is only provided for reference purposes.</p></div>
 <div class="paragraph"><p><em>user_data</em> is a pointer to user supplied data. <em>user_data</em> will be
 passed as the <em>user_data</em> argument when <em>pfn_notify</em> is called. user
 data can be NULL.</p></div>
 <div class="paragraph"><p><strong>clSetProgramReleaseCallback</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is not a valid program object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pfn_notify</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_setting_spir_v_specialization_constants">5.8.3. Setting SPIR-V specialization constants</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clSetProgramSpecializationConstant(cl_program program,
                                           cl_uint spec_id,
                                           size_t spec_size,
                                           const void *spec_value)</pre>
 </div></div>
 <div class="paragraph"><p>sets the values of a SPIR-V specialization constants.</p></div>
 <div class="paragraph"><p><em>program</em> must be a valid OpenCL program created from a SPIR-V module.</p></div>
 <div class="paragraph"><p><em>spec</em> id_ identifies the SPIR-V specialization constant whose value
 will be set.</p></div>
 <div class="paragraph"><p><em>spec_size</em> specifies the size in bytes of the data pointed to by
 <em>spec_value</em>. This should be 1 for boolean constants. For all other
 constant types this should match the size of the specialization constant
 in the SPIR-V module.</p></div>
 <div class="paragraph"><p><em>spec_value</em> is a pointer to the memory location that contains the value
 of the specialization constant. The data pointed to by <em>spec_value</em> are
 copied and can be safely reused by the application after
 <strong>clSetProgramSpecializationConstant</strong> returns. This specialization value
 will be used by subsequent calls to <strong>clBuildProgram</strong> until another call
 to <strong>clSetProgramSpecializationConstant</strong> changes it. If a specialization
 constant is a boolean constant, _spec value_should be a pointer to a
 cl_uchar value. A value of zero will set the specialization constant to
 false; any other value will set it to true.</p></div>
 <div class="paragraph"><p>Calling this function multiple times for the same specialization
 constant shall cause the last provided value to override any previously
 specified value. The values are used by a subsequent <strong>clBuildProgram</strong>
 call for the <em>program</em>.</p></div>
 <div class="paragraph"><p>Application is not required to provide values for every specialization
 constant contained in SPIR-V module. SPIR-V provides default values for
 all specialization constants.</p></div>
 <div class="paragraph"><p><strong>clSetProgramSpecializationConstant</strong> returns CL_SUCCESS if the function
 is executed successfully.</p></div>
 <div class="paragraph"><p>Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is not a valid program object created from a SPIR-V module.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_SPEC_ID if
 <em>spec_id</em> is not a valid specialization constant ID
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>spec_size</em> does not match the size of the specialization constant in
 the SPIR-V module, or if <em>spec_value</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_building_program_executables">5.8.4. Building Program Executables</h4>
 <div class="paragraph"><p>
 The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clBuildProgram(cl_program program,
                       cl_uint num_devices,
                       const cl_device_id *device_list,
                       const char *options,
                       void (CL_CALLBACK *pfn_notify)
                           (cl_program program,
                           void *user_data),
                       void *user_data)</pre>
 </div></div>
 <div class="paragraph"><p>builds (compiles &amp; links) a program executable from the program source
 or binary for all the devices or a specific device(s) in the OpenCL
 context associated with <em>program</em>. OpenCL allows program executables to
 be built using the source or the binary. <strong>clBuildProgram</strong> must be
 called for <em>program</em> created using <strong>clCreateProgramWithSource</strong>,
 <strong>clCreateProgramWithIL</strong> or <strong>clCreateProgramWithBinary</strong> to build the
 program executable for one or more devices associated with <em>program</em>.
 If <em>program</em> is created with <strong>clCreateProgramWithBinary</strong>, then the
 program binary must be an executable binary (not a compiled binary or
 library).</p></div>
 <div class="paragraph"><p>The executable binary can be queried using <strong>clGetProgramInfo</strong>(<em>program</em>,
 CL_PROGRAM_BINARIES, ) and can be specified to
 <strong>clCreateProgramWithBinary</strong> to create a new program object.</p></div>
 <div class="paragraph"><p><em>program</em> is the program object.</p></div>
 <div class="paragraph"><p><em>device_list</em> is a pointer to a list of devices associated with
 <em>program</em>. If <em>device_list</em> is a NULL value, the program executable is
 built for all devices associated with <em>program</em> for which a source or
 binary has been loaded. If <em>device_list</em> is a non-NULL value, the
 program executable is built for devices specified in this list for which
 a source or binary has been loaded.</p></div>
 <div class="paragraph"><p><em>num_devices_is the number of devices listed in _device_list</em>.</p></div>
 <div class="paragraph"><p><em>options</em> is a pointer to a null-terminated string of characters that
 describes the build options to be used for building the program
 executable. The list of supported options is described <em>in section 5.8.6</em>.
 If the program was created using clCreateProgramWithBinary and <em>options</em> is
 a NULL pointer,
 the program will be built as if options were the same as when the program binary
 was originally built.
 If the program was created using clCreateProgramWithBinary and <em>options</em> string contains
 anything other than the same options in the same order (whitespace ignored) as when
 the program binary was originally built, then the behavior is implementation defined.</p></div>
 <div class="paragraph"><p><em>pfn_notify</em> is a function pointer to a notification routine. The
 notification routine is a callback function that an application can
 register and which will be called when the program executable has been
 built (successfully or unsuccessfully). If <em>pfn_notify</em> is not NULL,
 <strong>clBuildProgram</strong> does not need to wait for the build to complete and can
 return immediately once the build operation can begin. The build
 operation can begin if the context, program whose sources are being
 compiled and linked, list of devices and build options specified are all
 valid and appropriate host and device resources needed to perform the
 build are available. If <em>pfn_notify</em> is NULL, <strong>clBuildProgram</strong> does not
 return until the build has completed. This callback function may be
 called asynchronously by the OpenCL implementation. It is the
 applications responsibility to ensure that the callback function is
 thread-safe.</p></div>
 <div class="paragraph"><p><em>user_data</em> will be passed as an argument when <em>pfn_notify</em> is called.
 <em>user_data</em> can be NULL.</p></div>
 <div class="paragraph"><p><strong>clBuildProgram</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is not a valid program object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>device_list</em> is NULL and <em>num_devices</em> is greater than zero, or if
 <em>device_list</em> is not NULL and <em>num_devices</em> is zero.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pfn_notify</em> is NULL but <em>user_data</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_DEVICE if
 OpenCL devices listed in <em>device_list</em> are not in the list of devices
 associated with <em>program</em>
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_BINARY if
 <em>program</em> is created with <strong>clCreateProgramWithBinary</strong> and devices listed
 in <em>device_list</em> do not have a valid program binary loaded.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_BUILD_OPTIONS
 if the build options specified by <em>options</em> are invalid.
 </p>
 </li>
 <li>
 <p>
       CL_COMPILER_NOT_AVAILABLE
 if <em>program</em> is created with <strong>clCreateProgramWithSource</strong> and a compiler
 is not available i.e. CL_DEVICE_COMPILER_AVAILABLE specified in <em>table
 4.3</em> is set to CL_FALSE.
 </p>
 </li>
 <li>
 <p>
       CL_BUILD_PROGRAM_FAILURE
 if there is a failure to build the program executable. This error will
 be returned if <strong>clBuildProgram</strong> does not return until the build has
 completed.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the build of a program executable for any of the devices listed in
 <em>device_list</em> by a previous call to <strong>clBuildProgram</strong> for <em>program</em> has
 not completed.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 there are kernel objects attached to <em>program</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 <em>program</em> was not created with <strong>clCreateProgramWithSource,
 clCreateProgramWithIL</strong> or <strong>clCreateProgramWithBinary</strong>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_separate_compilation_and_linking_of_programs">5.8.5. Separate Compilation and Linking of Programs</h4>
 <div class="paragraph"><p>OpenCL programs are compiled and linked to support the following:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       Separate compilation and
 link stages. Program sources can be compiled to generate a compiled
 binary object and linked in a separate stage with other compiled program
 objects to the program exectuable.
 </p>
 </li>
 <li>
 <p>
       Embedded headers. In
 OpenCL 1.0 and 1.1, the I build option could be used to specify the
 list of directories to be searched for headers files that are included
 by a program source(s). OpenCL 1.2 extends this by allowing the header
 sources to come from program objects instead of just header files.
 </p>
 </li>
 <li>
 <p>
       Libraries. The linker can
 be used to link compiled objects and libraries into a program executable
 or to create a library of compiled binaries.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clCompileProgram(cl_program program,
                         cl_uint num_devices,
                         const cl_device_id *device_list,
                         const char *options,
                         cl_uint num_input_headers,
                         const cl_program *input_headers,
                         const char **header_include_names,
                         void (CL_CALLBACK *pfn_notify)
                             (cl_program program,
                             void *user_data),
                         void *user_data)</pre>
 </div></div>
 <div class="paragraph"><p>compiles a programs source for all the devices or a specific device(s)
 in the OpenCL context associated with <em>program</em>. The pre-processor runs
 before the program sources are compiled. The compiled binary is built
 for all devices associated with <em>program</em> or the list of devices
 specified. The compiled binary can be queried using
 <strong>clGetProgramInfo</strong>(<em>program</em>, CL_PROGRAM_BINARIES, ) and can be passed
 to <strong>clCreateProgramWithBinary</strong> to create a new program object.</p></div>
 <div class="paragraph"><p><em>program</em> is the program object that is the compilation target.</p></div>
 <div class="paragraph"><p><em>device_list</em> is a pointer to a list of devices associated with
 <em>program</em>. If <em>device_list</em> is a NULL value, the compile is performed
 for all devices associated with <em>program</em>. If <em>device_list</em> is a
 non-NULL value, the compile is performed for devices specified in this
 list.</p></div>
 <div class="paragraph"><p><em>num_devices_is the number of devices listed in _device_list</em>.</p></div>
 <div class="paragraph"><p><em>options</em> is a pointer to a null-terminated string of characters that
 describes the compilation options to be used for building the program
 executable. Certain options are ignored when program is created with IL.
 The list of supported options is as described <em>in section 5.8.4</em>.</p></div>
 <div class="paragraph"><p><em>num_input_headers</em> specifies the number of programs that describe
 headers in the array referenced by <em>input_headers</em>.</p></div>
 <div class="paragraph"><p><em>input_headers</em> is an array of program embedded headers created with
 <strong>clCreateProgramWithSource</strong>.</p></div>
 <div class="paragraph"><p><em>header_include_names</em> is an array that has a one to one correspondence
 with <em>input_headers</em>. Each entry in <em>header_include_names</em> specifies
 the include name used by source in <em>program</em> that comes from an embedded
 header. The corresponding entry in <em>input_headers</em> identifies the
 program object which contains the header source to be used. The
 embedded headers are first searched before the headers in the list of
 directories specified by the I compile option (as described in <em>section
 5.8.4.1</em>). If multiple entries in <em>header_include_names</em> refer to the
 same header name, the first one encountered will be used.</p></div>
 <div class="paragraph"><p>If <em>program</em> was created using clCreateProgramWithIL, then
 <em>num_input_headers</em>, <em>input_headers</em>, and <em>header_include_names</em> are
 ignored.</p></div>
 <div class="paragraph"><p>For example, consider the following program source:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre> #include &lt;foo.h&gt;
  #include &lt;mydir/myinc.h&gt;

  __kernel void

  image_filter (int n, int m,

   __constant float *filter_weights,

   __read_only image2d_t src_image,

   __write_only image2d_t dst_image)

  {

  ...

  }</pre>
 </div></div>
 <div class="paragraph"><p>This kernel includes two headers foo.h and mydir/myinc.h. The following
 describes how these headers can be passed as embedded headers in program
 objects:
  </p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre> cl_program foo_pg = clCreateProgramWithSource(context,

  1, &amp;foo_header_src, NULL, &amp;err);

  cl_program myinc_pg = clCreateProgramWithSource(context,

  1, &amp;myinc_header_src, NULL, &amp;err);

  // lets assume the program source described above is given

  // by program_A and is loaded via clCreateProgramWithSource

  cl_program input_headers[2] = \{ foo_pg, myinc_pg };

  char * input_header_names[2] = \{ foo.h, mydir/myinc.h };

  clCompileProgram(program_A,

  0, NULL, // num_devices &amp; device_list

  NULL, // compile_options

  2, // num_input_headers

  input_headers,

  input_header_names,

  NULL, NULL); // pfn_notify &amp; user_data</pre>
 </div></div>
 <div class="paragraph"><p><em>pfn_notify</em> is a function pointer to a notification routine. The
 notification routine is a callback function that an application can
 register and which will be called when the program executable has been
 built (successfully or unsuccessfully). If <em>pfn_notify</em> is not NULL,
 <strong>clCompileProgram</strong> does not need to wait for the compiler to complete
 and can return immediately once the compilation can begin. The
 compilation can begin if the context, program whose sources are being
 compiled, list of devices, input headers, programs that describe input
 headers and compiler options specified are all valid and appropriate
 host and device resources needed to perform the compile are available.
 If <em>pfn_notify</em> is NULL, <strong>clCompileProgram</strong> does not return until the
 compiler has completed. This callback function may be called
 asynchronously by the OpenCL implementation. It is the applications
 responsibility to ensure that the callback function is thread-safe.</p></div>
 <div class="paragraph"><p><em>user_data</em> will be passed as an argument when <em>pfn_notify</em> is called.
 <em>user_data</em> can be NULL.</p></div>
 <div class="paragraph"><p><strong>clCompileProgram</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is not a valid program object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>device_list</em> is NULL and <em>num_devices</em> is greater than zero, or if
 <em>device_list</em> is not NULL and <em>num_devices</em> is zero.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>num_input_headers</em> is zero and <em>header_include_names</em> or
 <em>input_headers</em> are not NULL or if <em>num_input_headers</em> is not zero and
 <em>header_include_names</em> or <em>input_headers</em> are NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pfn_notify</em> is NULL but <em>user_data</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_DEVICE if
 OpenCL devices listed in <em>device_list</em> are not in the list of devices
 associated with <em>program</em>
 </p>
 </li>
 <li>
 <p>
      CL_INVALID_COMPILER_OPTIONS if the compiler options specified by
 <em>options</em> are invalid.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the compilation or build of a program executable for any of the devices
 listed in <em>device_list</em> by a previous call to <strong>clCompileProgram</strong> or
 <strong>clBuildProgram</strong> for <em>program</em> has not completed.
 </p>
 </li>
 <li>
 <p>
       CL_COMPILER_NOT_AVAILABLE
 if a compiler is not available i.e. CL_DEVICE_COMPILER_AVAILABLE
 specified in <em>table 4.3</em> is set to CL_FALSE.
 </p>
 </li>
 <li>
 <p>
       CL_COMPILE_PROGRAM_FAILURE
 if there is a failure to compile the program source. This error will be
 returned if <strong>clCompileProgram</strong> does not return until the compile has
 completed.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 there are kernel objects attached to <em>program</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 <em>program</em> has no source or IL available, i.e. it has not been created
 with <strong>clCreateProgramWithSource</strong> or <strong>clCreateProgramWithIL</strong>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_program clLinkProgram(cl_context context,
                          cl_uint num_devices,
                          const cl_device_id *device_list,
                          const char *options,
                          cl_uint num_input_programs,
                          const cl_program *input_programs,
                          void (CL_CALLBACK *pfn_notify)
                              (cl_program program,
                              void *user_data),
                          void *user_data,
                          cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>links a set of compiled program objects and libraries for all the
 devices or a specific device(s) in the OpenCL context and creates a
 library or executable. <strong>clLinkProgram</strong> creates a new program object
 which contains the library or executable. The library or executable
 binary can be queried using <strong>clGetProgramInfo</strong>(<em>program</em>,
 CL_PROGRAM_BINARIES, ) and can be specified to
 <strong>clCreateProgramWithBinary</strong> to create a new program object.</p></div>
 <div class="paragraph"><p>The devices associated with the returned program object will be the list
 of devices specified by_device_list_ or if <em>device_list</em> is NULL it will
 be the list of devices associated with <em>context</em>.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>device_list</em> is a pointer to a list of devices that are in <em>context</em>.
 If <em>device_list</em> is a NULL value, the link is performed for all devices
 associated with <em>context</em> for which a compiled object is available. If
 <em>device_list</em> is a non-NULL value, the link is performed for devices
 specified in this list for which a compiled object is available.</p></div>
 <div class="paragraph"><p><em>num_devices_is the number of devices listed in _device_list</em>.</p></div>
 <div class="paragraph"><p><em>options</em> is a pointer to a null-terminated string of characters that
 describes the link options to be used for building the program
 executable. The list of supported options is as described <em>in section
 5.8.7</em>.
 If the program was created using clCreateProgramWithBinary and <em>options</em> is
 a NULL pointer,
 the program will be linked as if options were the same as when the program binary
 was originally built.
 If the program was created using clCreateProgramWithBinary and <em>options</em> string contains
 anything other than the same options in the same order (whitespace ignored)
 as when the program binary
 was originally built, then the behavior is implementation defined.</p></div>
 <div class="paragraph"><p><em>num_input_programs</em> specifies the number of programs in array
 referenced by <em>input_programs</em>.</p></div>
 <div class="paragraph"><p><em>input_programs</em> is an array of program objects that are compiled
 binaries or libraries that are to be linked to create the program
 executable. For each device in <em>device_list</em> or if <em>device_list</em> is
 NULL the list of devices associated with context, the following cases
 occur:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       All programs specified by
 <em>input_programs</em> contain a compiled binary or library for the device.
 In this case, a link is performed to generate a program executable for
 this device.
 </p>
 </li>
 <li>
 <p>
       None of the programs
 contain a compiled binary or library for that device. In this case, no
 link is performed and there will be no program executable generated for
 this device.
 </p>
 </li>
 <li>
 <p>
       All other cases will
 return a CL_INVALID_OPERATION error.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><em>pfn_notify</em> is a function pointer to a notification routine. The
 notification routine is a callback function that an application can
 register and which will be called when the program executable has been
 built (successfully or unsuccessfully).</p></div>
 <div class="paragraph"><p>If <em>pfn_notify</em> is not NULL, <strong>clLinkProgram</strong> does not need to wait for
 the linker to complete and can return immediately once the linking
 operation can begin. Once the linker has completed, the <em>pfn_notify</em>
 callback function is called which returns the program object returned by
 <strong>clLinkProgram</strong>. The application can query the link status and log for
 this program object. This callback function may be called asynchronously
 by the OpenCL implementation. It is the applications responsibility to
 ensure that the callback function is thread-safe.</p></div>
 <div class="paragraph"><p>If <em>pfn_notify</em> is NULL, <strong>clLinkProgram</strong> does not return until the
 linker has completed.</p></div>
 <div class="paragraph"><p><em>user_data</em> will be passed as an argument when <em>pfn_notify</em> is called.
 <em>user_data</em> can be NULL.</p></div>
 <div class="paragraph"><p>The linking operation can begin if the context, list of devices, input
 programs and linker options specified are all valid and appropriate host
 and device resources needed to perform the link are available. If the
 linking operation can begin, <strong>clLinkProgram</strong> returns a valid non-zero
 program object.</p></div>
 <div class="paragraph"><p>If <em>pfn_notify</em> is NULL, the <em>errcode_ret</em> will be set to CL_SUCCESS if
 the link operation was successful and CL_LINK_FAILURE if there is a
 failure to link the compiled binaries and/or libraries.</p></div>
 <div class="paragraph"><p>If <em>pfn_notify</em> is not NULL, <strong>clLinkProgram</strong> does not have to wait until
 the linker to complete and can return CL_SUCCESS in <em>errcode_ret</em> if the
 linking operation can begin. The <em>pfn_notify</em> callback function will
 return a CL_SUCCESS or CL_LINK_FAILURE if the linking operation was
 successful or not.</p></div>
 <div class="paragraph"><p>Otherwise <strong>clLinkProgram</strong> returns a NULL program object with an
 appropriate error in <em>errcode_ret</em>. The application should query the
 linker status of this program object to check if the link was successful
 or not. The list of errors that can be returned are:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 <em>context</em> is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>device_list</em> is NULL and <em>num_devices</em> is greater than zero, or if
 <em>device_list</em> is not NULL and <em>num_devices</em> is zero.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>num_input_programs</em> is zero and <em>input_programs</em> is NULL__or
 if_num_input_programs_ is zero and <em>input_programs</em> is not NULL or if
 <em>num_input_programs</em> is not zero and <em>input_programs</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 programs specified in <em>input_programs</em> are not valid program objects.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pfn_notify</em> is NULL but <em>user_data</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_DEVICE if
 OpenCL devices listed in <em>device_list</em> are not in the list of devices
 associated with <em>context</em>
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_LINKER_OPTIONS
 if the linker options specified by <em>options</em> are invalid.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the compilation or build of a program executable for any of the devices
 listed in <em>device_list</em> by a previous call to <strong>clCompileProgram</strong> or
 <strong>clBuildProgram</strong> for <em>program</em> has not completed.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the rules for devices containing compiled binaries or libraries as
 described in <em>input_programs</em> argument above are not followed.
 </p>
 </li>
 <li>
 <p>
       CL_LINKER_NOT_AVAILABLE if
 a linker is not available i.e. CL_DEVICE_LINKER_AVAILABLE specified in
 <em>table 4.3</em> is set to CL_FALSE.
 </p>
 </li>
 <li>
 <p>
       CL_LINK_PROGRAM_FAILURE if
 there is a failure to link the compiled binaries and/or libraries.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_compiler_options">5.8.6. Compiler Options</h4>
 <div class="paragraph"><p>The compiler options are categorized as pre-processor options, options
 for math intrinsics, options that control optimization and miscellaneous
 options. This specification defines a standard set of options that must
 be supported by the compiler when building program executables online or
 offline from OpenCL C/C++ or, where relevant, from an IL. These may be
 extended by a set of vendor- or platform-specific options.</p></div>
 <div class="sect4">
 <h5 id="_preprocessor_options">Preprocessor options</h5>
 <div class="paragraph"><p>These options control the OpenCL C/C++ preprocessor which is run on each
 program source before actual compilation. These options are ignored for
 programs created with IL.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-D _name_</pre>
 </div></div>
 <div class="paragraph"><p>Predefine <em>name</em> as a macro, with definition 1.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-D _name_=_definition_</pre>
 </div></div>
 <div class="paragraph"><p>The contents of <em>definition</em> are tokenized and processed as if they
 appeared during</p></div>
 <div class="paragraph"><p>translation phase three in a &#8216;#define&#8217; directive. In particular, the
 definition will be</p></div>
 <div class="paragraph"><p>truncated by embedded newline characters.</p></div>
 <div class="paragraph"><p>-D options are processed in the order they are given in the <em>options</em>
 argument to <strong>clBuildProgram</strong> or <strong>clCompileProgram</strong>. Note that a space is
 required between the -D option and the symbol it defines, otherwise
 behavior is implementation defined.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-I _dir_</pre>
 </div></div>
 <div class="paragraph"><p>Add the directory <em>dir</em> to the list of directories to be searched for
 header files.  <em>dir</em> can optionally be enclosed in double quotes.</p></div>
 <div class="paragraph"><p>This option is not portable due to its dependency on host file system and host operating
 system. It is supported for backwards compatibility with previous OpenCL versions.
 Developers are encouraged to create and use explicit header objects by means of
 clCompileProgram followed by clLinkProgram.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_math_intrinsics_options">Math Intrinsics Options</h5>
 <div class="paragraph"><p>These options control compiler behavior regarding floating-point
 arithmetic. These options trade off between speed and correctness.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-single-precision-constant</pre>
 </div></div>
 <div class="paragraph"><p>Treat double precision floating-point constant as single precision
 constant. This option is ignored for programs created with IL.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-denorms-are-zero</pre>
 </div></div>
 <div class="paragraph"><p>This option controls how single precision and double precision
 denormalized numbers are handled. If specified as a build option, the
 single precision denormalized numbers may be flushed to zero; double
 precision denormalized numbers may also be flushed to zero if the
 optional extension for double precision is supported. This is intended
 to be a performance hint and the OpenCL compiler can choose not to flush
 denorms to zero if the device supports single precision (or double
 precision) denormalized numbers.</p></div>
 <div class="paragraph"><p>This option is ignored for single precision numbers if the device does
 not support single precision denormalized numbers i.e. CL_FP_DENORM bit
 is not set in CL_DEVICE_SINGLE_FP_CONFIG.</p></div>
 <div class="paragraph"><p>This option is ignored for double precision numbers if the device does
 not support double precision or if it does support double precision but
 not double precision denormalized numbers i.e. CL_FP_DENORM bit is not
 set in CL_DEVICE_DOUBLE_FP_CONFIG.</p></div>
 <div class="paragraph"><p>This flag only applies for scalar and vector single precision
 floating-point variables and computations on these floating-point
 variables inside a program. It does not apply to reading from or
 writing to image objects.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-fp32-correctly-rounded-divide-sqrt</pre>
 </div></div>
 <div class="paragraph"><p>The -cl-fp32-correctly-rounded-divide-sqrt build option to
 <strong>clBuildProgram</strong> or</p></div>
 <div class="paragraph"><p><strong>clCompileProgram</strong> allows an application to specify that single
 precision floating-point</p></div>
 <div class="paragraph"><p>divide (x/y and 1/x) and sqrt used in the program source are correctly
 rounded. If</p></div>
 <div class="paragraph"><p>this build option is not specified, te minimum numerical accuracy of
 single precision</p></div>
 <div class="paragraph"><p>floating-point divide and sqrt are as defined in the SPIR-V OpenCL
 environment specification.</p></div>
 <div class="paragraph"><p>This build option can only be specified if the
 CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT is set in CL_DEVICE_SINGLE_FP_CONFIG
 (as defined in <em>table 4.3</em>) for devices that the program is being
 build. <strong>clBuildProgram</strong> or <strong>clCompileProgram</strong> will fail to compile the
 program for a device if the -cl-fp32-correctly-rounded-divide-sqrt
 option is specified and CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT is not set
 for the device.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_optimization_options">Optimization Options</h5>
 <div class="paragraph"><p>These options control various sorts of optimizations. Turning on
 optimization flags makes the compiler attempt to improve the performance
 and/or code size at the expense of compilation time and possibly the
 ability to debug the program.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-opt-disable</pre>
 </div></div>
 <div class="literalblock">
 <div class="content monospaced">
 <pre> This option disables all optimizations. The default is
 optimizations are enabled.</pre>
 </div></div>
 <div class="paragraph"><p>The following options control compiler behavior regarding floating-point
 arithmetic. These options trade off between performance and correctness
 and must be specifically enabled. These options are not turned on by
 default since it can result in incorrect output for programs which
 depend on an exact implementation of IEEE 754 rules/specifications for
 math functions.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-mad-enable</pre>
 </div></div>
 <div class="paragraph"><p>Allow a * b + c to be replaced by a mad. The mad computes a * b + c
 with</p></div>
 <div class="paragraph"><p>reduced accuracy. For example, some OpenCL devices implement mad as
 truncate the</p></div>
 <div class="paragraph"><p>result of a * b before adding it to c.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-no-signed-zeros</pre>
 </div></div>
 <div class="paragraph"><p>Allow optimizations for floating-point arithmetic that ignore the
 signedness of zero.</p></div>
 <div class="paragraph"><p>IEEE 754 arithmetic specifies the distinct behavior of +0.0 and -0.0
 values, which then prohibits simplification of expressions such as x+0.0
 or 0.0*x (even with -cl-finite-math only). This option implies that the
 sign of a zero result isn&#8217;t significant.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-unsafe-math-optimizations</pre>
 </div></div>
 <div class="paragraph"><p>Allow optimizations for floating-point arithmetic that (a) assume that
 arguments and results are valid, (b) may violate IEEE 754 standard and
 (c) may violate the OpenCL numerical compliance requirements as defined
 in the SPIR-V OpenCL environment specification for single precision and
 double precision floating-point, and edge case behavior in the SPIR-V
 OpenCL environment specification. This option includes the
 -cl-no-signed-zeros and -cl-mad-enable options.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-finite-math-only</pre>
 </div></div>
 <div class="paragraph"><p>Allow optimizations for floating-point arithmetic that assume that
 arguments and results</p></div>
 <div class="paragraph"><p>are not NaNs, +Inf, -Inf. This option may violate the OpenCL numerical
 compliance requirements for single precision and double precision
 floating-point, as well as edge case behavior. The original and modified
 values are defined in the SPIR-V OpenCL environment specification</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-fast-relaxed-math</pre>
 </div></div>
 <div class="paragraph"><p>Sets the optimization options -cl-finite-math-only and
 -cl-unsafe-math-optimizations.</p></div>
 <div class="paragraph"><p>This allows optimizations for floating-point arithmetic that may violate
 the IEEE 754
 standard and the OpenCL numerical compliance requirements for single
 precision and double precision floating-point, as well as floating point
 edge case behavior. This option also relaxes the precision of commonly
 used math functions. This option causes the preprocessor macro
 <em>FAST_RELAXED_MATH</em> to be defined in the OpenCL program. The original
 and modified values are defined in the SPIR-V OpenCL environment
 specification</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-uniform-work-group-size</pre>
 </div></div>
 <div class="paragraph"><p>This requires that the global work-size be a multiple of the work-group
 size specified to
 <strong>clEnqueueNDRangeKernel</strong>. Allow optimizations that are made possible by
 this restriction.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-no-subgroup-ifp</pre>
 </div></div>
 <div class="paragraph"><p>This indicates that kernels in this program do not require subgroups to
 make independent forward progress. Allows optimizations that are made
 possible by this restriction. This option has no effect for devices
 that do not support independent forward progress for subgroups.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_options_to_request_or_suppress_warnings">Options to Request or Suppress Warnings</h5>
 <div class="paragraph"><p>Warnings are diagnostic messages that report constructions which are not
 inherently erroneous but which are risky or suggest there may have been
 an error. The following language-independent options do not enable
 specific warnings but control the kinds of diagnostics produced by the
 OpenCL compiler. These options are ignored for programs created with IL.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-w</pre>
 </div></div>
 <div class="paragraph"><p>Inhibit all warning messages.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-Werror</pre>
 </div></div>
 <div class="paragraph"><p>Make all warnings into errors.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_options_controlling_the_opencl_c_version">Options Controlling the OpenCL C version</h5>
 <div class="paragraph"><p>The following option controls the version of OpenCL C that the compiler
 accepts. These options are ignored for programs created with IL.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-std=</pre>
 </div></div>
 <div class="paragraph"><p>Determine the OpenCL C language version to use. A value for this
 option must be provided. Valid values are:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL1.1  Support all OpenCL C programs that use the OpenCL C language
 features defined in <em>section 6</em> of the OpenCL 1.1 specification.
 </p>
 </li>
 <li>
 <p>
 CL1.2  Support all OpenCL C programs that use the OpenCL C language
 features defined in <em>section 6</em> of the OpenCL 1.2 specification.
 </p>
 </li>
 <li>
 <p>
 CL2.0  Support all OpenCL C programs that use the OpenCL C language
 features defined in <em>section 6</em> OpenCL C 2.0 specification.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Calls to <strong>clBuildProgram</strong> or <strong>clCompileProgram</strong> with the -cl-std=CL1.1
 option <strong>will fail</strong> to compile the program for any devices with
 CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.0.</p></div>
 <div class="paragraph"><p>Calls to <strong>clBuildProgram</strong> or <strong>clCompileProgram</strong> with the -cl-std=CL1.2
 option <strong>will fail</strong> to compile the program for any devices with
 CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.0.</p></div>
 <div class="paragraph"><p>Calls to <strong>clBuildProgram</strong> or <strong>clCompileProgram</strong> with the -cl-std=CL2.0
 option <strong>will fail</strong> to compile the program for any devices with
 CL_DEVICE_OPENCL_C_VERSION = OpenCL C 1.0, OpenCL C 1.1 or OpenCL C 1.2.</p></div>
 <div class="paragraph"><p>If the cl-std build option is not specified, the highest OpenCL C 1.x
 language version supported by each device is used when compiling the
 program for each device. Applications are required to specify the
 cl-std=CL2.0 option if they want to compile or build their programs
 with OpenCL C 2.0.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_options_for_querying_kernel_argument_information">Options for Querying Kernel Argument Information</h5>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-kernel-arg-info</pre>
 </div></div>
 <div class="paragraph"><p>This option allows the compiler to store information about the arguments
 of a kernel(s) in</p></div>
 <div class="paragraph"><p>the program executable. The argument information stored includes the
 argument name,</p></div>
 <div class="paragraph"><p>its type, the address space and access qualifiers used. Refer to
 description of</p></div>
 <div class="paragraph"><p><strong>clGetKernelArgInfo</strong> on how to query this information.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_options_for_debugging_your_program">Options for debugging your program</h5>
 <div class="paragraph"><p>The following option is available.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-g</pre>
 </div></div>
 <div class="paragraph"><p>This option can currently be used to generate additional errors for the
 built-in functions that allow you to enqueue commands on a device (refer
 to OpenCL kernel languages specifications).</p></div>
 </div>
 </div>
 <div class="sect3">
 <h4 id="_linker_options">5.8.7. Linker Options</h4>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>This specification defines a standard set of linker options that must be
 supported by the OpenCL C compiler when linking compiled programs online
 or offline. These linker options are categorized as library linking
 options and program linking options. These may be extended by a set of
 vendor- or platform-specific options.</p></div>
 <div class="sect4">
 <h5 id="_library_linking_options">Library Linking Options</h5>
 <div class="paragraph"><p>The following options can be specified when creating a library of
 compiled binaries.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-create-library</pre>
 </div></div>
 <div class="paragraph"><p>Create a library of compiled binaries specified in <em>input_programs</em>
 argument to <strong>clLinkProgram</strong>.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-enable-link-options</pre>
 </div></div>
 <div class="paragraph"><p>Allows the linker to modify the library behavior based on one or more
 link options</p></div>
 <div class="paragraph"><p>(described in <em>section 5.8.5.2</em>) when this library is linked with a
 program executable. This</p></div>
 <div class="paragraph"><p>option must be specified with the create-library option.</p></div>
 </div>
 <div class="sect4">
 <h5 id="_program_linking_options">Program Linking Options</h5>
 <div class="paragraph"><p>The following options can be specified when linking a program
 executable.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>-cl-denorms-are-zero
 -cl-no-signed-zeroes
 -cl-unsafe-math-optimizations
 -cl-finite-math-only
 -cl-fast-relaxed-math
 -cl-no-subgroup-ifp</pre>
 </div></div>
 <div class="paragraph"><p>The options are described in <em>section 5.8.4.2</em> and <em>section 5.8.4.3</em>.
 The linker may apply these options to all compiled program objects
 specified to <strong>clLinkProgram</strong>. The linker may apply these options only
 to libraries which were created with the enable-link-option.</p></div>
 </div>
 </div>
 <div class="sect3">
 <h4 id="_unloading_the_opencl_compiler">5.8.8. Unloading the OpenCL Compiler</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int  clUnloadPlatformCompiler(cl_platform_id platform)</pre>
 </div></div>
 <div class="paragraph"><p>allows the implementation to release the resources allocated by the
 OpenCL compiler for <em>platform</em>. This is a hint from the application and
 does not guarantee that the compiler will not be used in the future or
 that the compiler will actually be unloaded by the implementation.
 Calls to <strong>clBuildProgram</strong>, <strong>clCompileProgram</strong> or <strong>clLinkProgram</strong> after
 <strong>clUnloadPlatformCompiler</strong> will reload the compiler, if necessary, to
 build the appropriate program executable.</p></div>
 <div class="paragraph"><p><strong>clUnloadPlatformCompiler</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PLATFORM if
 <em>platform</em> is not a valid platform.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect3">
 <h4 id="_program_object_queries">5.8.9. Program Object Queries</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetProgramInfo(cl_program program,
                         cl_program_info param_name,
                         size_t param_value_size,
                         void *param_value,
                         size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns information about the program object.</p></div>
 <div class="paragraph"><p><em>program</em> specifies the program object being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetProgramInfo</strong> is described in <em>table 5.17</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.17.</em></p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 23. <em>clGetProgramInfo parameter queries</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_program_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_PROGRAM_REFERENCE_ COUNT*<span class="footnote"><br>[The reference count returned should be considered immediately stale. It is unsuitable for general use in
 applications. This feature is provided for identifying memory leaks.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the <em>program</em> reference count.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_CONTEXT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_context</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the context specified when the
 program object is created</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_NUM_DEVICES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the number of devices
 associated with <em>program</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_DEVICES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_id[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the list of devices
 associated with the program object. This can be the devices associated
 with context on which the program object has been created or can be a
 subset of devices that are specified when a progam object is created
 using <strong>clCreateProgramWithBinary</strong>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_SOURCE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the program source code specified
 by clCreateProgramWithSource. The
 source string returned is a concatenation of
 all source strings specified to
 clCreateProgramWithSource with a null
 terminator. The concatenation strips any
 nulls in the original source strings.
 <br>
 <br>
 If program is created using
 clCreateProgramWithBinary,
 clCreateProgramWithIL or
 clCreateProgramWithBuiltinKernels, a
 null string or the appropriate program
 source code is returned depending on
 whether or not the program source code is
 stored in the binary.
 <br>
 <br>
 The actual number of characters that
 represents the program source code
 including the null terminator is returned in
 param_value_size_ret.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_IL</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the program IL for programs created
 with clCreateProgramWithIL.
 <br>
 <br>
 If program is created with
 clCreateProgramWithSource,
 clCreateProgramWithBinary or
 clCreateProgramWithBuiltinKernels the
 memory pointed to by param_value will be
 unchanged and param_value_size_retwill be
 set to 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_BINARY_SIZES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns an array that contains the size in
 bytes of the program binary (could be an
 executable binary, compiled binary or
 library binary) for each device associated
 with program. The size of the array is the
 number of devices associated with
 program. If a binary is not available for a
 device(s), a size of zero is returned.
 <br>
 <br>
 If program is created using
 clCreateProgramWithBuiltinKernels,
 the implementation may return zero in any
 entries of the returned array.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_BINARIES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">unsigned <br>
  char *[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the program binaries (could be an
 executable binary, compiled binary or
 library binary) for all devices associated
 with program.   For each device in
 program, the binary returned can be the
 binary specified for the device when
 program is created with
 clCreateProgramWithBinary or it can be
 the executable binary generated by
 clBuildProgram or clLinkProgram.  If
 program is created with
 clCreateProgramWithSource or
 clCreateProgramWithIL, the binary
 returned is the binary generated by
 clBuildProgram, clCompileProgram or
 clLinkProgram.  The bits returned can be
 an implementation-specific intermediate
 representation (a.k.a. IR) or device specific
 executable bits or both. The decision on
 which information is returned in the binary
 is up to the OpenCL implementation.
 <br>
 <br>
 param_value points to an array of n
 pointers allocated by the caller, where n is
 the number of devices associated with
 program. The buffer sizes needed to
 allocate the memory that these n pointers
 refer to can be queried using the
 CL_PROGRAM_BINARY_SIZES query as
 described in this table.
 <br>
 <br>
 Each entry in this array is used by the
 implementation as the location in memory
 where to copy the program binary for a
 specific device, if there is a binary
 available.  To find out which device the
 program binary in the array refers to, use
 the CL_PROGRAM_DEVICES query to get
 the list of devices. There is a one-to-one
 correspondence between the array of n
 pointers returned by
 CL_PROGRAM_BINARIES and array of
 devices returned by
 CL_PROGRAM_DEVICES.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_NUM_KERNELS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the number of kernels
 declared in <em>program</em> that can be created with <strong>clCreateKernel</strong>. This
 information is only available after a successful program executable has
 been built for at least one device in the list of devices associated
 with <em>program</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_KERNEL_ NAMES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns a semi-colon separated list
 of kernel names in <em>program</em> that can be created with <strong>clCreateKernel</strong>.
 This information is only available after a successful program executable
 has been built for at least one device in the list of devices associated
 with <em>program</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_SCOPE_ GLOBAL_CTORS_PRESENT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This indicates that
 the <em>program</em> object contains non-trivial constructor(s) that will be
 executed by runtime before any kernel from the program is executed.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_SCOPE_ GLOBAL_DTORS_PRESENT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This indicates that
 the program object contains non-trivial destructor(s) that will be
 executed by runtime when <em>program</em> is
 destroyed. </p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><strong>clGetProgramInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table
 5.17_and _param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is a not a valid program object.
 </p>
 </li>
 <li>
 <p>

 CL_INVALID_PROGRAM_EXECUTABLE if <em>param_name</em> is CL_PROGRAM_NUM_KERNELS
 or CL_PROGRAM_KERNEL_NAMES and a successful program executable has not
 been built for at least one device in the list of devices associated
 with <em>program</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetProgramBuildInfo(cl_program program,
                              cl_device_id device,
                              cl_program_build_info param_name,
                              size_t param_value_size,
                              void *param_value,
                              size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns build information for each device in the program object.</p></div>
 <div class="paragraph"><p><em>program</em> specifies the program object being queried.</p></div>
 <div class="paragraph"><p><em>device</em> specifies the device for which build information is being
 queried. <em>device</em> must be a valid device associated with <em>program</em>.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetProgramBuildInfo</strong> is described in <em>table 5.18</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.18.</em></p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 24. <em>clGetProgramBuildInfo</em> <em>parameter queries.</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_program_build_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in
 <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_PROGRAM_BUILD_<br>
 STATUS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_build_status</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the build, compile or link status,
 whichever was performed last on program for
 device.
 <br>
 <br>
 This can be one of the following:
 <br>
 <br>
 CL_BUILD_NONE. The build status returned if
 no clBuildProgram, clCompileProgram or
 clLinkProgram has been performed on the
 specified program object for device.
 <br>
 <br>
 CL_BUILD_ERROR. The build status returned
 if clBuildProgram, clCompileProgram or
 clLinkProgram whichever was performed last
 on the specified program object for device
 generated an error.
 <br>
 <br>
 CL_BUILD_SUCCESS. The build status
 returned if clBuildProgram,
 clCompileProgram or clLinkProgram
 whichever was performed last on the specified
 program object for device was successful.
 <br>
 <br>
 CL_BUILD_IN_PROGRESS. The build status
 returned if clBuildProgram,
 clCompileProgram or clLinkProgram
 whichever was performed last on the specified
 program object for device has not finished.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_PROGRAM_BUILD_<br>
 OPTIONS</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the build, compile or link options
 specified by the options argument in
 clBuildProgram, clCompileProgram or
 clLinkProgram, whichever was performed last
 on program for device.
 <br>
 <br>
 If build status of program for device is
 CL_BUILD_NONE, an empty string is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_PROGRAM_BUILD_<br>
 LOG</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the build or compile log for
 clBuildProgram or clCompileProgram
 whichever was performed last on program for
 device.
 <br>
 <br>
 If build status of program for device is
 CL_BUILD_NONE, an empty string is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">CL_PROGRAM_BINARY_<br>
 TYPE</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_program_<br>
  binary_type</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the program binary type for device.
 This can be one of the following values:
 <br>
 <br>
 CL_PROGRAM_BINARY_TYPE_NONE – There
 is no binary associated with device.
 <br>
 <br>
 CL_PROGRAM_BINARY_TYPE_
 COMPILED_OBJECT – A compiled binary is
 associated with device. This is the case if
 program was created using
 clCreateProgramWithSource and compiled
 using clCompileProgram or a compiled binary
 is loaded using clCreateProgramWithBinary.
 <br>
 <br>
 CL_PROGRAM_BINARY_TYPE_
 LIBRARY – A library binary is associated with
 device. This is the case if program was created
 by clLinkProgram which is called with the –
 create-library link option or if a library binary is
 loaded using clCreateProgramWithBinary.
 <br>
 <br>
 CL_PROGRAM_BINARY_TYPE_
 EXECUTABLE – An executable binary is
 associated with device. This is the case if
 program was created by clLinkProgram
 without the –create-library link option or
 program was created by clBuildProgram or an
 executable binary is loaded using
 clCreateProgramWithBinary.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROGRAM_BUILD_<br>
  GLOBAL_VARIABLE_<br>
  TOTAL_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The total amount of storage, in bytes, used by
 program variables in the global address space.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetProgramBuildInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_DEVICE if
 <em>device</em> is not in the list of devices associated with <em>program</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table 5.18</em>
 and <em>param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is a not a valid program object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>NOTE:</p></div>
 <div class="paragraph"><p>A program binary (compiled binary, library binary or executable binary)
 built for a parent device can be used by all its sub-devices. If a
 program binary has not been built for a sub-device, the program binary
 associated with the parent device will be used.</p></div>
 <div class="paragraph"><p>A program binary for a device specified with <strong>clCreateProgramWithBinary</strong>
 or queried using <strong>clGetProgramInfo</strong> can be used as the binary for the
 associated root device, and all sub-devices created from the root-level
 device or sub-devices thereof.</p></div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_kernel_objects">5.9. Kernel Objects</h3>
 <div class="paragraph"><p>A kernel is a function declared in a program. A kernel is identified
 by the <em>kernel qualifier applied to any function in a program. A
 kernel object encapsulates the specific </em>kernel function declared in a
 program and the argument values to be used when executing this __kernel
 function.</p></div>
 <div class="sect3">
 <h4 id="_creating_kernel_objects">5.9.1. Creating Kernel Objects</h4>
 <div class="paragraph"><p>To create a kernel object, use the function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_kernel clCreateKernel(cl_program program,
                          const char *kernel_name,
                          cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p><em>program</em> is a program object with a successfully built executable.</p></div>
 <div class="paragraph"><p><em>kernel_name</em> is a function name in the program declared with the
 __kernel qualifier.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateKernel</strong> returns a valid non-zero kernel object and
 <em>errcode_ret</em> is set to CL_SUCCESS if the kernel object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 _program_is not a valid program object.
 </p>
 </li>
 <li>
 <p>
   CL_INVALID_PROGRAM_EXECUTABLE if there is no successfully built
 executable for <em>program</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_KERNEL_NAME if
 <em>kernel_name</em> is not found in <em>program</em>.
 </p>
 </li>
 <li>
 <p>
   CL_INVALID_KERNEL_DEFINITION if the function definition for <em>_kernel
 function given by _kernel_name</em> such as the number of arguments, the
 argument types are not the same for all devices for which the <em>program</em>
 executable has been built.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>kernel_name</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clCreateKernelsInProgram(cl_program program,
                                 cl_uint num_kernels,
                                 cl_kernel *kernels,
                                 cl_uint *num_kernels_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates kernel objects for all kernel functions in <em>program</em>. Kernel
 objects are not created for any <em>_kernel functions in _program</em> that do
 not have the same function definition across all devices for which a
 program executable has been successfully built.</p></div>
 <div class="paragraph"><p><em>program</em> is a program object with a successfully built executable.</p></div>
 <div class="paragraph"><p><em>num_kernels</em> is the size of memory pointed to by <em>kernels</em> specified as
 the number of cl_kernel entries.</p></div>
 <div class="paragraph"><p><em>kernels</em> is the buffer where the kernel objects for kernels in
 <em>program</em> will be returned. If <em>kernels</em> is NULL, it is ignored. If
 <em>kernels</em> is not NULL, <em>num_kernels</em> must be greater than or equal to
 the number of kernels in <em>program</em>.</p></div>
 <div class="paragraph"><p><em>num_kernels_ret</em> is the number of kernels in <em>program</em>. If
 <em>num_kernels_ret</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><strong>clCreateKernelsInProgram</strong> will return CL_SUCCESS if the kernel objects
 were successfully allocated. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_PROGRAM if
 <em>program</em> is not a valid program object.
 </p>
 </li>
 <li>
 <p>
      CL_INVALID_PROGRAM_EXECUTABLE if there is no successfully built
 executable for any device in <em>program</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>kernels</em> is not NULL and <em>num_kernels</em> is less than the number of
 kernels in <em>program</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Kernel objects can only be created once you have a program object with a
 valid program source or binary loaded into the program object and the
 program executable has been successfully built for one or more devices
 associated with program. No changes to the program executable are
 allowed while there are kernel objects associated with a program
 object. This means that calls to <strong>clBuildProgram</strong> and
 <strong>clCompileProgram</strong> return CL_INVALID_OPERATION if there are kernel
 objects attached to a program object. The OpenCL context associated
 with <em>program</em> will be the context associated with <em>kernel</em>. The list
 of devices associated with <em>program</em> are the devices associated with
 <em>kernel</em>. Devices associated with a program object for which a valid
 program executable has been built can be used to execute kernels
 declared in the program object.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainKernel(cl_kernel kernel)</pre>
 </div></div>
 <div class="paragraph"><p>increments the <em>kernel</em> reference count. <strong>clRetainKernel</strong> returns
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>clCreateKernel</strong> or <strong>clCreateKernelsInProgram</strong> do an implicit retain.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseKernel(cl_kernel kernel)</pre>
 </div></div>
 <div class="paragraph"><p>decrements the <em>kernel</em> reference count. <strong>clReleaseKernel</strong> returns
 CL_SUCCESS if the function is executed successfully. Otherwise, it
 returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The kernel object is deleted once the number of instances that are
 retained to <em>kernel</em> become zero and the kernel object is no longer
 needed by any enqueued commands that use <em>kernel</em>. Using this function
 to release a reference that was not obtained by creating the object or
 by calling <strong>clRetainKernel</strong> causes undefined behavior.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_setting_kernel_arguments">5.9.2. Setting Kernel Arguments</h4>
 <div class="paragraph"><p>To execute a kernel, the kernel arguments must be set.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int  clSetKernelArg(cl_kernel kernel,
                        cl_uint arg_index,
                        size_t arg_size,
                        const void *arg_value)</pre>
 </div></div>
 <div class="paragraph"><p>is used to set the argument value for a specific argument of a kernel.</p></div>
 <div class="paragraph"><p><em>kernel</em> is a valid kernel object.</p></div>
 <div class="paragraph"><p><em>arg_index</em> is the argument index. Arguments to the kernel are referred
 by indices that go from 0 for the leftmost argument to <em>n</em> - 1, where
 <em>n</em> is the total number of arguments declared by a kernel.</p></div>
 <div class="paragraph"><p>For example, consider the following kernel:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre> kernel void image_filter (int n,
                            int m,
                            constant float *filter_weights,
                            read_only image2d_t src_image,
                            write_only image2d_t dst_image)
  {
  ...
  }</pre>
 </div></div>
 <div class="paragraph"><p>Argument index values for image_filter will be 0 for n, 1 for m, 2 for
 filter_weights, 3 for src_image and 4 for dst_image.</p></div>
 <div class="paragraph"><p><em>arg_value</em> is a pointer to data that should be used as the argument
 value for argument specified by <em>arg_index</em>. The argument data pointed
 to by_arg_value_ is copied and the <em>arg_value</em> pointer can therefore be
 reused by the application after <strong>clSetKernelArg</strong> returns. The argument
 value specified is the value used by all API calls that enqueue <em>kernel</em>
 (<strong>clEnqueueNDRangeKernel</strong>) until the argument value is changed by a call
 to <strong>clSetKernelArg</strong> for <em>kernel</em>.</p></div>
 <div class="paragraph"><p>If the argument is a memory object (buffer, pipe, image or image array),
 the <em>arg_value</em> entry will be a pointer to the appropriate buffer, pipe,
 image or image array object. The memory object must be created with the
 context associated with the kernel object. If the argument is a buffer
 object, the <em>arg_value</em> pointer can be NULL or point to a NULL value in
 which case a NULL value will be used as the value for the argument
 declared as a pointer to global or constant memory in the kernel. If
 the argument is declared with the local qualifier, the <em>arg_value</em> entry
 must be NULL. If the argument is of type <em>sampler_t</em>, the <em>arg_value</em>
 entry must be a pointer to the sampler object. If the argument is of
 type <em>queue_t</em>, the <em>arg_value</em> entry must be a pointer to the device
 queue object.</p></div>
 <div class="paragraph"><p>If the argument is declared to be a pointer of a built-in scalar or
 vector type, or a user defined structure type in the global or constant
 address space, the memory object specified as argument value must be a
 buffer object (or NULL). If the argument is declared with the constant
 qualifier, the size in bytes of the memory object cannot exceed
 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE and the number of arguments declared
 as pointers to <em>constant</em> memory cannot exceed
 CL_DEVICE_MAX_CONSTANT_ARGS.</p></div>
 <div class="paragraph"><p>The memory object specified as argument value must be a pipe object if
 the argument is declared with the <em>pipe</em> qualifier.</p></div>
 <div class="paragraph"><p>The memory object specified as argument value must be a 2D image object
 if the argument is declared to be of type <em>image2d_t</em>. The memory
 object specified as argument value must be a 2D image object with image
 channel order = CL_DEPTH if the argument is declared to be of type
 <em>image2d_depth_t</em>. The memory object specified as argument value must
 be a 3D image object if argument is declared to be of type <em>image3d_t</em>.
 The memory object specified as argument value must be a 1D image object
 if the argument is declared to be of type <em>image1d_t</em>. The memory
 object specified as argument value must be a 1D image buffer object if
 the argument is declared to be of type <em>image1d_buffer_t</em>. The memory
 object specified as argument value must be a 1D image array object if
 argument is declared to be of type <em>image1d_array_t</em>. The memory object
 specified as argument value must be a 2D image array object if argument
 is declared to be of type <em>image2d_array_t</em>. The memory object
 specified as argument value must be a 2D image array object with image
 channel order = CL_DEPTH if argument is declared to be of type
 <em>image2d_array_depth_t</em>.</p></div>
 <div class="paragraph"><p>For all other kernel arguments, the <em>arg_value</em> entry must be a pointer
 to the actual data to be used as argument value.</p></div>
 <div class="paragraph"><p><em>arg_size</em> specifies the size of the argument value. If the argument is
 a memory object, the size is the size of the memory object. For
 arguments declared with the local qualifier, the size specified will be
 the size in bytes of the buffer that must be allocated for the local
 argument. If the argument is of type <em>sampler_t</em>, the <em>arg_size</em> value
 must be equal to sizeof(cl_sampler). If the argument is of type
 <em>queue_t</em>, the <em>arg_size</em> value must be equal to
 sizeof(cl_command_queue). For all other arguments, the size will be the
 size of argument type.</p></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">A kernel object does not update the reference count for objects
 such as memory, sampler objects specified as argument values by
 <strong>clSetKernelArg</strong>, Users may not rely on a kernel object to retain
 objects specified as argument values to the kernel<span class="footnote"><br>[Implementations shall not allow cl_kernel objects to hold reference counts to  cl_kernel arguments, because no
 mechanism is provided for the user to tell the kernel to release that ownership right.  If the kernel holds ownership rights on kernel args, that would make it impossible for the user to tell with certainty when he may safel y release
 user allocated resources associated with OpenCL objects such as the cl_mem backing store used with
 CL_MEM_USE_HOST_PTR.]<br></span>:.</td>
 </tr></table>
 </div>
 <div class="paragraph"><p><strong>clSetKernelArg</strong> returns CL_SUCCESS if the function was executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_ARG_INDEX if
 <em>arg_index</em> is not a valid argument index.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_ARG_VALUE if
 <em>arg_value</em> specified is not a valid value.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT for
 an argument declared to be a memory object when the specified
 <em>arg_value</em> is not a valid memory object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_SAMPLER for an
 argument declared to be of type <em>sampler_t</em> when the specified
 <em>arg_value</em> is not a valid sampler object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_DEVICE_QUEUE
 for an argument declared to be of type <em>queue_t</em> when the specified
 <em>arg_value</em> is not a valid device queue object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_ARG_SIZE if
 <em>arg_size</em> does not match the size of the data type for an argument that
 is not a memory object or if the argument is a memory object and
 <em>arg_size</em> != sizeof(cl_mem) or if <em>arg_size</em> is zero and the argument
 is declared with the local qualifier or if the argument is a sampler and
 <em>arg_size</em> != sizeof(cl_sampler).
 </p>
 </li>
 <li>
 <p>
      CL_MAX_SIZE_RESTRICTION_EXCEEDED if the size in bytes of the memory
 object (if the argument was declared with constant qualifier) or
 <em>arg_size</em> (if the argument was declared with local qualifier) exceed
 the maximum size restriction that was set with
 the optional language attribute. The optional attribute can be
 cl::max_size defined in OpenCL 2.2 C++ Kernel Languange specification or
 SpvDecorationMaxByteOffset defined in SPIR-V 1.2 Specification.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_ARG_VALUE if
 the argument is an image declared with the read_only qualifier and
 <em>arg_value</em> refers to an image object created with <em>cl_mem_flags</em> of
 CL_MEM_WRITE or if the image argument is declared with the write_only
 qualifier and <em>arg_value</em> refers to an image object created with
 <em>cl_mem_flags</em> of CL_MEM_READ.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int  clSetKernelArgSVMPointer(cl_kernel kernel,
                                  cl_uint arg_index,
                                  const void *arg_value)</pre>
 </div></div>
 <div class="paragraph"><p>is used to set a SVM pointer as the argument value for a specific
 argument of a kernel.</p></div>
 <div class="paragraph"><p><em>kernel</em> is a valid kernel object.</p></div>
 <div class="paragraph"><p><em>arg_index</em> is the argument index. Arguments to the kernel are referred
 by indices that go from 0 for the leftmost argument to <em>n</em> - 1, where
 <em>n</em> is the total number of arguments declared by a kernel.</p></div>
 <div class="paragraph"><p><em>arg_value</em> is the SVM pointer that should be used as the argument value
 for argument specified by <em>arg_index</em>. The SVM pointer specified is the
 value used by all API calls that enqueue <em>kernel</em>
 (<strong>clEnqueueNDRangeKernel</strong>) until the argument value is changed by a call
 to <strong>clSetKernelArgSVMPointer</strong> for <em>kernel</em>. The SVM pointer can only be
 used for arguments that are declared to be a pointer to global or
 constant memory. The SVM pointer value must be aligned according to the
 arguments type. For example, if the argument is declared to be global
 float4 p, the SVM pointer value passed for p must be at a minimum
 aligned to a float4. The SVM pointer value specified as the argument
 value can be the pointer returned by <strong>clSVMAlloc</strong> or can be a pointer offset into the SVM region.</p></div>
 <div class="paragraph"><p><strong>clSetKernelArgSVMPointer</strong> returns CL_SUCCESS if the function was
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_ARG_INDEX if
 <em>arg_index</em> is not a valid argument index.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_ARG_VALUE if
 <em>arg_value</em> specified is not a valid value.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int  clSetKernelExecInfo(cl_kernel kernel,
                             cl_kernel_exec_info param_name,
                             size_t param_value_size,
                             const void *param_value)</pre>
 </div></div>
 <div class="paragraph"><p>can be used to pass additional information other than argument values to
 a kernel.</p></div>
 <div class="paragraph"><p><em>kernel</em> specifies the kernel object being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to be passed to kernel. The list
 of supported <em>param_name</em> types and the corresponding values passed in
 <em>param_value</em> is described in <em>table 5.19</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> specifies the size in bytes of the memory pointed to
 by <em>param_value</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate values
 determined by <em>param_name</em> are specified.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 25. <em>clSetKernelExecInfo</em> <em>parameter values.</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_kernel_exec_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_EXEC_INFO_ SVM_PTRS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">void *[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">SVM pointers must reference
 locations contained entirely within
 buffers that are passed to kernel as
 arguments, or that are passed through
 the execution information.
 <br>
 <br>
 Non-argument SVM buffers must be
 specified by passing pointers to those
 buffers via clSetKernelExecInfo for
 coarse-grain and fine-grain buffer
 SVM allocations but not for finegrain system SVM allocations.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_EXEC_INFO_ SVM_FINE_GRAIN_SYSTEM</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This flag
 indicates whether the kernel uses pointers that are fine grain system
 SVM allocations. These fine grain system SVM pointers may be passed as
 arguments or defined in SVM buffers that are passed as arguments to
 <em>kernel</em>.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clSetKernelExecInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is a not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, if <em>param_value</em> is NULL or if the size
 specified by <em>param_value_size</em> is not valid.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 <em>param_name</em> = CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM and
 <em>param_value</em> = CL_TRUE but no devices in context associated with
 <em>kernel</em> support fine-grain system SVM allocations.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>NOTES</p></div>
 <div class="paragraph"><p>Coarse-grain or fine-grain buffer SVM pointers used by a kernel
 which are not passed as a kernel arguments must be specified using
 <strong>clSetKernelExecInfo</strong> with CL_KERNEL_EXEC_INFO_SVM_PTRS. For example,
 if SVM buffer A contains a pointer to another SVM buffer B, and the
 kernel dereferences that pointer, then a pointer to B must either be
 passed as an argument in the call to that kernel or it must be made
 available to the kernel using <strong>clSetKernelExecInfo</strong>. For example, we
 might pass extra SVM pointers as follows:
  </p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>clSetKernelExecInfo(kernel,
                     CL_KERNEL_EXEC_INFO_SVM_PTRS,
                     num_ptrs * sizeof(void *),
                     extra_svm_ptr_list);</pre>
 </div></div>
 <div class="paragraph"><p>Here num_ptrs specifies the number of additional SVM pointers while
 extra_svm_ptr_list specifies a pointer to memory containing those SVM
 pointers.</p></div>
 <div class="paragraph"><p>When calling <strong>clSetKernelExecInfo</strong> with CL_KERNEL_EXEC_INFO_SVM_PTRS to
 specify pointers to non-argument SVM buffers as extra arguments to a
 kernel, each of these pointers can be the SVM pointer returned by
 <strong>clSVMAlloc</strong> or can be a pointer + offset into the SVM region. It is
 sufficient to provide one pointer for each SVM buffer used.</p></div>
 <div class="paragraph"><p>CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM is used to indicate
 whether SVM pointers used by a kernel will refer to system allocations
 or not.</p></div>
 <div class="paragraph"><p>CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM = CL_FALSE indicates that the
 OpenCL implementation may assume that system pointers are not passed as
 kernel arguments and are not stored inside SVM allocations passed as
 kernel arguments.</p></div>
 <div class="paragraph"><p>CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM = CL_TRUE indicates that the
 OpenCL implementation must assume that system pointers might be passed
 as kernel arguments and/or stored inside SVM allocations passed as
 kernel arguments. In this case, if the device to which the kernel is
 enqueued does not support system SVM pointers, <strong>clEnqueueNDRangeKernel</strong>
 will return a CL_INVALID_OPERATION error. If none of the devices in the
 context associated with kernel support fine-grain system SVM
 allocations, <strong>clSetKernelExecInfo</strong> will return a CL_INVALID_OPERATION
 error.</p></div>
 <div class="paragraph"><p>If <strong>clSetKernelExecInfo</strong> has not been called with a value for
 <strong>CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM</strong>, the default value is used
 for this kernel attribute. The default value depends on whether the
 device on which the kernel is enqueued supports fine-grain system SVM
 allocations. If so, the default value used is CL_TRUE (system pointers
 might be passed); otherwise, the default is CL_FALSE.</p></div>
 <div class="paragraph"><p>A call to <strong>clSetKernelExecInfo</strong> for a given value of <em>param_name</em>
 replaces any prior value passed for that value of <em>param_name</em>. Only one
 <em>param_value</em> will be stored for each value of <em>param_name</em>.
  </p></div>
 </div>
 <div class="sect3">
 <h4 id="_copying_kernel_objects">5.9.3. Copying Kernel Objects</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_kernel clCloneKernel(cl_kernel source_kernel,
                         cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>is used to make a shallow copy of the kernel object, its arguments and
 any information passed to the kernel object using <strong>clSetKernelExecInfo</strong>.
 If the kernel object was ready to be enqueued before copying it, the
 clone of the kernel object is ready to enqueue.</p></div>
 <div class="paragraph"><p><em>source_kernel</em> is a valid cl_kernel object that will be copied.
 _source_kernel_will not be modified in any way by this function.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will be assigned an appropriate error code. If
 _errcode_ret_is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCloneKernel</strong> returns a valid non-zero kernel object and <em>errcode_ret</em>
 is set to CL_SUCCESS if the kernel is successfully copied. Otherwise it
 returns a NULL value with one of the following error values returned in
 <em>errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The returned kernel object is an exact copy of <em>source_kernel</em>, with one
 caveat: the reference count on the returned kernel object is set as if
 it had been returned by <strong>clCreateKernel</strong>. The reference count of
 <em>source_kernel will</em> not be changed.</p></div>
 <div class="paragraph"><p>The resulting kernel will be in the same state as if <strong>clCreateKernel</strong> is
 called to create the resultant kernel with the same arguments as those
 used to create <em>source_kernel</em>, the latest call to <strong>clSetKernelArg</strong> or
 <strong>clSetKernelArgSVMPointer</strong> for each argument index applied to kernel and
 the last call to <strong>clSetKernelExecInfo</strong> for each value of the param name
 parameter are applied to the new kernel object.</p></div>
 <div class="paragraph"><p>All arguments of the new kernel object must be intact and it may be
 correctly used in the same situations as kernel except those that assume
 a pre-existing reference count. Setting arguments on the new kernel
 object will not affect <em>source_kernel</em> except insofar as the argument
 points to a shared underlying entity and in that situation behavior is
 as if two kernel objects had been created and the same argument applied
 to each. Only the data stored in the kernel object is copied; data
 referenced by the kernels arguments are not copied. For example, if a
 buffer or pointer argument is set on a kernel object, the pointer is
 copied but the underlying memory allocation is not.</p></div>
 </div>
 <div class="sect3">
 <h4 id="_kernel_object_queries">5.9.4. Kernel Object Queries</h4>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetKernelInfo*(cl_kernel kernel,
                         cl_kernel_info param_name,
                         size_t param_value_size,
                         void *param_value,
                         size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns information about the kernel object.</p></div>
 <div class="paragraph"><p><em>kernel</em> specifies the kernel object being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetKernelInfo</strong> is described in <em>table 5.20</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.20</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.
  </p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 26. <em>clGetKernelInfo</em> <em>parameter queries.</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_kernel_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_FUNCTION_NAME</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the kernel function name.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_NUM_ARGS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the number of arguments to
 kernel.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_KERNEL_REFERENCE_ COUNT*<span class="footnote"><br>[The reference count returned should be considered immediately stale. It is unsuitable for general use in
 applications. This feature is provided for identifying memory leaks. ]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the <em>kernel</em> reference
 count.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_CONTEXT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_context</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the context associated with
 <em>kernel</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_PROGRAM</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_program</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the program object associated
 with kernel.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_ATTRIBUTES</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns any attributes specified using
 the <em>attribute</em> OpenCL Cqualifier  (or using an OpenCL C++ qualifier syntax [[]] ) with the
 kernel function declaration in the
 program source. These attributes
 include attributes described in the
 earlier OpenCL C kernel language
 specifications and other attributes
 supported by an implementation.
 <br>
 <br>
 Attributes are returned as they were
 declared inside <em>attribute</em>&#8230;,
 with any surrounding whitespace and
 embedded newlines removed. When
 multiple attributes are present, they
 are returned as a single, space
 delimited string.
 <br>
 <br>
 For kernels not created from OpenCL
 C source and the
 clCreateProgramWithSource API
 call the string returned from this
 query will be empty.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetKernelInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table
 5.20_and _param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is a not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetKernelWorkGroupInfo(cl_kernel kernel,
                                 cl_device_id device,
                                 cl_kernel_work_group_info param_name,
                                 size_t param_value_size,
                                 void *param_value,
                                 size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns information about the kernel object that may be specific to a
 device.</p></div>
 <div class="paragraph"><p><em>kernel</em> specifies the kernel object being queried.</p></div>
 <div class="paragraph"><p><em>device</em> identifies a specific device in the list of devices associated
 with <em>kernel</em>. The list of devices is the list of devices in the OpenCL
 context that is associated with <em>kernel</em>. If the list of devices
 associated with <em>kernel</em> is a single device, <em>device</em> can be a NULL
 value.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetKernelWorkGroupInfo</strong> is described in <em>table 5.21</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.21</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 27. <em>clGetKernelWorkGroupInfo</em> <em>parameter queries.</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_kernel_work_group_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in
 <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_GLOBAL_ WORK_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t[3]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This provides a mechanism for the
 application to query the maximum global size that can be used to execute a kernel
 (i.e. global_work_size argument to
 clEnqueueNDRangeKernel) on a custom
 device given by device or a built-in kernel
 on an OpenCL device given by device.
 <br>
 <br>
 If device is not a custom device and kernel
 is not a built-in kernel,
 clGetKernelWorkGroupInfo returns the
 error CL_INVALID_VALUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_WORK_ GROUP_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This provides a mechanism for the
 application to query the maximum workgroup size that can be used to execute the
 kernel on a specific device given by
 device. The OpenCL implementation uses
 the resource requirements of the kernel
 (register usage etc.) to determine what this
 work-group size should be.
 <br>
 <br>
 As a result and unlike
 CL_DEVICE_MAX_WORK_GROUP_
 SIZE this value may vary from one kernel
 to another as well as one device to
 another.
 <br>
 <br>
 CL_KERNEL_WORK_GROUP_SIZE
 will be less than or equal to
 CL_DEVICE_MAX_WORK_GROUP_SI
 ZE for a given kernel object.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_COMPILE_<br>
  WORK_GROUP_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t[3]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the work-group size specified in
 the kernel source or IL.
 <br>
 <br>
 If the work-group size is not specified in
 the kernel source or IL, (0, 0, 0) is
 returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_LOCAL_<br>
 MEM_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the amount of local memory in
 bytes being used by a kernel. This
 includes local memory that may be needed
 by an implementation to execute the
 kernel, variables declared inside the kernel
 with the <em>local address qualifier and
 local memory to be allocated for
 arguments to the kernel declared as
 pointers with the </em>local address
 qualifier and whose size is specified with
 clSetKernelArg.
 <br>
 <br>
 If the local memory size, for any pointer
 argument to the kernel declared with the
 __local address qualifier, is not
 specified, its size is assumed to be 0.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_PREFERRED_<br>
 WORK_GROUP_SIZE_MULTIPLE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the preferred multiple of
 work-group size for launch. This is a performance hint. Specifying a
 work-group size that is not a multiple of the value returned by this
 query as the value of the local work size argument to
 <strong>clEnqueueNDRangeKernel</strong> will not fail to enqueue the kernel for
 execution unless the work-group size specified is larger than the device
 maximum.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_PRIVATE_<br>
 MEM_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the minimum amount of
 private memory, in bytes, used by each work-item in the kernel. This
 value may include any private memory needed by an implementation to
 execute the kernel, including that used by the language built-ins and
 variable declared inside the kernel with the __private qualifier.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetKernelWorkGroupInfo</strong> returns CL_SUCCESS if the function is
 executed successfully. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_DEVICE if
 <em>device</em> is not in the list of devices associated with <em>kernel</em> or if
 <em>device</em> is NULL but there is more than one device associated with
 <em>kernel</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table
 5.21_and _param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is CL_KERNEL_GLOBAL_WORK_SIZE and <em>device</em> is not a custom
 device and <em>kernel</em> is not a built-in kernel.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is a not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetKernelSubGroupInfo(cl_kernel kernel,
                                cl_device_id device,
                                cl_kernel_sub_group_info param_name,
                                size_t input_value_size,
                                const void *input_value,
                                size_t param_value_size,
                                void *param_value,
                                size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns information about the kernel object.</p></div>
 <div class="paragraph"><p><em>kernel</em> specifies the kernel object being queried.</p></div>
 <div class="paragraph"><p><em>device</em> identifies a specific device in the list of devices associated
 with <em>kernel</em>. The list of devices is the list of devices in the OpenCL
 context that is associated with <em>kernel</em>. If the list of devices
 associated with <em>kernel</em> is a single device, <em>device</em> can be a NULL
 value.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetKernelSubGroupInfo</strong> is described in <em>table 5.22</em>.</p></div>
 <div class="paragraph"><p><em>input_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>input_value</em>. This size must be == size of input type as
 described in the table below.</p></div>
 <div class="paragraph"><p><em>input_value</em> is a pointer to memory where the appropriate
 parameterization of the query is passed from. If <em>input_value</em> is
 NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.22</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 28. <em>clGetKernelSubGroupInfo</em> <em>parameter queries.</em></caption>
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_kernel_sub_group_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Input Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info.
 returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_MAX_ SUB_GROUP_SIZE_ FOR_NDRANGE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t *</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the maximum sub-group size
 for this kernel.  All sub-groups must
 be the same size, while the last subgroup in any work-group (i.e. the subgroup with the maximum index) could
 be the same or smaller size.
 <br>
 <br>
 The input_value must be an array of
 size_t values corresponding to the
 local work size parameter of the
 intended dispatch. The number of
 dimensions in the ND-range will be
 inferred from the value specified for
 input_value_size.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_SUB_ GROUP_COUNT_ FOR_NDRANGE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t *</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the number of sub-groups that
 will be present in each work-group for
 a given local work size.  All workgroups, apart from the last work-group
 in each dimension in the presence of
 non-uniform work-group sizes, will
 have the same number of sub-groups.
 <br>
 <br>
 The input_value must be an array of
 size_t values corresponding to the
 local work size parameter of the
 intended dispatch. The number of
 dimensions in the ND-range will be
 inferred from the value specified for
 input_value_size.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_LOCAL_ SIZE_FOR_SUB_ GROUP_COUNT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns
 the local size that will generate the requested number of sub-groups for
 the kernel. The output array must be an array of size_t values
 corresponding to the local size parameter. Any returned work-group will
 have one dimension. Other dimensions inferred from the value specified
 for param_value_size will be filled with the value 1. The returned value
 will produce an exact number of sub-groups and result in no partial
 groups for an executing kernel except in the case where the last
 work-group in a dimension has a size different from that of the other
 groups. If no work-group size can accommodate the requested number of
 sub-groups, 0 will be returned in each element of the return array.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_MAX_ NUM_SUB_GROUPS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">ignored</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">This provides a
 mechanism for the application to query the maximum number of sub-groups
 that may make up each work-group to execute a kernel on a specific device
 given by device. The OpenCL implementation uses the resource
 requirements of the kernel (register usage etc.) to determine what this
 work-group size should be. The returned value may be used to compute a
 work-group size to enqueue the kernel with to give a round number of
 sub-groups for an enqueue.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_ COMPILE_NUM_ SUB_GROUPS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">ignored</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the number
 of sub-groups specified in the kernel source or IL. If the sub-group
 count is not specified using the above attribute then 0 is returned.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetKernelSubGroupInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_DEVICE if
 <em>device</em> is not in the list of devices associated with <em>kernel</em> or if
 <em>device</em> is NULL but there is more than one device associated with
 <em>kernel</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table 5.22</em>
 and <em>param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is CL_KERNEL_SUB_GROUP_SIZE_FOR_NDRANGE and the size in
 bytes specified by <em>input_value_size</em> is not valid or if <em>input_value</em>
 is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is a not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetKernelArgInfo(cl_kernel kernel,
                           cl_uint arg_indx,
                           cl_kernel_arg_info param_name,
                           size_t param_value_size,
                           void *param_value,
                           size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns information about the arguments of a kernel.</p></div>
 <div class="paragraph"><p>Kernel argument information is only available if the program object
 associated with <em>kernel</em> is created with <strong>clCreateProgramWithSource</strong> and
 the program executable was built with the -cl-kernel-arg-info option
 specified in options argument to <strong>clBuildProgram</strong> or <strong>clCompileProgram</strong>.</p></div>
 <div class="paragraph"><p><em>kernel</em> specifies the kernel object being queried.</p></div>
 <div class="paragraph"><p><em>arg_indx</em> is the argument index. Arguments to the kernel are referred
 by indices that go from 0 for the leftmost argument to <em>n</em> - 1, where
 <em>n</em> is the total number of arguments declared by a kernel.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the argument information to query. The list of
 supported <em>param_name</em> types and the information returned in
 <em>param_value_by <strong>clGetKernelArgInfo</strong> is described in _table__5.23</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt; size of return type as
 described in <em>table__5.23</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 29. <em>clGetKernelArgInfo</em> <em>parameter queries.</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_kernel_arg_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_ARG_<br>
  ADDRESS_QUALIFIER</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_kernel_arg_<br>
  address_qualifier</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the address qualifier specified for the
 argument given by arg_indx. This can be one of the
 following values:
 CL_KERNEL_ARG_ADDRESS_ GLOBAL
 CL_KERNEL_ARG_ADDRESS_ LOCAL
 CL_KERNEL_ARG_ADDRESS_ CONSTANT
 CL_KERNEL_ARG_ADDRESS_ PRIVATE
 <br>
 <br>
 If no address qualifier is specified, the default
 address qualifier which is
 CL_KERNEL_ARG_ADDRESS_PRIVATE is
 returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_ARG_<br>
  ACCESS_QUALIFIER</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_kernel_arg_<br>
  access_qualifier</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the access qualifier specified for the
 argument given by arg_indx. This can be one of the
 following values:
 CL_KERNEL_ARG_ACCESS_ READ_ONLY
 CL_KERNEL_ARG_ACCESS_ WRITE_ONLY
 CL_KERNEL_ARG_ACCESS_ READ_WRITE
 CL_KERNEL_ARG_ACCESS_ NONE
 <br>
 <br>
 If argument is not an image type and is not declared
 with the pipe qualifier,
 CL_KERNEL_ARG_ACCESS_NONE is returned. If
 argument is an image type, the access qualifier
 specified or the default access qualifier is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_ARG_TYPE_<br>
  NAME</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the type name specified for the argument given
 by <em>arg_indx</em>. The type name returned will be the argument type name as
 it was declared with any whitespace removed. If argument type name is
 an unsigned scalar type (i.e. unsigned char, unsigned short, unsigned
 int, unsigned long), uchar, ushort, uint and ulong will be returned.
 The argument type name returned does not include any type qualifiers.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_ARG_TYPE_<br>
  QUALIFIER</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_kernel_arg_<br>
  type_qualifier</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the type qualifier specified for the argument
 given by arg_indx.  The returned value can be:
 CL_KERNEL_ARG_TYPE_ CONST
 CL_KERNEL_ARG_TYPE_ RESTRICT
 CL_KERNEL_ARG_TYPE_ VOLATILE, a
 combination of the above enums,
 CL_KERNEL_ARG_TYPE_PIPE or
 CL_KERNEL_ARG_TYPE_NONE
 <br>
 <br>
 NOTE: CL_KERNEL_ARG_ TYPE_VOLATILE is
 returned if the argument is a pointer and the
 referenced type is declared with the volatile
 qualifier.  For example, a kernel argument declared
 as global int volatile *x returns
 CL_KERNEL_ARG_TYPE_ VOLATILE but
 a kernel argument declared as global int *
 volatile x does not.  Similarly,
 CL_KERNEL_ARG_TYPE_CONST is returned if the
 argument is a pointer and the referenced type is
 declared with the restrict or const qualifier.  For
 example, a kernel argument declared as global
 int const *x returns
 CL_KERNEL_ARG_TYPE_CONST but
 a kernel argument declared as global int *
 const x does not.
 CL_KERNEL_ARG_TYPE_ RESTRICT will be returned
 if the pointer type is marked restrict. For example,
 global int * restrict x returns
 CL_KERNEL_ARG_TYPE_ RESTRICT.
 <br>
 <br>
 If the argument is declared with the constant address
 space qualifier, the
 CL_KERNEL_ARG_TYPE_CONST type qualifier
 will be set.
 <br>
 <br>
 CL_KERNEL_ARG_TYPE_NONE is returned for all parameters passed by value.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_KERNEL_ARG_NAME</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">char[]</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Returns the name specified for the
 argument given by <em>arg_indx</em>.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p><strong>clGetKernelArgInfo</strong> returns CL SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_ARG_INDEX if
 <em>arg_indx</em> is not a valid argument index.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value</em> size is &lt; size of return type as described in_table 5.23_
 and <em>param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>

 CL_KERNEL_ARG_INFO_NOT_AVAILABLE if the argument information is not
 available for kernel.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is a not a valid kernel object.
 </p>
 </li>
 </ul></div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_executing_kernels">5.10. Executing Kernels</h3>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueNDRangeKernel(cl_command_queue command_queue,
                               cl_kernel kernel,
                               cl_uint work_dim,
                               const size_t *global_work_offset,
                               const size_t *global_work_size,
                               const size_t *local_work_size,
                               cl_uint num_events_in_wait_list,
                               const cl_event *event_wait_list,
                               cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to execute a kernel on a device.</p></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command-queue. The kernel will be
 queued for execution on the device associated with <em>command_queue</em>.</p></div>
 <div class="paragraph"><p><em>kernel</em> is a valid kernel object. The OpenCL context associated with
 <em>kernel</em> and <em>command-queue</em> must be the same.</p></div>
 <div class="paragraph"><p><em>work_dim</em> is the number of dimensions used to specify the global
 work-items and work-items in the work-group. <em>work_dim</em> must be
 greater than zero and less than or equal to
 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS. If <em>global_work_size_is NULL, or the
 value in any passed dimension is 0 then the kernel command will
 trivially succeed after its event dependencies are satisfied and
 subsequently update its completion event. The behavior in this situation
 is similar to that of an enqueued marker, except that unlike a marker,
 an enqueued kernel with no events passed to _event_wait_list</em> may run at
 any time.</p></div>
 <div class="paragraph"><p><em>global_work_offset</em> can be used to specify an array of <em>work_dim</em>
 unsigned values that describe the offset used to calculate the global ID
 of a work-item. If <em>global_work_offset</em> is NULL, the global IDs start
 at offset (0, 0, 0).</p></div>
 <div class="paragraph"><p><em>global_work_size</em> points to an array of <em>work_dim</em> unsigned values that
 describe the number of global work-items in <em>work_dim</em> dimensions that
 will execute the kernel function. The total number of global work-items
 is computed as <em>global_work_size</em>[0] * &#8230; * <em>global_work_size</em>[<em>work_dim</em> - 1].</p></div>
 <div class="paragraph"><p><em>local_work_size</em> points to an array of <em>work_dim</em> unsigned values that
 describe the number of work-items that make up a work-group (also
 referred to as the size of the work-group) that will execute the kernel
 specified by <em>kernel</em>. The total number of work-items in a work-group
 is computed as <em>local_work_size</em>[0] * &#8230;  * <em>local_work_size</em>[<em>work_dim</em>
 - 1]. The total number of work-items in the work-group must be less than
 or equal to the CL_KERNEL_WORK_GROUP_SIZE value specified in <em>table
 5.21</em> and the number of work-items specified in <em>local_work_size</em>[0], &#8230;,
 <em>local_work_size</em>[<em>work_dim</em> - 1] must be less than or equal to the
 corresponding values specified by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], &#8230;,
 CL_DEVICE_MAX_WORK_ITEM_SIZES[<em>work_dim</em> -1]. The explicitly specified
 <em>local_work_size</em> will be used to determine how to break the global
 work-items specified by <em>global_work_size</em> into appropriate work-group
 instances.</p></div>
 <div class="paragraph"><p>Enabling non-uniform work-groups requires the <em>kernel</em>'s program to be
 compiled without the -cl-uniform-work-group-size flag. If the program
 was created with clCreateProgramWithSource, non-uniform work-groups
 are enabled only if the program was compiled with the -cl-std=CL2.0
 flag and without the -cl-uniform-work-group-size flag.
 If the program was created using clLinkProgram and any of the linked
 programs were compiled in a way that only supports uniform work-group
 sizes, the linked program only supports uniform work group sizes. If
 <em>local_work_size</em> is specified and the OpenCL <em>kernel</em> is compiled without
 non-uniform work-groups enabled, the values specified in
 <em>global_work_size</em>[0], &#8230;, <em>global_work_size</em>[<em>work_dim</em> - 1] must be evenly
 divisible by the corresponding values specified in <em>local_work_size</em>[0],
 &#8230;, <em>local_work_size</em>[<em>work_dim</em> - 1].</p></div>
 <div class="paragraph"><p>If non-uniform work-groups are enabled for the kernel, any single
 dimension for which the global size is not divisible by the local size
 will be partitioned into two regions. One region will have work-groups
 that have the same number of work items as was specified by the local
 size parameter in that dimension. The other region will have work-groups
 with less than the number of work items specified by the local size
 parameter in that dimension. The global IDs and group IDs of the work
 items in the first region will be numerically lower than those in the
 second, and the second region will be at most one work-group wide in
 that dimension. Work-group sizes could be non-uniform in multiple
 dimensions, potentially producing work-groups of up to 4 different sizes
 in a 2D range and 8 different sizes in a 3D range.</p></div>
 <div class="paragraph"><p>If <em>local_work_size_is NULL and the kernel is compiled without support
 for non-uniform work-groups, the OpenCL runtime will implement the
 ND-range with uniform work-group sizes. If _local_work_size</em> is NULL and
 non-uniform-work-groups are enabled, the OpenCL runtime is free to
 implement the ND-range using uniform or non-uniform work-group sizes,
 regardless of the divisibility of the global work size. If the ND-range
 is implemented using non-uniform work-group sizes, the work-group sizes,
 global IDs and group IDs will follow the same pattern as described in
 above paragraph.</p></div>
 <div class="paragraph"><p>The work-group size to be used for <em>kernel</em> can also be specified in the
 program source or intermediate language. In this case the size of work
 group specified by <em>local_work_size</em> must match the value specified in
 the program source.</p></div>
 <div class="paragraph"><p>These work-group instances are executed in parallel across multiple
 compute units or concurrently on the same compute unit.</p></div>
 <div class="paragraph"><p>Each work-item is uniquely identified by a global identifier. The
 global ID, which can be read inside the kernel, is computed using the
 value given by <em>global_work_size_and_global_work_offset</em>. In addition,
 a work-item is also identified within a work-group by a unique local
 ID. The local ID, which can also be read by the kernel, is computed
 using the value given by <em>local_work_size</em>. The starting local ID is
 always (0, 0,  0).</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed. If
 <em>event_wait_list</em> is NULL, then this particular command does not wait on
 any event to complete. If <em>event_wait_list</em> is NULL,
 <em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not NULL,
 the list of events pointed to by <em>event_wait_list</em> must be valid and
 <em>num_events_in_wait_list</em> must be greater than 0. The events specified
 in <em>event_wait_list</em> act as synchronization points. The context
 associated with events in <em>event_wait_list</em> and <em>command_queue</em> must be
 the same. The memory associated with <em>event_wait_list</em> can be reused or
 freed after the function returns.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular
 kernel-instance. Event objects are unique and can be used to identify a
 particular kernel-instance later on. If <em>event</em> is NULL, no event will
 be created for this kernel-instance and therefore it will not be
 possible for the application to query or queue a wait for this
 particular kernel-instance. If the <em>event_wait_list</em> and the <em>event</em>
 arguments are not NULL, the <em>event</em> argument should not refer to an
 element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueNDRangeKernel</strong> returns CL_SUCCESS if the kernel-instance was
 successfully queued. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
      CL_INVALID_PROGRAM_EXECUTABLE if there is no successfully built program
 executable available for device associated with <em>command_queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_KERNEL if
 <em>kernel</em> is not a valid kernel object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and <em>kernel</em> are not the same or
 if the context associated with <em>command_queue</em> and events in
 <em>event_wait_list</em> are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_KERNEL_ARGS if
 the kernel argument values have not been specified or if a kernel
 argument declared to be a pointer to a type does not point to a named
 address space.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_WORK_DIMENSION
 if <em>work_dim</em> is not a valid value (i.e. a value between 1 and 3).
 </p>
 </li>
 <li>
 <p>
      CL_INVALID_GLOBAL_WORK_SIZE if any of the values specified in
 <em>global_work_size</em>[0],  <em>global_work_size</em>[<em>work_dim</em>  1] exceed the
 maximum value representable by size_t on the device on which the
 kernel-instance will be enqueued.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_GLOBAL_OFFSET
 if the value specified in <em>global_work_size</em> + the corresponding values
 in <em>global_work_offset</em> for any dimensions is greater than the maximum
 value representable by size t on the device on which the kernel-instance
 will be enqueued.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_WORK_GROUP_SIZE
 if <em>local_work_size</em> is specified and does not match the required
 work-group size for <em>kernel</em> in the program source.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_WORK_GROUP_SIZE
 if <em>local_work_size</em> is specified and is not consistent with the
 required number of sub-groups for <em>kernel</em> in the program source.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_WORK_GROUP_SIZE
 if <em>local_work_size</em> is specified and the total number of work-items in
 the work-group computed as <em>local_work_size</em>[0] *
 <em>local_work_size</em>[<em>work_dim</em>  1] is greater than the value specified by
 CL_KERNEL_WORK_GROUP_SIZE in <em>table 5.21</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_WORK_GROUP_SIZE
 if the program was compiled with cl-uniform-work-group-size and the
 number of work-items specified by <em>global_work_size</em> is not evenly
 divisible by size of work-group given by <em>local_work_size</em> or by the
 required work-group size specified in the kernel source.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_WORK_ITEM_SIZE
 if the number of work-items specified in any of <em>local_work_size</em>[0],
 <em>local_work_size</em>[<em>work_dim</em>  1] is greater than the corresponding
 values specified by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], .
 CL_DEVICE_MAX_WORK_ITEM_SIZES[<em>work_dim</em> 1].
 </p>
 </li>
 <li>
 <p>
      CL_MISALIGNED_SUB_BUFFER_OFFSET if a sub-buffer object is specified as
 the value for an argument that is a buffer object and the <em>offset</em>
 specified when the sub-buffer object is created is not aligned to
 CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_IMAGE_SIZE if
 an image object is specified as an argument value and the image
 dimensions (image width, height, specified or compute row and/or slice
 pitch) are not supported by device associated with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
     CL_IMAGE_FORMAT_NOT_SUPPORTED if an image object is specified as an
 argument value and the image format (image channel order and data type)
 is not supported by device associated with <em>queue</em>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to queue the execution instance of <em>kernel</em> on the
 command-queue because of insufficient resources needed to execute the
 kernel. For example, the explicitly specified <em>local_work_size</em> causes
 a failure to execute the kernel because of insufficient resources such
 as registers or local memory. Another example would be the number of
 read-only image args used in <em>kernel</em> exceed the
 CL_DEVICE_MAX_READ_IMAGE_ARGS value for device or the number of
 write-only and read-write image args used in <em>kernel</em> exceed the
 CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS value for device or the number of
 samplers used in <em>kernel</em> exceed CL_DEVICE_MAX_SAMPLERS for device.
 </p>
 </li>
 <li>
 <p>
     CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with image or buffer objects specified
 as arguments to <em>kernel</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 SVM pointers are passed as arguments to a kernel and the device does not
 support SVM or if system pointers are passed as arguments to a kernel
 and/or stored inside SVM allocations passed as kernel arguments and the
 device does not support fine grain system SVM allocations.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueNativeKernel(cl_command_queue command_queue,
                              void (CL_CALLBACK *user_func)(void *),
                              void *args,
                              size_t cb_args,
                              cl_uint num_mem_objects,
                              const cl_mem *mem_list,
                              const void **args_mem_loc,
                              cl_uint num_events_in_wait_list,
                              const cl_event *event_wait_list,
                              cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a command to execute a native C/C++ function not compiled using
 the OpenCL compiler.</p></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command-queue. A native user function
 can only be executed on a command-queue created on a device that has
 CL_EXEC_NATIVE_KERNEL capability set in CL_DEVICE_EXECUTION_CAPABILITIES
 as specified in <em>table 4.3</em>.</p></div>
 <div class="paragraph"><p><em>user_func</em> is a pointer to a host-callable user function.</p></div>
 <div class="paragraph"><p><em>args</em> is a pointer to the args list that <em>user_func</em> should be called
 with.</p></div>
 <div class="paragraph"><p><em>cb_args</em> is the size in bytes of the args list that <em>args</em> points to.</p></div>
 <div class="paragraph"><p>The data pointed to by <em>args</em> and <em>cb_args</em> bytes in size will be copied
 and a pointer to this copied region will be passed to <em>user_func</em>. The
 copy needs to be done because the memory objects (cl_mem values) that
 <em>args</em> may contain need to be modified and replaced by appropriate
 pointers to global memory. When <strong>clEnqueueNativeKernel</strong> returns, the
 memory region pointed to by <em>args</em> can be reused by the application.</p></div>
 <div class="paragraph"><p><em>num_mem_objects</em> is the number of buffer objects that are passed in
 <em>args</em>.</p></div>
 <div class="paragraph"><p><em>mem_list</em> is a list of valid buffer objects, if <em>num_mem_objects</em> &gt; 0.
 The buffer object values specified in <em>mem_list</em> are memory object
 handles (cl_mem values) returned by <strong>clCreateBuffer</strong> or NULL.</p></div>
 <div class="paragraph"><p><em>args_mem_loc</em> is a pointer to appropriate locations that <em>args</em> points
 to where memory object handles (cl_mem values) are stored. Before the
 user function is executed, the memory object handles are replaced by
 pointers to global memory.</p></div>
 <div class="paragraph"><p><em>event_wait_list, num_events_in_wait_list and</em> <em>event</em> are as described
 in <strong>clEnqueueNDRangeKernel</strong>.</p></div>
 <div class="paragraph"><p><strong>clEnqueueNativeKernel</strong> returns CL_SUCCESS if the user function
 execution instance was successfully queued. Otherwise, it returns one
 of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>user_func</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if <em>args</em>
 is a NULL value and <em>cb_args</em> &gt; 0, or if <em>args</em> is a NULL value and
 <em>num_mem_objects</em> &gt; 0.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>args</em> is not NULL and <em>cb_args</em> is 0.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>num_mem_objects</em> &gt; 0 and <em>mem_list</em> or <em>args_mem_loc</em> are NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>num_mem_objects</em> = 0 and <em>mem_list</em> or <em>args_mem_loc</em> are not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the device associated with <em>command_queue</em> cannot execute the native
 kernel.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_MEM_OBJECT if
 one or more memory objects specified in <em>mem_list</em> are not valid or are
 not buffer objects.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to queue the execution instance of <em>kernel</em> on the
 command-queue because of insufficient resources needed to execute the
 kernel.
 </p>
 </li>
 <li>
 <p>
      CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
 memory for data store associated with buffer objects specified as
 arguments to <em>kernel</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 SVM pointers are passed as arguments to a kernel and the device does not
 support SVM or if system pointers are passed as arguments to a kernel
 and/or stored inside SVM allocations passed as kernel arguments and the
 device does not support fine grain system SVM allocations.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>NOTE:</p></div>
 <div class="paragraph"><p>The total number of read-only images specified as arguments to a kernel
 cannot exceed CL_DEVICE_MAX_READ_IMAGE_ARGS. Each image array argument
 to a kernel declared with the read_only qualifier counts as one image.
 The total number of write-only images specified as arguments to a kernel
 cannot exceed CL_DEVICE_MAX_WRITE_IMAGE_ARGS. Each image array argument to a kernel
 declared with the write_only qualifier counts as one image.</p></div>
 <div class="paragraph"><p>The total number of read-write images specified as arguments to a kernel
 cannot exceed CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS. Each image array
 argument to a kernel declared with the read_write qualifier counts as
 one image.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_event_objects">5.11. Event Objects</h3>
 <div class="paragraph"><p>Event objects can be used to refer to a kernel-instance command
 (<strong>clEnqueueNDRangeKernel, clEnqueueNativeKernel</strong>), read, write, map and
 copy commands on memory objects (<strong>clEnqueue{Read|Write|Map}Buffer,
 clEnqueueUnmapMemObject</strong>, <strong>clEnqueue{Read|Write}BufferRect</strong>,
 <strong>clEnqueue{Read|Write|Map}Image</strong>, <strong>clEnqueueCopy{Buffer|Image}</strong>,
 <strong>clEnqueueCopyBufferRect</strong>, <strong>clEnqueueCopyBufferToImage</strong>,
 <strong>clEnqueueCopyImageToBuffer),</strong> <strong>clEnqueueSVMMemcpy</strong>,
 <strong>clEnqueueSVMMemFill</strong>, <strong>clEnqueueSVMMap</strong>, <strong>clEnqueueSVMUnmap</strong>,
 <strong>clEnqueueSVMFree</strong>,
 <strong>clEnqueueMarkerWithWaitList</strong>,<strong>clEnqueueBarrierWithWaitList</strong>(refer to
 <em>section 5.12</em>) or user events.</p></div>
 <div class="paragraph"><p>An event object can be used to track the execution status of a command.
 The API calls that enqueue commands to a command-queue create a new
 event object that is returned in the <em>event</em> argument. In case of an
 error enqueuing the command in the command-queue the event argument does
 not return an event object.</p></div>
 <div class="paragraph"><p>The execution status of an enqueued command at any given point in time
 can be one of the following:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_QUEUED  This indicates
 that the command has been enqueued in a command-queue. This is the
 initial state of all events except user events.
 </p>
 </li>
 <li>
 <p>
       CL_SUBMITTED  This is the
 initial state for all user events. For all other events, this indicates
 that the command has been submitted by the host to the device.
 </p>
 </li>
 <li>
 <p>
       CL_RUNNING  This
 indicates that the device has started executing this command. In order
 for the execution status of an enqueued command to change from
 CL_SUBMITTED to CL_RUNNING, all events that this command is waiting on
 must have completed successfully i.e. their execution status must be
 CL_COMPLETE.
 </p>
 </li>
 <li>
 <p>
       CL_COMPLETE  This
 indicates that the command has successfully completed.
 </p>
 </li>
 <li>
 <p>
       Error code  The error
 code is a negative integer value and indicates that the command was
 abnormally terminated. Abnormal termination may occur for a number of
 reasons such as a bad memory access.
 </p>
 </li>
 </ul></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">A command is considered to be complete if its execution status is
 CL_COMPLETE or is a negative integer value.</td>
 </tr></table>
 </div>
 <div class="paragraph"><p>If the execution of a command is terminated, the command-queue
 associated with this terminated command, and the associated context (and
 all other command-queues in this context) may no longer be available.
 The behavior of OpenCL API calls that use this context (and
 command-queues associated with this context) are now considered to be
 implementation-defined. The user registered callback function specified
 when context is created can be used to report appropriate error
 information.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_event clCreateUserEvent(cl_context context,
                            cl_int *errcode_ret)</pre>
 </div></div>
 <div class="paragraph"><p>creates a user event object. User events allow applications to enqueue
 commands that wait on a user event to finish before the command is
 executed by the device.</p></div>
 <div class="paragraph"><p><em>context</em> must be a valid OpenCL context.</p></div>
 <div class="paragraph"><p><em>errcode_ret</em> will return an appropriate error code. If <em>errcode_ret</em>
 is NULL, no error code is returned.</p></div>
 <div class="paragraph"><p><strong>clCreateUserEvent</strong> returns a valid non-zero event object and
 <em>errcode_ret_is set to CL_SUCCESS if the user event object is created
 successfully. Otherwise, it returns a NULL value with one of the
 following error values returned in _errcode_ret</em>:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 _context_is not a valid context.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The execution status of the user event object created is set to
 CL_SUBMITTED.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clSetUserEventStatus(cl_event event,
                             cl_int execution_status)</pre>
 </div></div>
 <div class="paragraph"><p>sets the execution status of a user event object.</p></div>
 <div class="paragraph"><p><em>event</em> is a user event object created using <strong>clCreateUserEvent</strong>.</p></div>
 <div class="paragraph"><p><em>execution_status</em> specifies the new execution status to be set and can
 be CL_COMPLETE or a negative integer value to indicate an error. A
 negative integer value causes all enqueued commands that wait on this
 user event to be terminated. <strong>clSetUserEventStatus</strong> can only be called
 once to change the execution status of <em>event</em>.</p></div>
 <div class="paragraph"><p><strong>clSetUserEventStatus</strong> returns CL_SUCCESS if the function was executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_EVENT if
 <em>event</em> is not a valid user event object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if the
 <em>execution_status</em> is not CL_COMPLETE or a negative integer value.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_OPERATION if
 the <em>execution_status</em> for <em>event</em> has already been changed by a
 previous call to <strong>clSetUserEventStatus</strong>.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">If there are enqueued commands with user events in the
 <em>event_wait_list</em> argument of <strong>clEnqueue</strong> commands, the user must
 ensure that the status of these user events being waited on are set
 using <strong>clSetUserEventStatus</strong> before any OpenCL APIs that release OpenCL
 objects except for event objects are called; otherwise the behavior is
 undefined.</td>
 </tr></table>
 </div>
 <div class="paragraph"><p>For example, the following code sequence will result in undefined
 behavior of <strong>clReleaseMemObject</strong>.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>ev1 = clCreateUserEvent(ctx, NULL);

 clEnqueueWriteBuffer(cq, buf1, CL_FALSE, ..., 1, &amp;ev1, NULL);

 clEnqueueWriteBuffer(cq, buf2, CL_FALSE,...);

 clReleaseMemObject(buf2);

 clSetUserEventStatus(ev1, CL_COMPLETE);</pre>
 </div></div>
 <div class="paragraph"><p>The following code sequence, however, works correctly.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>ev1 = clCreateUserEvent(ctx, NULL);

 clEnqueueWriteBuffer(cq, buf1, CL_FALSE, ...,1, &amp;ev1, NULL);

 clEnqueueWriteBuffer(cq, buf2, CL_FALSE,...);

 clSetUserEventStatus(ev1, CL_COMPLETE);

 clReleaseMemObject(buf2);</pre>
 </div></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clWaitForEvents(cl_uint num_events,
                        const cl_event *event_list)</pre>
 </div></div>
 <div class="paragraph"><p>waits on the host thread for commands identified by event objects in
 <em>event_list</em> to complete. A command is considered complete if its
 execution status is CL_COMPLETE or a negative value. The events
 specified in <em>event_list</em> act as synchronization points.</p></div>
 <div class="paragraph"><p><strong>clWaitForEvents</strong> returns CL_SUCCESS if the execution status of all
 events in <em>event_list</em> is CL_COMPLETE. Otherwise, it returns one of the
 following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>num_events</em> is zero or <em>event_list</em> is NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 events specified in <em>event_list</em> do not belong to the same context.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT if event
 objects specified in <em>event_list</em> are not valid event objects.
 </p>
 </li>
 <li>
 <p>
   CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the execution status of
 any of the events in <em>event_list</em> is a negative integer value.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetEventInfo(cl_event event,
                       cl_event_info param_name,
                       size_t param_value_size,
                       void *param_value,
                       size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns information about the event object.</p></div>
 <div class="paragraph"><p><em>event</em> specifies the event object being queried.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the information to query. The list of supported
 <em>param_name</em> types and the information returned in <em>param_value</em> by
 <strong>clGetEventInfo</strong> is described in <em>table 5.24</em>.</p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.24</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 30. <em>clGetEventInfo</em> <em>parameter queries.</em></caption>
 <col style="width:30%;">
 <col style="width:33%;">
 <col style="width:37%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_event_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_EVENT_COMMAND_<br>
  QUEUE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_command_<br>
  queue</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the command-queue associated with <em>event</em>. For user
 event objects, a NULL value is returned.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_EVENT_CONTEXT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_context</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the context associated with
 <em>event</em>.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_EVENT_COMMAND_<br>
  TYPE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_command_type</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the command associated with event.
 Can be one of the following values:
 CL_COMMAND_NDRANGE_KERNEL
 CL_COMMAND_NATIVE_KERNEL
 CL_COMMAND_READ_BUFFER
 CL_COMMAND_WRITE_BUFFER
 CL_COMMAND_COPY_BUFFER
 CL_COMMAND_READ_IMAGE
 CL_COMMAND_WRITE_IMAGE
 CL_COMMAND_COPY_IMAGE
 CL_COMMAND_COPY_BUFFER_ TO_IMAGE
 CL_COMMAND_COPY_IMAGE_ TO_BUFFER
 CL_COMMAND_MAP_BUFFER
 CL_COMMAND_MAP_IMAGE
 CL_COMMAND_UNMAP_MEM_ OBJECT
 CL_COMMAND_MARKER
 CL_COMMAND_ACQUIRE_ GL_OBJECTS
 CL_COMMAND_RELEASE_ GL_OBJECTS
 CL_COMMAND_READ_ BUFFER_RECT
 CL_COMMAND_WRITE_ BUFFER_RECT
 CL_COMMAND_COPY_ BUFFER_RECT
 CL_COMMAND_USER
 CL_COMMAND_BARRIER
 CL_COMMAND_MIGRATE_ MEM_OBJECTS
 CL_COMMAND_FILL_BUFFER
 CL_COMMAND_FILL_IMAGE
 CL_COMMAND_SVM_FREE
 CL_COMMAND_SVM_MEMCPY
 CL_COMMAND_SVM_MEMFILL
 CL_COMMAND_SVM_MAP
 CL_COMMAND_SVM_UNMAP</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_EVENT_COMMAND_ EXECUTION_STATUS*<span class="footnote"><br>[The error code values are negative, and event state values are positive.  The event state values are ordered from
 the largest value (CL_QUEUED) for the first or initial state to the smallest value  (CL_COMPLETE or negative
 integer value) for the last or complete state. The value of CL_COMPLETE and CL_SUCCESS are the same.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_int</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the execution status of the command
 identified by event.
 Valid values are:
 <br>
 <br>
 CL_QUEUED (command has been enqueued n the command-queue),
 <br>
 <br>
 CL_SUBMITTED (enqueued command has
 been submitted by the host to the device
 associated with the command-queue),
 <br>
 <br>
 CL_RUNNING (device is currently executing
 this command),
 <br>
 <br>
 CL_COMPLETE (the command has
 completed), or
 <br>
 <br>
 Error code given by a negative integer value.
 (command was abnormally terminated – this
 may be caused by a bad memory access etc.).
 These error codes come from the same set of
 error codes that are returned from the
 platform or runtime API calls as return
 values or errcode_ret values.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">*CL_EVENT_REFERENCE_ COUNT*<span class="footnote"><br>[The reference count returned should be considered immediately stale. It is unsuitable for general use in
 applications. This feature is provided for identifying memory leaks.]<br></span>:</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Return the <em>event</em> reference
 count.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>Using <strong>clGetEventInfo</strong> to determine if a command identified by <em>event</em>
 has finished execution (i.e. CL_EVENT_COMMAND_EXECUTION_STATUS returns
 CL_COMPLETE) is not a synchronization point. There are no guarantees
 that the memory objects being modified by command associated with
 <em>event</em> will be visible to other enqueued commands.</p></div>
 <div class="paragraph"><p><strong>clGetEventInfo</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table 5.23</em>
 and <em>param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 information to query given in <em>param_name</em> cannot be queried for
 <em>event</em>.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT if
 <em>event</em> is a not a valid event object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clSetEventCallback(cl_event event,
                           cl_int command_exec_callback_type,
                           void (CL_CALLBACK *pfn_event_notify)(
                               cl_event event,
                               cl_int event_command_exec_status,
                               void *user_data),
                           void *user_data)</pre>
 </div></div>
 <div class="paragraph"><p>registers a user callback function for a specific command execution
 status. The registered callback function will be called when the
 execution status of command associated with <em>event</em> changes to an
 execution status equal to or past the status specified by
 <em>command_exec_status</em>.</p></div>
 <div class="paragraph"><p>Each call to <strong>clSetEventCallback</strong> registers the specified user callback
 function on a callback stack associated with <em>event</em>. The order in
 which the registered user callback functions are called is undefined.</p></div>
 <div class="paragraph"><p><em>event</em> is a valid event object.</p></div>
 <div class="paragraph"><p><em>command_exec_callback_type</em> specifies the command execution status for
 which the callback is registered. The command execution callback
 values for which a callback can be registered are: CL_SUBMITTED,
 CL_RUNNING or CL_COMPLETE<span class="footnote"><br>[The callback function registered for a command_exec_callback_type value of CL_COMPLETE will be called
 when the command has completed successfully or is abnormally terminated.]<br></span>:. There is no guarantee that
 the callback functions registered for various execution status values
 for an event will be called in the exact order that the execution status
 of a command changes. Furthermore, it should be noted that receiving a
 call back for an event with a status other than CL_COMPLETE, in no way
 implies that the memory model or execution model as defined by the
 OpenCL specification has changed. For example, it is not valid to assume
 that a corresponding memory transfer has completed unless the event is
 in a state CL_COMPLETE.</p></div>
 <div class="paragraph"><p><em>pfn_event_notify</em> is the event callback function that can be registered
 by the application. This callback function may be called asynchronously
 by the OpenCL implementation. It is the applications responsibility to
 ensure that the callback function is thread-safe. The parameters to
 this callback function are:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       <em>event</em> is the event
 object for which the callback function is invoked.
 </p>
 </li>
 <li>
 <p>
      <em>event_command_ _exec_status</em> is equal to the
 <em>command_exec_callback_type</em> used while registering the callback. Refer
 to table 5.23 for the command execution status values. If the callback
 is called as the result of the command associated with event being
 abnormally terminated, an appropriate error code for the error that
 caused the termination will be passed to <em>event_command_exec_status</em>
 instead.
 </p>
 </li>
 <li>
 <p>
       <em>user_data</em> is a pointer to user supplied data.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><em>user_data</em> will be passed as the <em>user_data</em> argument when <em>pfn_notify</em>
 is called. <em>user_data</em> can be NULL.</p></div>
 <div class="paragraph"><p>All callbacks registered for an event object must be called. All
 enqueued callbacks shall be called before the event object is
 destroyed. Callbacks must return promptly.   The behavior of calling
 expensive system routines, OpenCL API calls to create contexts or
 command-queues, or blocking OpenCL operations from the following list
 below, in a callback is undefined.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>clFinish</strong>,
 </p>
 </li>
 <li>
 <p>
 <strong>clWaitForEvents</strong>,
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clEnqueueReadBuffer</strong>, <strong>clEnqueueReadBufferRect</strong>,
 </p>
 </li>
 <li>
 <p>
 <strong>clEnqueueWriteBuffer</strong>, <strong>clEnqueueWriteBufferRect</strong>,
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clEnqueueReadImage</strong> and <strong>clEnqueueWriteImage</strong>,
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueMapImage</strong>,
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clBuildProgram</strong>, <strong>clCompileProgram</strong> or <strong>clLinkProgram</strong>,
 </p>
 </li>
 <li>
 <p>
 blocking calls to <strong>clEnqueueSVMMemcpy</strong> or <strong>clEnqueueSVMMap</strong>
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>If an application needs to wait for completion of a routine from the
 above list in a callback, please use the non-blocking form of the
 function, and assign a completion callback to it to do the remainder of
 your work.  Note that when a callback (or other code) enqueues commands
 to a command-queue, the commands are not required to begin execution
 until the queue is flushed. In standard usage, blocking enqueue calls
 serve this role by implicitly flushing the queue. Since blocking calls
 are not permitted in callbacks, those callbacks that enqueue commands on
 a command queue should either call <strong>clFlush</strong> on the queue before
 returning or arrange for <strong>clFlush</strong> to be called later on another thread.</p></div>
 <div class="paragraph"><p><strong>clSetEventCallback</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_EVENT if
 <em>event</em> is not a valid event object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>pfn_event_notify</em> is NULL or if <em>command_exec_callback_type</em> is not
 CL_SUBMITTED, CL_RUNNING or CL_COMPLETE.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clRetainEvent(cl_event event)</pre>
 </div></div>
 <div class="paragraph"><p>increments the <em>event</em> reference count. The OpenCL commands that return
 an event perform an implicit retain.</p></div>
 <div class="paragraph"><p><strong>clRetainEvent</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_EVENT if
 <em>event</em> is not a valid event object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>To release an event, use the following function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clReleaseEvent(cl_event event)</pre>
 </div></div>
 <div class="paragraph"><p>decrements the <em>event</em> reference count.</p></div>
 <div class="paragraph"><p><strong>clReleaseEvent</strong> returns CL_SUCCESS if the function is executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_EVENT if
 <em>event</em> is not a valid event object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The event object is deleted once the reference count becomes zero, the
 specific command identified by this event has completed (or terminated)
 and there are no commands in the command-queues of a context that
 require a wait for this event to complete. Using this function to
 release a reference that was not obtained by creating the object or by
 calling <strong>clRetainEvent</strong> causes undefined behavior.</p></div>
 <div class="admonitionblock">
 <table><tr>
 <td class="icon">
 <div class="title">Note</div>
 </td>
 <td class="content">Developers should be careful when releasing their last reference
 count on events created by <strong>clCreateUserEvent</strong> that have not yet been
 set to status of CL_COMPLETE or an error. If the user event was used in
 the event_wait_list argument passed to a <strong>clEnqueue</strong> API or another
 application host thread is waiting for it in <strong>clWaitForEvents</strong>, those
 commands and host threads will continue to wait for the event status to
 reach CL_COMPLETE or error, even after the application has released the
 object. Since in this scenario the application has released its last
 reference count to the user event, it would be in principle no longer
 valid for the application to change the status of the event to unblock
 all the other machinery. As a result the waiting tasks will wait
 forever, and associated events, cl_mem objects, command queues and
 contexts are likely to leak. In-order command queues caught up in this
 deadlock may cease to do any work.</td>
 </tr></table>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_markers_barriers_and_waiting_for_events">5.12. Markers, Barriers and Waiting for Events</h3>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueMarkerWithWaitList(cl_command_queue command_queue,
                                    cl_uint num_events_in_wait_list,
                                    const cl_event *event_wait_list,
                                    cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a marker command which waits for either a list of events to
 complete, or if the list is empty it waits for all commands previously
 enqueued in <em>command_queue</em> to complete before it completes. This
 command returns an <em>event</em> which can be waited on, i.e. this event can
 be waited on to insure that all events either in the <em>event_wait_list</em>
 or all previously enqueued commands, queued before this command to
 <em>command_queue</em>, have completed.</p></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed.</p></div>
 <div class="paragraph"><p>If <em>event_wait_list</em> is NULL, <em>num_events_in_wait_list</em> must be 0. If
 <em>event_wait_list</em> is not NULL, the list of events pointed to by
 <em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
 greater than 0. The events specified in <em>event_wait_list</em> act as
 synchronization points. The context associated with events in
 <em>event_wait_list</em> and <em>command_queue</em> must be the same. The memory
 associated with <em>event_wait_list</em> can be reused or freed after the
 function returns.</p></div>
 <div class="paragraph"><p>If <em>event_wait_list</em> is NULL, then this particular command waits until
 all previous enqueued commands to <em>command_queue</em> have completed.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular
 command. Event objects are unique and can be used to identify this
 marker command later on. <em>event</em> can be NULL in which case it will not
 be possible for the application to query the status of this command or
 queue a wait for this command to complete. If the <em>event_wait_list</em> and
 the <em>event</em> arguments are not NULL, the <em>event</em> argument should not
 refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueMarkerWithWaitList</strong> returns CL_SUCCESS if the function is
 successfully executed. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events. <br>
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clEnqueueBarrierWithWaitList(cl_command_queue command_queue,
                                     cl_uint num_events_in_wait_list,
                                     const cl_event *event_wait_list,
                                     cl_event *event)</pre>
 </div></div>
 <div class="paragraph"><p>enqueues a barrier command which waits for either a list of events to
 complete, or if the list is empty it waits for all commands previously
 enqueued in <em>command_queue</em> to complete before it completes. This
 command blocks command execution, that is, any following commands
 enqueued after it do not execute until it completes. This command
 returns an <em>event</em> which can be waited on, i.e. this event can be waited
 on to insure that all events either in the <em>event_wait_list</em> or all
 previously enqueued commands, queued before this command to
 <em>command_queue</em>, have completed</p></div>
 <div class="paragraph"><p><em>command_queue</em> is a valid host command-queue.</p></div>
 <div class="paragraph"><p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need
 to complete before this particular command can be executed.</p></div>
 <div class="paragraph"><p>If <em>event_wait_list</em> is NULL, <em>num_events_in_wait_list</em> must be 0. If
 <em>event_wait_list</em> is not NULL, the list of events pointed to by
 <em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
 greater than 0. The events specified in <em>event_wait_list</em> act as
 synchronization points. The context associated with events in
 <em>event_wait_list</em> and <em>command_queue</em> must be the same. The memory
 associated with <em>event_wait_list</em> can be reused or freed after the
 function returns.</p></div>
 <div class="paragraph"><p>If <em>event_wait_list</em> is NULL, then this particular command waits until
 all previous enqueued commands to <em>command_queue</em> have completed.</p></div>
 <div class="paragraph"><p><em>event</em> returns an event object that identifies this particular
 command. Event objects are unique and can be used to identify this
 barrier command later on. <em>event</em> can be NULL in which case it will not
 be possible for the application to query the status of this command or
 queue a wait for this command to complete. If the <em>event_wait_list</em> and
 the <em>event</em> arguments are not NULL, the <em>event</em> argument should not
 refer to an element of the <em>event_wait_list</em> array.</p></div>
 <div class="paragraph"><p><strong>clEnqueueBarrierWithWaitList</strong> returns CL_SUCCESS if the function is
 successfully executed. Otherwise, it returns one of the following
 errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_CONTEXT if
 context associated with <em>command_queue</em> and events in <em>event_wait_list</em>
 are not the same.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT_WAIT_LIST
 if <em>event_wait_list</em> is NULL and <em>num_events_in_wait_list</em> &gt; 0, or
 <em>event_wait_list</em> is not NULL and <em>num_events_in_wait_list</em> is 0, or if
 event objects in <em>event_wait_list</em> are not valid events.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect2">
 <h3 id="_out_of_order_execution_of_kernels_and_memory_object_commands">5.13. Out-of-order Execution of Kernels and Memory Object Commands</h3>
 <div class="paragraph"><p>The OpenCL functions that are submitted to a command-queue are enqueued
 in the order the calls are made but can be configured to execute
 in-order or out-of-order. The _properties_argument in
 <strong>clCreateCommandQueueWithProperties</strong> can be used to specify the
 execution order.</p></div>
 <div class="paragraph"><p>If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a
 command-queue is not set, the commands enqueued to a command-queue
 execute in order. For example, if an application calls
 <strong>clEnqueueNDRangeKernel</strong> to execute kernel A followed by a
 <strong>clEnqueueNDRangeKernel</strong> to execute kernel B, the application can assume
 that kernel A finishes first and then kernel B is executed. If the
 memory objects output by kernel A are inputs to kernel B then kernel B
 will see the correct data in memory objects produced by execution of
 kernel A. If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a
 command-queue is set, then there is no guarantee that kernel A will
 finish before kernel B starts execution.</p></div>
 <div class="paragraph"><p>Applications can configure the commands enqueued to a command-queue to
 execute out-of-order by setting the
 CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of the command-queue.
 This can be specified when the command-queue is created. In
 out-of-order execution mode there is no guarantee that the enqueued
 commands will finish execution in the order they were queued. As there
 is no guarantee that kernels will be executed in order, i.e. based on
 when the <strong>clEnqueueNDRangeKernel</strong> calls are made within a
 command-queue, it is therefore possible that an earlier
 <strong>clEnqueueNDRangeKernel</strong> call to execute kernel A identified by event A
 may execute and/or finish later than a <strong>clEnqueueNDRangeKernel</strong> call to
 execute kernel B which was called by the application at a later point in
 time. To guarantee a specific order of execution of kernels, a wait on
 a particular event (in this case event A) can be used. The wait for
 event A can be specified in the <em>event_wait_list</em> argument to
 <strong>clEnqueueNDRangeKernel</strong> for kernel B.</p></div>
 <div class="paragraph"><p>In addition, a marker (<strong>clEnqueueMarkerWithWaitList</strong>) or a barrier
 (<strong>clEnqueueBarrierWithWaitList</strong>) command can be enqueued to the
 command-queue. The marker command ensures that previously enqueued
 commands identified by the list of events to wait for (or all previous
 commands) have finished. A barrier command is similar to a marker
 command, but additionally guarantees that no later-enqueued commands
 will execute until the waited-for commands have executed.</p></div>
 <div class="paragraph"><p>Similarly, commands to read, write, copy or map memory objects that are
 enqueued after <strong>clEnqueueNDRangeKernel</strong> or <strong>clEnqueueNativeKernel</strong>
 commands are not guaranteed to wait for kernels scheduled for execution
 to have completed (if the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
 property is set). To ensure correct ordering of commands, the event
 object returned by <strong>clEnqueueNDRangeKernel</strong> or <strong>clEnqueueNativeKernel</strong>
 can be used to enqueue a wait for event or a barrier command can be
 enqueued that must complete before reads or writes to the memory
 object(s) occur.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_profiling_operations_on_memory_objects_and_kernels">5.14. Profiling Operations on Memory Objects and Kernels</h3>
 <div class="paragraph"><p>This section describes profiling of OpenCL functions that are enqueued
 as commands to a command-queue. The specific
 functions<span class="footnote"><br>[clEnqueueAcquireGLObjects and clEnqueueReleaseGLObjects defined in section 9.6.6 of the
 OpenCL 2.0 Extension Specification are also included.]<br></span>: being referred to are:
 <strong>clEnqueue{Read|Write|Map}Buffer, clEnqueue{Read|Write}BufferRect,
 clEnqueue{Read|Write|Map}Image,
 clEnqueueUnmapMemObject</strong>,<strong>clEnqueueSVMMemcpy</strong>, <strong>clEnqueueSVMMemFill</strong>,
 <strong>clEnqueueSVMMap</strong>, <strong>clEnqueueSVMUnmap</strong>,
 <strong>clEnqueueSVMFree</strong>,<strong>clEnqueueCopyBuffer, clEnqueueCopyBufferRect,
 clEnqueueCopyImage, clEnqueueCopyImageToBuffer,
 clEnqueueCopyBufferToImage,</strong> <strong>clEnqueueNDRangeKernel</strong> and
 <strong>clEnqueueNativeKernel</strong>. These enqueued commands are identified by
 unique event objects.</p></div>
 <div class="paragraph"><p>Event objects can be used to capture profiling information that measure
 execution time of a command. Profiling of OpenCL commands can be
 enabled either by using a command-queue created with
 CL_QUEUE_PROFILING_ENABLE flag set in _properties_argument to
 <strong>clCreateCommandQueueWithProperties</strong>.</p></div>
 <div class="paragraph"><p>If profiling is enabled, the function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clGetEventProfilingInfo(cl_event event,
                                cl_profiling_info param_name,
                                size_t param_value_size,
                                void *param_value,
                                size_t *param_value_size_ret)</pre>
 </div></div>
 <div class="paragraph"><p>returns profiling information for the command associated with event.</p></div>
 <div class="paragraph"><p><em>event</em> specifies the event object.</p></div>
 <div class="paragraph"><p><em>param_name</em> specifies the profiling data to query. The list of
 supported <em>param_name</em> types and the information returned in
 <em>param_value</em> by <strong>clGetEventProfilingInfo</strong> is described in <em>table 5.25</em></p></div>
 <div class="paragraph"><p><em>param_value</em> is a pointer to memory where the appropriate result being
 queried is returned. If <em>param_value</em> is NULL, it is ignored.</p></div>
 <div class="paragraph"><p><em>param_value_size</em> is used to specify the size in bytes of memory
 pointed to by <em>param_value</em>. This size must be &gt;= size of return type
 as described in <em>table 5.25</em>.</p></div>
 <div class="paragraph"><p><em>param_value_size_ret</em> returns the actual size in bytes of data being
 queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
 ignored.</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <caption class="title">Table 31. <em>clGetEventProfilingInfo</em> <em>parameter queries.</em></caption>
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_profiling_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Info. returned in <em>param_value</em></strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROFILING_COMMAND_ QUEUED</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A 64-bit value that describes the
 current device time counter in
 nanoseconds when the command
 identified by event is enqueued in a
 command-queue by the host.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROFILING_COMMAND_ SUBMIT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A 64-bit value that describes the
 current device time counter in
 nanoseconds when the command
 identified by event that has been
 enqueued is submitted by the host to
 the device associated with the
 command-queue.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROFILING_COMMAND_ START</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A 64-bit value that describes the
 current device time counter in
 nanoseconds when the command
 identified by event starts execution on
 the device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROFILING_COMMAND_ END</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A 64-bit value that describes the
 current device time counter in
 nanoseconds when the command
 identified by event has finished
 execution on the device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_PROFILING_COMMAND_ COMPLETE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">A 64-bit value that describes the current device
 time counter in nanoseconds when the command identified by event and any
 child commands enqueued by this command on the device have finished
 execution.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>
 The unsigned 64-bit values returned can be used to measure the time in
 nano-seconds consumed by OpenCL commands.</p></div>
 <div class="paragraph"><p>OpenCL devices are required to correctly track time across changes in
 device frequency and power states. The
 CL_DEVICE_PROFILING_TIMER_RESOLUTION specifies the resolution of the
 timer i.e. the number of nanoseconds elapsed before the timer is
 incremented.</p></div>
 <div class="paragraph"><p><strong>clGetEventProfilingInfo</strong> returns CL_SUCCESS if the function is executed
 successfully and the profiling information has been recorded.
 Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
      CL_PROFILING_INFO_NOT_AVAILABLE if the CL_QUEUE_PROFILING_ENABLE flag is
 not set for the command-queue, if the execution status of the command
 identified by <em>event</em> is not CL_COMPLETE or if <em>event</em> refers to the
 <strong>clEnqueueSVMFree</strong> command or is a user event object.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_VALUE if
 <em>param_name</em> is not valid, or if size in bytes specified by
 <em>param_value_size</em> is &lt; size of return type as described in <em>table 5.25</em>
 and <em>param_value</em> is not NULL.
 </p>
 </li>
 <li>
 <p>
       CL_INVALID_EVENT if
 <em>event</em> is a not a valid event object.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect2">
 <h3 id="_flush_and_finish">5.15. Flush and Finish</h3>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clFlush(cl_command_queue command_queue)</pre>
 </div></div>
 <div class="paragraph"><p>issues all previously queued OpenCL commands in <em>command_queue_to the
 device associated with_command_queue</em>. <strong>clFlush</strong> only guarantees that
 all queued commands to <em>command_queue</em> will eventually be submitted to
 the appropriate device. There is no guarantee that they will be
 complete after <strong>clFlush</strong> returns.</p></div>
 <div class="paragraph"><p><strong>clFlush</strong> returns CL_SUCCESS if the function call was executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Any blocking commands queued in a command-queue and
 <strong>clReleaseCommandQueue</strong> perform an implicit flush of the command-queue.
 These blocking commands are <strong>clEnqueueReadBuffer</strong>,
 <strong>clEnqueueReadBufferRect</strong>, <strong>clEnqueueReadImage</strong>, with <em>blocking_read</em>
 set to CL_TRUE;
 <strong>clEnqueueWriteBuffer</strong>,<strong>clEnqueueWriteBufferRect</strong>,<strong>clEnqueueWriteImage</strong>
 with <em>blocking_write</em> set to CL_TRUE; <strong>clEnqueueMapBuffer</strong>,
 <strong>clEnqueueMapImage</strong> with <em>blocking_map</em> set to CL_TRUE;
 <strong>clEnqueueSVMMemcpy</strong> with <em>blocking_copy</em> set to CL_TRUE;
 <strong>clEnqueueSVMMap</strong> with <em>blocking_map</em> set to CL_TRUE or
 <strong>clWaitForEvents</strong>.</p></div>
 <div class="paragraph"><p>To use event objects that refer to commands enqueued in a command-queue
 as event objects to wait on by commands enqueued in a different
 command-queue, the application must call a <strong>clFlush</strong> or any blocking
 commands that perform an implicit flush of the command-queue where the
 commands that refer to these event objects are enqueued.</p></div>
 <div class="paragraph"><p>The function</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_int clFinish(cl_command_queue command_queue)</pre>
 </div></div>
 <div class="paragraph"><p>blocks until all previously queued OpenCL commands in <em>command_queue</em>
 are issued to the associated device and have completed. <strong>clFinish</strong> does
 not return until all previously queued commands in <em>command_queue</em> have
 been processed and completed. <strong>clFinish</strong> is also a synchronization
 point.</p></div>
 <div class="paragraph"><p><strong>clFinish</strong> returns CL_SUCCESS if the function call was executed
 successfully. Otherwise, it returns one of the following errors:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
       CL_INVALID_COMMAND_QUEUE
 if <em>command_queue</em> is not a valid host command-queue.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_RESOURCES if
 there is a failure to allocate resources required by the OpenCL
 implementation on the device.
 </p>
 </li>
 <li>
 <p>
       CL_OUT_OF_HOST_MEMORY if
 there is a failure to allocate resources required by the OpenCL
 implementation on the host.
 </p>
 </li>
 </ul></div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_associated_opencl_specification">6. Associated OpenCL specification</h2>
 <div class="sectionbody">
 <div class="sect2">
 <h3 id="_spir_v_intermediate_language">6.1. SPIR-V Intermediate language</h3>
 <div class="paragraph"><p>The OpenCL 2.2 specification requires support for the SPIR-V
 intermediate language that allows offline, or linked online, compilation
 to a binary format that may be consumed by the <strong>clCreateProgramWithIL</strong>
 interface.</p></div>
 <div class="paragraph"><p>The OpenCL specification includes a specification for the SPIR-V 1.2
 intermediate language as a cross-platform input language. In addition,
 platform vendors may support their own IL if this is appropriate. The
 OpenCL runtime will return a list of supported IL versions using the
 <strong>CL_DEVICE_IL_VERSION</strong> parameter to the *clGetDeviceInfo*query.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_extensions_to_opencl">6.2. Extensions to OpenCL</h3>
 <div class="paragraph"><p>In addition to the specification of core features, OpenCL provides a
 number of extensions to the API, kernel language or intermediate
 representation. These features are defined in the OpenCL 2.2 extensions
 specification document.</p></div>
 <div class="paragraph"><p>Extensions defined against earlier versions of the OpenCL
 specifications, whether the API or language specification, are defined
 in the matching versions of the extension specification document.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_support_for_earlier_opencl_c_kernel_languages">6.3. Support for earlier OpenCL C kernel languages</h3>
 <div class="paragraph"><p>The OpenCL C kernel language is not defined in the OpenCL 2.2
 specification. New language features are described in the OpenCL C++
 specification as well as the SPIR-V 1.2 specification and in kernel
 languages that target it. A kernel language defined by any of the OpenCL
 1.0, OpenCL 1.1, OpenCL 1.2 and OpenCL 2.0 kernel language
 specifications as well as kernels language extensions defined by the
 matching versions of OpenCL extension specifications are valid to pass
 to <strong>clCreateProgramWithSource</strong> executing against an OpenCL 2.2 runtime.</p></div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_opencl_embedded_profile">7. OpenCL Embedded Profile</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>The OpenCL 2.2specification describes the feature requirements for
 desktop platforms. This section describes the OpenCL 2.2embedded
 profile that allows us to target a subset of the OpenCL 2.2specification
 for handheld and embedded platforms. The optional extensions defined in
 the OpenCL 2.2Extension Specification apply to both profiles.</p></div>
 <div class="paragraph"><p>The OpenCL 2.2 embedded profile has the following restrictions:</p></div>
 <div class="olist arabic"><ol class="arabic">
 <li>
 <p>
 64 bit integers i.e. long, ulong including the appropriate vector
 data types and operations on 64-bit integers are optional. The
 *cles_khr_int64<span class="footnote"><br>[Note that the performance of 64-bit integer arithmetic
 can vary significantly between embedded devices.]<br></span>: extension string will be reported
 if the embedded profile implementation supports 64-bit integers.
 </p>
 </li>
 <li>
 <p>
 [line-through] Support for 3D images is optional.
 </p>
 </li>
 <li>
 <p>
 If [line-through] <strong>CL_DEVICE_IMAGE3D_MAX_WIDTH</strong>, [line-through] <strong>CL_DEVICE_IMAGE3D_MAX_HEIGHT</strong>
 and [line-through] <strong>CL_DEVICE_IMAGE3D_MAX_DEPTH</strong> are zero, the call to
 <strong>clCreateImage</strong> in the embedded profile will fail to create the 3D
 image. The <em>errcode_ret</em> argument in <strong>clCreateImage</strong>
 returns [line-through] CL_INVALID_OPERATION. Declaring arguments of
 type*[line-through]*image3d_t in a kernel will result in a compilation
 error.
 </p>
 </li>
 </ol></div>
 <div class="paragraph"><p>If
 [line-through] <strong>CL_DEVICE_IMAGE3D_MAX_WIDTH</strong>, [line-through] <strong>CL_DEVICE_IMAGE3D_HEIGHT</strong> and
 [line-through] CL_DEVICE_IMAGE3D_MAX_DEPTH &gt; 0, 3D images are supported by the OpenCL embedded profile implementation.
 [line-through] <strong>clCreateImage</strong> will work as defined by the OpenCL specification. The
 [line-through] image3d_t data type can be used in a kernel(s).</p></div>
 <div class="olist arabic"><ol class="arabic">
 <li>
 <p>
 [line-through] Support for 2D image array writes is optional. If the <strong>cles_khr_2d_image_array_writes</strong> extension is supported by the
 [line-through] embedded profile, writes to 2D image arrays are supported.
 </p>
 </li>
 <li>
 <p>
 [line-through] Image and image arrays created with an
 [line-through] image_channel_data_type value of
 [line-through] CL_FLOAT or
 [line-through] CL_HALF_FLOAT can only be used with samplers that use a filter mode of
 [line-through] CL_FILTER_NEAREST. The values returned by <strong>read_imagef</strong> and *read_imageh*<span class="footnote"><br>[If cl_khr_fp16 extension is supported.]<br></span>: for 2D and 3D images if
 [line-through] image_channel_data_type value is
 [line-through] CL_FLOAT or
 [line-through] CL_HALF_FLOAT and sampler with
 [line-through] filter_mode =CL_FILTER_LINEAR are undefined.
 </p>
 </li>
 <li>
 <p>
 The mandated minimum single precision floating-point capability
 given by CL_DEVICE_SINGLE_FP_CONFIG is CL_FP_ROUND_TO_ZERO or
 CL_FP_ROUND_TO_NEAREST. If CL_FP_ROUND_TO_NEAREST is supported, the
 default rounding mode will be round to nearest even; otherwise the
 default rounding mode will be round to zero. <br>
 </p>
 </li>
 <li>
 <p>
 The single precision floating-point operations (addition,
 subtraction and multiplication) shall be correctly rounded. Zero
 results may always be positive 0.0. The accuracy of division and sqrt
 are given in the SPIR-V OpenCL environment specification.
 </p>
 </li>
 </ol></div>
 <div class="paragraph"><p>If CL_FP_INF_NAN is not set in CL_DEVICE_SINGLE_FP_CONFIG, and one of
 the operands or the result of addition, subtraction, multiplication or
 division would signal the overflow or invalid exception (see IEEE 754
 specification), the value of the result is implementation-defined.
 Likewise, single precision comparison operators (&lt;, &gt;, &#8656;, &gt;=, ==, !=)
 return implementation-defined values when one or more operands is a
 NaN.</p></div>
 <div class="paragraph"><p>In all cases, conversions (see the SPIR-V OpenCL environment
 specification) shall be correctly rounded as described for the
 FULL_PROFILE, including those that consume or produce an INF or NaN.
 The built-in math functions shall behave as described for the
 FULL_PROFILE, including edge case behavior but with slightly different
 accuracy rules. Edge case behavior and accuracy rules are described in
 the SPIR-V OpenCL environment specification. <br></p></div>
 <div class="dlist"><dl>
 <dt class="hdlist1">
 NOTE
 </dt>
 <dd>
 <p>
 If addition, subtraction and multiplication have default round
 to zero rounding mode, then <strong>fract</strong>, <strong>fma</strong> and <strong>fdim</strong> shall produce the
 correctly rounded result for round to zero rounding mode.
 </p>
 </dd>
 </dl></div>
 <div class="paragraph"><p>This relaxation of the requirement to adhere to IEEE 754 requirements
 for basic floating-point operations, though extremely undesirable, is to
 provide flexibility for embedded devices that have lot stricter
 requirements on hardware area budgets.</p></div>
 <div class="olist arabic"><ol class="arabic">
 <li>
 <p>
 Denormalized numbers for the half data type which may be generated
 when converting a float to a half using variants of the <strong>vstore_half</strong>
 function or when converting from a half to a float using variants of the
 <strong>vload_half</strong> function can be flushed to zero. The SPIR-V environment
 specification for details.
 </p>
 </li>
 <li>
 <p>
 The precision of conversions from CL_UNORM_INT8, CL_SNORM_INT8,
 CL_UNORM_INT16, CL_SNORM_INT16, CL_UNORM_INT_101010 and
 CL_UNORM_INT_101010_2 to float is &#8656; 2 ulp for the embedded profile
 instead of &#8656; 1.5 ulp as defined in the full profile. The exception
 cases described in the full profile and given below apply to the
 embedded profile.
 </p>
 </li>
 </ol></div>
 <div class="paragraph"><p>For CL_UNORM_INT8</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 0 must convert to 0.0f and
 </p>
 </li>
 <li>
 <p>
 255 must convert to 1.0f
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>For CL_UNORM_INT16</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 0 must convert to 0.0f and
 </p>
 </li>
 <li>
 <p>
 65535 must convert to 1.0f
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>For CL_SNORM_INT8</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 -128 and -127 must convert to -1.0f,
 </p>
 </li>
 <li>
 <p>
 0 must convert to 0.0f and
 </p>
 </li>
 <li>
 <p>
 127 must convert to 1.0f
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>For CL_SNORM_INT16</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 -32768 and -32767 must convert to -1.0f,
 </p>
 </li>
 <li>
 <p>
 0 must convert to 0.0f and
 </p>
 </li>
 <li>
 <p>
 32767 must convert to 1.0f
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>For CL_UNORM_INT_101010</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 0 must convert to 0.0f and
 </p>
 </li>
 <li>
 <p>
 1023 must convert to 1.0f
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>For CL_UNORM_INT_101010_2</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 0  must convert to 0.0f and
 </p>
 </li>
 <li>
 <p>
 1023 must convert to 1.0f (for RGB)
 </p>
 </li>
 <li>
 <p>
 3 must convert to 1.0f (for A)
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following optional extensions defined in the OpenCL 2.2 Extension
 Specification are available to the embedded profile:</p></div>
 <div class="paragraph"><p><strong>cl_khr_int64_base_atomics</strong></p></div>
 <div class="paragraph"><p><strong>cl_khr_int64_extended_atomics</strong></p></div>
 <div class="paragraph"><p><strong>cl_khr_fp16</strong></p></div>
 <div class="paragraph"><p><strong>cles_khr_int64</strong>.</p></div>
 <div class="paragraph"><p>If double precision is supported i.e. CL_DEVICE_DOUBLE_FP_CONFIG is not
 zero, then cles_khr_int64 must also be supported.</p></div>
 <div class="paragraph"><p>CL_PLATFORM_PROFILE defined in <em>table 4.1</em> will return the string
 EMBEDDED_PROFILE if the OpenCL implementation supports the embedded
 profile only.</p></div>
 <div class="paragraph"><p>The minimum maximum values specified in <em>table 4.3</em> that have been
 modified for the OpenCL embedded profile are listed below:</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:34%;">
 <col style="width:33%;">
 <col style="width:33%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>cl_device_info</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Return Type</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Description</strong></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_MAX_READ_ IMAGE_ARGS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of image objects
 arguments of a kernel declared with the read_only qualifier. The
 minimum value is 8 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_MAX_WRITE_ IMAGE_ARGS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of image objects
 arguments of a kernel declared with the write_only qualifier. The
 minimum value is 8 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_MAX_READ_ WRITE_IMAGE_ARGS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of image
 objects arguments of a kernel declared with the write_only or read_write
 qualifier. The minimum value is 8 if CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_IMAGE2D_ MAX_WIDTH</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max width of 2D image in
 pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_IMAGE2D_ MAX_HEIGHT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max height of 2D image in
 pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_IMAGE3D_ MAX_WIDTH</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max width of 3D image in
 pixels. The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_IMAGE3D_ MAX_HEIGHT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max height of 3D image in
 pixels. The minimum value is 2048.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_IMAGE3D_ MAX_DEPTH</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max depth of 3D image in pixels.
 The minimum value is 2048.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_IMAGE_ MAX_BUFFER_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of pixels for a 1D image
 created from a buffer object.
 <br>
 <br>
 The minimum value is 2048 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_IMAGE_ MAX_ARRAY_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of images in a 1D or 2D
 image array.
 <br>
 <br>
 The minimum value is 256 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_MAX_SAMPLERS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum number of samplers that
 can be used in a kernel.
 <br>
 <br>
 The minimum value is 8 if
 CL_DEVICE_IMAGE_SUPPORT is
 CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_MAX_ PARAMETER_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max size in bytes of all
 arguments that can be passed to a kernel. The minimum value is 256
 bytes for devices that are not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_SINGLE_ FP_CONFIG</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_device_<br>
  fp_config</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Describes single precision floatingpoint capability of the device. This is
 a bit-field that describes one or more
 of the following values:
 <br>
 <br>
 CL_FP_DENORM – denorms are supported
 <br>
 <br>
 CL_FP_INF_NAN – INF and quiet NaNs are
 supported.
 <br>
 <br>
 CL_FP_ROUND_TO_NEAREST– round to
 nearest even rounding mode supported
 <br>
 <br>
 CL_FP_ROUND_TO_ZERO – round to zero
 rounding mode supported
 <br>
 <br>
 CL_FP_ROUND_TO_INF – round to positive
 and negative infinity rounding modes
 supported
 <br>
 <br>
 CL_FP_FMA – IEEE754-2008 fused
 multiply-add is supported.
 <br>
 <br>
 CL_FP_CORRECTLY_ ROUNDED_DIVIDE
 _SQRT – divide and sqrt are correctly
 rounded as defined by the IEEE754
 specification.
 <br>
 <br>
 CL_FP_SOFT_FLOAT –  Basic floatingpoint operations (such as addition, subtraction,
 multiplication) are implemented in software.
 <br>
 <br>
 The mandated minimum floating-point
 capability is:
 CL_FP_ROUND_TO_ZERO or
 CL_FP_ROUND_TO_NEAREST
 for devices that are not of type
 CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_MAX_CONSTANT_ BUFFER_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max size in bytes
 of a constant buffer allocation. The minimum value is 1 KB for devices
 that are not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_MAX_CONSTANT_ ARGS</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Max number of arguments
 declared with the __constant qualifier in a kernel. The minimum value
 is 4 for devices that are not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_LOCAL_MEM_ SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_ulong</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Size of local memory arena in
 bytes. The minimum value is 1 KB for devices that are not of type
 CL_DEVICE_TYPE_CUSTOM.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_COMPILER_ AVAILABLE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_FALSE if the implementation
 does not have a compiler available to
 compile the program source.
 <br>
 <br>
 Is CL_TRUE if the compiler is
 available.
 This can be CL_FALSE for the
 embedded platform profile only.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_LINKER_ AVAILABLE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_bool</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Is CL_FALSE if the implementation
 does not have a linker available.
 Is CL_TRUE if the linker is available.
 <br>
 <br>
 This can be CL_FALSE for the
 embedded platform profile only.
 <br>
 <br>
 This must be CL_TRUE if
 CL_DEVICE_COMPILER_ AVAILABLE is CL_TRUE.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_QUEUE_ON_ DEVICE_MAX_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">cl_uint</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">The max. size of the
 device queue in bytes. The minimum value is 64 KB for the embedded
 profile</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DEVICE_PRINTF_ BUFFER_SIZE</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">size_t</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum size in bytes of the
 internal buffer that holds the output of printf calls from a kernel.
 The minimum value for the EMBEDDED profile is 1 KB.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>If CL_DEVICE_IMAGE_SUPPORT specified in <em>table 4.3</em> is CL_TRUE, the
 values assigned to CL_DEVICE_MAX_READ_IMAGE_ARGS,
 CL_DEVICE_MAX_WRITE_IMAGE_ARGS, CL_DEVICE_IMAGE2D_MAX_WIDTH,
 CL_DEVICE_IMAGE2D_MAX_HEIGHT, CL_DEVICE_IMAGE3D_MAX_WIDTH,
 CL_DEVICE_IMAGE3D_MAX_HEIGHT, CL_DEVICE_IMAGE3D_MAX_DEPTH and
 CL_DEVICE_MAX_SAMPLERS by the implementation must be greater than or
 equal to the minimum values specified in the embedded profile version of
 <em>table 4.3</em> given above.</p></div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_appendix_a">8. Appendix A</h2>
 <div class="sectionbody">
 <div class="sect2">
 <h3 id="_a_1_shared_opencl_objects">8.1. A.1 Shared OpenCL Objects</h3>
 <div class="paragraph"><p>This section describes which objects can be shared across multiple
 command-queues created within a host process.</p></div>
 <div class="paragraph"><p>OpenCL memory objects, program objects and kernel objects are created
 using a context and can be shared across multiple command-queues created
 using the same context. Event objects can be created when a command is
 queued to a command-queue. These event objects can be shared across
 multiple command-queues created using the same context.</p></div>
 <div class="paragraph"><p>The application needs to implement appropriate synchronization across
 threads on the host processor to ensure that the changes to the state of
 a shared object (such as a command-queue object, memory object, program
 or kernel object) happen in the correct order (deemed correct by the
 application) when multiple command-queues in multiple threads are making
 changes to the state of a shared object.</p></div>
 <div class="paragraph"><p>A command-queue can cache changes to the state of a memory object on the
 device associated with the command-queue. To synchronize changes to a
 memory object across command-queues, the application must do the
 following:</p></div>
 <div class="paragraph"><p>In the command-queue that includes commands that modify the state of a
 memory object, the application must do the following:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Get appropriate event objects for commands that modify the state of the shared memory object.
 </p>
 </li>
 <li>
 <p>
 Call the <strong>clFlush</strong> (or <strong>clFinish</strong>) API to issue any outstanding commands from this command-queue.

 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>In the command-queue that wants to synchronize to the latest state of a
 memory object, commands queued by the application must use the
 appropriate event objects that represent commands that modify the state
 of the shared memory object as event objects to wait on. This is to
 ensure that commands that use this shared memory object complete in the
 previous command-queue before the memory objects are used by commands
 executing in this command-queue.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The results of modifying a shared resource in one command-queue while it
 is being used by another command-queue are undefined.
  </p></div>
 </div>
 <div class="sect2">
 <h3 id="_a_2_multiple_host_threads">8.2. A.2 Multiple Host Threads</h3>
 <div class="paragraph"><p>All OpenCL API calls are thread-safe<span class="footnote"><br>[Please refer to the OpenCL glossary
 for the OpenCL definition of thread -safe. This definition may be different
 from usage of the term in other contexts.]<br></span>: except those that
 modify the state of cl_kernel objects: <strong>clSetKernelArg,
 clSetKernelArgSVMPointer, clSetKernelExecInfo*and *clCloneKernel</strong>.
 <strong>clSetKernelArg</strong> , <strong>clSetKernelArgSVMPointer</strong>, <strong>clSetKernelExecInfo</strong>
 and <strong>clCloneKernel*are safe to call from any host thread, and safe to
 call re-entrantly so long as concurrent calls to any combination of
 these API calls operate on different cl_kernel objects. The state of
 the cl_kernel object is undefined if *clSetKernelArg</strong>,
 <strong>clSetKernelArgSVMPointer</strong>, <strong>clSetKernelExecInfo</strong> or *clCloneKernel*are
 called from multiple host threads on the same cl_kernel object at the
 same time<span class="footnote"><br>[There is an inherent race condition in the design of OpenCL that occurs between setting a kernel argument and
 using the kernel with clEnqueueNDRangeKernel. Another host thread might change the kernel arguments between
 when a host thread sets the kernel arguments and then enqueues the kernel, causing the wrong kernel arguments to
 be enqueued. Rather than attempt to share cl_kernel objects among multiple host threads, applications are strongly
 encouraged to make additional cl_kernel objects for kernel functions for each host thread.]<br></span>:. Please note that there are additional
 limitations as to which OpenCL APIs may be called from OpenCL callback
 functions&#8201;&#8212;&#8201;please see <em>section 5.11</em>.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The behavior of OpenCL APIs called from an interrupt or signal handler
 is implementation-defined</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The OpenCL implementation should be able to create multiple
 command-queues for a given OpenCL context and multiple OpenCL contexts
 in an application running on the host processor.</p></div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_appendix_b_portability">9. Appendix B Portability</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>OpenCL is designed to be portable to other architectures and hardware
 designs. OpenCL has used at its core a C99 based programming language
 and follows rules based on that heritage. Floating-point arithmetic is
 based on the <strong>IEEE-754</strong> and <strong>IEEE-754-2008</strong> standards. The memory
 objects, pointer qualifiers and weakly ordered memory are designed to
 provide maximum compatibility with discrete memory architectures
 implemented by OpenCL devices.   Command-queues and barriers allow for
 synchronization between the host and OpenCL devices. The design,
 capabilities and limitations of OpenCL are very much a reflection of the
 capabilities of underlying hardware. </p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Unfortunately, there are a number of areas where idiosyncrasies of one
 hardware platform may allow it to do some things that do not work on
 another. By virtue of the rich operating system resident on the CPU, on
 some implementations the kernels executing on a CPU may be able to call
 out to system services whereas the same calls on the GPU will likely
 fail for now. Since there is some advantage to having these services
 available for debugging purposes, implementations can use the OpenCL
 extension mechanism to implement these services.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Likewise, the heterogeneity of computing architectures might mean that a
 particular loop construct might execute at an acceptable speed on the
 CPU but very poorly on a GPU, for example. CPUs are designed in general
 to work well on latency sensitive algorithms on single threaded tasks,
 whereas common GPUs may encounter extremely long latencies, potentially
 orders of magnitude worse. A developer interested in writing portable
 code may find that it is necessary to test his design on a diversity of
 hardware designs to make sure that key algorithms are structured in a
 way that works well on a diversity of hardware.  We suggest favoring
 more work-items over fewer. It is anticipated that over the coming
 months and years experience will produce a set of best practices that
 will help foster a uniformly favorable experience on a diversity of
 computing devices.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Of somewhat more concern is the topic of endianness. Since a majority
 of devices supported by the initial implementation of OpenCL are
 little-endian, developers need to make sure that their kernels are
 tested on both big-endian and little-endian devices to ensure source
 compatibility with OpenCL devices now and in the future. The endian
 attribute qualifier is supported by the SPIR-V IL to allow developers to
 specify whether the data uses the endianness of the host or the OpenCL
 device. This allows the OpenCL compiler to do appropriate
 endian-conversion on load and store operations from or to this data.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>We also describe how endianness can leak into an implementation causing
 kernels to produce unintended results:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>When a big-endian vector machine (e.g. AltiVec, CELL SPE) loads a
 vector, the order of the data is retained. That is both the order of
 the bytes within each element and the order of the elements in the
 vector are the same as in memory.  When a little-endian vector machine
 (e.g. SSE) loads a vector, the order of the data in register (where all
 the work is done) is reversed. *Both* the order of the bytes within
 each element and the order of the elements with respect to one another
 in the vector are reversed. </p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Memory:</p></div>
 <div class="paragraph"><p>uint4 a =</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x00010203</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x04050607</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x08090A0B</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0C0D0E0F</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>In register (big-endian):</p></div>
 <div class="paragraph"><p>uint4 a =</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x00010203</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x04050607</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x08090A0B</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0C0D0E0F</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In register (little-endian):</p></div>
 <div class="paragraph"><p>uint4 a =</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0F0E0D0C</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0B0A0908</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x07060504</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x03020100</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>This allows little-endian machines to use a single vector load to load
 little-endian data, regardless of how large each piece of data is in the
 vector. That is the transformation is equally valid whether that vector
 was a uchar16 or a ulong2.  Of course, as is well known, little-endian
 machines actually<span class="footnote"><br>[Note that we are talking about the programming model here. In reality, little endian systems might choose to
 simply address their bytes from "the right" or reverse the "order" of the bits in the byte. Either of these choices
 would mean that no big swap would need to occur in hardware.]<br></span>: store their data in reverse byte
 order to compensate for the little-endian storage format of the array
 elements:</p></div>
 <div class="paragraph"><p>Memory (big-endian):</p></div>
 <div class="paragraph"><p>uint4 a =</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x00010203</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x04050607</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x08090A0B</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0C0D0E0F</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p>Memory (little-endian):</p></div>
 <div class="paragraph"><p>uint4 a =</p></div>
 <div class="paragraph"><p> </p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x03020100</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x07060504</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0B0A0908</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0F0E0D0C</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Once that data is loaded into a vector, we end up with this:</p></div>
 <div class="paragraph"><p>In register (big-endian):</p></div>
 <div class="paragraph"><p>uint4 a =  </p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x00010203</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x04050607</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x08090A0B</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0C0D0E0F</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In register (little-endian):</p></div>
 <div class="paragraph"><p>uint4 a =</p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <col style="width:25%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x0C0D0E0F</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x08090A0B</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x04050607</p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">0x00010203</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>That is, in the process of correcting the endianness of the bytes within
 each element, the machine ends up reversing the order that the elements
 appear in the vector with respect to each other within the vector.
 0x00010203 appears at the left of the big-endian vector and at the
 right of the little-endian vector.</p></div>
 <div class="paragraph"><p>When the host and device have different endianness, the developer must
 ensure that kernel argument values are processed correctly. The
 implementation may or may not automatically convert endianness of kernel
 arguments. Developers should consult vendor documentation for guidance
 on how to handle kernel arguments in these situations.</p></div>
 <div class="paragraph"><p>OpenCL provides a consistent programming model across architectures by
 numbering elements according to their order in memory. Concepts such as
 even/odd and high/low follow accordingly.  Once the data is loaded into
 registers, we find that element 0 is at the left of the big-endian
 vector and element 0 is at the right of the little-endian vector:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>float x[4];

 float4 v = vload4( 0, x );</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Big-endian:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>v contains { x[0], x[1], x[2], x[3] }</pre>
 </div></div>
 <div class="paragraph"><p>Little-endian:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>v contains { x[3], x[2], x[1], x[0] }</pre>
 </div></div>
 <div class="paragraph"><p>The compiler is aware that this swap occurs and references elements
 accordingly. So long as we refer to them by a numeric index such as
 .s0123456789abcdef or by descriptors such as .xyzw, .hi, .lo, .even and
 .odd, everything works transparently. Any ordering reversal is undone
 when the data is stored back to memory. The developer should be able to
 work with a big endian programming model and ignore the element ordering
 problem in the vector &#8230; for most problems. This mechanism relies on
 the fact that we can rely on a consistent element numbering. Once we
 change numbering system, for example by conversion-free casting (using
 as_type_n_) a vector to another vector of the same size but a different
 number of elements, then we get different results on different
 implementations depending on whether the system is big- endian, or
 little-endian or indeed has no vector unit at all. (Thus, the behavior
 of bitcasts to vectors of different numbers of elements is
 implementation-defined, see section 1.2.4 of OpenCL 2.0C specification)</p></div>
 <div class="paragraph"><p>An example follows:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>float x[4] = { 0.0f, 1.0f, 2.0f, 3.0f };

 float4 v = vload4( 0, x );

 uint4 y = as_uint4(v);  legal, portable

 ushort8 z = as_ushort8(v);  legal, not portable

  element size changed</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Big-endian:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>v contains { 0.0f, 1.0f, 2.0f, 3.0f }</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>y contains { 0x00000000, 0x3f800000, 0x40000000, 0x40400000 }</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>z contains { 0x0000, 0x0000, *0x3f80*, 0x0000, 0x4000, 0x0000, 0x4040, 0x0000 }</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>z.z is 0x3f80</pre>
 </div></div>
 <div class="paragraph"><p>Little-endian:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>v contains { 3.0f, 2.0f, 1.0f, 0.0f }</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>y contains { 0x40400000, 0x40000000, 0x3f800000, 0x00000000 }</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>z contains { 0x4040, 0x0000, 0x4000, 0x0000, 0x3f80, *0x0000*, 0x0000, 0x0000 }</pre>
 </div></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>z.z is 0</pre>
 </div></div>
 <div class="paragraph"><p>Here, the value in z.z is not the same between big- and little-endian
 vector machines </p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>OpenCL could have made it illegal to do a conversion free cast that
 changes the number of elements in the name of portability. However,
 while OpenCL provides a common set of operators drawing from the set
 that are typically found on vector machines, it can not provide access
 to everything every ISA may offer in a consistent uniform portable
 manner. Many vector ISAs provide special purpose instructions that
 greatly accelerate specific operations such as DCT, SAD, or 3D
 geometry. It is not intended for OpenCL to be so heavy handed that
 time-critical performance sensitive algorithms can not be written by
 knowledgeable developers to perform at near peak performance. Developers
 willing to throw away portability should be able to use the
 platform-specific instructions in their code. For this reason, OpenCL is
 designed to allow traditional vector C language programming extensions,
 such as the AltiVec C Programming Interface or the Intel C programming
 interfaces (such as those found in emmintrin.h) to be used directly in
 OpenCL with OpenCL data types as an extension to OpenCL.  As these
 interfaces rely on the ability to do conversion-free casts that change
 the number of elements in the vector to function properly, OpenCL allows
 them too. </p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>As a general rule, any operation that operates on vector types in
 segments that are not the same size as the vector element size may break
 on other hardware with different endianness or different vector
 architecture.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Examples might include:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Combining two uchar8&#8217;s containing high and low bytes of a ushort, to make a ushort8 using
 .even and .odd operators  (please use <strong>upsample()</strong> for this)
 </p>
 </li>
 <li>
 <p>
 Any bitcast that changes
 the number of elements in the vector. (Operations on the new type are
 non-portable.)
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>* Swizzle operations that
 change the order of data using chunk sizes that are not the same as the
 element size</p></div>
 <div class="paragraph"><p>Examples of operations that are portable:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Combining two uint8&#8217;s to
 make a uchar16 using .even and .odd operators. For example to
 interleave left and right audio streams.
 </p>
 </li>
 <li>
 <p>
 Any bitcast that does not
 change the number of elements (e.g.  (float4) unit4&#8201;&#8212;&#8201;we define the
 storage format for floating-point types)
 </p>
 </li>
 <li>
 <p>
 Swizzle operations that
 swizzle elements of the same size as the elements of the vector.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>OpenCL has made some additions to C to make application behavior more
 dependable than C. Most notably in a few cases OpenCL defines the
 behavior of some operations that are undefined in C99:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 OpenCL provides convert_
 operators for conversion between all types. C99 does not define what
 happens when a floating-point type is converted to integer type and the
 floating-point value lies outside the representable range of the integer
 type after rounding. When the <em>sat variant of the conversion is used,
 the float shall be converted to the nearest representable integer value.
  Similarly, OpenCL also makes recommendations about what should happen
 with NaN. Hardware manufacturers that provide the saturated conversion
 in hardware may use the saturated conversion hardware for both the
 saturated and non-saturated versions of the OpenCL convert</em> operator.
 OpenCL does not define what happens for the non-saturated conversions
 when floating-point operands are outside the range representable
 integers after rounding.
 </p>
 </li>
 <li>
 <p>
 The format of half, float,
 and double types is defined to be the binary16, binary32 and binary64
 formats in the draft IEEE-754 standard. (The latter two are identical to
 the existing IEEE-754 standard.) You may depend on the positioning and
 meaning of the bits in these types.
 </p>
 </li>
 <li>
 <p>
 OpenCL defines behavior
 for oversized shift values. Shift operations that shift greater than or
 equal to the number of bits in the first operand reduce the shift value
 modulo the number of bits in the element. For example,  if we shift an
 int4 left by 33 bits, OpenCL treats this as shift left by 33%32 = 1 bit.
 </p>
 </li>
 <li>
 <p>
 A number of edge cases for
 math library functions are more rigorously defined than in C99. Please
 see _section 3.5_of the OpenCL 2.0 C specification.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p> </p></div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_appendix_c_application_data_types">10. Appendix C  Application Data Types</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>This section documents the provided host application types and constant
 definitions. The documented material describes the commonly defined
 data structures, types and constant values available to all platforms
 and architectures. The addition of these details demonstrates our
 commitment to maintaining a portable programming environment and
 potentially deters changes to the supplied headers.</p></div>
 <div class="sect2">
 <h3 id="_c_1_shared_application_scalar_data_types">10.1. C.1 Shared Application Scalar Data Types</h3>
 <div class="paragraph"><p>The following application scalar types are provided for application
 convenience.</p></div>
 <div class="paragraph"><p>cl_char</p></div>
 <div class="paragraph"><p>cl_uchar</p></div>
 <div class="paragraph"><p>cl_short</p></div>
 <div class="paragraph"><p>cl_ushort</p></div>
 <div class="paragraph"><p>cl_int</p></div>
 <div class="paragraph"><p>cl_uint</p></div>
 <div class="paragraph"><p>cl_long</p></div>
 <div class="paragraph"><p>cl_ulong</p></div>
 <div class="paragraph"><p>cl_half</p></div>
 <div class="paragraph"><p>cl_float</p></div>
 <div class="paragraph"><p>cl_double</p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 <div class="sect2">
 <h3 id="_c_2_supported_application_vector_data_types">10.2. C.2 Supported Application Vector Data Types</h3>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Application vector types are unions used to create vectors of the above
 application scalar types. The following application vector types are
 provided for application convenience.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>cl_char_n_</p></div>
 <div class="paragraph"><p>cl_uchar_n_</p></div>
 <div class="paragraph"><p>cl_short_n_</p></div>
 <div class="paragraph"><p>cl_ushort_n_</p></div>
 <div class="paragraph"><p>cl_int_n_</p></div>
 <div class="paragraph"><p>cl_uint_n_</p></div>
 <div class="paragraph"><p>cl_long_n_</p></div>
 <div class="paragraph"><p>cl_ulong_n_</p></div>
 <div class="paragraph"><p>cl_half_n_</p></div>
 <div class="paragraph"><p>cl_float_n_</p></div>
 <div class="paragraph"><p>cl_double_n_</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p><em>n</em> can be 2, 3, 4, 8 or 16.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The application scalar and vector data types are defined in the
 <strong>cl_platform.h</strong> header file.</p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 <div class="sect2">
 <h3 id="_c_3_alignment_of_application_data_types">10.3. C.3 Alignment of Application Data Types</h3>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The user is responsible for ensuring that pointers passed into and out
 of OpenCL kernels are natively aligned relative to the data type of the
 parameter as defined in the kernel language and SPIR-V specifications.
 This implies that OpenCL buffers created with CL_MEM_USE_HOST_PTR need
 to provide an appropriately aligned host memory pointer that is aligned
 to the data types used to access these buffers in a kernel(s), that SVM
 allocations must correctly align and that pointers into SVM allocations
 must also be correctly aligned. The user is also responsible for
 ensuring image data passed is aligned to the granularity of the data
 representing a single pixel (e.g. image_num_channels *
 sizeof(image_channel_data_type)) except for CL_RGB and CL_RGBx images
 where the data must be aligned to the granularity of a single channel in
 a pixel (i.e. sizeof(image_channel_data_type)). This implies that OpenCL
 images created with CL_MEM_USE_HOST_PTR must align correctly. The image
 alignment value can be queried using the
 CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT query. In addition,</p></div>
 <div class="paragraph"><p>source pointers for clEnqueueWriteImage and other operations that copy
 to the OpenCL runtime, as well as destination pointers for
 clEnqueueReadImage and other operations that copy from the OpenCL
 runtime must follow the same alignment rules.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>OpenCL makes no requirement about the alignment of OpenCL application
 defined data types outside of buffers and images, except that the
 underlying vector primitives (e.g. <em>_cl_float4) where defined shall be
 directly accessible as such using appropriate named fields in the
 cl_type union (see _section C.5</em>). Nevertheless, it is recommended that
 the <strong>cl_platform.h</strong> header should attempt to naturally align OpenCL
 defined application data types (e.g. cl_float4) according to their type.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_c_4_vector_literals">10.4. C.4 Vector Literals</h3>
 <div class="paragraph"><p>Application vector literals may be used in assignments of individual
 vector components. Literal usage follows the convention of the
 underlying application compiler.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>cl_float2 foo = { .s[1] = 2.0f };

 cl_int8 bar = {{ 2, 4, 6, 8, 10, 12, 14, 16 }};</pre>
 </div></div>
 </div>
 <div class="sect2">
 <h3 id="_c_5_vector_components">10.5. C.5 Vector Components</h3>
 <div class="paragraph"><p>The components of application vector types can be addressed using the
 &lt;vector_name&gt;.s[&lt;index&gt;] notation.</p></div>
 <div class="paragraph"><p>For example:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>foo.s[0] = 1.0f; // Sets the 1st vector component of foo

 pos.s[6] = 2; // Sets the 7th vector component of bar</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In some cases vector components may also be accessed using the following
 notations. These notations are not guaranteed to be supported on all
 implementations, so their use should be accompanied by a check of the
 corresponding preprocessor symbol.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="sect3">
 <h4 id="_c_5_1_named_vector_components_notation">10.5.1. C.5.1 Named vector components notation</h4>
 <div class="paragraph"><p>Vector data type components may be accessed using the .sN, .sn or .xyzw
 field naming convention, similar to how they are used within the OpenCL
 language. Use of the .xyzw field naming convention only allows
 accessing of the first 4 component fields. Support of these notations
 is identified by the CL_HAS_NAMED_VECTOR_FIELDS preprocessor symbol.
 For example:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>#ifdef CL_HAS_NAMED_VECTOR_FIELDS

 cl_float4 foo;

 cl_int16 bar;

 foo.x = 1.0f; // Set first component

 foo.s0 = 1.0f; // Same as above

 bar.z = 3;  // Set third component

 bar.se = 11;  // Same as bar.s[0xe]

 bar.sD = 12;  // Same as bar.s[0xd]

 #endif</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Unlike the OpenCL language type usage of named vector fields, only one
 component field may be accessed at a time. This restriction prevents
 the ability to swizzle or replicate components as is possible with the
 OpenCL language types. Attempting to access beyond the number of
 components for a type also results in a failure.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>foo.xy  // illegal - illegal field name combination

 bar.s1234 // illegal - illegal field name combination

 foo.s7  // illegal - no component s7</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 </div>
 <div class="sect3">
 <h4 id="_c_5_2_high_low_vector_component_notation">10.5.2. C.5.2 High/Low vector component notation</h4>
 <div class="paragraph"><p>Vector data type components may be accessed using the .hi and .lo
 notation similar to that supported within the language types. Support
 of this notation is identified by the CL_HAS_HI_LO_VECTOR_FIELDS
 preprocessor symbol. For example:</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>#ifdef CL_HAS_HI_LO_VECTOR_FIELDS

 cl_float4 foo;

 cl_float2 new_hi = 2.0f, new_lo = 4.0f;

 foo.hi = new_hi;

 foo.lo = new_lo;

 #endif</pre>
 </div></div>
 </div>
 <div class="sect3">
 <h4 id="_c_5_3_native_vector_type_notation">10.5.3. C.5.3 Native vector type notation</h4>
 <div class="paragraph"><p>Certain native vector types are defined for providing a mapping of
 vector types to architecturally builtin vector types. Unlike the above
 described application vector types, these native types are supported on
 a limited basis depending on the supporting architecture and compiler.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>These types are not unions, but rather convenience mappings to the
 underlying architectures' builtin vector types. The native types share
 the name of their application counterparts but are preceded by a double
 underscore "__".</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>For example, <em>cl_float4 is the native builtin vector type equivalent of
 the cl_float4 application vector type. The </em>cl_float4 type may provide
 direct access to the architectural builtin __m128 or vector float type,
 whereas the cl_float4 is treated as a union.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In addition, the above described application data types may have native
 vector data type members for access convenience. The native components
 are accessed using the .vN sub-vector notation, where N is the number of
 elements in the sub-vector. In cases where the native type is a subset
 of a larger type (more components), the notation becomes an index based
 array of the sub-vector type.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Support of the native vector types is identified by a <em>CL_TYPEN</em>
 preprocessor symbol matching the native type name. For example:</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>#ifdef __CL_FLOAT4__ // Check for native cl_float4 type

 cl_float8 foo;

 __cl_float4 bar; // Use of native type

 bar = foo.v4[1]; // Access the second native float4 vector

 #endif</pre>
 </div></div>
 <div class="paragraph"><p> </p></div>
 </div>
 </div>
 <div class="sect2">
 <h3 id="_c_6_implicit_conversions">10.6. C.6 Implicit Conversions</h3>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Implicit conversions between application vector types are not supported.</p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 <div class="sect2">
 <h3 id="_c_7_explicit_casts">10.7. C.7 Explicit Casts</h3>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>Explicit casting of application vector types (cl_typen) is not
 supported. Explicit casting of native vector types (__cl_typen) is
 defined by the external compiler.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 <div class="sect2">
 <h3 id="_c_8_other_operators_and_functions">10.8. C.8 Other operators and functions</h3>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The behavior of standard operators and function on both application
 vector types (cl_typen) and native vector types (__cl_typen) is defined
 by the external compiler.</p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 <div class="sect2">
 <h3 id="_c_9_application_constant_definitions">10.9. C.9 Application constant definitions</h3>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>In addition to the above application type definitions, the following
 literal defintions are also available.</p></div>
 <div class="paragraph"><p> </p></div>
 <table class="tableblock frame-all grid-all"
 style="
 width:100%;
 ">
 <col style="width:50%;">
 <col style="width:50%;">
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CHAR_BIT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Bit width of a character</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SCHAR_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_char</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SCHAR_MIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum value of a type cl_char</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CHAR_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_char</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_CHAR_MIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum value of a type cl_char</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UCHAR_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_uchar</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SHORT_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_short</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_SHORT_MIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum value of a type cl_short</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_USHORT_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_ushort</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_INT_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_int</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_INT_MIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum value of a type cl_int</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_UINT_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_uint</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_LONG_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_long</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_LONG_MIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum value of a type cl_long</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_ULONG_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of a type cl_ulong</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"> </p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_DIAG</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Number of decimal digits of precision for the type
 cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_MANT_DIG</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Number of digits in the mantissa of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_MAX_10_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum positive integer such that 10 raised to
 this power minus one can be represented as a normalized floating-point
 number of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_MAX_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum exponent value of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_MIN_10_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum negative integer such that 10 raised to
 this power minus one can be represented as a normalized floating-point
 number of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_MIN_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum exponent value of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_RADIX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Base value of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_MIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum value of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_FLT_EPSILON</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum positive floating-point number of type
 cl_float such that 1.0 + CL_FLT_EPSILON != 1 is true.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_DIG</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Number of decimal digits of precision for the type
 cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_MANT_DIG</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Number of digits in the mantissa of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_MAX_10_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum positive integer such that 10 raised to
 this power minus one can be represented as a normalized floating-point
 number of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_MAX_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum exponent value of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_MIN_10_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum negative integer such that 10 raised to
 this power minus one can be represented as a normalized floating-point
 number of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_MIN_EXP</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum exponent value of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_RADIX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Base value of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_MAX</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_MIN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum value of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_DBL_EPSILON</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Minimum positive floating-point number of type
 cl_double such that 1.0 + CL_DBL_EPSILON != 1 is true.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_NAN</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Macro expanding to a value representing NaN</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_HUGE_VALF</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Largest representative value of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_HUGE_VAL</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Largest representative value of type cl_double</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_MAXFLOAT</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Maximum value of type cl_float</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>CL_INFINITY</strong></p></td>
 <td class="tableblock halign-left valign-top" ><p class="tableblock">Macro expanding to a value represnting infinity</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>These literal definitions are defined in the <strong>cl_platform.h</strong> header.</p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_appendix_d_cl_mem_copy_overlap">11. Appendix D CL_MEM_COPY_OVERLAP</h2>
 <div class="sectionbody">
 <div class="paragraph"><p>The following code describes how to determine if there is overlap
 between the source and destination rectangles specified to
 <strong>clEnqueueCopyBufferRect</strong> provided the source and destination buffers
 refer to the same buffer object.</p></div>
 <div class="listingblock">
 <div class="content monospaced">
 <pre>unsigned int

 check_copy_overlap(const size_t src_origin[],
                    const size_t dst_origin[],
                    const size_t region[],
                    const size_t row_pitch,
                    const size_t slice_pitch )
 {

   const size_t slice_size = (region[1] - 1) * row_pitch + region[0];
   const size_t block_size = (region[2] - 1) * slice_pitch + slice_size;
   const size_t src_start = src_origin[2] * slice_pitch +
   src_origin[1] * row_pitch +
   src_origin[0];
   const size_t src_end = src_start + block_size;
   const size_t dst_start = dst_origin[2] * slice_pitch
                            + dst_origin[1] * row_pitch
                            +  dst_origin[0];

   F  const size_t dst_end = dst_start + block_size;

   /* No overlap if dst ends before src starts or if src ends

    * before dst starts.

    */

   if( (dst_end &lt;= src_start) || (src_end &lt;= dst_start) ){
     return 0;
   }

   /* No overlap if region[0] for dst or src fits in the gap

    * between region[0] and row_pitch.

    */

   {

     const size_t src_dx = src_origin[0] % row_pitch;
     const size_t dst_dx = dst_origin[0] % row_pitch;

     if( ((dst_dx &gt;= src_dx + region[0]) &amp;&amp;

         (dst_dx + region[0] &lt;= src_dx + row_pitch)) ||

         ((src_dx &gt;= dst_dx + region[0]) &amp;&amp;

        (src_dx + region[0] &lt;= dst_dx + row_pitch)) )

       {
         return 0;
       }

   }


   /* No overlap if region[1] for dst or src fits in the gap

    * between region[1] and slice_pitch.

    */

    {

       const size_t src_dy =

         (src_origin[1] * row_pitch + src_origin[0]) % slice_pitch;

       const size_t dst_dy =

         (dst_origin[1] * row_pitch + dst_origin[0]) % slice_pitch;


       if( ((dst_dy &gt;= src_dy + slice_size) &amp;&amp;

           (dst_dy + slice_size &lt;= src_dy + slice_pitch)) ||

           ((src_dy &gt;= dst_dy + slice_size) &amp;&amp;

           (src_dy + slice_size &lt;= dst_dy + slice_pitch)) ) {
         return 0;

       }

    }


  /* Otherwise src and dst overlap. */

  return 1;

 }</pre>
 </div></div>
 </div>
 </div>
 <div class="sect1">
 <h2 id="_appendix_e_changes">12. Appendix E  Changes</h2>
 <div class="sectionbody">
 <div class="sect2">
 <h3 id="_e_1_summary_of_changes_from_opencl_1_0">12.1. E.1 Summary of changes from OpenCL 1.0</h3>
 <div class="paragraph"><p>The following features are added to the OpenCL 1.1 platform layer and
 runtime (<em>sections 4 and 5</em>):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Following queries to <em>table 4.3</em>
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>o   CL_DEVICE_NATIVE_VECTOR_WIDTH_{CHAR | SHORT | INT | LONG | FLOAT |
 DOUBLE | HALF}</p></div>
 <div class="paragraph"><p>o   CL_DEVICE_HOST_UNIFIED_MEMORY</p></div>
 <div class="paragraph"><p>o   CL_DEVICE_OPENCL_C_VERSION</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 CL_CONTEXT_NUM_DEVICES to
 the list of queries specified to <strong>clGetContextInfo</strong>.
 </p>
 </li>
 <li>
 <p>
 Optional image formats: CL_Rx, CL_RGx and CL_RGBx.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong> Support for sub-buffer objects  ability to create a buffer object that refers to a specific
 region in another buffer object using *clCreateSubBuffer</strong>.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>clEnqueueReadBufferRect</strong>, <strong>clEnqueueWriteBufferRect</strong> and <strong>clEnqueueCopyBufferRect</strong> APIs to read
 from, write to and copy a rectangular region of a buffer object
 respectively.
 </p>
 </li>
 <li>
 <p>
 <strong>clSetMemObjectDestructorCallback</strong> API to allow a user to register a
 callback function that will be called when the memory object is deleted
 and its resources freed.
 </p>
 </li>
 <li>
 <p>
 Options that control the
 OpenCL C version used when building a program executable. These are
 described in <em>section 5.8.4.5</em>.
 </p>
 </li>
 <li>
 <p>
 CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE to the list of queries
 specified to <strong>clGetKernelWorkGroupInfo</strong>.
 </p>
 </li>
 <li>
 <p>
 Support for user events.
 User events allow applications to enqueue commands that wait on a user
 event to finish before the command is executed by the device. Following
 new APIs are added*- clCreateUserEvent*and <strong>clSetUserEventStatus</strong>.
 </p>
 </li>
 <li>
 <p>
 <strong>clSetEventCallback</strong> API to register a callback function for a specific command execution status.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following modifications are made to the OpenCL 1.1 platform layer
 and runtime (<em>sections 4 and 5</em>):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Following queries in <em>table 4.3</em>
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>o   CL_DEVICE_MAX_PARAMETER_SIZE from 256 to 1024 bytes</p></div>
 <div class="paragraph"><p>o   CL_DEVICE_LOCAL_MEM_SIZE from 16 KB to 32 KB.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The <em>global_work_offset</em> argument in <strong>clEnqueueNDRangeKernel</strong> can be a non-NULL value.
 </p>
 </li>
 <li>
 <p>
 All API calls except <strong>clSetKernelArg</strong> are thread-safe.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following features are added to the OpenCL C programming language
 (<em>section 6</em>) in OpenCL 1.1:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 3-component vector data types.
 </p>
 </li>
 <li>
 <p>
 New built-in functions
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>o   <strong>get_global_offset</strong> work-item function defined in section <em>6.12.1</em>.</p></div>
 <div class="paragraph"><p>o   <strong>minmag</strong>, <strong>maxmag</strong> math functions defined in section <em>6.12.2</em>.</p></div>
 <div class="paragraph"><p>o   <strong>clamp</strong> integer function defined in <em>section 6.12.3</em>.</p></div>
 <div class="paragraph"><p>o   (vector, scalar) variant of integer functions <strong>min</strong> and <strong>max</strong> in
 <em>section 6.12.3</em>.</p></div>
 <div class="paragraph"><p>o   <strong>async_work_group_strided_copy</strong> defined in section <em>6.12.10</em>.</p></div>
 <div class="paragraph"><p>o   <strong>vec_step</strong>, <strong>shuffle</strong> and <strong>shuffle2</strong> defined in section <em>6.12.12</em>.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>cl_khr_byte_addressable_store</strong> extension is a core feature.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>     *cl_khr_global_int32_base_atomics</strong>,
 <strong>cl_khr_global_int32_extended_atomics</strong>,
 <strong>cl_khr_local_int32_base_atomics</strong> and
 <strong>cl_khr_local_int32_extended_atomics*extensions are core features. The
 built-in atomic function names are changed to use the *atomic_</strong> prefix
 instead of <strong>atom_</strong>.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Macros CL_VERSION_1_0 and CL_VERSION_1_1.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following features in OpenCL 1.0 are deprecated (see glossary) in
 OpenCL 1.1:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 The <strong>clSetCommandQueueProperty</strong> API is no longer supported in OpenCL 1.1.
 </p>
 </li>
 <li>
 <p>
 The <em>ROUNDING_MODE</em> macro is no longer supported in OpenCL C 1.1.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong>      The cl-strict-aliasing
 option that can be specified in <em>options</em> argument to *clBuildProgram</strong>
 is no longer supported in OpenCL 1.1.</p></div>
 <div class="paragraph"><p>The following new extensions are added to <em>section 9</em> in OpenCL 1.1:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>cl_khr_gl_event</strong> for creating a CL event object from a GL sync object.
 </p>
 </li>
 <li>
 <p>
 <strong>cl_khr_d3d10_sharing</strong> for sharing memory objects with Direct3D 10.

 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following modifications are made to the OpenCL ES Profile described
 in <em>section 10</em> in OpenCL 1.1:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 64-bit integer support is optional.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect2">
 <h3 id="_e_2_summary_of_changes_from_opencl_1_1">12.2. E.2 Summary of changes from OpenCL 1.1</h3>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p>The following features are added to the OpenCL 1.2 platform layer and
 runtime (<em>sections 4 and 5</em>):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Custom devices and built-in kernels are supported.
 </p>
 </li>
 <li>
 <p>
 Device partitioning that
 allows a device to be partitioned based on a number of partitioning
 schemes supported by the device.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>* Extend <em>cl_mem_flags</em> to
 describe how the host accesses the data in a cl_mem object.</p></div>
 <div class="paragraph"><p><strong> *clEnqueueFillBuffer</strong> and
 <strong>clEnqueueFillImage</strong> to support filling a buffer with a pattern or an
 image with a color.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Add
 CL_MAP_WRITE_INVALIDATE_REGION to <em>cl_map_flags</em>. Appropriate
 clarification to the behavior of CL_MAP_WRITE has been added to the
 spec.
 </p>
 </li>
 <li>
 <p>
 New image types: 1D image,
 1D image from a buffer object, 1D image array and 2D image arrays.
 </p>
 </li>
 <li>
 <p>
 <strong>clCreateImage</strong> to create
 an image object.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong> *clEnqueueMigrateMemObjects</strong> API that allows a developer to have
 explicit control over the location of memory objects or to migrate a
 memory object from one device to another.</p></div>
 <div class="paragraph"><p>* Support separate compilation and linking of programs.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Additional queries to get
 the number of kernels and kernel names in a program have been added to
 <strong>clGetProgramInfo</strong>.
 </p>
 </li>
 <li>
 <p>
 Additiional queries to get
 the compile and link status and options have been added to
 <strong>clGetProgramBuildInfo</strong>.
 </p>
 </li>
 <li>
 <p>
 <strong>clGetKernelArgInfo</strong> API
 that returns information about the arguments of a kernel.
 </p>
 </li>
 <li>
 <p>
 <strong>clEnqueueMarkerWithWaitList</strong> and <strong>clEnqueueBarrierWithWaitList</strong> APIs.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following features are added to the OpenCL C programming language
 (<em>section 6</em>) in OpenCL 1.2:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Double-precision is now an optional core feature instead of an extension.
 </p>
 </li>
 <li>
 <p>
 New built in image types: <strong>image1d_t</strong> , <strong>image1d_array_t</strong> and <strong>image2d_array_t</strong> .
 </p>
 </li>
 <li>
 <p>
 New built-in functions
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>o   Functions to read from and write to a 1D image, 1D and 2D image
 arrays described in <em>sections 6.12.14.2</em>, <em>6.12.14.3</em> and <em>6.12.14.4</em>.</p></div>
 <div class="paragraph"><p>o   Sampler-less image read functions described in <em>section 6.12.14.3</em>.</p></div>
 <div class="paragraph"><p>o   <strong>popcount</strong> integer function described in <em>section 6.12.3</em>.</p></div>
 <div class="paragraph"><p>o   <strong>printf</strong> function described in <em>section 6.12.13</em>.</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Storage class specifiers extern and static as described in <em>section 6.8</em>.
 </p>
 </li>
 <li>
 <p>
 Macros CL_VERSION_1_2 and <em>OPENCL_C_VERSION</em>.

 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following APIs in OpenCL 1.1 are deprecated (see glossary) in OpenCL
 1.2:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>clEnqueueMarker</strong>, <strong>clEnqueueBarrier</strong> and <strong>clEnqueueWaitForEvents</strong>
 </p>
 </li>
 <li>
 <p>
 <strong>clCreateImage2D</strong> and <strong>clCreateImage3D</strong>
 </p>
 </li>
 <li>
 <p>
 <strong>clUnloadCompiler*and*clGetExtensionFunctionAddress</strong>
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p><strong> *clCreateFromGLTexture2D</strong> and <strong>clCreateFromGLTexture3D</strong>
  </p></div>
 <div class="paragraph"><p>The following queries are deprecated (see glossary) in OpenCL 1.2:</p></div>
 <div class="paragraph"><p><strong> *CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE</strong> in <em>table 4.3</em> queried using <strong>clGetDeviceInfo</strong>.</p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 <div class="sect2">
 <h3 id="_e_3_summary_of_changes_from_opencl_1_2">12.3. E.3 Summary of changes from OpenCL 1.2</h3>
 <div class="paragraph"><p>The following features are added to the OpenCL 2.0 platform layer and
 runtime (<em>sections 4 and 5</em>):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Shared virtual memory.
 </p>
 </li>
 <li>
 <p>
 Device queues used to enqueue kernels on the device.
 </p>
 </li>
 <li>
 <p>
 Pipes.
 </p>
 </li>
 <li>
 <p>
 Images  support for 2D image from buffer, depth images and sRGB images.

 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following modifications are made to the OpenCL 2.0 platform layer
 and runtime (sections 4 and 5):
 <strong>      All API calls except
 *clSetKernelArg</strong>, <strong>clSetKernelArgSVMPointer</strong> and <strong>clSetKernelExecInfo</strong>
 are thread-safe.
  </p></div>
 <div class="paragraph"><p>The following features are added to the OpenCL C programming language
 (<em>section 6</em>) in OpenCL 2.0:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Clang Blocks.
 </p>
 </li>
 <li>
 <p>
 Kernels enqueing kernels to a device queue.
 </p>
 </li>
 <li>
 <p>
 Program scope variables in global address space.
 </p>
 </li>
 <li>
 <p>
 Generic address space.
 </p>
 </li>
 <li>
 <p>
 C1x atomics.
 </p>
 </li>
 <li>
 <p>
 New built-in functions   (sections 6.13.9, 6.13.11, 6.13.15 and 6.14).
 </p>
 </li>
 <li>
 <p>
 Support images with the read_write qualifier.
 </p>
 </li>
 <li>
 <p>
 3D image writes are a core feature.
 </p>
 </li>
 <li>
 <p>
 The CL_VERSION_2_0 macro.

 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following APIs are deprecated (see glossary) in OpenCL 2.0:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>clCreateCommandQueue</strong> , <strong>clCreateSampler</strong> and <strong>clEnqueueTask</strong>
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following queries are deprecated (see glossary) in OpenCL 2.0:</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>CL_DEVICE_HOST_UNIFIED_MEMORY</strong> in <em>table 4.3</em> queried using
 <strong>clGetDeviceInfo</strong>.
 </p>
 </li>
 <li>
 <p>
 <strong>CL_IMAGE_BUFFER</strong> in <em>table 5.10</em> is deprecated.
 </p>
 </li>
 <li>
 <p>
 <strong>CL_DEVICE_QUEUE_PROPERTIES*is replaced by *CL_DEVICE_QUEUE_ON_HOST_PROPERTIES</strong>.
 </p>
 </li>
 <li>
 <p>
 The explicit memory fence functions defined in section 6.12.9 of the OpenCL 1.2 specification.
 </p>
 </li>
 <li>
 <p>
 The OpenCL 1.2 atomic built-in functions for 32-bit integer and floating-point data types
 defined in section 6.12.11 of the OpenCL 1.2 specification.
 </p>
 </li>
 </ul></div>
 </div>
 <div class="sect2">
 <h3 id="_e_4_summary_of_changes_from_opencl_2_0">12.4. E.4 Summary of changes from OpenCL 2.0</h3>
 <div class="paragraph"><p>The following features are added to the OpenCL 2.1 platform layer and
 runtime (<em>sections 4 and 5</em>):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>clGetKernelSubGroupInfo</strong> API call.
 </p>
 </li>
 <li>
 <p>
 <strong>CL_KERNEL_MAX_NUM_SUB_GROUPS</strong>, <strong>CL_KERNEL_COMPILE_NUM_SUB_GROUPS</strong> additions to table 5.21 of the API specification.
 </p>
 </li>
 <li>
 <p>
 <strong>clCreateProgramWithIL</strong> API call.
 </p>
 </li>
 <li>
 <p>
 <strong>clGetHostTimer</strong> and <strong>clGetDeviceAndHostTimer</strong> API calls.
 </p>
 </li>
 <li>
 <p>
 <strong>clEnqueueSVMMigrateMem</strong> API call.
 </p>
 </li>
 <li>
 <p>
 <strong>clCloneKernel</strong> API call.
 </p>
 </li>
 <li>
 <p>
 <strong>clSetDefaultDeviceCommandQueue</strong> API call.
 </p>
 </li>
 <li>
 <p>
 <strong>CL_PLATFORM_HOST_TIMER_RESOLUTION</strong> added to table 4.1 of the API specification.
 </p>
 </li>
 <li>
 <p>
 <strong>CL_DEVICE_IL_VERSION</strong>, <strong>CL_DEVICE_MAX_NUM_SUB_GROUPS</strong>, <strong>CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS</strong> added
 to table 4.3 of the API specification.
 </p>
 </li>
 <li>
 <p>
 *CL_PROGRAM_IL*to table 5.17 of the API specification.
 </p>
 </li>
 <li>
 <p>
 <strong>CL_QUEUE_DEVICE_DEFAULT</strong> added to table 5.2 of the API specification.
 </p>
 </li>
 <li>
 <p>
 Added table 5.22 to the API specification with the enums: <strong>CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE</strong> ,
 <strong>CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE</strong> and <strong>CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT</strong>
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following modifications are made to the OpenCL 2.1 platform layer
 and runtime (sections 4 and 5):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 All API calls except <strong>clSetKernelArg</strong> , <strong>clSetKernelArgSVMPointer</strong> , <strong>clSetKernelExecInfo</strong> and
 *clCloneKernel*are thread-safe.
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The OpenCL C kernel language is no longer chapter 6. The OpenCL C kernel
 language is not updated for OpenCL 2.1. The OpenCL 2.0 kernel language
 will still be consumed by OpenCL 2.1 runtimes.</p></div>
 <div class="paragraph"><p>The SPIR-V IL specification has been added.</p></div>
 </div>
 <div class="sect2">
 <h3 id="_e_5_summary_of_changes_from_opencl_2_1">12.5. E.5 Summary of changes from OpenCL 2.1</h3>
 <div class="paragraph"><p>The following changes have been made to the OpenCL 2.2 execution model
 (section 3)</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Added the third prerequisite (executing non-trivial constructors for program scope global variables).
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following features are added to the OpenCL 2.2 platform layer and
 runtime (<em>sections 4 and 5</em>):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 <strong>clSetProgramSpecializationConstant</strong> API call
 </p>
 </li>
 <li>
 <p>
 <strong>clSetProgramReleaseCallback</strong> API call
 </p>
 </li>
 <li>
 <p>
 Queries for  CL_PROGRAM_SCOPE_GLOBAL_CTORS_PRESENT, CL_PROGRAM_SCOPE_GLOBAL_DTORS_PRESENT
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>The following modifications are made to the OpenCL 2.2 platform layer
 and runtime (section 4 and 5):</p></div>
 <div class="ulist"><ul>
 <li>
 <p>
 Modified description of CL_DEVICE_MAX_CLOCK_FREQUENCY query.
 </p>
 </li>
 <li>
 <p>
 Added a new error code CL_MAX_SIZE_RESTRICTION_EXCEEDED to <strong>clSetKernelArg</strong> API call
 </p>
 </li>
 </ul></div>
 <div class="paragraph"><p>Added definition of Deprecation and Specialization constants to the
 glossary.</p></div>
 <div class="paragraph"><p> </p></div>
 <div class="paragraph"><p> </p></div>
 </div>
 </div>
 </div>
 </div>
 <div id="footnotes"><hr></div>
 <div id="footer">
 <div id="footer-text">
 Version v2.2-3<br>
 Last updated
  2017-05-12 11:30:12 PDT
 </div>
 </div>
 </body>
 </html>