blob: dacbc0fd6b2d2ff166710a110b587f74cbbcb76f [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=edge"><![endif]-->
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="Asciidoctor 1.5.5">
<meta name="author" content="Khronos OpenCL Working Group">
<title>The OpenCL&#8482; Specification</title>
<style>
/*! normalize.css v2.1.2 | MIT License | git.io/normalize */
/* ========================================================================== HTML5 display definitions ========================================================================== */
/** Correct `block` display not defined in IE 8/9. */
article, aside, details, figcaption, figure, footer, header, hgroup, main, nav, section, summary { display: block; }
/** Correct `inline-block` display not defined in IE 8/9. */
audio, canvas, video { display: inline-block; }
/** Prevent modern browsers from displaying `audio` without controls. Remove excess height in iOS 5 devices. */
audio:not([controls]) { display: none; height: 0; }
/** Address `[hidden]` styling not present in IE 8/9. Hide the `template` element in IE, Safari, and Firefox < 22. */
[hidden], template { display: none; }
script { display: none !important; }
/* ========================================================================== Base ========================================================================== */
/** 1. Set default font family to sans-serif. 2. Prevent iOS text size adjust after orientation change, without disabling user zoom. */
html { font-family: sans-serif; /* 1 */ -ms-text-size-adjust: 100%; /* 2 */ -webkit-text-size-adjust: 100%; /* 2 */ }
/** Remove default margin. */
body { margin: 0; }
/* ========================================================================== Links ========================================================================== */
/** Remove the gray background color from active links in IE 10. */
a { background: transparent; }
/** Address `outline` inconsistency between Chrome and other browsers. */
a:focus { outline: thin dotted; }
/** Improve readability when focused and also mouse hovered in all browsers. */
a:active, a:hover { outline: 0; }
/* ========================================================================== Typography ========================================================================== */
/** Address variable `h1` font-size and margin within `section` and `article` contexts in Firefox 4+, Safari 5, and Chrome. */
h1 { font-size: 2em; margin: 0.67em 0; }
/** Address styling not present in IE 8/9, Safari 5, and Chrome. */
abbr[title] { border-bottom: 1px dotted; }
/** Address style set to `bolder` in Firefox 4+, Safari 5, and Chrome. */
b, strong { font-weight: bold; }
/** Address styling not present in Safari 5 and Chrome. */
dfn { font-style: italic; }
/** Address differences between Firefox and other browsers. */
hr { -moz-box-sizing: content-box; box-sizing: content-box; height: 0; }
/** Address styling not present in IE 8/9. */
mark { background: #ff0; color: #000; }
/** Correct font family set oddly in Safari 5 and Chrome. */
code, kbd, pre, samp { font-family: monospace, serif; font-size: 1em; }
/** Improve readability of pre-formatted text in all browsers. */
pre { white-space: pre-wrap; }
/** Set consistent quote types. */
q { quotes: "\201C" "\201D" "\2018" "\2019"; }
/** Address inconsistent and variable font size in all browsers. */
small { font-size: 80%; }
/** Prevent `sub` and `sup` affecting `line-height` in all browsers. */
sub, sup { font-size: 75%; line-height: 0; position: relative; vertical-align: baseline; }
sup { top: -0.5em; }
sub { bottom: -0.25em; }
/* ========================================================================== Embedded content ========================================================================== */
/** Remove border when inside `a` element in IE 8/9. */
img { border: 0; }
/** Correct overflow displayed oddly in IE 9. */
svg:not(:root) { overflow: hidden; }
/* ========================================================================== Figures ========================================================================== */
/** Address margin not present in IE 8/9 and Safari 5. */
figure { margin: 0; }
/* ========================================================================== Forms ========================================================================== */
/** Define consistent border, margin, and padding. */
fieldset { border: 1px solid #c0c0c0; margin: 0 2px; padding: 0.35em 0.625em 0.75em; }
/** 1. Correct `color` not being inherited in IE 8/9. 2. Remove padding so people aren't caught out if they zero out fieldsets. */
legend { border: 0; /* 1 */ padding: 0; /* 2 */ }
/** 1. Correct font family not being inherited in all browsers. 2. Correct font size not being inherited in all browsers. 3. Address margins set differently in Firefox 4+, Safari 5, and Chrome. */
button, input, select, textarea { font-family: inherit; /* 1 */ font-size: 100%; /* 2 */ margin: 0; /* 3 */ }
/** Address Firefox 4+ setting `line-height` on `input` using `!important` in the UA stylesheet. */
button, input { line-height: normal; }
/** Address inconsistent `text-transform` inheritance for `button` and `select`. All other form control elements do not inherit `text-transform` values. Correct `button` style inheritance in Chrome, Safari 5+, and IE 8+. Correct `select` style inheritance in Firefox 4+ and Opera. */
button, select { text-transform: none; }
/** 1. Avoid the WebKit bug in Android 4.0.* where (2) destroys native `audio` and `video` controls. 2. Correct inability to style clickable `input` types in iOS. 3. Improve usability and consistency of cursor style between image-type `input` and others. */
button, html input[type="button"], input[type="reset"], input[type="submit"] { -webkit-appearance: button; /* 2 */ cursor: pointer; /* 3 */ }
/** Re-set default cursor for disabled elements. */
button[disabled], html input[disabled] { cursor: default; }
/** 1. Address box sizing set to `content-box` in IE 8/9. 2. Remove excess padding in IE 8/9. */
input[type="checkbox"], input[type="radio"] { box-sizing: border-box; /* 1 */ padding: 0; /* 2 */ }
/** 1. Address `appearance` set to `searchfield` in Safari 5 and Chrome. 2. Address `box-sizing` set to `border-box` in Safari 5 and Chrome (include `-moz` to future-proof). */
input[type="search"] { -webkit-appearance: textfield; /* 1 */ -moz-box-sizing: content-box; -webkit-box-sizing: content-box; /* 2 */ box-sizing: content-box; }
/** Remove inner padding and search cancel button in Safari 5 and Chrome on OS X. */
input[type="search"]::-webkit-search-cancel-button, input[type="search"]::-webkit-search-decoration { -webkit-appearance: none; }
/** Remove inner padding and border in Firefox 4+. */
button::-moz-focus-inner, input::-moz-focus-inner { border: 0; padding: 0; }
/** 1. Remove default vertical scrollbar in IE 8/9. 2. Improve readability and alignment in all browsers. */
textarea { overflow: auto; /* 1 */ vertical-align: top; /* 2 */ }
/* ========================================================================== Tables ========================================================================== */
/** Remove most spacing between table cells. */
table { border-collapse: collapse; border-spacing: 0; }
meta.foundation-mq-small { font-family: "only screen and (min-width: 768px)"; width: 768px; }
meta.foundation-mq-medium { font-family: "only screen and (min-width:1280px)"; width: 1280px; }
meta.foundation-mq-large { font-family: "only screen and (min-width:1440px)"; width: 1440px; }
*, *:before, *:after { -moz-box-sizing: border-box; -webkit-box-sizing: border-box; box-sizing: border-box; }
html, body { font-size: 100%; }
body { background: white; color: #222222; padding: 0; margin: 0; font-family: "Helvetica Neue", "Helvetica", Helvetica, Arial, sans-serif; font-weight: normal; font-style: normal; line-height: 1; position: relative; cursor: auto; }
a:hover { cursor: pointer; }
img, object, embed { max-width: 100%; height: auto; }
object, embed { height: 100%; }
img { -ms-interpolation-mode: bicubic; }
#map_canvas img, #map_canvas embed, #map_canvas object, .map_canvas img, .map_canvas embed, .map_canvas object { max-width: none !important; }
.left { float: left !important; }
.right { float: right !important; }
.text-left { text-align: left !important; }
.text-right { text-align: right !important; }
.text-center { text-align: center !important; }
.text-justify { text-align: justify !important; }
.hide { display: none; }
.antialiased { -webkit-font-smoothing: antialiased; }
img { display: inline-block; vertical-align: middle; }
textarea { height: auto; min-height: 50px; }
select { width: 100%; }
object, svg { display: inline-block; vertical-align: middle; }
.center { margin-left: auto; margin-right: auto; }
.spread { width: 100%; }
p.lead, .paragraph.lead > p, #preamble > .sectionbody > .paragraph:first-of-type p { font-size: 1.21875em; line-height: 1.6; }
.subheader, .admonitionblock td.content > .title, .audioblock > .title, .exampleblock > .title, .imageblock > .title, .listingblock > .title, .literalblock > .title, .stemblock > .title, .openblock > .title, .paragraph > .title, .quoteblock > .title, table.tableblock > .title, .verseblock > .title, .videoblock > .title, .dlist > .title, .olist > .title, .ulist > .title, .qlist > .title, .hdlist > .title { line-height: 1.4; color: black; font-weight: 300; margin-top: 0.2em; margin-bottom: 0.5em; }
/* Typography resets */
div, dl, dt, dd, ul, ol, li, h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6, pre, form, p, blockquote, th, td { margin: 0; padding: 0; direction: ltr; }
/* Default Link Styles */
a { color: #0068b0; text-decoration: none; line-height: inherit; }
a:hover, a:focus { color: #333333; }
a img { border: none; }
/* Default paragraph styles */
p { font-family: Noto, sans-serif; font-weight: normal; font-size: 1em; line-height: 1.6; margin-bottom: 0.75em; text-rendering: optimizeLegibility; }
p aside { font-size: 0.875em; line-height: 1.35; font-style: italic; }
/* Default header styles */
h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { font-family: Noto, sans-serif; font-weight: normal; font-style: normal; color: black; text-rendering: optimizeLegibility; margin-top: 0.5em; margin-bottom: 0.5em; line-height: 1.2125em; }
h1 small, h2 small, h3 small, #toctitle small, .sidebarblock > .content > .title small, h4 small, h5 small, h6 small { font-size: 60%; color: #4d4d4d; line-height: 0; }
h1 { font-size: 2.125em; }
h2 { font-size: 1.6875em; }
h3, #toctitle, .sidebarblock > .content > .title { font-size: 1.375em; }
h4 { font-size: 1.125em; }
h5 { font-size: 1.125em; }
h6 { font-size: 1em; }
hr { border: solid #dddddd; border-width: 1px 0 0; clear: both; margin: 1.25em 0 1.1875em; height: 0; }
/* Helpful Typography Defaults */
em, i { font-style: italic; line-height: inherit; }
strong, b { font-weight: bold; line-height: inherit; }
small { font-size: 60%; line-height: inherit; }
code { font-family: Consolas, "Liberation Mono", Courier, monospace; font-weight: normal; color: #264357; }
/* Lists */
ul, ol, dl { font-size: 1em; line-height: 1.6; margin-bottom: 0.75em; list-style-position: outside; font-family: Noto, sans-serif; }
ul, ol { margin-left: 1.5em; }
ul.no-bullet, ol.no-bullet { margin-left: 1.5em; }
/* Unordered Lists */
ul li ul, ul li ol { margin-left: 1.25em; margin-bottom: 0; font-size: 1em; /* Override nested font-size change */ }
ul.square li ul, ul.circle li ul, ul.disc li ul { list-style: inherit; }
ul.square { list-style-type: square; }
ul.circle { list-style-type: circle; }
ul.disc { list-style-type: disc; }
ul.no-bullet { list-style: none; }
/* Ordered Lists */
ol li ul, ol li ol { margin-left: 1.25em; margin-bottom: 0; }
/* Definition Lists */
dl dt { margin-bottom: 0.3em; font-weight: bold; }
dl dd { margin-bottom: 0.75em; }
/* Abbreviations */
abbr, acronym { text-transform: uppercase; font-size: 90%; color: black; border-bottom: 1px dotted #dddddd; cursor: help; }
abbr { text-transform: none; }
/* Blockquotes */
blockquote { margin: 0 0 0.75em; padding: 0.5625em 1.25em 0 1.1875em; border-left: 1px solid #dddddd; }
blockquote cite { display: block; font-size: 0.8125em; color: #5e93b8; }
blockquote cite:before { content: "\2014 \0020"; }
blockquote cite a, blockquote cite a:visited { color: #5e93b8; }
blockquote, blockquote p { line-height: 1.6; color: #333333; }
/* Microformats */
.vcard { display: inline-block; margin: 0 0 1.25em 0; border: 1px solid #dddddd; padding: 0.625em 0.75em; }
.vcard li { margin: 0; display: block; }
.vcard .fn { font-weight: bold; font-size: 0.9375em; }
.vevent .summary { font-weight: bold; }
.vevent abbr { cursor: auto; text-decoration: none; font-weight: bold; border: none; padding: 0 0.0625em; }
@media only screen and (min-width: 768px) { h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { line-height: 1.4; }
h1 { font-size: 2.75em; }
h2 { font-size: 2.3125em; }
h3, #toctitle, .sidebarblock > .content > .title { font-size: 1.6875em; }
h4 { font-size: 1.4375em; } }
/* Tables */
table { background: white; margin-bottom: 1.25em; border: solid 1px #d8d8ce; }
table thead, table tfoot { background: -webkit-linear-gradient(top, #add386, #90b66a); font-weight: bold; }
table thead tr th, table thead tr td, table tfoot tr th, table tfoot tr td { padding: 0.5em 0.625em 0.625em; font-size: inherit; color: white; text-align: left; }
table tr th, table tr td { padding: 0.5625em 0.625em; font-size: inherit; color: #6d6e71; }
table tr.even, table tr.alt, table tr:nth-of-type(even) { background: #edf2f2; }
table thead tr th, table tfoot tr th, table tbody tr td, table tr td, table tfoot tr td { display: table-cell; line-height: 1.4; }
body { -moz-osx-font-smoothing: grayscale; -webkit-font-smoothing: antialiased; tab-size: 4; }
h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { line-height: 1.4; }
a:hover, a:focus { text-decoration: underline; }
.clearfix:before, .clearfix:after, .float-group:before, .float-group:after { content: " "; display: table; }
.clearfix:after, .float-group:after { clear: both; }
*:not(pre) > code { font-size: inherit; font-style: normal !important; letter-spacing: 0; padding: 0; background-color: white; -webkit-border-radius: 0; border-radius: 0; line-height: inherit; word-wrap: break-word; }
*:not(pre) > code.nobreak { word-wrap: normal; }
*:not(pre) > code.nowrap { white-space: nowrap; }
pre, pre > code { line-height: 1.6; color: #264357; font-family: Consolas, "Liberation Mono", Courier, monospace; font-weight: normal; }
em em { font-style: normal; }
strong strong { font-weight: normal; }
.keyseq { color: #333333; }
kbd { font-family: Consolas, "Liberation Mono", Courier, monospace; display: inline-block; color: black; font-size: 0.65em; line-height: 1.45; background-color: #f7f7f7; border: 1px solid #ccc; -webkit-border-radius: 3px; border-radius: 3px; -webkit-box-shadow: 0 1px 0 rgba(0, 0, 0, 0.2), 0 0 0 0.1em white inset; box-shadow: 0 1px 0 rgba(0, 0, 0, 0.2), 0 0 0 0.1em white inset; margin: 0 0.15em; padding: 0.2em 0.5em; vertical-align: middle; position: relative; top: -0.1em; white-space: nowrap; }
.keyseq kbd:first-child { margin-left: 0; }
.keyseq kbd:last-child { margin-right: 0; }
.menuseq, .menuref { color: #000; }
.menuseq b:not(.caret), .menuref { font-weight: inherit; }
.menuseq { word-spacing: -0.02em; }
.menuseq b.caret { font-size: 1.25em; line-height: 0.8; }
.menuseq i.caret { font-weight: bold; text-align: center; width: 0.45em; }
b.button:before, b.button:after { position: relative; top: -1px; font-weight: normal; }
b.button:before { content: "["; padding: 0 3px 0 2px; }
b.button:after { content: "]"; padding: 0 2px 0 3px; }
#header, #content, #footnotes, #footer { width: 100%; margin-left: auto; margin-right: auto; margin-top: 0; margin-bottom: 0; max-width: 62.5em; *zoom: 1; position: relative; padding-left: 1.5em; padding-right: 1.5em; }
#header:before, #header:after, #content:before, #content:after, #footnotes:before, #footnotes:after, #footer:before, #footer:after { content: " "; display: table; }
#header:after, #content:after, #footnotes:after, #footer:after { clear: both; }
#content { margin-top: 1.25em; }
#content:before { content: none; }
#header > h1:first-child { color: black; margin-top: 2.25rem; margin-bottom: 0; }
#header > h1:first-child + #toc { margin-top: 8px; border-top: 1px solid #dddddd; }
#header > h1:only-child, body.toc2 #header > h1:nth-last-child(2) { border-bottom: 1px solid #dddddd; padding-bottom: 8px; }
#header .details { border-bottom: 1px solid #dddddd; line-height: 1.45; padding-top: 0.25em; padding-bottom: 0.25em; padding-left: 0.25em; color: #5e93b8; display: -ms-flexbox; display: -webkit-flex; display: flex; -ms-flex-flow: row wrap; -webkit-flex-flow: row wrap; flex-flow: row wrap; }
#header .details span:first-child { margin-left: -0.125em; }
#header .details span.email a { color: #333333; }
#header .details br { display: none; }
#header .details br + span:before { content: "\00a0\2013\00a0"; }
#header .details br + span.author:before { content: "\00a0\22c5\00a0"; color: #333333; }
#header .details br + span#revremark:before { content: "\00a0|\00a0"; }
#header #revnumber { text-transform: capitalize; }
#header #revnumber:after { content: "\00a0"; }
#content > h1:first-child:not([class]) { color: black; border-bottom: 1px solid #dddddd; padding-bottom: 8px; margin-top: 0; padding-top: 1rem; margin-bottom: 1.25rem; }
#toc { border-bottom: 0 solid #dddddd; padding-bottom: 0.5em; }
#toc > ul { margin-left: 0.125em; }
#toc ul.sectlevel0 > li > a { font-style: italic; }
#toc ul.sectlevel0 ul.sectlevel1 { margin: 0.5em 0; }
#toc ul { font-family: Noto, sans-serif; list-style-type: none; }
#toc li { line-height: 1.3334; margin-top: 0.3334em; }
#toc a { text-decoration: none; }
#toc a:active { text-decoration: underline; }
#toctitle { color: black; font-size: 1.2em; }
@media only screen and (min-width: 768px) { #toctitle { font-size: 1.375em; }
body.toc2 { padding-left: 15em; padding-right: 0; }
#toc.toc2 { margin-top: 0 !important; background-color: white; position: fixed; width: 15em; left: 0; top: 0; border-right: 1px solid #dddddd; border-top-width: 0 !important; border-bottom-width: 0 !important; z-index: 1000; padding: 1.25em 1em; height: 100%; overflow: auto; }
#toc.toc2 #toctitle { margin-top: 0; margin-bottom: 0.8rem; font-size: 1.2em; }
#toc.toc2 > ul { font-size: 0.9em; margin-bottom: 0; }
#toc.toc2 ul ul { margin-left: 0; padding-left: 1em; }
#toc.toc2 ul.sectlevel0 ul.sectlevel1 { padding-left: 0; margin-top: 0.5em; margin-bottom: 0.5em; }
body.toc2.toc-right { padding-left: 0; padding-right: 15em; }
body.toc2.toc-right #toc.toc2 { border-right-width: 0; border-left: 1px solid #dddddd; left: auto; right: 0; } }
@media only screen and (min-width: 1280px) { body.toc2 { padding-left: 20em; padding-right: 0; }
#toc.toc2 { width: 20em; }
#toc.toc2 #toctitle { font-size: 1.375em; }
#toc.toc2 > ul { font-size: 0.95em; }
#toc.toc2 ul ul { padding-left: 1.25em; }
body.toc2.toc-right { padding-left: 0; padding-right: 20em; } }
#content #toc { border-style: solid; border-width: 1px; border-color: #e6e6e6; margin-bottom: 1.25em; padding: 1.25em; background: white; -webkit-border-radius: 0; border-radius: 0; }
#content #toc > :first-child { margin-top: 0; }
#content #toc > :last-child { margin-bottom: 0; }
#footer { max-width: 100%; background-color: none; padding: 1.25em; }
#footer-text { color: black; line-height: 1.44; }
#content { margin-bottom: 0.625em; }
.sect1 { padding-bottom: 0.625em; }
@media only screen and (min-width: 768px) { #content { margin-bottom: 1.25em; }
.sect1 { padding-bottom: 1.25em; } }
.sect1:last-child { padding-bottom: 0; }
.sect1 + .sect1 { border-top: 0 solid #dddddd; }
#content h1 > a.anchor, h2 > a.anchor, h3 > a.anchor, #toctitle > a.anchor, .sidebarblock > .content > .title > a.anchor, h4 > a.anchor, h5 > a.anchor, h6 > a.anchor { position: absolute; z-index: 1001; width: 1.5ex; margin-left: -1.5ex; display: block; text-decoration: none !important; visibility: hidden; text-align: center; font-weight: normal; }
#content h1 > a.anchor:before, h2 > a.anchor:before, h3 > a.anchor:before, #toctitle > a.anchor:before, .sidebarblock > .content > .title > a.anchor:before, h4 > a.anchor:before, h5 > a.anchor:before, h6 > a.anchor:before { content: "\00A7"; font-size: 0.85em; display: block; padding-top: 0.1em; }
#content h1:hover > a.anchor, #content h1 > a.anchor:hover, h2:hover > a.anchor, h2 > a.anchor:hover, h3:hover > a.anchor, #toctitle:hover > a.anchor, .sidebarblock > .content > .title:hover > a.anchor, h3 > a.anchor:hover, #toctitle > a.anchor:hover, .sidebarblock > .content > .title > a.anchor:hover, h4:hover > a.anchor, h4 > a.anchor:hover, h5:hover > a.anchor, h5 > a.anchor:hover, h6:hover > a.anchor, h6 > a.anchor:hover { visibility: visible; }
#content h1 > a.link, h2 > a.link, h3 > a.link, #toctitle > a.link, .sidebarblock > .content > .title > a.link, h4 > a.link, h5 > a.link, h6 > a.link { color: black; text-decoration: none; }
#content h1 > a.link:hover, h2 > a.link:hover, h3 > a.link:hover, #toctitle > a.link:hover, .sidebarblock > .content > .title > a.link:hover, h4 > a.link:hover, h5 > a.link:hover, h6 > a.link:hover { color: black; }
.audioblock, .imageblock, .literalblock, .listingblock, .stemblock, .videoblock { margin-bottom: 1.25em; }
.admonitionblock td.content > .title, .audioblock > .title, .exampleblock > .title, .imageblock > .title, .listingblock > .title, .literalblock > .title, .stemblock > .title, .openblock > .title, .paragraph > .title, .quoteblock > .title, table.tableblock > .title, .verseblock > .title, .videoblock > .title, .dlist > .title, .olist > .title, .ulist > .title, .qlist > .title, .hdlist > .title { text-rendering: optimizeLegibility; text-align: left; }
table.tableblock > caption.title { white-space: nowrap; overflow: visible; max-width: 0; }
.paragraph.lead > p, #preamble > .sectionbody > .paragraph:first-of-type p { color: black; }
table.tableblock #preamble > .sectionbody > .paragraph:first-of-type p { font-size: inherit; }
.admonitionblock > table { border-collapse: separate; border: 0; background: none; width: 100%; }
.admonitionblock > table td.icon { text-align: center; width: 80px; }
.admonitionblock > table td.icon img { max-width: initial; }
.admonitionblock > table td.icon .title { font-weight: bold; font-family: Noto, sans-serif; text-transform: uppercase; }
.admonitionblock > table td.content { padding-left: 1.125em; padding-right: 1.25em; border-left: 1px solid #dddddd; color: #5e93b8; }
.admonitionblock > table td.content > :last-child > :last-child { margin-bottom: 0; }
.exampleblock > .content { border-style: solid; border-width: 1px; border-color: #e6e6e6; margin-bottom: 1.25em; padding: 1.25em; background: white; -webkit-border-radius: 0; border-radius: 0; }
.exampleblock > .content > :first-child { margin-top: 0; }
.exampleblock > .content > :last-child { margin-bottom: 0; }
.sidebarblock { border-style: solid; border-width: 1px; border-color: #e6e6e6; margin-bottom: 1.25em; padding: 1.25em; background: white; -webkit-border-radius: 0; border-radius: 0; }
.sidebarblock > :first-child { margin-top: 0; }
.sidebarblock > :last-child { margin-bottom: 0; }
.sidebarblock > .content > .title { color: black; margin-top: 0; }
.exampleblock > .content > :last-child > :last-child, .exampleblock > .content .olist > ol > li:last-child > :last-child, .exampleblock > .content .ulist > ul > li:last-child > :last-child, .exampleblock > .content .qlist > ol > li:last-child > :last-child, .sidebarblock > .content > :last-child > :last-child, .sidebarblock > .content .olist > ol > li:last-child > :last-child, .sidebarblock > .content .ulist > ul > li:last-child > :last-child, .sidebarblock > .content .qlist > ol > li:last-child > :last-child { margin-bottom: 0; }
.literalblock pre, .listingblock pre:not(.highlight), .listingblock pre[class="highlight"], .listingblock pre[class^="highlight "], .listingblock pre.CodeRay, .listingblock pre.prettyprint { background: #eeeeee; }
.sidebarblock .literalblock pre, .sidebarblock .listingblock pre:not(.highlight), .sidebarblock .listingblock pre[class="highlight"], .sidebarblock .listingblock pre[class^="highlight "], .sidebarblock .listingblock pre.CodeRay, .sidebarblock .listingblock pre.prettyprint { background: #f2f1f1; }
.literalblock pre, .literalblock pre[class], .listingblock pre, .listingblock pre[class] { border: 1px hidden #666666; -webkit-border-radius: 0; border-radius: 0; word-wrap: break-word; padding: 1.25em 1.5625em 1.125em 1.5625em; font-size: 0.8125em; }
.literalblock pre.nowrap, .literalblock pre[class].nowrap, .listingblock pre.nowrap, .listingblock pre[class].nowrap { overflow-x: auto; white-space: pre; word-wrap: normal; }
@media only screen and (min-width: 768px) { .literalblock pre, .literalblock pre[class], .listingblock pre, .listingblock pre[class] { font-size: 0.90625em; } }
@media only screen and (min-width: 1280px) { .literalblock pre, .literalblock pre[class], .listingblock pre, .listingblock pre[class] { font-size: 1em; } }
.literalblock.output pre { color: #eeeeee; background-color: #264357; }
.listingblock pre.highlightjs { padding: 0; }
.listingblock pre.highlightjs > code { padding: 1.25em 1.5625em 1.125em 1.5625em; -webkit-border-radius: 0; border-radius: 0; }
.listingblock > .content { position: relative; }
.listingblock code[data-lang]:before { display: none; content: attr(data-lang); position: absolute; font-size: 0.75em; top: 0.425rem; right: 0.5rem; line-height: 1; text-transform: uppercase; color: #999; }
.listingblock:hover code[data-lang]:before { display: block; }
.listingblock.terminal pre .command:before { content: attr(data-prompt); padding-right: 0.5em; color: #999; }
.listingblock.terminal pre .command:not([data-prompt]):before { content: "$"; }
table.pyhltable { border-collapse: separate; border: 0; margin-bottom: 0; background: none; }
table.pyhltable td { vertical-align: top; padding-top: 0; padding-bottom: 0; line-height: 1.6; }
table.pyhltable td.code { padding-left: .75em; padding-right: 0; }
pre.pygments .lineno, table.pyhltable td:not(.code) { color: #999; padding-left: 0; padding-right: .5em; border-right: 1px solid #dddddd; }
pre.pygments .lineno { display: inline-block; margin-right: .25em; }
table.pyhltable .linenodiv { background: none !important; padding-right: 0 !important; }
.quoteblock { margin: 0 1em 0.75em 1.5em; display: table; }
.quoteblock > .title { margin-left: -1.5em; margin-bottom: 0.75em; }
.quoteblock blockquote, .quoteblock blockquote p { color: #333333; font-size: 1.15rem; line-height: 1.75; word-spacing: 0.1em; letter-spacing: 0; font-style: italic; text-align: justify; }
.quoteblock blockquote { margin: 0; padding: 0; border: 0; }
.quoteblock blockquote:before { content: "\201c"; float: left; font-size: 2.75em; font-weight: bold; line-height: 0.6em; margin-left: -0.6em; color: black; text-shadow: 0 1px 2px rgba(0, 0, 0, 0.1); }
.quoteblock blockquote > .paragraph:last-child p { margin-bottom: 0; }
.quoteblock .attribution { margin-top: 0.5em; margin-right: 0.5ex; text-align: right; }
.quoteblock .quoteblock { margin-left: 0; margin-right: 0; padding: 0.5em 0; border-left: 3px solid #5e93b8; }
.quoteblock .quoteblock blockquote { padding: 0 0 0 0.75em; }
.quoteblock .quoteblock blockquote:before { display: none; }
.verseblock { margin: 0 1em 0.75em 1em; }
.verseblock pre { font-family: "Open Sans", "DejaVu Sans", sans; font-size: 1.15rem; color: #333333; font-weight: 300; text-rendering: optimizeLegibility; }
.verseblock pre strong { font-weight: 400; }
.verseblock .attribution { margin-top: 1.25rem; margin-left: 0.5ex; }
.quoteblock .attribution, .verseblock .attribution { font-size: 0.8125em; line-height: 1.45; font-style: italic; }
.quoteblock .attribution br, .verseblock .attribution br { display: none; }
.quoteblock .attribution cite, .verseblock .attribution cite { display: block; letter-spacing: -0.025em; color: #5e93b8; }
.quoteblock.abstract { margin: 0 0 0.75em 0; display: block; }
.quoteblock.abstract blockquote, .quoteblock.abstract blockquote p { text-align: left; word-spacing: 0; }
.quoteblock.abstract blockquote:before, .quoteblock.abstract blockquote p:first-of-type:before { display: none; }
table.tableblock { max-width: 100%; border-collapse: separate; }
table.tableblock td > .paragraph:last-child p > p:last-child, table.tableblock th > p:last-child, table.tableblock td > p:last-child { margin-bottom: 0; }
table.tableblock, th.tableblock, td.tableblock { border: 0 solid #d8d8ce; }
table.grid-all > thead > tr > .tableblock, table.grid-all > tbody > tr > .tableblock { border-width: 0 1px 1px 0; }
table.grid-all > tfoot > tr > .tableblock { border-width: 1px 1px 0 0; }
table.grid-cols > * > tr > .tableblock { border-width: 0 1px 0 0; }
table.grid-rows > thead > tr > .tableblock, table.grid-rows > tbody > tr > .tableblock { border-width: 0 0 1px 0; }
table.grid-rows > tfoot > tr > .tableblock { border-width: 1px 0 0 0; }
table.grid-all > * > tr > .tableblock:last-child, table.grid-cols > * > tr > .tableblock:last-child { border-right-width: 0; }
table.grid-all > tbody > tr:last-child > .tableblock, table.grid-all > thead:last-child > tr > .tableblock, table.grid-rows > tbody > tr:last-child > .tableblock, table.grid-rows > thead:last-child > tr > .tableblock { border-bottom-width: 0; }
table.frame-all { border-width: 1px; }
table.frame-sides { border-width: 0 1px; }
table.frame-topbot { border-width: 1px 0; }
th.halign-left, td.halign-left { text-align: left; }
th.halign-right, td.halign-right { text-align: right; }
th.halign-center, td.halign-center { text-align: center; }
th.valign-top, td.valign-top { vertical-align: top; }
th.valign-bottom, td.valign-bottom { vertical-align: bottom; }
th.valign-middle, td.valign-middle { vertical-align: middle; }
table thead th, table tfoot th { font-weight: bold; }
tbody tr th { display: table-cell; line-height: 1.4; background: -webkit-linear-gradient(top, #add386, #90b66a); }
tbody tr th, tbody tr th p, tfoot tr th, tfoot tr th p { color: white; font-weight: bold; }
p.tableblock > code:only-child { background: none; padding: 0; }
p.tableblock { font-size: 1em; }
td > div.verse { white-space: pre; }
ol { margin-left: 1.75em; }
ul li ol { margin-left: 1.5em; }
dl dd { margin-left: 1.125em; }
dl dd:last-child, dl dd:last-child > :last-child { margin-bottom: 0; }
ol > li p, ul > li p, ul dd, ol dd, .olist .olist, .ulist .ulist, .ulist .olist, .olist .ulist { margin-bottom: 0.375em; }
ul.checklist, ul.none, ol.none, ul.no-bullet, ol.no-bullet, ol.unnumbered, ul.unstyled, ol.unstyled { list-style-type: none; }
ul.no-bullet, ol.no-bullet, ol.unnumbered { margin-left: 0.625em; }
ul.unstyled, ol.unstyled { margin-left: 0; }
ul.checklist { margin-left: 0.625em; }
ul.checklist li > p:first-child > .fa-square-o:first-child, ul.checklist li > p:first-child > .fa-check-square-o:first-child { width: 1.25em; font-size: 0.8em; position: relative; bottom: 0.125em; }
ul.checklist li > p:first-child > input[type="checkbox"]:first-child { margin-right: 0.25em; }
ul.inline { display: -ms-flexbox; display: -webkit-box; display: flex; -ms-flex-flow: row wrap; -webkit-flex-flow: row wrap; flex-flow: row wrap; list-style: none; margin: 0 0 0.375em -0.75em; }
ul.inline > li { margin-left: 0.75em; }
.unstyled dl dt { font-weight: normal; font-style: normal; }
ol.arabic { list-style-type: decimal; }
ol.decimal { list-style-type: decimal-leading-zero; }
ol.loweralpha { list-style-type: lower-alpha; }
ol.upperalpha { list-style-type: upper-alpha; }
ol.lowerroman { list-style-type: lower-roman; }
ol.upperroman { list-style-type: upper-roman; }
ol.lowergreek { list-style-type: lower-greek; }
.hdlist > table, .colist > table { border: 0; background: none; }
.hdlist > table > tbody > tr, .colist > table > tbody > tr { background: none; }
td.hdlist1, td.hdlist2 { vertical-align: top; padding: 0 0.625em; }
td.hdlist1 { font-weight: bold; padding-bottom: 0.75em; }
.literalblock + .colist, .listingblock + .colist { margin-top: -0.5em; }
.colist > table tr > td:first-of-type { padding: 0.4em 0.75em 0 0.75em; line-height: 1; vertical-align: top; }
.colist > table tr > td:first-of-type img { max-width: initial; }
.colist > table tr > td:last-of-type { padding: 0.25em 0; }
.thumb, .th { line-height: 0; display: inline-block; border: solid 4px white; -webkit-box-shadow: 0 0 0 1px #dddddd; box-shadow: 0 0 0 1px #dddddd; }
.imageblock.left, .imageblock[style*="float: left"] { margin: 0.25em 0.625em 1.25em 0; }
.imageblock.right, .imageblock[style*="float: right"] { margin: 0.25em 0 1.25em 0.625em; }
.imageblock > .title { margin-bottom: 0; }
.imageblock.thumb, .imageblock.th { border-width: 6px; }
.imageblock.thumb > .title, .imageblock.th > .title { padding: 0 0.125em; }
.image.left, .image.right { margin-top: 0.25em; margin-bottom: 0.25em; display: inline-block; line-height: 0; }
.image.left { margin-right: 0.625em; }
.image.right { margin-left: 0.625em; }
a.image { text-decoration: none; display: inline-block; }
a.image object { pointer-events: none; }
sup.footnote, sup.footnoteref { font-size: 0.875em; position: static; vertical-align: super; }
sup.footnote a, sup.footnoteref a { text-decoration: none; }
sup.footnote a:active, sup.footnoteref a:active { text-decoration: underline; }
#footnotes { padding-top: 0.75em; padding-bottom: 0.75em; margin-bottom: 0.625em; }
#footnotes hr { width: 20%; min-width: 6.25em; margin: -0.25em 0 0.75em 0; border-width: 1px 0 0 0; }
#footnotes .footnote { padding: 0 0.375em 0 0.225em; line-height: 1.3334; font-size: 0.875em; margin-left: 1.2em; margin-bottom: 0.2em; }
#footnotes .footnote a:first-of-type { font-weight: bold; text-decoration: none; margin-left: -1.05em; }
#footnotes .footnote:last-of-type { margin-bottom: 0; }
#content #footnotes { margin-top: -0.625em; margin-bottom: 0; padding: 0.75em 0; }
.gist .file-data > table { border: 0; background: #fff; width: 100%; margin-bottom: 0; }
.gist .file-data > table td.line-data { width: 99%; }
div.unbreakable { page-break-inside: avoid; }
.big { font-size: larger; }
.small { font-size: smaller; }
.underline { text-decoration: underline; }
.overline { text-decoration: overline; }
.line-through { text-decoration: line-through; }
.aqua { color: #00bfbf; }
.aqua-background { background-color: #00fafa; }
.black { color: black; }
.black-background { background-color: black; }
.blue { color: #0000bf; }
.blue-background { background-color: #0000fa; }
.fuchsia { color: #bf00bf; }
.fuchsia-background { background-color: #fa00fa; }
.gray { color: #606060; }
.gray-background { background-color: #7d7d7d; }
.green { color: #006000; }
.green-background { background-color: #007d00; }
.lime { color: #00bf00; }
.lime-background { background-color: #00fa00; }
.maroon { color: #600000; }
.maroon-background { background-color: #7d0000; }
.navy { color: #000060; }
.navy-background { background-color: #00007d; }
.olive { color: #606000; }
.olive-background { background-color: #7d7d00; }
.purple { color: #600060; }
.purple-background { background-color: #7d007d; }
.red { color: #bf0000; }
.red-background { background-color: #fa0000; }
.silver { color: #909090; }
.silver-background { background-color: #bcbcbc; }
.teal { color: #006060; }
.teal-background { background-color: #007d7d; }
.white { color: #bfbfbf; }
.white-background { background-color: #fafafa; }
.yellow { color: #bfbf00; }
.yellow-background { background-color: #fafa00; }
span.icon > .fa { cursor: default; }
a span.icon > .fa { cursor: inherit; }
.admonitionblock td.icon [class^="fa icon-"] { font-size: 2.5em; text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.5); cursor: default; }
.admonitionblock td.icon .icon-note:before { content: "\f05a"; color: #29475c; }
.admonitionblock td.icon .icon-tip:before { content: "\f0eb"; text-shadow: 1px 1px 2px rgba(155, 155, 0, 0.8); color: #111; }
.admonitionblock td.icon .icon-warning:before { content: "\f071"; color: #bf6900; }
.admonitionblock td.icon .icon-caution:before { content: "\f06d"; color: #bf3400; }
.admonitionblock td.icon .icon-important:before { content: "\f06a"; color: #bf0000; }
.conum[data-value] { display: inline-block; color: #fff !important; background-color: black; -webkit-border-radius: 100px; border-radius: 100px; text-align: center; font-size: 0.75em; width: 1.67em; height: 1.67em; line-height: 1.67em; font-family: "Open Sans", "DejaVu Sans", sans-serif; font-style: normal; font-weight: bold; }
.conum[data-value] * { color: #fff !important; }
.conum[data-value] + b { display: none; }
.conum[data-value]:after { content: attr(data-value); }
pre .conum[data-value] { position: relative; top: -0.125em; }
b.conum * { color: inherit !important; }
.conum:not([data-value]):empty { display: none; }
h1, h2, h3, #toctitle, .sidebarblock > .content > .title, h4, h5, h6 { border-bottom: 1px solid #dddddd; }
.sect1 { padding-bottom: 0; }
#toctitle { color: #00406F; font-weight: normal; margin-top: 1.5em; }
.sidebarblock { border-color: #aaa; }
code { -webkit-border-radius: 4px; border-radius: 4px; }
p.tableblock.header { color: #6d6e71; }
.literalblock pre, .listingblock pre { background: #eeeeee; }
</style>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.min.css">
<style>
/* Stylesheet for CodeRay to match GitHub theme | MIT License | http://foundation.zurb.com */
/*pre.CodeRay {background-color:#f7f7f8;}*/
.CodeRay .line-numbers{border-right:1px solid #d8d8d8;padding:0 0.5em 0 .25em}
.CodeRay span.line-numbers{display:inline-block;margin-right:.5em;color:rgba(0,0,0,.3)}
.CodeRay .line-numbers strong{color:rgba(0,0,0,.4)}
table.CodeRay{border-collapse:separate;border-spacing:0;margin-bottom:0;border:0;background:none}
table.CodeRay td{vertical-align: top;line-height:1.45}
table.CodeRay td.line-numbers{text-align:right}
table.CodeRay td.line-numbers>pre{padding:0;color:rgba(0,0,0,.3)}
table.CodeRay td.code{padding:0 0 0 .5em}
table.CodeRay td.code>pre{padding:0}
.CodeRay .debug{color:#fff !important;background:#000080 !important}
.CodeRay .annotation{color:#007}
.CodeRay .attribute-name{color:#000080}
.CodeRay .attribute-value{color:#700}
.CodeRay .binary{color:#509}
.CodeRay .comment{color:#998;font-style:italic}
.CodeRay .char{color:#04d}
.CodeRay .char .content{color:#04d}
.CodeRay .char .delimiter{color:#039}
.CodeRay .class{color:#458;font-weight:bold}
.CodeRay .complex{color:#a08}
.CodeRay .constant,.CodeRay .predefined-constant{color:#008080}
.CodeRay .color{color:#099}
.CodeRay .class-variable{color:#369}
.CodeRay .decorator{color:#b0b}
.CodeRay .definition{color:#099}
.CodeRay .delimiter{color:#000}
.CodeRay .doc{color:#970}
.CodeRay .doctype{color:#34b}
.CodeRay .doc-string{color:#d42}
.CodeRay .escape{color:#666}
.CodeRay .entity{color:#800}
.CodeRay .error{color:#808}
.CodeRay .exception{color:inherit}
.CodeRay .filename{color:#099}
.CodeRay .function{color:#900;font-weight:bold}
.CodeRay .global-variable{color:#008080}
.CodeRay .hex{color:#058}
.CodeRay .integer,.CodeRay .float{color:#099}
.CodeRay .include{color:#555}
.CodeRay .inline{color:#000}
.CodeRay .inline .inline{background:#ccc}
.CodeRay .inline .inline .inline{background:#bbb}
.CodeRay .inline .inline-delimiter{color:#d14}
.CodeRay .inline-delimiter{color:#d14}
.CodeRay .important{color:#555;font-weight:bold}
.CodeRay .interpreted{color:#b2b}
.CodeRay .instance-variable{color:#008080}
.CodeRay .label{color:#970}
.CodeRay .local-variable{color:#963}
.CodeRay .octal{color:#40e}
.CodeRay .predefined{color:#369}
.CodeRay .preprocessor{color:#579}
.CodeRay .pseudo-class{color:#555}
.CodeRay .directive{font-weight:bold}
.CodeRay .type{font-weight:bold}
.CodeRay .predefined-type{color:inherit}
.CodeRay .reserved,.CodeRay .keyword {color:#000;font-weight:bold}
.CodeRay .key{color:#808}
.CodeRay .key .delimiter{color:#606}
.CodeRay .key .char{color:#80f}
.CodeRay .value{color:#088}
.CodeRay .regexp .delimiter{color:#808}
.CodeRay .regexp .content{color:#808}
.CodeRay .regexp .modifier{color:#808}
.CodeRay .regexp .char{color:#d14}
.CodeRay .regexp .function{color:#404;font-weight:bold}
.CodeRay .string{color:#d20}
.CodeRay .string .string .string{background:#ffd0d0}
.CodeRay .string .content{color:#d14}
.CodeRay .string .char{color:#d14}
.CodeRay .string .delimiter{color:#d14}
.CodeRay .shell{color:#d14}
.CodeRay .shell .delimiter{color:#d14}
.CodeRay .symbol{color:#990073}
.CodeRay .symbol .content{color:#a60}
.CodeRay .symbol .delimiter{color:#630}
.CodeRay .tag{color:#008080}
.CodeRay .tag-special{color:#d70}
.CodeRay .variable{color:#036}
.CodeRay .insert{background:#afa}
.CodeRay .delete{background:#faa}
.CodeRay .change{color:#aaf;background:#007}
.CodeRay .head{color:#f8f;background:#505}
.CodeRay .insert .insert{color:#080}
.CodeRay .delete .delete{color:#800}
.CodeRay .change .change{color:#66f}
.CodeRay .head .head{color:#f4f}
</style>
<link rel="stylesheet" href="../katex/katex.min.css">
<script src="../katex/katex.min.js"></script>
<script src="../katex/contrib/auto-render.min.js"></script>
<!-- Use KaTeX to render math once document is loaded, see
https://github.com/Khan/KaTeX/tree/master/contrib/auto-render -->
<script>
document.addEventListener("DOMContentLoaded", function () {
renderMathInElement(
document.body,
{
delimiters: [
{ left: "$$", right: "$$", display: true},
{ left: "\\[", right: "\\]", display: true},
{ left: "$", right: "$", display: false},
{ left: "\\(", right: "\\)", display: false}
]
}
);
});
</script></head>
<body class="book toc2 toc-left" style="max-width: 100;">
<div id="header">
<h1>The OpenCL<sup>&#8482;</sup> Specification</h1>
<div class="details">
<span id="author" class="author">Khronos OpenCL Working Group</span><br>
<span id="revnumber">version 2.2-8,</span>
<span id="revdate">Mon, 08 Oct 2018 16:49:19 +0000</span>
<br><span id="revremark">from git branch: master commit: b3cab22fcff5d8c17869907c983e259ddd7ce788</span>
</div>
<div id="toc" class="toc2">
<div id="toctitle">Table of Contents</div>
<ul class="sectlevel1">
<li><a href="#_introduction">1. Introduction</a>
<ul class="sectlevel2">
<li><a href="#_normative_references">1.1. Normative References</a></li>
<li><a href="#_version_numbers">1.2. Version Numbers</a></li>
</ul>
</li>
<li><a href="#_glossary">2. Glossary</a></li>
<li><a href="#_the_opencl_architecture">3. The OpenCL Architecture</a>
<ul class="sectlevel2">
<li><a href="#_platform_model">3.1. Platform Model</a></li>
<li><a href="#_execution_model">3.2. Execution Model</a></li>
<li><a href="#_memory_model">3.3. Memory Model</a></li>
<li><a href="#opencl-framework">3.4. The OpenCL Framework</a></li>
</ul>
</li>
<li><a href="#opencl-platform-layer">4. The OpenCL Platform Layer</a>
<ul class="sectlevel2">
<li><a href="#_querying_platform_info">4.1. Querying Platform Info</a></li>
<li><a href="#platform-querying-devices">4.2. Querying Devices</a></li>
<li><a href="#_partitioning_a_device">4.3. Partitioning a Device</a></li>
<li><a href="#_contexts">4.4. Contexts</a></li>
</ul>
</li>
<li><a href="#opencl-runtime">5. The OpenCL Runtime</a>
<ul class="sectlevel2">
<li><a href="#_command_queues">5.1. Command Queues</a></li>
<li><a href="#_buffer_objects">5.2. Buffer Objects</a></li>
<li><a href="#_image_objects">5.3. Image Objects</a></li>
<li><a href="#_pipes">5.4. Pipes</a></li>
<li><a href="#_querying_unmapping_migrating_retaining_and_releasing_memory_objects">5.5. Querying, Unmapping, Migrating, Retaining and Releasing Memory Objects</a></li>
<li><a href="#_shared_virtual_memory">5.6. Shared Virtual Memory</a></li>
<li><a href="#_sampler_objects">5.7. Sampler Objects</a></li>
<li><a href="#_program_objects">5.8. Program Objects</a></li>
<li><a href="#_kernel_objects">5.9. Kernel Objects</a></li>
<li><a href="#_executing_kernels">5.10. Executing Kernels</a></li>
<li><a href="#event-objects">5.11. Event Objects</a></li>
<li><a href="#markers-barriers-waiting-for-events">5.12. Markers, Barriers and Waiting for Events</a></li>
<li><a href="#_out_of_order_execution_of_kernels_and_memory_object_commands">5.13. Out-of-order Execution of Kernels and Memory Object Commands</a></li>
<li><a href="#profiling-operations">5.14. Profiling Operations on Memory Objects and Kernels</a></li>
<li><a href="#_flush_and_finish">5.15. Flush and Finish</a></li>
</ul>
</li>
<li><a href="#_associated_opencl_specification">6. Associated OpenCL specification</a>
<ul class="sectlevel2">
<li><a href="#spirv-il">6.1. SPIR-V Intermediate language</a></li>
<li><a href="#opencl-extensions">6.2. Extensions to OpenCL</a></li>
<li><a href="#_support_for_earlier_opencl_c_kernel_languages">6.3. Support for earlier OpenCL C kernel languages</a></li>
</ul>
</li>
<li><a href="#opencl-embedded-profile">7. OpenCL Embedded Profile</a></li>
<li><a href="#_shared_objects_thread_safety">Appendix A: Shared Objects, Thread Safety</a>
<ul class="sectlevel2">
<li><a href="#shared-opencl-objects">Shared OpenCL Objects</a></li>
<li><a href="#_multiple_host_threads">Multiple Host Threads</a></li>
</ul>
</li>
<li><a href="#_portability">Appendix B: Portability</a></li>
<li><a href="#data-types">Appendix C: Application Data Types</a>
<ul class="sectlevel2">
<li><a href="#scalar-data-types">Shared Application Scalar Data Types</a></li>
<li><a href="#vector-data-types">Supported Application Vector Data Types</a></li>
<li><a href="#alignment-app-data-types">Alignment of Application Data Types</a></li>
<li><a href="#_vector_literals">Vector Literals</a></li>
<li><a href="#vector-components">Vector Components</a></li>
<li><a href="#_implicit_conversions">Implicit Conversions</a></li>
<li><a href="#_explicit_casts">Explicit Casts</a></li>
<li><a href="#_other_operators_and_functions">Other operators and functions</a></li>
<li><a href="#_application_constant_definitions">Application constant definitions</a></li>
</ul>
</li>
<li><a href="#check-copy-overlap">Appendix D: CL_MEM_COPY_OVERLAP</a></li>
<li><a href="#_changes">Appendix E: Changes</a>
<ul class="sectlevel2">
<li><a href="#_summary_of_changes_from_opencl_1_0">Summary of changes from OpenCL 1.0</a></li>
<li><a href="#_summary_of_changes_from_opencl_1_1">Summary of changes from OpenCL 1.1</a></li>
<li><a href="#_summary_of_changes_from_opencl_1_2">Summary of changes from OpenCL 1.2</a></li>
<li><a href="#_summary_of_changes_from_opencl_2_0">Summary of changes from OpenCL 2.0</a></li>
<li><a href="#_summary_of_changes_from_opencl_2_1">Summary of changes from OpenCL 2.1</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div style="page-break-after: always;"></div>
<div class="paragraph">
<p>Copyright 2008-2018 The Khronos Group.</p>
</div>
<div class="paragraph">
<p>This specification is protected by copyright laws and contains material proprietary
to the Khronos Group, Inc. Except as described by these terms, it or any components
may not be reproduced, republished, distributed, transmitted, displayed, broadcast
or otherwise exploited in any manner without the express prior written permission
of Khronos Group.</p>
</div>
<div class="paragraph">
<p>Khronos Group grants a conditional copyright license to use and reproduce the
unmodified specification for any purpose, without fee or royalty, EXCEPT no licenses
to any patent, trademark or other intellectual property rights are granted under
these terms. Parties desiring to implement the specification and make use of
Khronos trademarks in relation to that implementation, and receive reciprocal patent
license protection under the Khronos IP Policy must become Adopters and confirm the
implementation as conformant under the process defined by Khronos for this
specification; see <a href="https://www.khronos.org/adopters" class="bare">https://www.khronos.org/adopters</a>.</p>
</div>
<div class="paragraph">
<p>Khronos Group makes no, and expressly disclaims any, representations or warranties,
express or implied, regarding this specification, including, without limitation:
merchantability, fitness for a particular purpose, non-infringement of any
intellectual property, correctness, accuracy, completeness, timeliness, and
reliability. Under no circumstances will the Khronos Group, or any of its Promoters,
Contributors or Members, or their respective partners, officers, directors,
employees, agents or representatives be liable for any damages, whether direct,
indirect, special or consequential damages for lost revenues, lost profits, or
otherwise, arising from or in connection with these materials.</p>
</div>
<div class="paragraph">
<p>Vulkan is a registered trademark and Khronos, OpenXR, SPIR, SPIR-V, SYCL, WebGL,
WebCL, OpenVX, OpenVG, EGL, COLLADA, glTF, NNEF, OpenKODE, OpenKCAM, StreamInput,
OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL, OpenMAX DL, OpenML and DevU are
trademarks of the Khronos Group Inc. ASTC is a trademark of ARM Holdings PLC,
OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks
and the OpenGL ES and OpenGL SC logos are trademarks of Silicon Graphics
International used under license by Khronos. All other product names, trademarks,
and/or company names are used solely for identification and belong to their
respective owners.</p>
</div>
<div style="page-break-after: always;"></div>
<div class="paragraph">
<p><strong>Acknowledgements</strong></p>
</div>
<div class="paragraph">
<p>The OpenCL specification is the result of the contributions of many people,
representing a cross section of the desktop, hand-held, and embedded
computer industry.
Following is a partial list of the contributors, including the company that
they represented at the time of their contribution:</p>
</div>
<div class="paragraph">
<p>Chuck Rose, Adobe<br>
Eric Berdahl, Adobe<br>
Shivani Gupta, Adobe<br>
Bill Licea Kane, AMD<br>
Ed Buckingham, AMD<br>
Jan Civlin, AMD<br>
Laurent Morichetti, AMD<br>
Mark Fowler, AMD<br>
Marty Johnson, AMD<br>
Michael Mantor, AMD<br>
Norm Rubin, AMD<br>
Ofer Rosenberg, AMD<br>
Brian Sumner, AMD<br>
Victor Odintsov, AMD<br>
Aaftab Munshi, Apple<br>
Abe Stephens, Apple<br>
Alexandre Namaan, Apple<br>
Anna Tikhonova, Apple<br>
Chendi Zhang, Apple<br>
Eric Bainville, Apple<br>
David Hayward, Apple<br>
Giridhar Murthy, Apple<br>
Ian Ollmann, Apple<br>
Inam Rahman, Apple<br>
James Shearer, Apple<br>
MonPing Wang, Apple<br>
Tanya Lattner, Apple<br>
Mikael Bourges-Sevenier, Aptina<br>
Anton Lokhmotov, ARM<br>
Dave Shreiner, ARM<br>
Hedley Francis, ARM<br>
Robert Elliott, ARM<br>
Scott Moyers, ARM<br>
Tom Olson, ARM<br>
Anastasia Stulova, ARM<br>
Christopher Thompson-Walsh, Broadcom<br>
Holger Waechtler, Broadcom<br>
Norman Rink, Broadcom<br>
Andrew Richards, Codeplay<br>
Maria Rovatsou, Codeplay<br>
Alistair Donaldson, Codeplay<br>
Alastair Murray, Codeplay<br>
Stephen Frye, Electronic Arts<br>
Eric Schenk, Electronic Arts<br>
Daniel Laroche, Freescale<br>
David Neto, Google<br>
Robin Grosman, Huawei<br>
Craig Davies, Huawei<br>
Brian Horton, IBM<br>
Brian Watt, IBM<br>
Gordon Fossum, IBM<br>
Greg Bellows, IBM<br>
Joaquin Madruga, IBM<br>
Mark Nutter, IBM<br>
Mike Perks, IBM<br>
Sean Wagner, IBM<br>
Jon Parr, Imagination Technologies<br>
Robert Quill, Imagination Technologies<br>
James McCarthy, Imagination Technologie<br>
Jon Leech, Independent<br>
Aaron Kunze, Intel<br>
Aaron Lefohn, Intel<br>
Adam Lake, Intel<br>
Alexey Bader, Intel<br>
Allen Hux, Intel<br>
Andrew Brownsword, Intel<br>
Andrew Lauritzen, Intel<br>
Bartosz Sochacki, Intel<br>
Ben Ashbaugh, Intel<br>
Brian Lewis, Intel<br>
Geoff Berry, Intel<br>
Hong Jiang, Intel<br>
Jayanth Rao, Intel<br>
Josh Fryman, Intel<br>
Larry Seiler, Intel<br>
Mike MacPherson, Intel<br>
Murali Sundaresan, Intel<br>
Paul Lalonde, Intel<br>
Raun Krisch, Intel<br>
Stephen Junkins, Intel<br>
Tim Foley, Intel<br>
Timothy Mattson, Intel<br>
Yariv Aridor, Intel<br>
Michael Kinsner, Intel<br>
Kevin Stevens, Intel<br>
Benjamin Bergen, Los Alamos National Laboratory<br>
Roy Ju, Mediatek<br>
Bor-Sung Liang, Mediatek<br>
Rahul Agarwal, Mediatek<br>
Michal Witaszek, Mobica<br>
JenqKuen Lee, NTHU<br>
Amit Rao, NVIDIA<br>
Ashish Srivastava, NVIDIA<br>
Bastiaan Aarts, NVIDIA<br>
Chris Cameron, NVIDIA<br>
Christopher Lamb, NVIDIA<br>
Dibyapran Sanyal, NVIDIA<br>
Guatam Chakrabarti, NVIDIA<br>
Ian Buck, NVIDIA<br>
Jaydeep Marathe, NVIDIA<br>
Jian-Zhong Wang, NVIDIA<br>
Karthik Raghavan Ravi, NVIDIA<br>
Kedar Patil, NVIDIA<br>
Manjunath Kudlur, NVIDIA<br>
Mark Harris, NVIDIA<br>
Michael Gold, NVIDIA<br>
Neil Trevett, NVIDIA<br>
Richard Johnson, NVIDIA<br>
Sean Lee, NVIDIA<br>
Tushar Kashalikar, NVIDIA<br>
Vinod Grover, NVIDIA<br>
Xiangyun Kong, NVIDIA<br>
Yogesh Kini, NVIDIA<br>
Yuan Lin, NVIDIA<br>
Mayuresh Pise, NVIDIA<br>
Allan Tzeng, QUALCOMM<br>
Alex Bourd, QUALCOMM<br>
Anirudh Acharya, QUALCOMM<br>
Andrew Gruber, QUALCOMM<br>
Andrzej Mamona, QUALCOMM<br>
Benedict Gaster, QUALCOMM<br>
Bill Torzewski, QUALCOMM<br>
Bob Rychlik, QUALCOMM<br>
Chihong Zhang, QUALCOMM<br>
Chris Mei, QUALCOMM<br>
Colin Sharp, QUALCOMM<br>
David Garcia, QUALCOMM<br>
David Ligon, QUALCOMM<br>
Jay Yun, QUALCOMM<br>
Lee Howes, QUALCOMM<br>
Richard Ruigrok, QUALCOMM<br>
Robert J. Simpson, QUALCOMM<br>
Sumesh Udayakumaran, QUALCOMM<br>
Vineet Goel, QUALCOMM<br>
Lihan Bin, QUALCOMM<br>
Vlad Shimanskiy, QUALCOMM<br>
Jian Liu, QUALCOMM<br>
Tasneem Brutch, Samsung<br>
Yoonseo Choi, Samsung<br>
Dennis Adams, Sony<br>
Pr-Anders Aronsson, Sony<br>
Jim Rasmusson, Sony<br>
Thierry Lepley, STMicroelectronics<br>
Anton Gorenko, StreamHPC<br>
Jakub Szuppe, StreamHPC<br>
Vincent Hindriksen, StreamHPC<br>
Alan Ward, Texas Instruments<br>
Yuan Zhao, Texas Instruments<br>
Pete Curry, Texas Instruments<br>
Simon McIntosh-Smith, University of Bristol<br>
James Price, University of Bristol<br>
Paul Preney, University of Windsor<br>
Shane Peelar, University of Windsor<br>
Brian Hutsell, Vivante<br>
Mike Cai, Vivante<br>
Sumeet Kumar, Vivante<br>
Wei-Lun Kao, Vivante<br>
Xing Wang, Vivante<br>
Jeff Fifield, Xilinx<br>
Hem C. Neema, Xilinx<br>
Henry Styles, Xilinx<br>
Ralph Wittig, Xilinx<br>
Ronan Keryell, Xilinx<br>
AJ Guillon, YetiWare Inc<br></p>
</div>
<div style="page-break-after: always;"></div>
</div>
</div>
<div class="sect1">
<h2 id="_introduction">1. Introduction</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Modern processor architectures have embraced parallelism as an important
pathway to increased performance.
Facing technical challenges with higher clock speeds in a fixed power
envelope, Central Processing Units (CPUs) now improve performance by adding
multiple cores.
Graphics Processing Units (GPUs) have also evolved from fixed function
rendering devices into programmable parallel processors.
As todays computer systems often include highly parallel CPUs, GPUs and
other types of processors, it is important to enable software developers to
take full advantage of these heterogeneous processing platforms.</p>
</div>
<div class="paragraph">
<p>Creating applications for heterogeneous parallel processing platforms is
challenging as traditional programming approaches for multi-core CPUs and
GPUs are very different.
CPU-based parallel programming models are typically based on standards but
usually assume a shared address space and do not encompass vector
operations.
General purpose GPU programming models address complex memory hierarchies
and vector operations but are traditionally platform-, vendor- or
hardware-specific.
These limitations make it difficult for a developer to access the compute
power of heterogeneous CPUs, GPUs and other types of processors from a
single, multi-platform source code base.
More than ever, there is a need to enable software developers to effectively
take full advantage of heterogeneous processing platforms from high
performance compute servers, through desktop computer systems to handheld
devices - that include a diverse mix of parallel CPUs, GPUs and other
processors such as DSPs and the Cell/B.E.
processor.</p>
</div>
<div class="paragraph">
<p><strong>OpenCL</strong> (Open Computing Language) is an open royalty-free standard for
general purpose parallel programming across CPUs, GPUs and other processors,
giving software developers portable and efficient access to the power of
these heterogeneous processing platforms.</p>
</div>
<div class="paragraph">
<p>OpenCL supports a wide range of applications, ranging from embedded and
consumer software to HPC solutions, through a low-level, high-performance,
portable abstraction.
By creating an efficient, close-to-the-metal programming interface, OpenCL
will form the foundation layer of a parallel computing ecosystem of
platform-independent tools, middleware and applications.
OpenCL is particularly suited to play an increasingly significant role in
emerging interactive graphics applications that combine general parallel
compute algorithms with graphics rendering pipelines.</p>
</div>
<div class="paragraph">
<p>OpenCL consists of an API for coordinating parallel computation across
heterogeneous processors; and a cross-platform intermediate language with a
well-specified computation environment.
The OpenCL standard:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Supports both data- and task-based parallel programming models</p>
</li>
<li>
<p>Utilizes a portable and self-contained intermediate representation with
support for parallel execution</p>
</li>
<li>
<p>Defines consistent numerical requirements based on IEEE 754</p>
</li>
<li>
<p>Defines a configuration profile for handheld and embedded devices</p>
</li>
<li>
<p>Efficiently interoperates with OpenGL, OpenGL ES and other graphics APIs</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>This document begins with an overview of basic concepts and the architecture
of OpenCL, followed by a detailed description of its execution model, memory
model and synchronization support.
It then discusses the OpenCL platform and runtime API.
Some examples are given that describe sample compute use-cases and how they
would be written in OpenCL.
The specification is divided into a core specification that any OpenCL
compliant implementation must support; a handheld/embedded profile which
relaxes the OpenCL compliance requirements for handheld and embedded
devices; and a set of optional extensions that are likely to move into the
core specification in later revisions of the OpenCL specification.</p>
</div>
<div class="sect2">
<h3 id="_normative_references">1.1. Normative References</h3>
<div class="paragraph">
<p>Normative references are references to external documents or resources to
which implementers of OpenCL must comply with all, or specified portions of,
as described in this specification.</p>
</div>
<div id="iso-c11" class="paragraph">
<p><em>ISO/IEC 9899:2011 - Information technology - Programming languages - C</em>,
<a href="https://www.iso.org/standard/57853.html" class="bare">https://www.iso.org/standard/57853.html</a> (final specification),
<a href="http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf" class="bare">http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf</a> (last public
draft).</p>
</div>
</div>
<div class="sect2">
<h3 id="_version_numbers">1.2. Version Numbers</h3>
<div class="paragraph">
<p>The OpenCL version number follows a <em>major.minor-revision</em> scheme. When this
version number is used within the API it generally only includes the
<em>major.minor</em> components of the version number.</p>
</div>
<div class="paragraph">
<p>A difference in the <em>major</em> or <em>minor</em> version number indicates that some
amount of new functionality has been added to the specification, and may also
include behavior changes and bug fixes.
Functionality may also be deprecated or removed when the <em>major</em> or <em>minor</em>
version changes.</p>
</div>
<div class="paragraph">
<p>A difference in the <em>revision</em> number indicates small changes to the
specification, typically to fix a bug or to clarify language.
When the <em>revision</em> number changes there may be an impact on the behavior of
existing functionality, but this should not affect backwards compatibility.
Functionality should not be added or removed when the <em>revision</em> number
changes.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_glossary">2. Glossary</h2>
<div class="sectionbody">
<div class="dlist">
<dl>
<dt class="hdlist1">Application </dt>
<dd>
<p>The combination of the program running on the host and OpenCL devices.</p>
</dd>
<dt class="hdlist1">Acquire semantics </dt>
<dd>
<p>One of the memory order semantics defined for synchronization
operations.
Acquire semantics apply to atomic operations that load from memory.
Given two units of execution, <strong>A</strong> and <strong>B</strong>, acting on a shared atomic
object <strong>M</strong>, if <strong>A</strong> uses an atomic load of <strong>M</strong> with acquire semantics to
synchronize-with an atomic store to <strong>M</strong> by <strong>B</strong> that used release
semantics, then <strong>A</strong>'s atomic load will occur before any subsequent
operations by <strong>A</strong>.
Note that the memory orders <em>release</em>, <em>sequentially consistent</em>, and
<em>acquire_release</em> all include <em>release semantics</em> and effectively pair
with a load using acquire semantics.</p>
</dd>
<dt class="hdlist1">Acquire release semantics </dt>
<dd>
<p>A memory order semantics for synchronization operations (such as atomic
operations) that has the properties of both acquire and release memory
orders.
It is used with read-modify-write operations.</p>
</dd>
<dt class="hdlist1">Atomic operations </dt>
<dd>
<p>Operations that at any point, and from any perspective, have either
occurred completely, or not at all.
Memory orders associated with atomic operations may constrain the
visibility of loads and stores with respect to the atomic operations
(see <em>relaxed semantics</em>, <em>acquire semantics</em>, <em>release semantics</em> or
<em>acquire release semantics</em>).</p>
</dd>
<dt class="hdlist1">Blocking and Non-Blocking Enqueue API calls </dt>
<dd>
<p>A <em>non-blocking enqueue API call</em> places a <em>command</em> on a
<em>command-queue</em> and returns immediately to the host.
The <em>blocking-mode enqueue API calls</em> do not return to the host until
the command has completed.</p>
</dd>
<dt class="hdlist1">Barrier </dt>
<dd>
<p>There are three types of <em>barriers</em> a command-queue barrier, a
work-group barrier and a sub-group barrier.</p>
<div class="openblock">
<div class="content">
<div class="ulist">
<ul>
<li>
<p>The OpenCL API provides a function to enqueue a <em>command-queue</em>
<em>barrier</em> command.
This <em>barrier</em> command ensures that all previously enqueued commands to
a command-queue have finished execution before any following <em>commands</em>
enqueued in the <em>command-queue</em> can begin execution.</p>
</li>
<li>
<p>The OpenCL kernel execution model provides built-in <em>work-group barrier</em>
functionality.
This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on
a <em>device</em> to perform synchronization between <em>work-items</em> in a
<em>work-group</em> executing the <em>kernel</em>.
All the <em>work-items</em> of a <em>work-group</em> must execute the <em>barrier</em>
construct before any are allowed to continue execution beyond the
<em>barrier</em>.</p>
</li>
<li>
<p>The OpenCL kernel execution model provides built-in <em>sub-group barrier</em>
functionality.
This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on
a <em>device</em> to perform synchronization between <em>work-items</em> in a
<em>sub-group</em> executing the <em>kernel</em>.
All the <em>work-items</em> of a <em>sub-group</em> must execute the <em>barrier</em>
construct before any are allowed to continue execution beyond the
<em>barrier</em>.</p>
</li>
</ul>
</div>
</div>
</div>
</dd>
<dt class="hdlist1">Buffer Object </dt>
<dd>
<p>A memory object that stores a linear collection of bytes.
Buffer objects are accessible using a pointer in a <em>kernel</em> executing on
a <em>device</em>.
Buffer objects can be manipulated by the host using OpenCL API calls.
A <em>buffer object</em> encapsulates the following information:</p>
<div class="openblock">
<div class="content">
<div class="ulist">
<ul>
<li>
<p>Size in bytes.</p>
</li>
<li>
<p>Properties that describe usage information and which region to allocate
from.</p>
</li>
<li>
<p>Buffer data.</p>
</li>
</ul>
</div>
</div>
</div>
</dd>
<dt class="hdlist1">Built-in Kernel </dt>
<dd>
<p>A <em>built-in kernel</em> is a <em>kernel</em> that is executed on an OpenCL <em>device</em>
or <em>custom device</em> by fixed-function hardware or in firmware.
<em>Applications</em> can query the <em>built-in kernels</em> supported by a <em>device</em>
or <em>custom device</em>.
A <em>program object</em> can only contain <em>kernels</em> written in OpenCL C or
<em>built-in kernels</em> but not both.
See also <em>Kernel</em> and <em>Program</em>.</p>
</dd>
<dt class="hdlist1">Child kernel </dt>
<dd>
<p>See <em>Device-side enqueue</em>.</p>
</dd>
<dt class="hdlist1">Command </dt>
<dd>
<p>The OpenCL operations that are submitted to a <em>command-queue</em> for
execution.
For example, OpenCL commands issue kernels for execution on a compute
device, manipulate memory objects, etc.</p>
</dd>
<dt class="hdlist1">Command-queue </dt>
<dd>
<p>An object that holds <em>commands</em> that will be executed on a specific
<em>device</em>.
The <em>command-queue</em> is created on a specific <em>device</em> in a <em>context</em>.
<em>Commands</em> to a <em>command-queue</em> are queued in-order but may be executed
in-order or out-of-order.
<em>Refer to In-order Execution_and_Out-of-order Execution</em>.</p>
</dd>
<dt class="hdlist1">Command-queue Barrier </dt>
<dd>
<p>See <em>Barrier</em>.</p>
</dd>
<dt class="hdlist1">Command synchronization </dt>
<dd>
<p>Constraints on the order that commands are launched for execution on a
device defined in terms of the synchronization points that occur between
commands in host command-queues and between commands in device-side
command-queues.
See <em>synchronization points</em>.</p>
</dd>
<dt class="hdlist1">Complete </dt>
<dd>
<p>The final state in the six state model for the execution of a command.
The transition into this state occurs is signaled through event objects
or callback functions associated with a command.</p>
</dd>
<dt class="hdlist1">Compute Device Memory </dt>
<dd>
<p>This refers to one or more memories attached to the compute device.</p>
</dd>
<dt class="hdlist1">Compute Unit </dt>
<dd>
<p>An OpenCL <em>device</em> has one or more <em>compute units</em>.
A <em>work-group</em> executes on a single <em>compute unit</em>.
A <em>compute unit</em> is composed of one or more <em>processing elements</em> and
<em>local memory</em>.
A <em>compute unit</em> may also include dedicated texture filter units that
can be accessed by its processing elements.</p>
</dd>
<dt class="hdlist1">Concurrency </dt>
<dd>
<p>A property of a system in which a set of tasks in a system can remain
active and make progress at the same time.
To utilize concurrent execution when running a program, a programmer
must identify the concurrency in their problem, expose it within the
source code, and then exploit it using a notation that supports
concurrency.</p>
</dd>
<dt class="hdlist1">Constant Memory </dt>
<dd>
<p>A region of <em>global memory</em> that remains constant during the execution
of a <em>kernel</em>.
The <em>host</em> allocates and initializes memory objects placed into
<em>constant memory</em>.</p>
</dd>
<dt class="hdlist1">Context </dt>
<dd>
<p>The environment within which the kernels execute and the domain in which
synchronization and memory management is defined.
The <em>context</em> includes a set of <em>devices</em>, the memory accessible to
those <em>devices</em>, the corresponding memory properties and one or more
<em>command-queues</em> used to schedule execution of a <em>kernel(s)</em> or
operations on <em>memory objects</em>.</p>
</dd>
<dt class="hdlist1">Control flow </dt>
<dd>
<p>The flow of instructions executed by a work-item.
Multiple logically related work items may or may not execute the same
control flow.
The control flow is said to be <em>converged</em> if all the work-items in the
set execution the same stream of instructions.
In a <em>diverged</em> control flow, the work-items in the set execute
different instructions.
At a later point, if a diverged control flow becomes converged, it is
said to be a re-converged control flow.</p>
</dd>
<dt class="hdlist1">Converged control flow </dt>
<dd>
<p>See <em>Control flow</em>.</p>
</dd>
<dt class="hdlist1">Custom Device </dt>
<dd>
<p>An OpenCL <em>device</em> that fully implements the OpenCL Runtime but does not
support <em>programs</em> written in OpenCL C.
A custom device may be specialized non-programmable hardware that is
very power efficient and performant for directed tasks or hardware with
limited programmable capabilities such as specialized DSPs.
Custom devices are not OpenCL conformant.
Custom devices may support an online compiler.
Programs for custom devices can be created using the OpenCL runtime APIs
that allow OpenCL programs to be created from source (if an online
compiler is supported) and/or binary, or from <em>built-in kernels</em>
supported by the <em>device</em>.
See also <em>Device</em>.</p>
</dd>
<dt class="hdlist1">Data Parallel Programming Model </dt>
<dd>
<p>Traditionally, this term refers to a programming model where concurrency
is expressed as instructions from a single program applied to multiple
elements within a set of data structures.
The term has been generalized in OpenCL to refer to a model wherein a
set of instructions from a single program are applied concurrently to
each point within an abstract domain of indices.</p>
</dd>
<dt class="hdlist1">Data race </dt>
<dd>
<p>The execution of a program contains a data race if it contains two
actions in different work items or host threads where (1) one action
modifies a memory location and the other action reads or modifies the
same memory location, and (2) at least one of these actions is not
atomic, or the corresponding memory scopes are not inclusive, and (3)
the actions are global actions unordered by the global-happens-before
relation or are local actions unordered by the local-happens before
relation.</p>
</dd>
<dt class="hdlist1">Deprecation </dt>
<dd>
<p>Existing features are marked as deprecated if their usage is not
recommended as that feature is being de-emphasized, superseded and may
be removed from a future version of the specification.</p>
</dd>
<dt class="hdlist1">Device </dt>
<dd>
<p>A <em>device</em> is a collection of <em>compute units</em>.
A <em>command-queue</em> is used to queue <em>commands</em> to a <em>device</em>.
Examples of <em>commands</em> include executing <em>kernels</em>, or reading and
writing <em>memory objects</em>.
OpenCL devices typically correspond to a GPU, a multi-core CPU, and
other processors such as DSPs and the Cell/B.E.
processor.</p>
</dd>
<dt class="hdlist1">Device-side enqueue </dt>
<dd>
<p>A mechanism whereby a kernel-instance is enqueued by a kernel-instance
running on a device without direct involvement by the host program.
This produces <em>nested parallelism</em>; i.e. additional levels of
concurrency are nested inside a running kernel-instance.
The kernel-instance executing on a device (the <em>parent kernel</em>) enqueues
a kernel-instance (the <em>child kernel</em>) to a device-side command queue.
Child and parent kernels execute asynchronously though a parent kernel
does not complete until all of its child-kernels have completed.</p>
</dd>
<dt class="hdlist1">Diverged control flow </dt>
<dd>
<p>See <em>Control flow</em>.</p>
</dd>
<dt class="hdlist1">Ended </dt>
<dd>
<p>The fifth state in the six state model for the execution of a command.
The transition into this state occurs when execution of a command has
ended.
When a Kernel-enqueue command ends, all of the work-groups associated
with that command have finished their execution.</p>
</dd>
<dt class="hdlist1">Event Object </dt>
<dd>
<p>An <em>event object</em> encapsulates the status of an operation such as a
<em>command</em>.
It can be used to synchronize operations in a context.</p>
</dd>
<dt class="hdlist1">Event Wait List </dt>
<dd>
<p>An <em>event wait list</em> is a list of <em>event objects</em> that can be used to
control when a particular <em>command</em> begins execution.</p>
</dd>
<dt class="hdlist1">Fence </dt>
<dd>
<p>A memory ordering operation without an associated atomic object.
A fence can use the <em>acquire semantics, release semantics</em>, or <em>acquire
release semantics</em>.</p>
</dd>
<dt class="hdlist1">Framework </dt>
<dd>
<p>A software system that contains the set of components to support
software development and execution.
A <em>framework</em> typically includes libraries, APIs, runtime systems,
compilers, etc.</p>
</dd>
<dt class="hdlist1">Generic address space </dt>
<dd>
<p>An address space that include the <em>private</em>, <em>local</em>, and <em>global</em>
address spaces available to a device.
The generic address space supports conversion of pointers to and from
private, local and global address spaces, and hence lets a programmer
write a single function that at compile time can take arguments from any
of the three named address spaces.</p>
</dd>
<dt class="hdlist1">Global Happens before </dt>
<dd>
<p>See <em>Happens before</em>.</p>
</dd>
<dt class="hdlist1">Global ID </dt>
<dd>
<p>A <em>global ID</em> is used to uniquely identify a <em>work-item</em> and is derived
from the number of <em>global work-items</em> specified when executing a
<em>kernel</em>.
The <em>global ID</em> is a N-dimensional value that starts at (0, 0, &#8230;&#8203; 0).
See also <em>Local ID</em>.</p>
</dd>
<dt class="hdlist1">Global Memory </dt>
<dd>
<p>A memory region accessible to all <em>work-items</em> executing in a <em>context</em>.
It is accessible to the <em>host</em> using <em>commands</em> such as read, write and
map.
<em>Global memory</em> is included within the <em>generic address space</em> that
includes the private and local address spaces.</p>
</dd>
<dt class="hdlist1">GL share group </dt>
<dd>
<p>A <em>GL share group</em> object manages shared OpenGL or OpenGL ES resources
such as textures, buffers, framebuffers, and renderbuffers and is
associated with one or more GL context objects.
The <em>GL share group</em> is typically an opaque object and not directly
accessible.</p>
</dd>
<dt class="hdlist1">Handle </dt>
<dd>
<p>An opaque type that references an <em>object</em> allocated by OpenCL.
Any operation on an <em>object</em> occurs by reference to that objects handle.</p>
</dd>
<dt class="hdlist1">Happens before </dt>
<dd>
<p>An ordering relationship between operations that execute on multiple
units of execution.
If an operation A happens-before operation B then A must occur before B;
in particular, any value written by A will be visible to B.
We define two separate happens before relations: <em>global-happens-before</em>
and <em>local-happens-before</em>.
These are defined in <a href="#memory-ordering-rules">Memory Model: Memory
Ordering Rules</a>.</p>
</dd>
<dt class="hdlist1">Host </dt>
<dd>
<p>The <em>host</em> interacts with the <em>context</em> using the OpenCL API.</p>
</dd>
<dt class="hdlist1">Host-thread </dt>
<dd>
<p>The unit of execution that executes the statements in the host program.</p>
</dd>
<dt class="hdlist1">Host pointer </dt>
<dd>
<p>A pointer to memory that is in the virtual address space on the <em>host</em>.</p>
</dd>
<dt class="hdlist1">Illegal </dt>
<dd>
<p>Behavior of a system that is explicitly not allowed and will be reported
as an error when encountered by OpenCL.</p>
</dd>
<dt class="hdlist1">Image Object </dt>
<dd>
<p>A <em>memory object</em> that stores a two- or three-dimensional structured
array.
Image data can only be accessed with read and write functions.
The read functions use a <em>sampler</em>.</p>
<div class="openblock">
<div class="content">
<div class="paragraph">
<p>The <em>image object</em> encapsulates the following information:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Dimensions of the image.</p>
</li>
<li>
<p>Description of each element in the image.</p>
</li>
<li>
<p>Properties that describe usage information and which region to allocate
from.</p>
</li>
<li>
<p>Image data.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The elements of an image are selected from a list of predefined image
formats.</p>
</div>
</div>
</div>
</dd>
<dt class="hdlist1">Implementation Defined </dt>
<dd>
<p>Behavior that is explicitly allowed to vary between conforming
implementations of OpenCL.
An OpenCL implementor is required to document the implementation-defined
behavior.</p>
</dd>
<dt class="hdlist1">Independent Forward Progress </dt>
<dd>
<p>If an entity supports independent forward progress, then if it is
otherwise not dependent on any actions due to be performed by any other
entity (for example it does not wait on a lock held by, and thus that
must be released by, any other entity), then its execution cannot be
blocked by the execution of any other entity in the system (it will not
be starved).
Work items in a subgroup, for example, typically do not support
independent forward progress, so one work item in a subgroup may be
completely blocked (starved) if a different work item in the same
subgroup enters a spin loop.</p>
</dd>
<dt class="hdlist1">In-order Execution </dt>
<dd>
<p>A model of execution in OpenCL where the <em>commands</em> in a <em>command-queue</em>
are executed in order of submission with each <em>command</em> running to
completion before the next one begins.
See Out-of-order Execution.</p>
</dd>
<dt class="hdlist1">Intermediate Language </dt>
<dd>
<p>A lower-level language that may be used to create programs.
SPIR-V is a required IL for OpenCL 2.2 runtimes.
Additional ILs may be accepted on an implementation-defined basis.</p>
</dd>
<dt class="hdlist1">Kernel </dt>
<dd>
<p>A <em>kernel</em> is a function declared in a <em>program</em> and executed on an
OpenCL <em>device</em>.
A <em>kernel</em> is identified by the kernel or kernel qualifier applied to
any function defined in a <em>program</em>.</p>
</dd>
<dt class="hdlist1">Kernel-instance </dt>
<dd>
<p>The work carried out by an OpenCL program occurs through the execution
of kernel-instances on devices.
The kernel instance is the <em>kernel object</em>, the values associated with
the arguments to the kernel, and the parameters that define the
<em>NDRange</em> index space.</p>
</dd>
<dt class="hdlist1">Kernel Object </dt>
<dd>
<p>A <em>kernel object</em> encapsulates a specific <code>__kernel</code> function declared
in a <em>program</em> and the argument values to be used when executing this
<code>__kernel</code> function.</p>
</dd>
<dt class="hdlist1">Kernel Language </dt>
<dd>
<p>A language that is used to create source code for kernel.
Supported kernel languages include OpenCL C, OpenCL C++, and OpenCL
dialect of SPIR-V.</p>
</dd>
<dt class="hdlist1">Launch </dt>
<dd>
<p>The transition of a command from the <em>submitted</em> state to the <em>ready</em>
state.
See <em>Ready</em>.</p>
</dd>
<dt class="hdlist1">Local ID </dt>
<dd>
<p>A <em>local ID</em> specifies a unique <em>work-item ID</em> within a given
<em>work-group</em> that is executing a <em>kernel</em>.
The <em>local ID</em> is a N-dimensional value that starts at (0, 0, &#8230;&#8203; 0).
See also <em>Global ID</em>.</p>
</dd>
<dt class="hdlist1">Local Memory </dt>
<dd>
<p>A memory region associated with a <em>work-group</em> and accessible only by
<em>work-items</em> in that <em>work-group</em>.
<em>Local memory</em> is included within the <em>generic address space</em> that
includes the private and global address spaces.</p>
</dd>
<dt class="hdlist1">Marker </dt>
<dd>
<p>A <em>command</em> queued in a <em>command-queue</em> that can be used to tag all
<em>commands</em> queued before the <em>marker</em> in the <em>command-queue</em>.
The <em>marker</em> command returns an <em>event</em> which can be used by the
<em>application</em> to queue a wait on the marker event i.e. wait for all
commands queued before the <em>marker</em> command to complete.</p>
</dd>
<dt class="hdlist1">Memory Consistency Model </dt>
<dd>
<p>Rules that define which values are observed when multiple units of
execution load data from any shared memory plus the synchronization
operations that constrain the order of memory operations and define
synchronization relationships.
The memory consistency model in OpenCL is based on the memory model from
the ISO C11 programming language.</p>
</dd>
<dt class="hdlist1">Memory Objects </dt>
<dd>
<p>A <em>memory object</em> is a handle to a reference counted region of <em>Global
Memory</em>.
Also see <em>Buffer Object</em> and <em>Image Object</em>.</p>
</dd>
<dt class="hdlist1">Memory Regions (or Pools) </dt>
<dd>
<p>A distinct address space in OpenCL.
<em>Memory regions</em> may overlap in physical memory though OpenCL will treat
them as logically distinct.
The <em>memory regions</em> are denoted as <em>private</em>, <em>local</em>, <em>constant,</em> and
<em>global</em>.</p>
</dd>
<dt class="hdlist1">Memory Scopes </dt>
<dd>
<p>These memory scopes define a hierarchy of visibilities when analyzing
the ordering constraints of memory operations.
They are defined by the values of the <strong>memory_scope</strong> enumeration
constant.
Current values are <strong>memory_scope_work_item</strong> (memory constraints only
apply to a single work-item and in practice apply only to image
operations), <strong>memory_scope_sub_group</strong> (memory-ordering constraints only
apply to work-items executing in a sub-group), <strong>memory_scope_work_group</strong>
(memory-ordering constraints only apply to work-items executing in a
work-group), <strong>memory_scope_device</strong> (memory-ordering constraints only
apply to work-items executing on a single device) and
<strong>memory_scope_all_svm_devices</strong> (memory-ordering constraints only apply
to work-items executing across multiple devices and when using shared
virtual memory).</p>
</dd>
<dt class="hdlist1">Modification Order </dt>
<dd>
<p>All modifications to a particular atomic object M occur in some
particular <em>total order</em>, called the <em>modification order</em> of M.
If A and B are modifications of an atomic object M, and A happens-before
B, then A shall precede B in the modification order of M.
Note that the modification order of an atomic object M is independent of
whether M is in local or global memory.</p>
</dd>
<dt class="hdlist1">Nested Parallelism </dt>
<dd>
<p>See <em>device-side enqueue</em>.</p>
</dd>
<dt class="hdlist1">Object </dt>
<dd>
<p>Objects are abstract representation of the resources that can be
manipulated by the OpenCL API.
Examples include <em>program objects</em>, <em>kernel objects</em>, and <em>memory
objects</em>.</p>
</dd>
<dt class="hdlist1">Out-of-Order Execution </dt>
<dd>
<p>A model of execution in which <em>commands</em> placed in the <em>work queue</em> may
begin and complete execution in any order consistent with constraints
imposed by <em>event wait lists_and_command-queue barrier</em>.
See <em>In-order Execution</em>.</p>
</dd>
<dt class="hdlist1">Parent device </dt>
<dd>
<p>The OpenCL <em>device</em> which is partitioned to create <em>sub-devices</em>.
Not all <em>parent devices</em> are <em>root devices</em>.
A <em>root device</em> might be partitioned and the <em>sub-devices</em> partitioned
again.
In this case, the first set of <em>sub-devices</em> would be <em>parent devices</em>
of the second set, but not the <em>root devices</em>.
Also see <em>Device</em>, <em>parent device</em> and <em>root device</em>.</p>
</dd>
<dt class="hdlist1">Parent kernel </dt>
<dd>
<p>see <em>Device-side enqueue</em>.</p>
</dd>
<dt class="hdlist1">Pipe </dt>
<dd>
<p>The <em>pipe</em> memory object conceptually is an ordered sequence of data
items.
A pipe has two endpoints: a write endpoint into which data items are
inserted, and a read endpoint from which data items are removed.
At any one time, only one kernel instance may write into a pipe, and
only one kernel instance may read from a pipe.
To support the producer consumer design pattern, one kernel instance
connects to the write endpoint (the producer) while another kernel
instance connects to the reading endpoint (the consumer).</p>
</dd>
<dt class="hdlist1">Platform </dt>
<dd>
<p>The <em>host</em> plus a collection of <em>devices</em> managed by the OpenCL
<em>framework</em> that allow an application to share <em>resources</em> and execute
<em>kernels</em> on <em>devices</em> in the <em>platform</em>.</p>
</dd>
<dt class="hdlist1">Private Memory </dt>
<dd>
<p>A region of memory private to a <em>work-item</em>.
Variables defined in one <em>work-items</em> <em>private memory</em> are not visible
to another <em>work-item</em>.</p>
</dd>
<dt class="hdlist1">Processing Element </dt>
<dd>
<p>A virtual scalar processor.
A work-item may execute on one or more processing elements.</p>
</dd>
<dt class="hdlist1">Program </dt>
<dd>
<p>An OpenCL <em>program</em> consists of a set of <em>kernels</em>.
<em>Programs</em> may also contain auxiliary functions called by the
<code>__kernel</code> functions and constant data.</p>
</dd>
<dt class="hdlist1">Program Object </dt>
<dd>
<p>A <em>program object</em> encapsulates the following information:</p>
<div class="openblock">
<div class="content">
<div class="ulist">
<ul>
<li>
<p>A reference to an associated <em>context</em>.</p>
</li>
<li>
<p>A <em>program</em> source or binary.</p>
</li>
<li>
<p>The latest successfully built program executable, the list of <em>devices</em>
for which the program executable is built, the build options used and a
build log.</p>
</li>
<li>
<p>The number of <em>kernel objects</em> currently attached.</p>
</li>
</ul>
</div>
</div>
</div>
</dd>
<dt class="hdlist1">Queued </dt>
<dd>
<p>The first state in the six state model for the execution of a command.
The transition into this state occurs when the command is enqueued into
a command-queue.</p>
</dd>
<dt class="hdlist1">Ready </dt>
<dd>
<p>The third state in the six state model for the execution of a command.
The transition into this state occurs when pre-requisites constraining
execution of a command have been met; i.e. the command has been
launched.
When a kernel-enqueue command is launched, work-groups associated with
the command are placed in a devices work-pool from which they are
scheduled for execution.</p>
</dd>
<dt class="hdlist1">Re-converged Control Flow </dt>
<dd>
<p>see <em>Control flow</em>.</p>
</dd>
<dt class="hdlist1">Reference Count </dt>
<dd>
<p>The life span of an OpenCL object is determined by its <em>reference
count</em>, an internal count of the number of references to the object.
When you create an object in OpenCL, its <em>reference count</em> is set to
one.
Subsequent calls to the appropriate <em>retain</em> API (such as
<strong>clRetainContext</strong>, <strong>clRetainCommandQueue</strong>) increment the <em>reference
count</em>.
Calls to the appropriate <em>release</em> API (such as <strong>clReleaseContext</strong>,
<strong>clReleaseCommandQueue</strong>) decrement the <em>reference count</em>.
Implementations may also modify the <em>reference count</em>, e.g. to track
attached objects or to ensure correct operation of in-progress or
scheduled activities.
The object becomes inaccessible to host code when the number of
<em>release</em> operations performed matches the number of <em>retain</em> operations
plus the allocation of the object.
At this point the reference count may be zero but this is not
guaranteed.</p>
</dd>
<dt class="hdlist1">Relaxed Consistency </dt>
<dd>
<p>A memory consistency model in which the contents of memory visible to
different <em>work-items</em> or <em>commands</em> may be different except at a
<em>barrier</em> or other explicit synchronization points.</p>
</dd>
<dt class="hdlist1">Relaxed Semantics </dt>
<dd>
<p>A memory order semantics for atomic operations that implies no order
constraints.
The operation is <em>atomic</em> but it has no impact on the order of memory
operations.</p>
</dd>
<dt class="hdlist1">Release Semantics </dt>
<dd>
<p>One of the memory order semantics defined for synchronization
operations.
Release semantics apply to atomic operations that store to memory.
Given two units of execution, <strong>A</strong> and <strong>B</strong>, acting on a shared atomic
object <strong>M</strong>, if <strong>A</strong> uses an atomic store of <strong>M</strong> with release semantics to
synchronize-with an atomic load to <strong>M</strong> by <strong>B</strong> that used acquire
semantics, then <strong>A</strong>'s atomic store will occur <em>after</em> any prior
operations by <strong>A</strong>.
Note that the memory orders <em>acquire</em>, <em>sequentialy consistent</em>, and
<em>acquire_release</em> all include <em>acquire semantics</em> and effectively pair
with a store using release semantics.</p>
</dd>
<dt class="hdlist1">Remainder work-groups </dt>
<dd>
<p>When the work-groups associated with a kernel-instance are defined, the
sizes of a work-group in each dimension may not evenly divide the size
of the NDRange in the corresponding dimensions.
The result is a collection of work-groups on the boundaries of the
NDRange that are smaller than the base work-group size.
These are known as <em>remainder work-groups</em>.</p>
</dd>
<dt class="hdlist1">Running </dt>
<dd>
<p>The fourth state in the six state model for the execution of a command.
The transition into this state occurs when the execution of the command
starts.
When a Kernel-enqueue command starts, one or more work-groups associated
with the command start to execute.</p>
</dd>
<dt class="hdlist1">Root device </dt>
<dd>
<p>A <em>root device</em> is an OpenCL <em>device</em> that has not been partitioned.
Also see <em>Device</em>, <em>Parent device</em> and <em>Root device</em>.</p>
</dd>
<dt class="hdlist1">Resource </dt>
<dd>
<p>A class of <em>objects</em> defined by OpenCL.
An instance of a <em>resource</em> is an <em>object</em>.
The most common <em>resources</em> are the <em>context</em>, <em>command-queue</em>, <em>program
objects</em>, <em>kernel objects</em>, and <em>memory objects</em>.
Computational resources are hardware elements that participate in the
action of advancing a program counter.
Examples include the <em>host</em>, <em>devices</em>, <em>compute units</em> and <em>processing
elements</em>.</p>
</dd>
<dt class="hdlist1">Retain, Release </dt>
<dd>
<p>The action of incrementing (retain) and decrementing (release) the
reference count using an OpenCL <em>object</em>.
This is a book keeping functionality to make sure the system doesnt
remove an <em>object</em> before all instances that use this <em>object</em> have
finished.
Refer to <em>Reference Count</em>.</p>
</dd>
<dt class="hdlist1">Sampler </dt>
<dd>
<p>An <em>object</em> that describes how to sample an image when the image is read
in the <em>kernel</em>.
The image read functions take a <em>sampler</em> as an argument.
The <em>sampler</em> specifies the image addressing-mode i.e. how out-of-range
image coordinates are handled, the filter mode, and whether the input
image coordinate is a normalized or unnormalized value.</p>
</dd>
<dt class="hdlist1">Scope inclusion </dt>
<dd>
<p>Two actions <strong>A</strong> and <strong>B</strong> are defined to have an inclusive scope if they
have the same scope <strong>P</strong> such that: (1) if <strong>P</strong> is
<strong>memory_scope_sub_group</strong>, and <strong>A</strong> and <strong>B</strong> are executed by work-items
within the same sub-group, or (2) if <strong>P</strong> is <strong>memory_scope_work_group</strong>,
and <strong>A</strong> and <strong>B</strong> are executed by work-items within the same work-group,
or (3) if <strong>P</strong> is <strong>memory_scope_device</strong>, and <strong>A</strong> and <strong>B</strong> are executed by
work-items on the same device, or (4) if <strong>P</strong> is
<strong>memory_scope_all_svm_devices</strong>, if <strong>A</strong> and <strong>B</strong> are executed by host
threads or by work-items on one or more devices that can share SVM
memory with each other and the host process.</p>
</dd>
<dt class="hdlist1">Sequenced before </dt>
<dd>
<p>A relation between evaluations executed by a single unit of execution.
Sequenced-before is an asymmetric, transitive, pair-wise relation that
induces a partial order between evaluations.
Given any two evaluations A and B, if A is sequenced-before B, then the
execution of A shall precede the execution of B.</p>
</dd>
<dt class="hdlist1">Sequential consistency </dt>
<dd>
<p>Sequential consistency interleaves the steps executed by each unit of
execution.
Each access to a memory location sees the last assignment to that
location in that interleaving.</p>
</dd>
<dt class="hdlist1">Sequentially consistent semantics </dt>
<dd>
<p>One of the memory order semantics defined for synchronization
operations.
When using sequentially-consistent synchronization operations, the loads
and stores within one unit of execution appear to execute in program
order (i.e., the sequenced-before order), and loads and stores from
different units of execution appear to be simply interleaved.</p>
</dd>
<dt class="hdlist1">Shared Virtual Memory (SVM) </dt>
<dd>
<p>An address space exposed to both the host and the devices within a
context.
SVM causes addresses to be meaningful between the host and all of the
devices within a context and therefore supports the use of pointer based
data structures in OpenCL kernels.
It logically extends a portion of the global memory into the host
address space therefore giving work-items access to the host address
space.
There are three types of SVM in OpenCL:</p>
<div class="openblock">
<div class="content">
<div class="dlist">
<dl>
<dt class="hdlist1"><em>Coarse-Grained buffer SVM</em> </dt>
<dd>
<p>Sharing occurs at the granularity of regions of OpenCL buffer memory
objects.</p>
</dd>
<dt class="hdlist1"><em>Fine-Grained buffer SVM</em> </dt>
<dd>
<p>Sharing occurs at the granularity of individual loads/stores into bytes
within OpenCL buffer memory objects.</p>
</dd>
<dt class="hdlist1"><em>Fine-Grained system SVM</em> </dt>
<dd>
<p>Sharing occurs at the granularity of individual loads/stores into bytes
occurring anywhere within the host memory.</p>
</dd>
</dl>
</div>
</div>
</div>
</dd>
<dt class="hdlist1">SIMD </dt>
<dd>
<p>Single Instruction Multiple Data.
A programming model where a <em>kernel</em> is executed concurrently on
multiple <em>processing elements</em> each with its own data and a shared
program counter.
All <em>processing elements</em> execute a strictly identical set of
instructions.</p>
</dd>
<dt class="hdlist1">Specialization constants </dt>
<dd>
<p>Specialization is intended for constant objects that will not have known
constant values until after initial generation of a SPIR-V module.
Such objects are called specialization constants.
Application might provide values for the specialization constants that
will be used when SPIR-V program is built.
Specialization constants that do not receive a value from an application
shall use default value as defined in SPIR-V specification.</p>
</dd>
<dt class="hdlist1">SPMD </dt>
<dd>
<p>Single Program Multiple Data.
A programming model where a <em>kernel</em> is executed concurrently on
multiple <em>processing elements</em> each with its own data and its own
program counter.
Hence, while all computational resources run the same <em>kernel</em> they
maintain their own instruction counter and due to branches in a
<em>kernel</em>, the actual sequence of instructions can be quite different
across the set of <em>processing elements</em>.</p>
</dd>
<dt class="hdlist1">Sub-device </dt>
<dd>
<p>An OpenCL <em>device</em> can be partitioned into multiple <em>sub-devices</em>.
The new <em>sub-devices</em> alias specific collections of compute units within
the parent <em>device</em>, according to a partition scheme.
The <em>sub-devices</em> may be used in any situation that their parent
<em>device</em> may be used.
Partitioning a <em>device</em> does not destroy the parent <em>device</em>, which may
continue to be used along side and intermingled with its child
<em>sub-devices</em>.
Also see <em>Device</em>, <em>Parent device</em> and <em>Root device</em>.</p>
</dd>
<dt class="hdlist1">Sub-group </dt>
<dd>
<p>Sub-groups are an implementation-dependent grouping of work-items within
a work-group.
The size and number of sub-groups is implementation-defined.</p>
</dd>
<dt class="hdlist1">Sub-group Barrier </dt>
<dd>
<p>See <em>Barrier</em>.</p>
</dd>
<dt class="hdlist1">Submitted </dt>
<dd>
<p>The second state in the six state model for the execution of a command.
The transition into this state occurs when the command is flushed from
the command-queue and submitted for execution on the device.
Once submitted, a programmer can assume a command will execute once its
prerequisites have been met.</p>
</dd>
<dt class="hdlist1">SVM Buffer </dt>
<dd>
<p>A memory allocation enabled to work with <em>Shared Virtual Memory (SVM)</em>.
Depending on how the SVM buffer is created, it can be a coarse-grained
or fine-grained SVM buffer.
Optionally it may be wrapped by a <em>Buffer Object</em>.
See <em>Shared Virtual Memory (SVM)</em>.</p>
</dd>
<dt class="hdlist1">Synchronization </dt>
<dd>
<p>Synchronization refers to mechanisms that constrain the order of
execution and the visibility of memory operations between two or more
units of execution.</p>
</dd>
<dt class="hdlist1">Synchronization operations </dt>
<dd>
<p>Operations that define memory order constraints in a program.
They play a special role in controlling how memory operations in one
unit of execution (such as work-items or, when using SVM a host thread)
are made visible to another.
Synchronization operations in OpenCL include <em>atomic operations</em> and
<em>fences</em>.</p>
</dd>
<dt class="hdlist1">Synchronization point </dt>
<dd>
<p>A synchronization point between a pair of commands (A and B) assures
that results of command A happens-before command B is launched (i.e.
enters the ready state) .</p>
</dd>
<dt class="hdlist1">Synchronizes with </dt>
<dd>
<p>A relation between operations in two different units of execution that
defines a memory order constraint in global memory
(<em>global-synchronizes-with</em>) or local memory
(<em>local-synchronizes-with</em>).</p>
</dd>
<dt class="hdlist1">Task Parallel Programming Model </dt>
<dd>
<p>A programming model in which computations are expressed in terms of
multiple concurrent tasks executing in one or more <em>command-queues</em>.
The concurrent tasks can be running different <em>kernels</em>.</p>
</dd>
<dt class="hdlist1">Thread-safe </dt>
<dd>
<p>An OpenCL API call is considered to be <em>thread-safe</em> if the internal
state as managed by OpenCL remains consistent when called simultaneously
by multiple <em>host</em> threads.
OpenCL API calls that are <em>thread-safe</em> allow an application to call
these functions in multiple <em>host</em> threads without having to implement
mutual exclusion across these <em>host</em> threads i.e. they are also
re-entrant-safe.</p>
</dd>
<dt class="hdlist1">Undefined </dt>
<dd>
<p>The behavior of an OpenCL API call, built-in function used inside a
<em>kernel</em> or execution of a <em>kernel</em> that is explicitly not defined by
OpenCL.
A conforming implementation is not required to specify what occurs when
an undefined construct is encountered in OpenCL.</p>
</dd>
<dt class="hdlist1">Unit of execution </dt>
<dd>
<p>A generic term for a process, OS managed thread running on the host (a
host-thread), kernel-instance, host program, work-item or any other
executable agent that advances the work associated with a program.</p>
</dd>
<dt class="hdlist1">Work-group </dt>
<dd>
<p>A collection of related <em>work-items</em> that execute on a single <em>compute
unit</em>.
The <em>work-items</em> in the group execute the same <em>kernel-instance</em> and
share <em>local</em> <em>memory</em> and <em>work-group functions</em>.</p>
</dd>
<dt class="hdlist1">Work-group Barrier </dt>
<dd>
<p>See <em>Barrier</em>.</p>
</dd>
<dt class="hdlist1">Work-group Function </dt>
<dd>
<p>A function that carries out collective operations across all the
work-items in a work-group.
Available collective operations are a barrier, reduction, broadcast,
prefix sum, and evaluation of a predicate.
A work-group function must occur within a <em>converged control flow</em>; i.e.
all work-items in the work-group must encounter precisely the same
work-group function.</p>
</dd>
<dt class="hdlist1">Work-group Synchronization </dt>
<dd>
<p>Constraints on the order of execution for work-items in a single
work-group.</p>
</dd>
<dt class="hdlist1">Work-pool </dt>
<dd>
<p>A logical pool associated with a device that holds commands and
work-groups from kernel-instances that are ready to execute.
OpenCL does not constrain the order that commands and work-groups are
scheduled for execution from the work-pool; i.e. a programmer must
assume that they could be interleaved.
There is one work-pool per device used by all command-queues associated
with that device.
The work-pool may be implemented in any manner as long as it assures
that work-groups placed in the pool will eventually execute.</p>
</dd>
<dt class="hdlist1">Work-item </dt>
<dd>
<p>One of a collection of parallel executions of a <em>kernel</em> invoked on a
<em>device</em> by a <em>command</em>.
A <em>work-item</em> is executed by one or more <em>processing elements</em> as part
of a <em>work-group</em> executing on a <em>compute unit</em>.
A <em>work-item</em> is distinguished from other work-items by its <em>global ID</em>
or the combination of its <em>work-group</em> ID and its <em>local ID</em> within a
<em>work-group</em>.</p>
</dd>
</dl>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_the_opencl_architecture">3. The OpenCL Architecture</h2>
<div class="sectionbody">
<div class="paragraph">
<p><strong>OpenCL</strong> is an open industry standard for programming a heterogeneous
collection of CPUs, GPUs and other discrete computing devices organized into
a single platform.
It is more than a language.
OpenCL is a framework for parallel programming and includes a language, API,
libraries and a runtime system to support software development.
Using OpenCL, for example, a programmer can write general purpose programs
that execute on GPUs without the need to map their algorithms onto a 3D
graphics API such as OpenGL or DirectX.</p>
</div>
<div class="paragraph">
<p>The target of OpenCL is expert programmers wanting to write portable yet
efficient code.
This includes library writers, middleware vendors, and performance oriented
application programmers.
Therefore OpenCL provides a low-level hardware abstraction plus a framework
to support programming and many details of the underlying hardware are
exposed.</p>
</div>
<div class="paragraph">
<p>To describe the core ideas behind OpenCL, we will use a hierarchy of models:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Platform Model</p>
</li>
<li>
<p>Memory Model</p>
</li>
<li>
<p>Execution Model</p>
</li>
<li>
<p>Programming Model</p>
</li>
</ul>
</div>
<div class="sect2">
<h3 id="_platform_model">3.1. Platform Model</h3>
<div class="paragraph">
<p>The <a href="#platform-model-image">Platform model</a> for OpenCL is defined below.
The model consists of a <strong>host</strong> connected to one or more <strong>OpenCL devices</strong>.
An OpenCL device is divided into one or more <strong>compute units</strong> (CUs) which are
further divided into one or more <strong>processing elements</strong> (PEs).
Computations on a device occur within the processing elements.</p>
</div>
<div class="paragraph">
<p>An OpenCL application is implemented as both host code and device kernel
code.
The host code portion of an OpenCL application runs on a host processor
according to the models native to the host platform.
The OpenCL application host code submits the kernel code as commands from
the host to OpenCL devices.
An OpenCL device executes the commands computation on the processing
elements within the device.</p>
</div>
<div class="paragraph">
<p>An OpenCL device has considerable latitude on how computations are mapped
onto the devices processing elements.
When processing elements within a compute unit execute the same sequence of
statements across the processing elements, the control flow is said to be
<em>converged</em>.
Hardware optimized for executing a single stream of instructions over
multiple processing elements is well suited to converged control flows.
When the control flow varies from one processing element to another, it is
said to be <em>diverged</em>.
While a kernel always begins execution with a converged control flow, due to
branching statements within a kernel, converged and diverged control flows
may occur within a single kernel.
This provides a great deal of flexibility in the algorithms that can be
implemented with OpenCL.</p>
</div>
<div id="platform-model-image" class="imageblock" style="text-align: center">
<div class="content">
<img src="" alt="platform model">
</div>
<div class="title">Figure 1. Platform Model &#8230;&#8203; one host plus one or more compute devices each with one or more compute units composed of one or more processing elements.</div>
</div>
<div class="paragraph">
<p>Programmers provide programs in the form of SPIR-V source binaries, OpenCL C
or OpenCL C++ source strings or implementation-defined binary objects.
The OpenCL platform provides a compiler to translate program input of either
form into executable program objects.
The device code compiler may be <em>online</em> or <em>offline</em>.
An <em>online</em> <em>compiler</em> is available during host program execution using
standard APIs.
An <em>offline compiler</em> is invoked outside of host program control, using
platform-specific methods.
The OpenCL runtime allows developers to get a previously compiled device
program executable and be able to load and execute a previously compiled
device program executable.</p>
</div>
<div class="paragraph">
<p>OpenCL defines two kinds of platform profiles: a <em>full profile</em> and a
reduced-functionality <em>embedded profile</em>.
A full profile platform must provide an online compiler for all its devices.
An embedded platform may provide an online compiler, but is not required to
do so.</p>
</div>
<div class="paragraph">
<p>A device may expose special purpose functionality as a <em>built-in function</em>.
The platform provides APIs for enumerating and invoking the built-in
functions offered by a device, but otherwise does not define their
construction or semantics.
A <em>custom device</em> supports only built-in functions, and cannot be programmed
via a kernel language.</p>
</div>
<div class="paragraph">
<p>All device types support the OpenCL execution model, the OpenCL memory
model, and the APIs used in OpenCL to manage devices.</p>
</div>
<div class="paragraph">
<p>The platform model is an abstraction describing how OpenCL views the
hardware.
The relationship between the elements of the platform model and the hardware
in a system may be a fixed property of a device or it may be a dynamic
feature of a program dependent on how a compiler optimizes code to best
utilize physical hardware.</p>
</div>
</div>
<div class="sect2">
<h3 id="_execution_model">3.2. Execution Model</h3>
<div class="paragraph">
<p>The OpenCL execution model is defined in terms of two distinct units of
execution: <strong>kernels</strong> that execute on one or more OpenCL devices and a <strong>host
program</strong> that executes on the host.
With regard to OpenCL, the kernels are where the "work" associated with a
computation occurs.
This work occurs through <strong>work-items</strong> that execute in groups
(<strong>work-groups</strong>).</p>
</div>
<div class="paragraph">
<p>A kernel executes within a well-defined context managed by the host.
The context defines the environment within which kernels execute.
It includes the following resources:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Devices</strong>: One or more devices exposed by the OpenCL platform.</p>
</li>
<li>
<p><strong>Kernel Objects</strong>:The OpenCL functions with their associated argument
values that run on OpenCL devices.</p>
</li>
<li>
<p><strong>Program Objects</strong>:The program source and executable that implement the
kernels.</p>
</li>
<li>
<p><strong>Memory Objects</strong>:Variables visible to the host and the OpenCL devices.
Instances of kernels operate on these objects as they execute.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The host program uses the OpenCL API to create and manage the context.
Functions from the OpenCL API enable the host to interact with a device
through a <em>command-queue</em>.
Each command-queue is associated with a single device.
The commands placed into the command-queue fall into one of three types:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Kernel-enqueue commands</strong>: Enqueue a kernel for execution on a device.</p>
</li>
<li>
<p><strong>Memory commands</strong>: Transfer data between the host and device memory,
between memory objects, or map and unmap memory objects from the host
address space.</p>
</li>
<li>
<p><strong>Synchronization commands</strong>: Explicit synchronization points that define
order constraints between commands.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>In addition to commands submitted from the host command-queue, a kernel
running on a device can enqueue commands to a device-side command queue.
This results in <em>child kernels</em> enqueued by a kernel executing on a device
(the <em>parent kernel</em>).
Regardless of whether the command-queue resides on the host or a device,
each command passes through six states.</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p><strong>Queued</strong>: The command is enqueued to a command-queue.
A command may reside in the queue until it is flushed either explicitly
(a call to <strong>clFlush</strong>) or implicitly by some other command.</p>
</li>
<li>
<p><strong>Submitted</strong>: The command is flushed from the command-queue and submitted
for execution on the device.
Once flushed from the command-queue, a command will execute after any
prerequisites for execution are met.</p>
</li>
<li>
<p><strong>Ready</strong>: All prerequisites constraining execution of a command have been
met.
The command, or for a kernel-enqueue command the collection of work
groups associated with a command, is placed in a device work-pool from
which it is scheduled for execution.</p>
</li>
<li>
<p><strong>Running</strong>: Execution of the command starts.
For the case of a kernel-enqueue command, one or more work-groups
associated with the command start to execute.</p>
</li>
<li>
<p><strong>Ended</strong>: Execution of a command ends.
When a Kernel-enqueue command ends, all of the work-groups associated
with that command have finished their execution.
<em>Immediate side effects</em>, i.e. those associated with the kernel but not
necessarily with its child kernels, are visible to other units of
execution.
These side effects include updates to values in global memory.</p>
</li>
<li>
<p><strong>Complete</strong>: The command and its child commands have finished execution
and the status of the event object, if any, associated with the command
is set to CL_COMPLETE.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>The <a href="#profiled-states-image">execution states and the transitions between
them</a> are summarized below.
These states and the concept of a device work-pool are conceptual elements
of the execution model.
An implementation of OpenCL has considerable freedom in how these are
exposed to a program.
Five of the transitions, however, are directly observable through a
profiling interface.
These <a href="#profiled-states-image">profiled states</a> are shown below.</p>
</div>
<div id="profiled-states-image" class="imageblock" style="text-align: center">
<div class="content">
<img src="" alt="profiled states">
</div>
<div class="title">Figure 2. The states and transitions between states defined in the OpenCL execution model. A subset of these transitions is exposed through the <a href="#profiling-operations">profiling interface</a>.</div>
</div>
<div class="paragraph">
<p>Commands communicate their status through <em>Event objects</em>.
Successful completion is indicated by setting the event status associated
with a command to CL_COMPLETE.
Unsuccessful completion results in abnormal termination of the command which
is indicated by setting the event status to a negative value.
In this case, the command-queue associated with the abnormally terminated
command and all other command-queues in the same context may no longer be
available and their behavior is implementation defined.</p>
</div>
<div class="paragraph">
<p>A command submitted to a device will not launch until prerequisites that
constrain the order of commands have been resolved.
These prerequisites have three sources:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>They may arise from commands submitted to a command-queue that constrain
the order in which commands are launched.
For example, commands that follow a command queue barrier will not
launch until all commands prior to the barrier are complete.</p>
</li>
<li>
<p>The second source of prerequisites is dependencies between commands
expressed through events.
A command may include an optional list of events.
The command will wait and not launch until all the events in the list
are in the state CL COMPLETE.
By this mechanism, event objects define order constraints between
commands and coordinate execution between the host and one or more
devices.</p>
</li>
<li>
<p>The third source of prerequisities can be the presence of non-trivial C
initializers or C constructors for program scope global variables.
In this case, OpenCL C/C compiler shall generate program
initialization kernels that perform C initialization or C++
construction.
These kernels must be executed by OpenCL runtime on a device before any
kernel from the same program can be executed on the same device.
The ND-range for any program initialization kernel is (1,1,1).
When multiple programs are linked together, the order of execution of
program initialization kernels that belong to different programs is
undefined.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Program clean up may result in the execution of one or more program clean up
kernels by the OpenCL runtime.
This is due to the presence of non-trivial C++ destructors for
program scope variables.
The ND-range for executing any program clean up kernel is (1,1,1).
The order of execution of clean up kernels from different programs (that are
linked together) is undefined.</p>
</div>
<div class="paragraph">
<p>Note that C initializers, C constructors, or C destructors for program
scope variables cannot use pointers to coarse grain and fine grain SVM
allocations.</p>
</div>
<div class="paragraph">
<p>A command may be submitted to a device and yet have no visible side effects
outside of waiting on and satisfying event dependences.
Examples include markers, kernels executed over ranges of no work-items or
copy operations with zero sizes.
Such commands may pass directly from the <em>ready</em> state to the <em>ended</em> state.</p>
</div>
<div class="paragraph">
<p>Command execution can be blocking or non-blocking.
Consider a sequence of OpenCL commands.
For blocking commands, the OpenCL API functions that enqueue commands don&#8217;t
return until the command has completed.
Alternatively, OpenCL functions that enqueue non-blocking commands return
immediately and require that a programmer defines dependencies between
enqueued commands to ensure that enqueued commands are not launched before
needed resources are available.
In both cases, the actual execution of the command may occur asynchronously
with execution of the host program.</p>
</div>
<div class="paragraph">
<p>Commands within a single command-queue execute relative to each other in one
of two modes:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>In-order Execution</strong>: Commands and any side effects associated with
commands appear to the OpenCL application as if they execute in the same
order they are enqueued to a command-queue.</p>
</li>
<li>
<p><strong>Out-of-order Execution</strong>: Commands execute in any order constrained only
by explicit synchronization points (e.g. through command queue barriers)
or explicit dependencies on events.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Multiple command-queues can be present within a single context.
Multiple command-queues execute commands independently.
Event objects visible to the host program can be used to define
synchronization points between commands in multiple command queues.
If such synchronization points are established between commands in multiple
command-queues, an implementation must assure that the command-queues
progress concurrently and correctly account for the dependencies established
by the synchronization points.
For a detailed explanation of synchronization points, see
<a href="#execution-model-sync">Execution Model: Synchronization</a>.</p>
</div>
<div class="paragraph">
<p>The core of the OpenCL execution model is defined by how the kernels
execute.
When a kernel-enqueue command submits a kernel for execution, an index space
is defined.
The kernel, the argument values associated with the arguments to the kernel,
and the parameters that define the index space define a <em>kernel-instance</em>.
When a kernel-instance executes on a device, the kernel function executes
for each point in the defined index space.
Each of these executing kernel functions is called a <em>work-item</em>.
The work-items associated with a given kernel-instance are managed by the
device in groups called <em>work-groups</em>.
These work-groups define a coarse grained decomposition of the Index space.
Work-groups are further divided into <em>sub-groups</em>, which provide an
additional level of control over execution.</p>
</div>
<div class="paragraph">
<p>Work-items have a global ID based on their coordinates within the Index
space.
They can also be defined in terms of their work-group and the local ID
within a work-group.
The details of this mapping are described in the following section.</p>
</div>
<div class="sect3">
<h4 id="_execution_model_mapping_work_items_onto_an_ndrange">3.2.1. Execution Model: Mapping work-items onto an NDRange</h4>
<div class="paragraph">
<p>The index space supported by OpenCL is called an NDRange.
An NDRange is an N-dimensional index space, where N is one, two or three.
The NDRange is decomposed into work-groups forming blocks that cover the
Index space.
An NDRange is defined by three integer arrays of length N:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The extent of the index space (or global size) in each dimension.</p>
</li>
<li>
<p>An offset index F indicating the initial value of the indices in each
dimension (zero by default).</p>
</li>
<li>
<p>The size of a work-group (local size) in each dimension.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Each work-items global ID is an N-dimensional tuple.
The global ID components are values in the range from F, to F plus the
number of elements in that dimension minus one.</p>
</div>
<div class="paragraph">
<p>If a kernel is created from OpenCL 2.0 or SPIR-V, the size of work-groups in
an NDRange (the local size) need not be the same for all work-groups.
In this case, any single dimension for which the global size is not
divisible by the local size will be partitioned into two regions.
One region will have work-groups that have the same number of work items as
was specified for that dimension by the programmer (the local size).
The other region will have work-groups with less than the number of work
items specified by the local size parameter in that dimension (the
<em>remainder work-groups</em>).
Work-group sizes could be non-uniform in multiple dimensions, potentially
producing work-groups of up to 4 different sizes in a 2D range and 8
different sizes in a 3D range.</p>
</div>
<div class="paragraph">
<p>Each work-item is assigned to a work-group and given a local ID to represent
its position within the work-group.
A work-item&#8217;s local ID is an N-dimensional tuple with components in the
range from zero to the size of the work-group in that dimension minus one.</p>
</div>
<div class="paragraph">
<p>Work-groups are assigned IDs similarly.
The number of work-groups in each dimension is not directly defined but is
inferred from the local and global NDRanges provided when a kernel-instance
is enqueued.
A work-group&#8217;s ID is an N-dimensional tuple with components in the range 0
to the ceiling of the global size in that dimension divided by the local
size in the same dimension.
As a result, the combination of a work-group ID and the local-ID within a
work-group uniquely defines a work-item.
Each work-item is identifiable in two ways; in terms of a global index, and
in terms of a work-group index plus a local index within a work group.</p>
</div>
<div class="paragraph">
<p>For example, consider the <a href="#index-space-image">2-dimensional index space</a>
shown below.
We input the index space for the work-items (G<sub>x</sub>, G<sub>y</sub>), the size of each
work-group (S<sub>x</sub>, S<sub>y</sub>) and the global ID offset (F<sub>x</sub>, F<sub>y</sub>).
The global indices define an G<sub>x</sub>by G<sub>y</sub> index space where the total number
of work-items is the product of G<sub>x</sub> and G<sub>y</sub>.
The local indices define an S<sub>x</sub> by S<sub>y</sub> index space where the number of
work-items in a single work-group is the product of S<sub>x</sub> and S<sub>y</sub>.
Given the size of each work-group and the total number of work-items we can
compute the number of work-groups.
A 2-dimensional index space is used to uniquely identify a work-group.
Each work-item is identified by its global ID (<em>g</em><sub>x</sub>, <em>g</em><sub>y</sub>) or by the
combination of the work-group ID (<em>w</em><sub>x</sub>, <em>w</em><sub>y</sub>), the size of each
work-group (S<sub>x</sub>,S<sub>y</sub>) and the local ID (s<sub>x</sub>, s<sub>y</sub>) inside the work-group
such that</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"></dt>
<dd>
<p>(g<sub>x</sub> , g<sub>y</sub>) = (w<sub>x</sub> S<sub>x</sub> + s<sub>x</sub> + F<sub>x</sub>, w<sub>y</sub> S<sub>y</sub> + s<sub>y</sub> + F<sub>y</sub>)</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>The number of work-groups can be computed as:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"></dt>
<dd>
<p>(W<sub>x</sub>, W<sub>y</sub>) = (ceil(G<sub>x</sub> / S<sub>x</sub>), ceil(G<sub>y</sub> / S<sub>y</sub>))</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>Given a global ID and the work-group size, the work-group ID for a work-item
is computed as:</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1"></dt>
<dd>
<p>(w<sub>x</sub>, w<sub>y</sub>) = ( (g<sub>x</sub> s<sub>x</sub> F<sub>x</sub>) / S<sub>x</sub>, (g<sub>y</sub> s<sub>y</sub> F<sub>y</sub>) / S<sub>y</sub> )</p>
</dd>
</dl>
</div>
<div id="index-space-image" class="imageblock" style="text-align: center">
<div class="content">
<img src="" alt="index space">
</div>
<div class="title">Figure 3. An example of an NDRange index space showing work-items, their global IDs and their mapping onto the pair of work-group and local IDs. In this case, we assume that in each dimension, the size of the work-group evenly divides the global NDRange size (i.e. all work-groups have the same size) and that the offset is equal to zero.</div>
</div>
<div class="paragraph">
<p>Within a work-group work-items may be divided into sub-groups.
The mapping of work-items to sub-groups is implementation-defined and may be
queried at runtime.
While sub-groups may be used in multi-dimensional work-groups, each
sub-group is 1-dimensional and any given work-item may query which sub-group
it is a member of.</p>
</div>
<div class="paragraph">
<p>Work items are mapped into sub-groups through a combination of compile-time
decisions and the parameters of the dispatch.
The mapping to sub-groups is invariant for the duration of a kernels
execution, across dispatches of a given kernel with the same work-group
dimensions, between dispatches and query operations consistent with the
dispatch parameterization, and from one work-group to another within the
dispatch (excluding the trailing edge work-groups in the presence of
non-uniform work-group sizes).
In addition, all sub-groups within a work-group will be the same size, apart
from the sub-group with the maximum index which may be smaller if the size
of the work-group is not evenly divisible by the size of the sub-groups.</p>
</div>
<div class="paragraph">
<p>In the degenerate case, a single sub-group must be supported for each
work-group.
In this situation all sub-group scope functions are equivalent to their
work-group level equivalents.</p>
</div>
</div>
<div class="sect3">
<h4 id="_execution_model_execution_of_kernel_instances">3.2.2. Execution Model: Execution of kernel-instances</h4>
<div class="paragraph">
<p>The work carried out by an OpenCL program occurs through the execution of
kernel-instances on compute devices.
To understand the details of OpenCLs execution model, we need to consider
how a kernel object moves from the kernel-enqueue command, into a
command-queue, executes on a device, and completes.</p>
</div>
<div class="paragraph">
<p>A kernel-object is defined from a function within the program object and a
collection of arguments connecting the kernel to a set of argument values.
The host program enqueues a kernel-object to the command queue along with
the NDRange, and the work-group decomposition.
These define a <em>kernel-instance</em>.
In addition, an optional set of events may be defined when the kernel is
enqueued.
The events associated with a particular kernel-instance are used to
constrain when the kernel-instance is launched with respect to other
commands in the queue or to commands in other queues within the same
context.</p>
</div>
<div class="paragraph">
<p>A kernel-instance is submitted to a device.
For an in-order command queue, the kernel instances appear to launch and
then execute in that same order; where we use the term appear to emphasize
that when there are no dependencies between commands and hence differences
in the order that commands execute cannot be observed in a program, an
implementation can reorder commands even in an in-order command queue.
For an out of order command-queue, kernel-instances wait to be launched
until:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Synchronization commands enqueued prior to the kernel-instance are
satisfied.</p>
</li>
<li>
<p>Each of the events in an optional event list defined when the
kernel-instance was enqueued are set to CL_COMPLETE.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Once these conditions are met, the kernel-instance is launched and the
work-groups associated with the kernel-instance are placed into a pool of
ready to execute work-groups.
This pool is called a <em>work-pool</em>.
The work-pool may be implemented in any manner as long as it assures that
work-groups placed in the pool will eventually execute.
The device schedules work-groups from the work-pool for execution on the
compute units of the device.
The kernel-enqueue command is complete when all work-groups associated with
the kernel-instance end their execution, updates to global memory associated
with a command are visible globally, and the device signals successful
completion by setting the event associated with the kernel-enqueue command
to CL_COMPLETE.</p>
</div>
<div class="paragraph">
<p>While a command-queue is associated with only one device, a single device
may be associated with multiple command-queues all feeding into the single
work-pool.
A device may also be associated with command queues associated with
different contexts within the same platform, again all feeding into the
single work-pool.
The device will pull work-groups from the work-pool and execute them on one
or several compute units in any order; possibly interleaving execution of
work-groups from multiple commands.
A conforming implementation may choose to serialize the work-groups so a
correct algorithm cannot assume that work-groups will execute in parallel.
There is no safe and portable way to synchronize across the independent
execution of work-groups since once in the work-pool, they can execute in
any order.</p>
</div>
<div class="paragraph">
<p>The work-items within a single sub-group execute concurrently but not
necessarily in parallel (i.e. they are not guaranteed to make independent
forward progress).
Therefore, only high-level synchronization constructs (e.g. sub-group
functions such as barriers) that apply to all the work-items in a sub-group
are well defined and included in OpenCL.</p>
</div>
<div class="paragraph">
<p>Sub-groups execute concurrently within a given work-group and with
appropriate device support (see <a href="#platform-querying-devices">Querying
Devices</a>), may make independent forward progress with respect to each
other, with respect to host threads and with respect to any entities
external to the OpenCL system but running on an OpenCL device, even in the
absence of work-group barrier operations.
In this situation, sub-groups are able to internally synchronize using
barrier operations without synchronizing with each other and may perform
operations that rely on runtime dependencies on operations other sub-groups
perform.</p>
</div>
<div class="paragraph">
<p>The work-items within a single work-group execute concurrently but are only
guaranteed to make independent progress in the presence of sub-groups and
device support.
In the absence of this capability, only high-level synchronization
constructs (e.g. work-group functions such as barriers) that apply to all
the work-items in a work-group are well defined and included in OpenCL for
synchronization within the work-group.</p>
</div>
<div class="paragraph">
<p>In the absence of synchronization functions (e.g. a barrier), work-items
within a sub-group may be serialized.
In the presence of sub -group functions, work-items within a sub -group may
be serialized before any given sub -group function, between dynamically
encountered pairs of sub-group functions and between a work-group function
and the end of the kernel.</p>
</div>
<div class="paragraph">
<p>In the absence of independent forward progress of constituent sub-groups,
work-items within a work-group may be serialized before, after or between
work-group synchronization functions.</p>
</div>
</div>
<div class="sect3">
<h4 id="device-side-enqueue">3.2.3. Execution Model: Device-side enqueue</h4>
<div class="paragraph">
<p>Algorithms may need to generate additional work as they execute.
In many cases, this additional work cannot be determined statically; so the
work associated with a kernel only emerges at runtime as the kernel-instance
executes.
This capability could be implemented in logic running within the host
program, but involvement of the host may add significant overhead and/or
complexity to the application control flow.
A more efficient approach would be to nest kernel-enqueue commands from
inside other kernels.
This <strong>nested parallelism</strong> can be realized by supporting the enqueuing of
kernels on a device without direct involvement by the host program;
so-called <strong>device-side enqueue</strong>.</p>
</div>
<div class="paragraph">
<p>Device-side kernel-enqueue commands are similar to host-side kernel-enqueue
commands.
The kernel executing on a device (the <strong>parent kernel</strong>) enqueues a
kernel-instance (the <strong>child kernel</strong>) to a device-side command queue.
This is an out-of-order command-queue and follows the same behavior as the
out-of-order command-queues exposed to the host program.
Commands enqueued to a device side command-queue generate and use events to
enforce order constraints just as for the command-queue on the host.
These events, however, are only visible to the parent kernel running on the
device.
When these prerequisite events take on the value CL_COMPLETE, the
work-groups associated with the child kernel are launched into the devices
work pool.
The device then schedules them for execution on the compute units of the
device.
Child and parent kernels execute asynchronously.
However, a parent will not indicate that it is complete by setting its event
to CL_COMPLETE until all child kernels have ended execution and have
signaled completion by setting any associated events to the value
CL_COMPLETE.
Should any child kernel complete with an event status set to a negative
value (i.e. abnormally terminate), the parent kernel will abnormally
terminate and propagate the childs negative event value as the value of the
parents event.
If there are multiple children that have an event status set to a negative
value, the selection of which childs negative event value is propagated is
implementation-defined.</p>
</div>
</div>
<div class="sect3">
<h4 id="execution-model-sync">3.2.4. Execution Model: Synchronization</h4>
<div class="paragraph">
<p>Synchronization refers to mechanisms that constrain the order of execution
between two or more units of execution.
Consider the following three domains of synchronization in OpenCL:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Work-group synchronization: Constraints on the order of execution for
work-items in a single work-group</p>
</li>
<li>
<p>Sub-group synchronization: Contraints on the order of execution for
work-items in a single sub-group</p>
</li>
<li>
<p>Command synchronization: Constraints on the order of commands launched
for execution</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Synchronization across all work-items within a single work-group is carried
out using a <em>work-group function</em>.
These functions carry out collective operations across all the work-items in
a work-group.
Available collective operations are: barrier, reduction, broadcast, prefix
sum, and evaluation of a predicate.
A work-group function must occur within a converged control flow; i.e. all
work-items in the work-group must encounter precisely the same work-group
function.
For example, if a work-group function occurs within a loop, the work-items
must encounter the same work-group function in the same loop iterations.
All the work-items of a work-group must execute the work-group function and
complete reads and writes to memory before any are allowed to continue
execution beyond the work-group function.
Work-group functions that apply between work-groups are not provided in
OpenCL since OpenCL does not define forward-progress or ordering relations
between work-groups, hence collective synchronization operations are not
well defined.</p>
</div>
<div class="paragraph">
<p>Synchronization across all work-items within a single sub-group is carried
out using a <em>sub-group function</em>.
These functions carry out collective operations across all the work-items in
a sub-group.
Available collective operations are: barrier, reduction, broadcast, prefix
sum, and evaluation of a predicate.
A sub-group function must occur within a converged control flow; i.e. all
work-items in the sub-group must encounter precisely the same sub-group
function.
For example, if a work-group function occurs within a loop, the work-items
must encounter the same sub-group function in the same loop iterations.
All the work-items of a sub-group must execute the sub-group function and
complete reads and writes to memory before any are allowed to continue
execution beyond the sub-group function.
Synchronization between sub-groups must either be performed using work-group
functions, or through memory operations.
Using memory operations for sub-group synchronization should be used
carefully as forward progress of sub-groups relative to each other is only
supported optionally by OpenCL implementations.</p>
</div>
<div class="paragraph">
<p>Command synchronization is defined in terms of distinct <strong>synchronization
points</strong>.
The synchronization points occur between commands in host command-queues and
between commands in device-side command-queues.
The synchronization points defined in OpenCL include:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Launching a command:</strong> A kernel-instance is launched onto a device after
all events that kernel is waiting-on have been set to CL_COMPLETE.</p>
</li>
<li>
<p><strong>Ending a command:</strong> Child kernels may be enqueued such that they wait
for the parent kernel to reach the <em>end</em> state before they can be
launched.
In this case, the ending of the parent command defines a synchronization
point.</p>
</li>
<li>
<p><strong>Completion of a command:</strong> A kernel-instance is complete after all of
the work-groups in the kernel and all of its child kernels have
completed.
This is signaled to the host, a parent kernel or other kernels within
command queues by setting the value of the event associated with a
kernel to CL_COMPLETE.</p>
</li>
<li>
<p><strong>Blocking Commands:</strong> A blocking command defines a synchronization point
between the unit of execution that calls the blocking API function and
the enqueued command reaching the complete state.</p>
</li>
<li>
<p><strong>Command-queue barrier:</strong> The command-queue barrier ensures that all
previously enqueued commands have completed before subsequently enqueued
commands can be launched.</p>
</li>
<li>
<p><strong>clFinish:</strong> This function blocks until all previously enqueued commands
in the command queue have completed after which <strong>clFinish</strong> defines a
synchronization point and the <strong>clFinish</strong> function returns.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>A synchronization point between a pair of commands (A and B) assures that
results of command A happens-before command B is launched.
This requires that any updates to memory from command A complete and are
made available to other commands before the synchronization point completes.
Likewise, this requires that command B waits until after the synchronization
point before loading values from global memory.
The concept of a synchronization point works in a similar fashion for
commands such as a barrier that apply to two sets of commands.
All the commands prior to the barrier must complete and make their results
available to following commands.
Furthermore, any commands following the barrier must wait for the commands
prior to the barrier before loading values and continuing their execution.</p>
</div>
<div class="paragraph">
<p>These <em>happens-before</em> relationships are a fundamental part of the OpenCL
memory model.
When applied at the level of commands, they are straightforward to define at
a language level in terms of ordering relationships between different
commands.
Ordering memory operations inside different commands, however, requires
rules more complex than can be captured by the high level concept of a
synchronization point.
These rules are described in detail in <a href="#memory-ordering-rules">Memory
Ordering Rules</a>.</p>
</div>
</div>
<div class="sect3">
<h4 id="_execution_model_categories_of_kernels">3.2.5. Execution Model: Categories of Kernels</h4>
<div class="paragraph">
<p>The OpenCL execution model supports three types of kernels:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>OpenCL kernels</strong> are managed by the OpenCL API as kernel-objects
associated with kernel functions within program-objects.
OpenCL kernels are provided via a kernel language.
All OpenCL implementations must support OpenCL kernels supplied in the
standard SPIR-V intermediate language with the appropriate environment
specification, and the OpenCL C programming language defined in earlier
versions of the OpenCL specification.
Implementations must also support OpenCL kernels in SPIR-V intermediate
language.
SPIR-V binaries nay be generated from an OpenCL kernel language or by a
third party compiler from an alternative input.</p>
</li>
<li>
<p><strong>Native kernels</strong> are accessed through a host function pointer.
Native kernels are queued for execution along with OpenCL kernels on a
device and share memory objects with OpenCL kernels.
For example, these native kernels could be functions defined in
application code or exported from a library.
The ability to execute native kernels is optional within OpenCL and the
semantics of native kernels are implementation-defined.
The OpenCL API includes functions to query capabilities of a device(s)
and determine if this capability is supported.</p>
</li>
<li>
<p><strong>Built-in kernels</strong> are tied to particular device and are not built at
runtime from source code in a program object.
The common use of built in kernels is to expose fixed-function hardware
or firmware associated with a particular OpenCL device or custom device.
The semantics of a built-in kernel may be defined outside of OpenCL and
hence are implementation defined.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>All three types of kernels are manipulated through the OpenCL command queues
and must conform to the synchronization points defined in the OpenCL
execution model.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_memory_model">3.3. Memory Model</h3>
<div class="paragraph">
<p>The OpenCL memory model describes the structure, contents, and behavior of
the memory exposed by an OpenCL platform as an OpenCL program runs.
The model allows a programmer to reason about values in memory as the host
program and multiple kernel-instances execute.</p>
</div>
<div class="paragraph">
<p>An OpenCL program defines a context that includes a host, one or more
devices, command-queues, and memory exposed within the context.
Consider the units of execution involved with such a program.
The host program runs as one or more host threads managed by the operating
system running on the host (the details of which are defined outside of
OpenCL).
There may be multiple devices in a single context which all have access to
memory objects defined by OpenCL.
On a single device, multiple work-groups may execute in parallel with
potentially overlapping updates to memory.
Finally, within a single work-group, multiple work-items concurrently
execute, once again with potentially overlapping updates to memory.</p>
</div>
<div class="paragraph">
<p>The memory model must precisely define how the values in memory as seen from
each of these units of execution interact so a programmer can reason about
the correctness of OpenCL programs.
We define the memory model in four parts.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Memory regions: The distinct memories visible to the host and the
devices that share a context.</p>
</li>
<li>
<p>Memory objects: The objects defined by the OpenCL API and their
management by the host and devices.</p>
</li>
<li>
<p>Shared Virtual Memory: A virtual address space exposed to both the host
and the devices within a context.</p>
</li>
<li>
<p>Consistency Model: Rules that define which values are observed when
multiple units of execution load data from memory plus the atomic/fence
operations that constrain the order of memory operations and define
synchronization relationships.</p>
</li>
</ul>
</div>
<div class="sect3">
<h4 id="_memory_model_fundamental_memory_regions">3.3.1. Memory Model: Fundamental Memory Regions</h4>
<div class="paragraph">
<p>Memory in OpenCL is divided into two parts.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Host Memory:</strong> The memory directly available to the host.
The detailed behavior of host memory is defined outside of OpenCL.
Memory objects move between the Host and the devices through functions
within the OpenCL API or through a shared virtual memory interface.</p>
</li>
<li>
<p><strong>Device Memory:</strong> Memory directly available to kernels executing on
OpenCL devices.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Device memory consists of four named address spaces or <em>memory regions</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Global Memory:</strong> This memory region permits read/write access to all
work-items in all work-groups running on any device within a context.
Work-items can read from or write to any element of a memory object.
Reads and writes to global memory may be cached depending on the
capabilities of the device.</p>
</li>
<li>
<p><strong>Constant Memory</strong>: A region of global memory that remains constant
during the execution of a kernel-instance.
The host allocates and initializes memory objects placed into constant
memory.</p>
</li>
<li>
<p><strong>Local Memory</strong>: A memory region local to a work-group.
This memory region can be used to allocate variables that are shared by
all work-items in that work-group.</p>
</li>
<li>
<p><strong>Private Memory</strong>: A region of memory private to a work-item.
Variables defined in one work-items private memory are not visible to
another work-item.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The <a href="#memory-regions-image">memory regions</a> and their relationship to the
OpenCL Platform model are summarized below.
Local and private memories are always associated with a particular device.
The global and constant memories, however, are shared between all devices
within a given context.
An OpenCL device may include a cache to support efficient access to these
shared memories</p>
</div>
<div class="paragraph">
<p>To understand memory in OpenCL, it is important to appreciate the
relationships between these named address spaces.
The four named address spaces available to a device are disjoint meaning
they do not overlap.
This is a logical relationship, however, and an implementation may choose to
let these disjoint named address spaces share physical memory.</p>
</div>
<div class="paragraph">
<p>Programmers often need functions callable from kernels where the pointers
manipulated by those functions can point to multiple named address spaces.
This saves a programmer from the error-prone and wasteful practice of
creating multiple copies of functions; one for each named address space.
Therefore the global, local and private address spaces belong to a single
<em>generic address space</em>.
This is closely modeled after the concept of a generic address space used in
the embedded C standard (ISO/IEC 9899:1999).
Since they all belong to a single generic address space, the following
properties are supported for pointers to named address spaces in device
memory:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>A pointer to the generic address space can be cast to a pointer to a
global, local or private address space</p>
</li>
<li>
<p>A pointer to a global, local or private address space can be cast to a
pointer to the generic address space.</p>
</li>
<li>
<p>A pointer to a global, local or private address space can be implicitly
converted to a pointer to the generic address space, but the converse is
not allowed.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The constant address space is disjoint from the generic address space.</p>
</div>
<div class="paragraph">
<p>The addresses of memory associated with memory objects in Global memory are
not preserved between kernel instances, between a device and the host, and
between devices.
In this regard global memory acts as a global pool of memory objects rather
than an address space.
This restriction is relaxed when shared virtual memory (SVM) is used.</p>
</div>
<div class="paragraph">
<p>SVM causes addresses to be meaningful between the host and all of the
devices within a context hence supporting the use of pointer based data
structures in OpenCL kernels.
It logically extends a portion of the global memory into the host address
space giving work-items access to the host address space.
On platforms with hardware support for a shared address space between the
host and one or more devices, SVM may also provide a more efficient way to
share data between devices and the host.
Details about SVM are presented in <a href="#shared-virtual-memory">Shared Virtual
Memory</a>.</p>
</div>
<div id="memory-regions-image" class="imageblock" style="text-align: center">
<div class="content">
<img src="" alt="memory regions">
</div>
<div class="title">Figure 4. The named address spaces exposed in an OpenCL Platform. Global and Constant memories are shared between the one or more devices within a context, while local and private memories are associated with a single device. Each device may include an optional cache to support efficient access to their view of the global and constant address spaces.</div>
</div>
<div class="paragraph">
<p>A programmer may use the features of the <a href="#memory-consistency-model">memory
consistency model</a> to manage safe access to global memory from multiple
work-items potentially running on one or more devices.
In addition, when using shared virtual memory (SVM), the memory consistency
model may also be used to ensure that host threads safely access memory
locations in the shared memory region.</p>
</div>
</div>
<div class="sect3">
<h4 id="_memory_model_memory_objects">3.3.2. Memory Model: Memory Objects</h4>
<div class="paragraph">
<p>The contents of global memory are <em>memory objects</em>.
A memory object is a handle to a reference counted region of global memory.
Memory objects use the OpenCL type <em>cl_mem</em> and fall into three distinct
classes.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Buffer</strong>: A memory object stored as a block of contiguous memory and
used as a general purpose object to hold data used in an OpenCL program.
The types of the values within a buffer may be any of the built in types
(such as int, float), vector types, or user-defined structures.
The buffer can be manipulated through pointers much as one would with
any block of memory in C.</p>
</li>
<li>
<p><strong>Image</strong>: An image memory object holds one, two or three dimensional
images.
The formats are based on the standard image formats used in graphics
applications.
An image is an opaque data structure managed by functions defined in the
OpenCL API.
To optimize the manipulation of images stored in the texture memories
found in many GPUs, OpenCL kernels have traditionally been disallowed
from both reading and writing a single image.
In OpenCL 2.0, however, we have relaxed this restriction by providing
synchronization and fence operations that let programmers properly
synchronize their code to safely allow a kernel to read and write a
single image.</p>
</li>
<li>
<p><strong>Pipe</strong>: The <em>pipe</em> memory object conceptually is an ordered sequence of
data items.
A pipe has two endpoints: a write endpoint into which data items are
inserted, and a read endpoint from which data items are removed.
At any one time, only one kernel instance may write into a pipe, and
only one kernel instance may read from a pipe.
To support the producer consumer design pattern, one kernel instance
connects to the write endpoint (the producer) while another kernel
instance connects to the reading endpoint (the consumer).</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Memory objects are allocated by host APIs.
The host program can provide the runtime with a pointer to a block of
continuous memory to hold the memory object when the object is created
(CL_MEM_USE_HOST_PTR).
Alternatively, the physical memory can be managed by the OpenCL runtime and
not be directly accessible to the host program.</p>
</div>
<div class="paragraph">
<p>Allocation and access to memory objects within the different memory regions
varies between the host and work-items running on a device.
This is summarized in the <a href="#memory-regions-table">Memory Regions</a> table,
which describes whether the kernel or the host can allocate from a memory
region, the type of allocation (static at compile time vs.
dynamic at runtime) and the type of access allowed (i.e. whether the kernel
or the host can read and/or write to a memory region).</p>
</div>
<table id="memory-regions-table" class="tableblock frame-all grid-all" style="width: 80%;">
<caption class="title">Table 1. Memory Regions</caption>
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"></th>
<th class="tableblock halign-left valign-top">Global</th>
<th class="tableblock halign-left valign-top">Constant</th>
<th class="tableblock halign-left valign-top">Local</th>
<th class="tableblock halign-left valign-top">Private</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock">Host</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">No Allocation</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access to buffers and images but not pipes</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">No access</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">No access</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock">Kernel</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation for program scope variables</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation.
</p><p class="tableblock"> Dynamic allocation for child kernel</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Read-only access</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access.
</p><p class="tableblock"> No access to child&#8217;s local memory.</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
</tr>
</tbody>
</table>
<div class="sidebarblock">
<div class="content">
<div class="title">Caption</div>
<div class="paragraph">
<p>The <a href="#memory-regions-table">Memory Regions</a> table shows the different
memory regions in OpenCL and how memory objects are allocated and accessed
by the host and by an executing instance of a kernel.
For the case of kernels, we distinguish between the behavior of local memory
with respect to a kernel (self) and its child kernels.</p>
</div>
</div>
</div>
<div class="paragraph">
<p>Once allocated, a memory object is made available to kernel-instances
running on one or more devices.
In addition to <a href="#shared-virtual-memory">Shared Virtual Memory</a>, there are
three basic ways to manage the contents of buffers between the host and
devices.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Read/Write/Fill commands</strong>: The data associated with a memory object is
explicitly read and written between the host and global memory regions
using commands enqueued to an OpenCL command queue.</p>
</li>
<li>
<p><strong>Map/Unmap commands</strong>: Data from the memory object is mapped into a
contiguous block of memory accessed through a host accessible pointer.
The host program enqueues a <em>map</em> command on block of a memory object
before it can be safely manipulated by the host program.
When the host program is finished working with the block of memory, the
host program enqueues an <em>unmap</em> command to allow a kernel-instance to
safely read and/or write the buffer.</p>
</li>
<li>
<p><strong>Copy commands:</strong> The data associated with a memory object is copied
between two buffers, each of which may reside either on the host or on
the device.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>With Read/Write/Map, the commands
can be blocking or non-blocking operations.
The OpenCL function call for a blocking memory transfer returns once the
command (memory transfer) has completed. At this point the associated memory
resources on the host can be safely reused, and following operations on the host are
guaranteed that the transfer has already completed.
For a non-blocking memory transfer, the OpenCL function call returns as soon
as the command is enqueued.</p>
</div>
<div class="paragraph">
<p>Memory objects are bound to a context and hence can appear in multiple
kernel-instances running on more than one physical device.
The OpenCL platform must support a large range of hardware platforms
including systems that do not support a single shared address space in
hardware; hence the ways memory objects can be shared between
kernel-instances is restricted.
The basic principle is that multiple read operations on memory objects from
multiple kernel-instances that overlap in time are allowed, but mixing
overlapping reads and writes into the same memory objects from different
kernel instances is only allowed when fine grained synchronization is used
with <a href="#shared-virtual-memory">Shared Virtual Memory</a>.</p>
</div>
<div class="paragraph">
<p>When global memory is manipulated by multiple kernel-instances running on
multiple devices, the OpenCL runtime system must manage the association of
memory objects with a given device.
In most cases the OpenCL runtime will implicitly associate a memory object
with a device.
A kernel instance is naturally associated with the command queue to which
the kernel was submitted.
Since a command-queue can only access a single device, the queue uniquely
defines which device is involved with any given kernel-instance; hence
defining a clear association between memory objects, kernel-instances and
devices.
Programmers may anticipate these associations in their programs and
explicitly manage association of memory objects with devices in order to
improve performance.</p>
</div>
</div>
<div class="sect3">
<h4 id="shared-virtual-memory">3.3.3. Memory Model: Shared Virtual Memory</h4>
<div class="paragraph">
<p>OpenCL extends the global memory region into the host memory region through
a shared virtual memory (SVM) mechanism.
There are three types of SVM in OpenCL</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>Coarse-Grained buffer SVM</strong>: Sharing occurs at the granularity of
regions of OpenCL buffer memory objects.
Consistency is enforced at synchronization points and with map/unmap
commands to drive updates between the host and the device.
This form of SVM is similar to non-SVM use of memory; however, it lets
kernel-instances share pointer-based data structures (such as
linked-lists) with the host program.
Program scope global variables are treated as per-device coarse-grained
SVM for addressing and sharing purposes.</p>
</li>
<li>
<p><strong>Fine-Grained buffer SVM</strong>: Sharing occurs at the granularity of
individual loads/stores into bytes within OpenCL buffer memory objects.
Loads and stores may be cached.
This means consistency is guaranteed at synchronization points.
If the optional OpenCL atomics are supported, they can be used to
provide fine-grained control of memory consistency.</p>
</li>
<li>
<p><strong>Fine-Grained system SVM</strong>: Sharing occurs at the granularity of
individual loads/stores into bytes occurring anywhere within the host
memory.
Loads and stores may be cached so consistency is guaranteed at
synchronization points.
If the optional OpenCL atomics are supported, they can be used to
provide fine-grained control of memory consistency.</p>
</li>
</ul>
</div>
<table id="svm-summary-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 2. A summary of shared virtual memory (SVM) options in OpenCL</caption>
<colgroup>
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
<col style="width: 20%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-center valign-top"></th>
<th class="tableblock halign-center valign-top">Granularity of sharing</th>
<th class="tableblock halign-center valign-top">Memory Allocation</th>
<th class="tableblock halign-center valign-top">Mechanisms to enforce Consistency</th>
<th class="tableblock halign-center valign-top">Explicit updates between host and device</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">Non-SVM buffers</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">OpenCL Memory objects(buffer)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock"><strong>clCreateBuffer</strong></p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Host synchronization points on the same or between devices.</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">yes, through Map and Unmap commands.</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">Coarse-Grained buffer SVM</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">OpenCL Memory objects (buffer)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock"><strong>clSVMAlloc</strong></p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Host synchronization points between devices</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">yes, through Map and Unmap commands.</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">Fine Grained buffer SVM</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Bytes within OpenCL Memory objects (buffer)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock"><strong>clSVMAlloc</strong></p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Synchronization points plus atomics (if supported)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">No</p></td>
</tr>
<tr>
<td class="tableblock halign-center valign-top"><p class="tableblock">Fine-Grained system SVM</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Bytes within Host memory (system)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Host memory allocation mechanisms (e.g. malloc)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">Synchronization points plus atomics (if supported)</p></td>
<td class="tableblock halign-center valign-top"><p class="tableblock">No</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>Coarse-Grained buffer SVM is required in the core OpenCL specification.
The two finer grained approaches are optional features in OpenCL.
The various SVM mechanisms to access host memory from the work-items
associated with a kernel instance are <a href="#svm-summary-table">summarized
above</a>.</p>
</div>
</div>
<div class="sect3">
<h4 id="memory-consistency-model">3.3.4. Memory Model: Memory Consistency Model</h4>
<div class="paragraph">
<p>The OpenCL memory model tells programmers what they can expect from an
OpenCL implementation; which memory operations are guaranteed to happen in
which order and which memory values each read operation will return.
The memory model tells compiler writers which restrictions they must follow
when implementing compiler optimizations; which variables they can cache in
registers and when they can move reads or writes around a barrier or atomic
operation.
The memory model also tells hardware designers about limitations on hardware
optimizations; for example, when they must flush or invalidate hardware
caches.</p>
</div>
<div class="paragraph">
<p>The memory consistency model in OpenCL is based on the memory model from the
ISO C11 programming language.
To help make the presentation more precise and self-contained, we include
modified paragraphs taken verbatim from the ISO C11 international standard.
When a paragraph is taken or modified from the C11 standard, it is
identified as such along with its original location in the <a href="#iso-c11">C11
standard</a>.</p>
</div>
<div class="paragraph">
<p>For programmers, the most intuitive model is the <em>sequential consistency</em>
memory model.
Sequential consistency interleaves the steps executed by each of the units
of execution.
Each access to a memory location sees the last assignment to that location
in that interleaving.
While sequential consistency is relatively straightforward for a programmer
to reason about, implementing sequential consistency is expensive.
Therefore, OpenCL implements a relaxed memory consistency model; i.e. it is
possible to write programs where the loads from memory violate sequential
consistency.
Fortunately, if a program does not contain any races and if the program only
uses atomic operations that utilize the sequentially consistent memory order
(the default memory ordering for OpenCL), OpenCL programs appear to execute
with sequential consistency.</p>
</div>
<div class="paragraph">
<p>Programmers can to some degree control how the memory model is relaxed by
choosing the memory order for synchronization operations.
The precise semantics of synchronization and the memory orders are formally
defined in <a href="#memory-ordering-rules">Memory Ordering Rules</a>.
Here, we give a high level description of how these memory orders apply to
atomic operations on atomic objects shared between units of execution.
OpenCL memory_order choices are based on those from the ISO C11 standard
memory model.
They are specified in certain OpenCL functions through the following
enumeration constants:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>memory_order_relaxed</strong>: implies no order constraints.
This memory order can be used safely to increment counters that are
concurrently incremented, but it doesnt guarantee anything about the
ordering with respect to operations to other memory locations.
It can also be used, for example, to do ticket allocation and by expert
programmers implementing lock-free algorithms.</p>
</li>
<li>
<p><strong>memory_order_acquire</strong>: A synchronization operation (fence or atomic)
that has acquire semantics "acquires" side-effects from a release
operation that synchronises with it: if an acquire synchronises with a
release, the acquiring unit of execution will see all side-effects
preceding that release (and possibly subsequent side-effects.) As part
of carefully-designed protocols, programmers can use an "acquire" to
safely observe the work of another unit of execution.</p>
</li>
<li>
<p><strong>memory_order_release</strong>: A synchronization operation (fence or atomic
operation) that has release semantics "releases" side effects to an
acquire operation that synchronises with it.
All side effects that precede the release are included in the release.
As part of carefully-designed protocols, programmers can use a "release"
to make changes made in one unit of execution visible to other units of
execution.</p>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
In general, no acquire must <em>always</em> synchronise with any particular
release.
However, synchronisation can be forced by certain executions.
See <a href="#memory-ordering-fence">Memory Order Rules: Fence Operations</a> for
detailed rules for when synchronisation must occur.
</td>
</tr>
</table>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>memory_order_acq_rel</strong>: A synchronization operation with acquire-release
semantics has the properties of both the acquire and release memory
orders.
It is typically used to order read-modify-write operations.</p>
</li>
<li>
<p><strong>memory_order_seq_cst</strong>: The loads and stores of each unit of execution
appear to execute in program (i.e., sequenced-before) order, and the
loads and stores from different units of execution appear to be simply
interleaved.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Regardless of which memory_order is specified, resolving constraints on
memory operations across a heterogeneous platform adds considerable overhead
to the execution of a program.
An OpenCL platform may be able to optimize certain operations that depend on
the features of the memory consistency model by restricting the scope of the
memory operations.
Distinct memory scopes are defined by the values of the memory_scope
enumeration constant:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>memory_scope_work_item</strong>: memory-ordering constraints only apply within
the work-item<sup>1</sup>.</p>
<div class="openblock">
<div class="content">
<div class="dlist">
<dl>
<dt class="hdlist1">1</dt>
<dd>
<p>This value for memory_scope can only be used with atomic_work_item_fence
with flags set to CLK_IMAGE_MEM_FENCE.</p>
</dd>
</dl>
</div>
</div>
</div>
</li>
<li>
<p><strong>memory_scope_sub_group</strong>:memory-ordering constraints only apply within
the sub-group.</p>
</li>
<li>
<p><strong>memory_scope_work_group</strong>: memory-ordering constraints only apply to
work-items executing within a single work-group.</p>
</li>
<li>
<p><strong>memory_scope_device:</strong> memory-ordering constraints only apply to
work-items executing on a single device</p>
</li>
<li>
<p><strong>memory_scope_all_svm_devices</strong>: memory-ordering constraints apply to
work-items executing across multiple devices and (when using SVM) the
host.
A release performed with <strong>memory_scope_all_svm_devices</strong> to a buffer that
does not have the CL_MEM_SVM_ATOMICS flag set will commit to at least
<strong>memory_scope_device</strong> visibility, with full synchronization of the
buffer at a queue synchronization point (e.g. an OpenCL event).</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>These memory scopes define a hierarchy of visibilities when analyzing the
ordering constraints of memory operations.
For example if a programmer knows that a sequence of memory operations will
only be associated with a collection of work-items from a single work-group
(and hence will run on a single device), the implementation is spared the
overhead of managing the memory orders across other devices within the same
context.
This can substantially reduce overhead in a program.
All memory scopes are valid when used on global memory or local memory.
For local memory, all visibility is constrained to within a given work-group
and scopes wider than <strong>memory_scope_work_group</strong> carry no additional meaning.</p>
</div>
<div class="paragraph">
<p>In the following subsections (leading up to <a href="#opencl-framework">OpenCL
Framework</a>), we will explain the synchronization constructs and detailed
rules needed to use OpenCL&#8217;s relaxed memory models.
It is important to appreciate, however, that many programs do not benefit
from relaxed memory models.
Even expert programmers have a difficult time using atomics and fences to
write correct programs with relaxed memory models.
A large number of OpenCL programs can be written using a simplified memory
model.
This is accomplished by following these guidelines.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Write programs that manage safe sharing of global memory objects through
the synchronization points defined by the command queues.</p>
</li>
<li>
<p>Restrict low level synchronization inside work-groups to the work-group
functions such as barrier.</p>
</li>
<li>
<p>If you want sequential consistency behavior with system allocations or
fine-grain SVM buffers with atomics support, use only
<strong>memory_order_seq_cst</strong> operations with the scope
<strong>memory_scope_all_svm_devices</strong>.</p>
</li>
<li>
<p>If you want sequential consistency behavior when not using system
allocations or fine-grain SVM buffers with atomics support, use only
<strong>memory_order_seq_cst</strong> operations with the scope <strong>memory_scope_device</strong>
or <strong>memory_scope_all_svm_devices</strong>.</p>
</li>
<li>
<p>Ensure your program has no races.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If these guidelines are followed in your OpenCL programs, you can skip the
detailed rules behind the relaxed memory models and go directly to
<a href="#opencl-framework">OpenCL Framework</a>.</p>
</div>
</div>
<div class="sect3">
<h4 id="_memory_model_overview_of_atomic_and_fence_operations">3.3.5. Memory Model: Overview of atomic and fence operations</h4>
<div class="paragraph">
<p>The OpenCL 2.0 specification defines a number of <em>synchronization
operations</em> that are used to define memory order constraints in a program.
They play a special role in controlling how memory operations in one unit of
execution (such as work-items or, when using SVM a host thread) are made
visible to another.
There are two types of synchronization operations in OpenCL; <em>atomic
operations</em> and <em>fences</em>.</p>
</div>
<div class="paragraph">
<p>Atomic operations are indivisible.
They either occur completely or not at all.
These operations are used to order memory operations between units of
execution and hence they are parameterized with the memory_order and
memory_scope parameters defined by the OpenCL memory consistency model.
The atomic operations for OpenCL kernel languages are similar to the
corresponding operations defined by the C11 standard.</p>
</div>
<div class="paragraph">
<p>The OpenCL 2.0 atomic operations apply to variables of an atomic type (a
subset of those in the C11 standard) including atomic versions of the int,
uint, long, ulong, float, double, half, intptr_t, uintptr_t, size_t, and
ptrdiff_t types.
However, support for some of these atomic types depends on support for the
corresponding regular types.</p>
</div>
<div class="paragraph">
<p>An atomic operation on one or more memory locations is either an acquire
operation, a release operation, or both an acquire and release operation.
An atomic operation without an associated memory location is a fence and can
be either an acquire fence, a release fence, or both an acquire and release
fence.
In addition, there are relaxed atomic operations, which do not have
synchronization properties, and atomic read-modify-write operations, which
have special characteristics.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 5, modified.]</a></p>
</div>
<div class="paragraph">
<p>The orders <strong>memory_order_acquire</strong> (used for reads), <strong>memory_order_release</strong>
(used for writes), and <strong>memory_order_acq_rel</strong> (used for read-modify-write
operations) are used for simple communication between units of execution
using shared variables.
Informally, executing a <strong>memory_order_release</strong> on an atomic object A makes
all previous side effects visible to any unit of execution that later
executes a <strong>memory_order_acquire</strong> on A.
The orders <strong>memory_order_acquire</strong>, <strong>memory_order_release</strong>, and
<strong>memory_order_acq_rel</strong> do not provide sequential consistency for race-free
programs because they will not ensure that atomic stores followed by atomic
loads become visible to other threads in that order.</p>
</div>
<div id="atomic-fence-orders" class="paragraph">
<p>The fence operation is atomic_work_item_fence, which includes a memory_order
argument as well as the memory_scope and cl_mem_fence_flags arguments.
Depending on the memory_order argument, this operation:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>has no effects, if <strong>memory_order_relaxed</strong>;</p>
</li>
<li>
<p>is an acquire fence, if <strong>memory_order_acquire</strong>;</p>
</li>
<li>
<p>is a release fence, if <strong>memory_order_release</strong>;</p>
</li>
<li>
<p>is both an acquire fence and a release fence, if <strong>memory_order_acq_rel</strong>;</p>
</li>
<li>
<p>is a sequentially-consistent fence with both acquire and release
semantics, if <strong>memory_order_seq_cst</strong>.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If specified, the cl_mem_fence_flags argument must be CLK_IMAGE_MEM_FENCE,
CLK_GLOBAL_MEM_FENCE, CLK_LOCAL_MEM_FENCE, or CLK_GLOBAL_MEM_FENCE |
CLK_LOCAL_MEM_FENCE.</p>
</div>
<div class="paragraph">
<p>The atomic_work_item_fence(CLK_IMAGE_MEM_FENCE) built-in function must be
used to make sure that sampler-less writes are visible to later reads by the
same work-item.
Without use of the atomic_work_item_fence function, write-read coherence on
image objects is not guaranteed: if a work-item reads from an image to which
it has previously written without an intervening atomic_work_item_fence, it
is not guaranteed that those previous writes are visible to the work-item.</p>
</div>
<div class="paragraph">
<p>The synchronization operations in OpenCL can be parameterized by a
memory_scope.
Memory scopes control the extent that an atomic operation or fence is
visible with respect to the memory model.
These memory scopes may be used when performing atomic operations and fences
on global memory and local memory.
When used on global memory visibility is bounded by the capabilities of that
memory.
When used on a fine-grained non-atomic SVM buffer, a coarse-grained SVM
buffer, or a non-SVM buffer, operations parameterized with
<strong>memory_scope_all_svm_devices</strong> will behave as if they were parameterized
with <strong>memory_scope_device</strong>.
When used on local memory, visibility is bounded by the work-group and, as a
result, memory_scope with wider visibility than <strong>memory_scope_work_group</strong>
will be reduced to <strong>memory_scope_work_group</strong>.</p>
</div>
<div class="paragraph">
<p>Two actions <strong>A</strong> and <strong>B</strong> are defined to have an inclusive scope if they have
the same scope <strong>P</strong> such that:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>P</strong> is <strong>memory_scope_sub_group</strong> and <strong>A</strong> and <strong>B</strong> are executed by
work-items within the same sub-group.</p>
</li>
<li>
<p><strong>P</strong> is <strong>memory_scope_work_group</strong> and <strong>A</strong> and <strong>B</strong> are executed by
work-items within the same work-group.</p>
</li>
<li>
<p><strong>P</strong> is <strong>memory_scope_device</strong> and <strong>A</strong> and <strong>B</strong> are executed by work-items
on the same device when <strong>A</strong> and <strong>B</strong> apply to an SVM allocation or <strong>A</strong>
and <strong>B</strong> are executed by work-items in the same kernel or one of its
children when <strong>A</strong> and <strong>B</strong> apply to a cl_mem buffer.</p>
</li>
<li>
<p><strong>P</strong> is <strong>memory_scope_all_svm_devices</strong> if <strong>A</strong> and <strong>B</strong> are executed by
host threads or by work-items on one or more devices that can share SVM
memory with each other and the host process.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="memory-ordering-rules">3.3.6. Memory Model: Memory Ordering Rules</h4>
<div class="paragraph">
<p>Fundamentally, the issue in a memory model is to understand the orderings in
time of modifications to objects in memory.
Modifying an object or calling a function that modifies an object are side
effects, i.e. changes in the state of the execution environment.
Evaluation of an expression in general includes both value computations and
initiation of side effects.
Value computation for an lvalue expression includes determining the identity
of the designated object.
<a href="#iso-c11">[C11 standard, Section 5.1.2.3, paragraph 2, modified.]</a></p>
</div>
<div class="paragraph">
<p>We assume that the OpenCL kernel language and host programming languages
have a sequenced-before relation between the evaluations executed by a
single unit of execution.
This sequenced-before relation is an asymmetric, transitive, pair-wise
relation between those evaluations, which induces a partial order among
them.
Given any two evaluations <strong>A</strong> and <strong>B</strong>, if <strong>A</strong> is sequenced-before <strong>B</strong>, then
the execution of <strong>A</strong> shall precede the execution of <strong>B</strong>.
(Conversely, if <strong>A</strong> is sequenced-before <strong>B</strong>, then <strong>B</strong> is sequenced-after
<strong>A</strong>.) If <strong>A</strong> is not sequenced-before or sequenced-after <strong>B</strong>, then <strong>A</strong> and
<strong>B</strong> are unsequenced.
Evaluations <strong>A</strong> and <strong>B</strong> are indeterminately sequenced when <strong>A</strong> is either
sequenced-before or sequenced-after <strong>B</strong>, but it is unspecified which.
<a href="#iso-c11">[C11 standard, Section 5.1.2.3, paragraph 3, modified.]</a></p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
Sequenced-before is a partial order of the operations executed by a
single unit of execution (e.g. a host thread or work-item).
It generally corresponds to the source program order of those operations,
and is partial because of the undefined argument evaluation order of OpenCLs
kernel C language.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>In an OpenCL kernel language, the value of an object visible to a work-item
W at a particular point is the initial value of the object, a value stored
in the object by W, or a value stored in the object by another work-item or
host thread, according to the rules below.
Depending on details of the host programming language, the value of an
object visible to a host thread may also be the value stored in that object
by another work-item or host thread.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 2, modified.]</a></p>
</div>
<div class="paragraph">
<p>Two expression evaluations conflict if one of them modifies a memory
location and the other one reads or modifies the same memory location.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 4.]</a></p>
</div>
<div class="paragraph">
<p>All modifications to a particular atomic object <strong>M</strong> occur in some particular
total order, called the modification order of <strong>M</strong>.
If <strong>A</strong> and <strong>B</strong> are modifications of an atomic object <strong>M</strong>, and <strong>A</strong>
happens-before <strong>B</strong>, then <strong>A</strong> shall precede <strong>B</strong> in the modification order of
<strong>M</strong>, which is defined below.
Note that the modification order of an atomic object <strong>M</strong> is independent of
whether <strong>M</strong> is in local or global memory.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 7, modified.]</a></p>
</div>
<div class="paragraph">
<p>A release sequence begins with a release operation <strong>A</strong> on an atomic object
<strong>M</strong> and is the maximal contiguous sub-sequence of side effects in the
modification order of <strong>M</strong>, where the first operation is <strong>A</strong> and every
subsequent operation either is performed by the same work-item or host
thread that performed the release or is an atomic read-modify-write
operation.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 10, modified.]</a></p>
</div>
<div class="paragraph">
<p>OpenCLs local and global memories are disjoint.
Kernels may access both kinds of memory while host threads may only access
global memory.
Furthermore, the <em>flags</em> argument of OpenCLs work_group_barrier function
specifies which memory operations the function will make visible: these
memory operations can be, for example, just the ones to local memory, or the
ones to global memory, or both.
Since the visibility of memory operations can be specified for local memory
separately from global memory, we define two related but independent
relations, <em>global-synchronizes-with</em> and <em>local-synchronizes-with</em>.
Certain operations on global memory may global-synchronize-with other
operations performed by another work-item or host thread.
An example is a release atomic operation in one work- item that
global-synchronizes-with an acquire atomic operation in a second work-item.
Similarly, certain atomic operations on local objects in kernels can
local-synchronize- with other atomic operations on those local objects.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 11, modified.]</a></p>
</div>
<div class="paragraph">
<p>We define two separate happens-before relations: global-happens-before and
local-happens-before.</p>
</div>
<div class="paragraph">
<p>A global memory action <strong>A</strong> global-happens-before a global memory action <strong>B</strong>
if</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>A</strong> is sequenced before <strong>B</strong>, or</p>
</li>
<li>
<p><strong>A</strong> global-synchronizes-with <strong>B</strong>, or</p>
</li>
<li>
<p>For some global memory action <strong>C</strong>, <strong>A</strong> global-happens-before <strong>C</strong> and <strong>C</strong>
global-happens-before <strong>B</strong>.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>A local memory action <strong>A</strong> local-happens-before a local memory action <strong>B</strong> if</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>A</strong> is sequenced before <strong>B</strong>, or</p>
</li>
<li>
<p><strong>A</strong> local-synchronizes-with <strong>B</strong>, or</p>
</li>
<li>
<p>For some local memory action <strong>C</strong>, <strong>A</strong> local-happens-before <strong>C</strong> and <strong>C</strong>
local-happens-before <strong>B</strong>.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>An OpenCL implementation shall ensure that no program execution demonstrates
a cycle in either the local-happens-before relation or the
global-happens-before relation.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
The global- and local-happens-before relations are critical to
defining what values are read and when data races occur.
The global-happens-before relation, for example, defines what global memory
operations definitely happen before what other global memory operations.
If an operation <strong>A</strong> global-happens-before operation <strong>B</strong> then <strong>A</strong> must occur
before <strong>B</strong>; in particular, any write done by <strong>A</strong> will be visible to <strong>B</strong>.
The local-happens-before relation has similar properties for local memory.
Programmers can use the local- and global-happens-before relations to reason
about the order of program actions.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>A visible side effect <strong>A</strong> on a global object <strong>M</strong> with respect to a value
computation <strong>B</strong> of <strong>M</strong> satisfies the conditions:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>A</strong> global-happens-before <strong>B</strong>, and</p>
</li>
<li>
<p>there is no other side effect <strong>X</strong> to <strong>M</strong> such that <strong>A</strong>
global-happens-before <strong>X</strong> and <strong>X</strong> global-happens-before <strong>B</strong>.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>We define visible side effects for local objects <strong>M</strong> similarly.
The value of a non-atomic scalar object <strong>M</strong>, as determined by evaluation
<strong>B</strong>, shall be the value stored by the visible side effect <strong>A</strong>.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 19, modified.]</a></p>
</div>
<div class="paragraph">
<p>The execution of a program contains a data race if it contains two
conflicting actions <strong>A</strong> and <strong>B</strong> in different units of execution, and</p>
</div>
<div class="ulist">
<ul>
<li>
<p>(1) at least one of <strong>A</strong> or <strong>B</strong> is not atomic, or <strong>A</strong> and <strong>B</strong> do not have
inclusive memory scope, and</p>
</li>
<li>
<p>(2) the actions are global actions unordered by the
global-happens-before relation or are local actions unordered by the
local-happens-before relation.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Any such data race results in undefined behavior.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 25, modified.]</a></p>
</div>
<div class="paragraph">
<p>We also define the visible sequence of side effects on local and global
atomic objects.
The remaining paragraphs of this subsection define this sequence for a
global atomic object <strong>M</strong>; the visible sequence of side effects for a local
atomic object is defined similarly by using the local-happens-before
relation.</p>
</div>
<div class="paragraph">
<p>The visible sequence of side effects on a global atomic object <strong>M</strong>, with
respect to a value computation <strong>B</strong> of <strong>M</strong>, is a maximal contiguous
sub-sequence of side effects in the modification order of <strong>M</strong>, where the
first side effect is visible with respect to <strong>B</strong>, and for every side effect,
it is not the case that <strong>B</strong> global-happens-before it.
The value of <strong>M</strong>, as determined by evaluation <strong>B</strong>, shall be the value stored
by some operation in the visible sequence of <strong>M</strong> with respect to <strong>B</strong>.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 22, modified.]</a></p>
</div>
<div class="paragraph">
<p>If an operation <strong>A</strong> that modifies an atomic object <strong>M</strong> global-happens before
an operation <strong>B</strong> that modifies <strong>M</strong>, then <strong>A</strong> shall be earlier than <strong>B</strong> in
the modification order of <strong>M</strong>.
This requirement is known as write-write coherence.</p>
</div>
<div class="paragraph">
<p>If a value computation <strong>A</strong> of an atomic object <strong>M</strong> global-happens-before a
value computation <strong>B</strong> of <strong>M</strong>, and <strong>A</strong> takes its value from a side effect <strong>X</strong>
on <strong>M</strong>, then the value computed by <strong>B</strong> shall either equal the value stored
by <strong>X</strong>, or be the value stored by a side effect <strong>Y</strong> on <strong>M</strong>, where <strong>Y</strong>
follows <strong>X</strong> in the modification order of <strong>M</strong>.
This requirement is known as read-read coherence.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 22, modified.]</a></p>
</div>
<div class="paragraph">
<p>If a value computation <strong>A</strong> of an atomic object <strong>M</strong> global-happens-before an
operation <strong>B</strong> on <strong>M</strong>, then <strong>A</strong> shall take its value from a side effect <strong>X</strong>
on <strong>M</strong>, where <strong>X</strong> precedes <strong>B</strong> in the modification order of <strong>M</strong>.
This requirement is known as read-write coherence.</p>
</div>
<div class="paragraph">
<p>If a side effect <strong>X</strong> on an atomic object <strong>M</strong> global-happens-before a value
computation <strong>B</strong> of <strong>M</strong>, then the evaluation <strong>B</strong> shall take its value from
<strong>X</strong> or from a side effect <strong>Y</strong> that follows <strong>X</strong> in the modification order of
<strong>M</strong>.
This requirement is known as write-read coherence.</p>
</div>
<div class="sect4">
<h5 id="_memory_ordering_rules_atomic_operations">Memory Ordering Rules: Atomic Operations</h5>
<div class="paragraph">
<p>This and following sections describe how different program actions in kernel
C code and the host program contribute to the local- and
global-happens-before relations.
This section discusses ordering rules for OpenCL 2.0 atomic operations.</p>
</div>
<div class="paragraph">
<p><a href="#device-side-enqueue">Device-side enqueue</a> defines the enumerated type
memory_order.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>For <strong>memory_order_relaxed</strong>, no operation orders memory.</p>
</li>
<li>
<p>For <strong>memory_order_release</strong>, <strong>memory_order_acq_rel</strong>, and
<strong>memory_order_seq_cst</strong>, a store operation performs a release operation
on the affected memory location.</p>
</li>
<li>
<p>For <strong>memory_order_acquire</strong>, <strong>memory_order_acq_rel</strong>, and
<strong>memory_order_seq_cst</strong>, a load operation performs an acquire operation
on the affected memory location.
<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraphs 2-4, modified.]</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Certain built-in functions synchronize with other built-in functions
performed by another unit of execution.
This is true for pairs of release and acquire operations under specific
circumstances.
An atomic operation <strong>A</strong> that performs a release operation on a global object
<strong>M</strong> global-synchronizes-with an atomic operation <strong>B</strong> that performs an
acquire operation on <strong>M</strong> and reads a value written by any side effect in the
release sequence headed by <strong>A</strong>.
A similar rule holds for atomic operations on objects in local memory: an
atomic operation <strong>A</strong> that performs a release operation on a local object <strong>M</strong>
local-synchronizes-with an atomic operation <strong>B</strong> that performs an acquire
operation on <strong>M</strong> and reads a value written by any side effect in the release
sequence headed by <strong>A</strong>.
<a href="#iso-c11">[C11 standard, Section 5.1.2.4, paragraph 11, modified.]</a></p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
Atomic operations specifying <strong>memory_order_relaxed</strong> are relaxed only
with respect to memory ordering.
Implementations must still guarantee that any given atomic access to a
particular atomic object be indivisible with respect to all other atomic
accesses to that object.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>There shall exist a single total order <strong>S</strong> for all <strong>memory_order_seq_cst</strong>
operations that is consistent with the modification orders for all affected
locations, as well as the appropriate global-happens-before and
local-happens-before orders for those locations, such that each
<strong>memory_order_seq</strong> operation <strong>B</strong> that loads a value from an atomic object
<strong>M</strong> in global or local memory observes one of the following values:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>the result of the last modification <strong>A</strong> of <strong>M</strong> that precedes <strong>B</strong> in <strong>S</strong>,
if it exists, or</p>
</li>
<li>
<p>if <strong>A</strong> exists, the result of some modification of <strong>M</strong> in the visible
sequence of side effects with respect to <strong>B</strong> that is not
<strong>memory_order_seq_cst</strong> and that does not happen before <strong>A</strong>, or</p>
</li>
<li>
<p>if <strong>A</strong> does not exist, the result of some modification of <strong>M</strong> in the
visible sequence of side effects with respect to <strong>B</strong> that is not
<strong>memory_order_seq_cst</strong>.
<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 6, modified.]</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Let X and Y be two <strong>memory_order_seq_cst</strong> operations.
If X local-synchronizes-with or global-synchronizes-with Y then X both
local-synchronizes-with Y and global-synchronizes-with Y.</p>
</div>
<div class="paragraph">
<p>If the total order <strong>S</strong> exists, the following rules hold:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>For an atomic operation <strong>B</strong> that reads the value of an atomic object
<strong>M</strong>, if there is a <strong>memory_order_seq_cst</strong> fence <strong>X</strong> sequenced-before
<strong>B</strong>, then <strong>B</strong> observes either the last <strong>memory_order_seq_cst</strong>
modification of <strong>M</strong> preceding <strong>X</strong> in the total order <strong>S</strong> or a later
modification of <strong>M</strong> in its modification order.
<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 9.]</a></p>
</li>
<li>
<p>For atomic operations <strong>A</strong> and <strong>B</strong> on an atomic object <strong>M</strong>, where <strong>A</strong>
modifies <strong>M</strong> and <strong>B</strong> takes its value, if there is a
<strong>memory_order_seq_cst</strong> fence <strong>X</strong> such that <strong>A</strong> is sequenced-before <strong>X</strong>
and <strong>B</strong> follows <strong>X</strong> in <strong>S</strong>, then <strong>B</strong> observes either the effects of <strong>A</strong>
or a later modification of <strong>M</strong> in its modification order.
<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 10.]</a></p>
</li>
<li>
<p>For atomic operations <strong>A</strong> and <strong>B</strong> on an atomic object <strong>M</strong>, where <strong>A</strong>
modifies <strong>M</strong> and <strong>B</strong> takes its value, if there are
<strong>memory_order_seq_cst</strong> fences <strong>X</strong> and <strong>Y</strong> such that <strong>A</strong> is
sequenced-before <strong>X</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>X</strong> precedes <strong>Y</strong>
in <strong>S</strong>, then <strong>B</strong> observes either the effects of <strong>A</strong> or a later
modification of <strong>M</strong> in its modification order.
<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 11.]</a></p>
</li>
<li>
<p>For atomic operations <strong>A</strong> and <strong>B</strong> on an atomic object <strong>M</strong>, if there are
<strong>memory_order_seq_cst</strong> fences <strong>X</strong> and <strong>Y</strong> such that <strong>A</strong> is
sequenced-before <strong>X</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>X</strong> precedes <strong>Y</strong>
in <strong>S</strong>, then <strong>B</strong> occurs later than <strong>A</strong> in the modification order of <strong>M</strong>.</p>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<strong>memory_order_seq_cst</strong> ensures sequential consistency only for a
program that is (1) free of data races, and (2) exclusively uses
<strong>memory_order_seq_cst</strong> synchronization operations.
Any use of weaker ordering will invalidate this guarantee unless extreme
care is used.
In particular, <strong>memory_order_seq_cst</strong> fences ensure a total order only for
the fences themselves.
Fences cannot, in general, be used to restore sequential consistency for
atomic operations with weaker ordering specifications.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>Atomic read-modify-write operations should always read the last value (in
the modification order) stored before the write associated with the
read-modify-write operation.
<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 12.]</a></p>
</div>
<div class="paragraph">
<p><span class="underline">Implementations should ensure that no "out-of-thin-air" values
are computed that circularly depend on their own computation.</span></p>
</div>
<div class="paragraph">
<p>Note: Under the rules described above, and independent to the previously
footnoted C++ issue, it is known that <em>x == y == 42</em> is a valid final state
in the following problematic example:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">global atomic_int x = ATOMIC_VAR_INIT(<span class="integer">0</span>);
local atomic_int y = ATOMIC_VAR_INIT(<span class="integer">0</span>);
<span class="label">unit_of_execution_1:</span>
... [execution not reading or writing x or y, leading up to:]
<span class="predefined-type">int</span> t = atomic_load_explicit(&amp;y, memory_order_acquire);
atomic_store_explicit(&amp;x, t, memory_order_release);
<span class="label">unit_of_execution_2:</span>
... [execution not reading or writing x or y, leading up to:]
<span class="predefined-type">int</span> t = atomic_load_explicit(&amp;x, memory_order_acquire);
atomic_store_explicit(&amp;y, t, memory_order_release);</code></pre>
</div>
</div>
<div class="paragraph">
<p>This is not useful behavior and implementations should not exploit this
phenomenon.
It should be expected that in the future this may be disallowed by
appropriate updates to the memory model description by the OpenCL committee.</p>
</div>
<div class="paragraph">
<p>Implementations should make atomic stores visible to atomic loads within a
reasonable amount of time.
<a href="#iso-c11">[C11 standard, Section 7.17.3, paragraph 16.]</a></p>
</div>
<div class="paragraph">
<p>As long as the following conditions are met, a host program sharing SVM
memory with a kernel executing on one or more OpenCL devices may use atomic
and synchronization operations to ensure that its assignments, and those of
the kernel, are visible to each other:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Either fine-grained buffer or fine-grained system SVM must be used to
share memory.
While coarse-grained buffer SVM allocations may support atomic
operations, visibility on these allocations is not guaranteed except at
map and unmap operations.</p>
</li>
<li>
<p>The optional OpenCL 2.0 SVM atomic-controlled visibility specified by
provision of the CL_MEM_SVM_ATOMICS flag must be supported by the device
and the flag provided to the SVM buffer on allocation.</p>
</li>
<li>
<p>The host atomic and synchronization operations must be compatible with
those of an OpenCL kernel language.
This requires that the size and representation of the data types that
the host atomic operations act on be consistent with the OpenCL kernel
language atomic types.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>If these conditions are met, the host operations will apply at
all_svm_devices scope.</p>
</div>
</div>
<div class="sect4">
<h5 id="memory-ordering-fence">Memory Ordering Rules: Fence Operations</h5>
<div class="paragraph">
<p>This section describes how the OpenCL 2.0 fence operations contribute to the
local- and global-happens-before relations.</p>
</div>
<div class="paragraph">
<p>Earlier, we introduced synchronization primitives called fences.
Fences can utilize the acquire memory_order, release memory_order, or both.
A fence with acquire semantics is called an acquire fence; a fence with
release semantics is called a release fence. The <a href="#atomic-fence-orders">overview of atomic and fence operations</a> section decribes the memory orders
that result in acquire and release fences.</p>
</div>
<div class="paragraph">
<p>A global release fence <strong>A</strong> global-synchronizes-with a global acquire fence
<strong>B</strong> if there exist atomic operations <strong>X</strong> and <strong>Y</strong>, both operating on some
global atomic object <strong>M</strong>, such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong>
modifies <strong>M</strong>, <strong>Y</strong> is sequenced-before <strong>B</strong>, <strong>Y</strong> reads the value written by
<strong>X</strong> or a value written by any side effect in the hypothetical release
sequence <strong>X</strong> would head if it were a release operation, and that the scopes
of <strong>A</strong>, <strong>B</strong> are inclusive.
<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 2, modified.]</a></p>
</div>
<div class="paragraph">
<p>A global release fence <strong>A</strong> global-synchronizes-with an atomic operation <strong>B</strong>
that performs an acquire operation on a global atomic object <strong>M</strong> if there
exists an atomic operation <strong>X</strong> such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong>
modifies <strong>M</strong>, <strong>B</strong> reads the value written by <strong>X</strong> or a value written by any
side effect in the hypothetical release sequence <strong>X</strong> would head if it were a
release operation, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 3, modified.]</a></p>
</div>
<div class="paragraph">
<p>An atomic operation <strong>A</strong> that is a release operation on a global atomic
object <strong>M</strong> global-synchronizes-with a global acquire fence <strong>B</strong> if there
exists some atomic operation <strong>X</strong> on <strong>M</strong> such that <strong>X</strong> is sequenced-before
<strong>B</strong> and reads the value written by <strong>A</strong> or a value written by any side effect
in the release sequence headed by <strong>A</strong>, and the scopes of <strong>A</strong> and <strong>B</strong> are
inclusive.
<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 4, modified.]</a></p>
</div>
<div class="paragraph">
<p>A local release fence <strong>A</strong> local-synchronizes-with a local acquire fence <strong>B</strong>
if there exist atomic operations <strong>X</strong> and <strong>Y</strong>, both operating on some local
atomic object <strong>M</strong>, such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong> modifies <strong>M</strong>,
<strong>Y</strong> is sequenced-before <strong>B</strong>, and <strong>Y</strong> reads the value written by <strong>X</strong> or a
value written by any side effect in the hypothetical release sequence <strong>X</strong>
would head if it were a</p>
</div>
<div class="paragraph">
<p>release operation, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 2, modified.]</a></p>
</div>
<div class="paragraph">
<p>A local release fence <strong>A</strong> local-synchronizes-with an atomic operation <strong>B</strong>
that performs an acquire operation on a local atomic object <strong>M</strong> if there
exists an atomic operation <strong>X</strong> such that <strong>A</strong> is sequenced-before <strong>X</strong>, <strong>X</strong>
modifies <strong>M</strong>, and <strong>B</strong> reads the value written by <strong>X</strong> or a value written by
any side effect in the hypothetical release sequence <strong>X</strong> would head if it
were a release operation, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 3, modified.]</a></p>
</div>
<div class="paragraph">
<p>An atomic operation <strong>A</strong> that is a release operation on a local atomic object
<strong>M</strong> local-synchronizes-with a local acquire fence <strong>B</strong> if there exists some
atomic operation <strong>X</strong> on <strong>M</strong> such that <strong>X</strong> is sequenced-before <strong>B</strong> and reads
the value written by <strong>A</strong> or a value written by any side effect in the
release sequence headed by <strong>A</strong>, and the scopes of <strong>A</strong> and <strong>B</strong> are inclusive.
<a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 4, modified.]</a></p>
</div>
<div class="paragraph">
<p>Let <strong>X</strong> and <strong>Y</strong> be two work item fences that each have both the
CLK_GLOBAL_MEM_FENCE and CLK_LOCAL_MEM_FENCE flags set.
<strong>X</strong> global-synchronizes-with <strong>Y</strong> and <strong>X</strong> local synchronizes with <strong>Y</strong> if the
conditions required for <strong>X</strong> to global-synchronize with <strong>Y</strong> are met, the
conditions required for <strong>X</strong> to local-synchronize-with <strong>Y</strong> are met, or both
sets of conditions are met.</p>
</div>
</div>
<div class="sect4">
<h5 id="_memory_ordering_rules_work_group_functions">Memory Ordering Rules: Work-group Functions</h5>
<div class="paragraph">
<p>The OpenCL kernel execution model includes collective operations across the
work-items within a single work-group.
These are called work-group functions.
Besides the work-group barrier function, they include the scan, reduction
and pipe work-group functions described in the SPIR-V IL specifications.
We will first discuss the work-group barrier.
The other work-group functions are discussed afterwards.</p>
</div>
<div class="paragraph">
<p>The barrier function provides a mechanism for a kernel to synchronize the
work-items within a single work-group: informally, each work-item of the
work-group must execute the barrier before any are allowed to proceed.
It also orders memory operations to a specified combination of one or more
address spaces such as local memory or global memory, in a similar manner to
a fence.</p>
</div>
<div class="paragraph">
<p>To precisely specify the memory ordering semantics for barrier, we need to
distinguish between a dynamic and a static instance of the call to a
barrier.
A call to a barrier can appear in a loop, for example, and each execution of
the same static barrier call results in a new dynamic instance of the
barrier that will independently synchronize a work-groups work-items.</p>
</div>
<div class="paragraph">
<p>A work-item executing a dynamic instance of a barrier results in two
operations, both fences, that are called the entry and exit fences.
These fences obey all the rules for fences specified elsewhere in this
chapter as well as the following:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The entry fence is a release fence with the same flags and scope as
requested for the barrier.</p>
</li>
<li>
<p>The exit fence is an acquire fence with the same flags and scope as
requested for the barrier.</p>
</li>
<li>
<p>For each work-item the entry fence is sequenced before the exit fence.</p>
</li>
<li>
<p>If the flags have CLK_GLOBAL_MEM_FENCE set then for each work-item the
entry fence global-synchronizes-with the exit fence of all other
work-items in the same work-group.</p>
</li>
<li>
<p>If the flags have CLK_LOCAL_MEM_FENCE set then for each work-item the
entry fence local-synchronizes-with the exit fence of all other
work-items in the same work-group.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The other work-group functions include such functions as work_group_all()
and work_group_broadcast() and are described in the kernel language and IL
specifications.
The use of these work-group functions implies sequenced-before relationships
between statements within the execution of a single work-item in order to
satisfy data dependencies.
For example, a work item that provides a value to a work-group function must
behave as if it generates that value before beginning execution of that
work-group function.
Furthermore, the programmer must ensure that all work items in a work group
must execute the same work-group function call site, or dynamic work-group
function instance.</p>
</div>
</div>
<div class="sect4">
<h5 id="_memory_ordering_rules_sub_group_functions">Memory Ordering Rules: Sub-group Functions</h5>
<div class="paragraph">
<p>The OpenCL kernel execution model includes collective operations across the
work-items within a single sub-group.
These are called sub-group functions.
Besides the sub-group-barrier function, they include the scan, reduction and
pipe sub-group functions described in the SPIR-V IL specification.
We will first discuss the sub-group barrier.
The other sub-group functions are discussed afterwards.</p>
</div>
<div class="paragraph">
<p>The barrier function provides a mechanism for a kernel to synchronize the
work-items within a single sub-group: informally, each work-item of the
sub-group must execute the barrier before any are allowed to proceed.
It also orders memory operations to a specified combination of one or more
address spaces such as local memory or global memory, in a similar manner to
a fence.</p>
</div>
<div class="paragraph">
<p>To precisely specify the memory ordering semantics for barrier, we need to
distinguish between a dynamic and a static instance of the call to a
barrier.
A call to a barrier can appear in a loop, for example, and each execution of
the same static barrier call results in a new dynamic instance of the
barrier that will independently synchronize a sub-groups work-items.</p>
</div>
<div class="paragraph">
<p>A work-item executing a dynamic instance of a barrier results in two
operations, both fences, that are called the entry and exit fences.
These fences obey all the rules for fences specified elsewhere in this
chapter as well as the following:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The entry fence is a release fence with the same flags and scope as
requested for the barrier.</p>
</li>
<li>
<p>The exit fence is an acquire fence with the same flags and scope as
requested for the barrier.</p>
</li>
<li>
<p>For each work-item the entry fence is sequenced before the exit fence.</p>
</li>
<li>
<p>If the flags have CLK_GLOBAL_MEM_FENCE set then for each work-item the
entry fence global-synchronizes-with the exit fence of all other
work-items in the same sub-group.</p>
</li>
<li>
<p>If the flags have CLK_LOCAL_MEM_FENCE set then for each work-item the
entry fence local-synchronizes-with the exit fence of all other
work-items in the same sub-group.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The other sub-group functions include such functions as sub_group_all() and
sub_group_broadcast() and are described in OpenCL kernel languages
specifications.
The use of these sub-group functions implies sequenced-before relationships
between statements within the execution of a single work-item in order to
satisfy data dependencies.
For example, a work item that provides a value to a sub-group function must
behave as if it generates that value before beginning execution of that
sub-group function.
Furthermore, the programmer must ensure that all work items in a sub-group
must execute the same sub-group function call site, or dynamic sub-group
function instance.</p>
</div>
</div>
<div class="sect4">
<h5 id="_memory_ordering_rules_host_side_and_device_side_commands">Memory Ordering Rules: Host-side and Device-side Commands</h5>
<div class="paragraph">
<p>This section describes how the OpenCL API functions associated with
command-queues contribute to happens-before relations.
There are two types of command queues and associated API functions in OpenCL
2.0; <em>host command-queues</em> and <em>device command-queues</em>.
The interaction of these command queues with the memory model are for the
most part equivalent.
In a few cases, the rules only applies to the host command-queue.
We will indicate these special cases by specifically denoting the host
command-queue in the memory ordering rule.
SVM memory consistency in such instances is implied only with respect to
synchronizing host commands.</p>
</div>
<div class="paragraph">
<p>Memory ordering rules in this section apply to all memory objects (buffers,
images and pipes) as well as to SVM allocations where no earlier, and more
fine-grained, rules apply.</p>
</div>
<div class="paragraph">
<p>In the remainder of this section, we assume that each command <strong>C</strong> enqueued
onto a command-queue has an associated event object <strong>E</strong> that signals its
execution status, regardless of whether <strong>E</strong> was returned to the unit of
execution that enqueued <strong>C</strong>.
We also distinguish between the API function call that enqueues a command
<strong>C</strong> and creates an event <strong>E</strong>, the execution of <strong>C</strong>, and the completion of
<strong>C</strong>(which marks the event <strong>E</strong> as complete).</p>
</div>
<div class="paragraph">
<p>The ordering and synchronization rules for API commands are defined as
following:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>If an API function call <strong>X</strong> enqueues a command <strong>C</strong>, then <strong>X</strong>
global-synchronizes-with <strong>C</strong>.
For example, a host API function to enqueue a kernel
global-synchronizes-with the start of that kernel-instances execution,
so that memory updates sequenced-before the enqueue kernel function call
will global-happen-before any kernel reads or writes to those same
memory locations.
For a device-side enqueue, global memory updates sequenced before <strong>X</strong>
happens-before <strong>C</strong> reads or writes to those memory locations only in the
case of fine-grained SVM.</p>
</li>
<li>
<p>If <strong>E</strong> is an event upon which a command <strong>C</strong> waits, then <strong>E</strong>
global-synchronizes-with <strong>C</strong>.
In particular, if <strong>C</strong> waits on an event <strong>E</strong> that is tracking the
execution status of the command <strong>C1</strong>, then memory operations done by
<strong>C1</strong> will global-happen-before memory operations done by <strong>C</strong>.
As an example, assume we have an OpenCL program using coarse-grain SVM
sharing that enqueues a kernel to a host command-queue to manipulate the
contents of a region of a buffer that the host thread then accesses
after the kernel completes.
To do this, the host thread can call <strong>clEnqueueMapBuffer</strong> to enqueue a
blocking-mode map command to map that buffer region, specifying that the
map command must wait on an event signaling the kernels completion.
When <strong>clEnqueueMapBuffer</strong> returns, any memory operations performed by
the kernel to that buffer region will global- happen-before subsequent
memory operations made by the host thread.</p>
</li>
<li>
<p>If a command <strong>C</strong> has an event <strong>E</strong> that signals its completion, then <strong>C</strong>
global- synchronizes-with <strong>E</strong>.</p>
</li>
<li>
<p>For a command <strong>C</strong> enqueued to a host-side command queue, if <strong>C</strong> has an
event <strong>E</strong> that signals its completion, then <strong>E</strong> global-synchronizes-with
an API call <strong>X</strong> that waits on <strong>E</strong>.
For example, if a host thread or kernel-instance calls the
wait-for-events function on <strong>E</strong> (e.g. the <strong>clWaitForEvents</strong> function
called from a host thread), then <strong>E</strong> global-synchronizes-with that
wait-for-events function call.</p>
</li>
<li>
<p>If commands <strong>C</strong> and <strong>C1</strong> are enqueued in that sequence onto an in-order
command-queue, then the event (including the event implied between <strong>C</strong>
and <strong>C1</strong> due to the in-order queue) signaling <strong>C</strong>'s completion
global-synchronizes-with <strong>C1</strong>.
Note that in OpenCL 2.0, only a host command-queue can be configured as
an in-order queue.</p>
</li>
<li>
<p>If an API call enqueues a marker command <strong>C</strong> with an empty list of
events upon which <strong>C</strong> should wait, then the events of all commands
enqueued prior to <strong>C</strong> in the command-queue global-synchronize-with <strong>C</strong>.</p>
</li>
<li>
<p>If a host API call enqueues a command-queue barrier command <strong>C</strong> with an
empty list of events on which <strong>C</strong> should wait, then the events of all
commands enqueued prior to <strong>C</strong> in the command-queue
global-synchronize-with <strong>C</strong>.
In addition, the event signaling the completion of <strong>C</strong>
global-synchronizes-with all commands enqueued after <strong>C</strong> in the
command-queue.</p>
</li>
<li>
<p>If a host thread executes a <strong>clFinish</strong> call <strong>X</strong>, then the events of all
commands enqueued prior to <strong>X</strong> in the command-queue
global-synchronizes-with <strong>X</strong>.</p>
</li>
<li>
<p>The start of a kernel-instance <strong>K</strong> global-synchronizes-with all
operations in the work items of <strong>K</strong>.
Note that this includes the execution of any atomic operations by the
work items in a program using fine-grain SVM.</p>
</li>
<li>
<p>All operations of all work items of a kernel-instance <strong>K</strong>
global-synchronizes-with the event signaling the completion of <strong>K</strong>.
Note that this also includes the execution of any atomic operations by
the work items in a program using fine-grain SVM.</p>
</li>
<li>
<p>If a callback procedure <strong>P</strong> is registered on an event <strong>E</strong>, then <strong>E</strong>
global-synchronizes-with all operations of <strong>P</strong>.
Note that callback procedures are only defined for commands within host
command-queues.</p>
</li>
<li>
<p>If <strong>C</strong> is a command that waits for an event <strong>E</strong>'s completion, and API
function call <strong>X</strong> sets the status of a user event <strong>E</strong>'s status to
CL_COMPLETE (for example, from a host thread using a
<strong>clSetUserEventStatus</strong> function), then <strong>X</strong> global-synchronizes-with <strong>C</strong>.</p>
</li>
<li>
<p>If a device enqueues a command <strong>C</strong> with the
CLK_ENQUEUE_FLAGS_WAIT_KERNEL flag, then the end state of the parent
kernel instance global-synchronizes with <strong>C</strong>.</p>
</li>
<li>
<p>If a work-group enqueues a command <strong>C</strong> with the
CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP flag, then the end state of the
work-group global-synchronizes with <strong>C</strong>.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>When using an out-of-order command queue, a wait on an event or a marker or
command-queue barrier command can be used to ensure the correct ordering of
dependent commands.
In those cases, the wait for the event or the marker or barrier command will
provide the necessary global-synchronizes-with relation.</p>
</div>
<div class="paragraph">
<p>In this situation:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>access to shared locations or disjoint locations in a single cl_mem
object when using atomic operations from different kernel instances
enqueued from the host such that one or more of the atomic operations is
a write is implementation-defined and correct behavior is not guaranteed
except at synchronization points.</p>
</li>
<li>
<p>access to shared locations or disjoint locations in a single cl_mem
object when using atomic operations from different kernel instances
consisting of a parent kernel and any number of child kernels enqueued
by that kernel is guaranteed under the memory ordering rules described
earlier in this section.</p>
</li>
<li>
<p>access to shared locations or disjoint locations in a single program
scope global variable, coarse-grained SVM allocation or fine-grained SVM
allocation when using atomic operations from different kernel instances
enqueued from the host to a single device is guaranteed under the memory
ordering rules described earlier in this section.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If fine-grain SVM is used but without support for the OpenCL 2.0 atomic
operations, then the host and devices can concurrently read the same memory
locations and can concurrently update non-overlapping memory regions, but
attempts to update the same memory locations are undefined.
Memory consistency is guaranteed at the OpenCL synchronization points
without the need for calls to <strong>clEnqueueMapBuffer</strong> and
<strong>clEnqueueUnmapMemObject</strong>.
For fine-grained SVM buffers it is guaranteed that at synchronization points
only values written by the kernel will be updated.
No writes to fine-grained SVM buffers can be introduced that were not in the
original program.</p>
</div>
<div class="paragraph">
<p>In the remainder of this section, we discuss a few points regarding the
ordering rules for commands with a host command queue.</p>
</div>
<div class="paragraph">
<p>The OpenCL 1.2 standard describes a synchronization point as a
kernel-instance or host program location where the contents of memory
visible to different work-items or command-queue commands are the same.
It also says that waiting on an event and a command-queue barrier are
synchronization points between commands in command- queues.
Four of the rules listed above (2, 4, 7, and 8) cover these OpenCL
synchronization points.</p>
</div>
<div class="paragraph">
<p>A map operation (<strong>clEnqueueMapBuffer</strong> or <strong>clEnqueueMapImage</strong>) performed on a
non-SVM buffer or a coarse-grained SVM buffer is allowed to overwrite the
entire target region with the latest runtime view of the data as seen by the
command with which the map operation synchronizes, whether the values were
written by the executing kernels or not.
Any values that were changed within this region by another kernel or host
thread while the kernel synchronizing with the map operation was executing
may be overwritten by the map operation.</p>
</div>
<div class="paragraph">
<p>Access to non-SVM cl_mem buffers and coarse-grained SVM allocations is
ordered at synchronization points between host commands.
In the presence of an out-of-order command queue or a set of command queues
mapped to the same device, multiple kernel instances may execute
concurrently on the same device.</p>
</div>
</div>
</div>
</div>
<div class="sect2">
<h3 id="opencl-framework">3.4. The OpenCL Framework</h3>
<div class="paragraph">
<p>The OpenCL framework allows applications to use a host and one or more
OpenCL devices as a single heterogeneous parallel computer system.
The framework contains the following components:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>OpenCL Platform layer</strong>: The platform layer allows the host program to
discover OpenCL devices and their capabilities and to create contexts.</p>
</li>
<li>
<p><strong>OpenCL Runtime</strong>: The runtime allows the host program to manipulate
contexts once they have been created.</p>
</li>
<li>
<p><strong>OpenCL Compiler</strong>: The OpenCL compiler creates program executables that
contain OpenCL kernels.
SPIR-V intermediate language, OpenCL C, OpenCL C++, and OpenCL C
language versions from earlier OpenCL specifications are supported by
the compiler.
Other input languages may be supported by some implementations.</p>
</li>
</ul>
</div>
<div class="sect3">
<h4 id="_opencl_framework_mixed_version_support">3.4.1. OpenCL Framework: Mixed Version Support</h4>
<div class="paragraph">
<p>OpenCL supports devices with different capabilities under a single platform.
This includes devices which conform to different versions of the OpenCL
specification.
There are three version identifiers to consider for an OpenCL system: the
platform version, the version of a device, and the version(s) of the kernel
language or IL supported on a device.</p>
</div>
<div class="paragraph">
<p>The platform version indicates the version of the OpenCL runtime that is
supported.
This includes all of the APIs that the host can use to interact with
resources exposed by the OpenCL runtime; including contexts, memory objects,
devices, and command queues.</p>
</div>
<div class="paragraph">
<p>The device version is an indication of the device&#8217;s capabilities separate
from the runtime and compiler as represented by the device info returned by
<strong>clGetDeviceInfo</strong>.
Examples of attributes associated with the device version are resource
limits (e.g., minimum size of local memory per compute unit) and extended
functionality (e.g., list of supported KHR extensions).
The version returned corresponds to the highest version of the OpenCL
specification for which the device is conformant, but is not higher than the
platform version.</p>
</div>
<div class="paragraph">
<p>The language version for a device represents the OpenCL programming language
features a developer can assume are supported on a given device.
The version reported is the highest version of the language supported.</p>
</div>
<div class="paragraph">
<p>Backwards compatibility is an important goal for the OpenCL standard.
Backwards compatibility is expected such that a device will consume earlier
versions of the SPIR-V and OpenCL C programming languages with the following
minimum requirements:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>An OpenCL 1.x device must support at least one 1.x version of the OpenCL
C programming language.</p>
</li>
<li>
<p>An OpenCL 2.0 device must support all the requirements of an OpenCL 1.x
device in addition to the OpenCL C 2.0 programming language.
If multiple language versions are supported, the compiler defaults to
using the highest OpenCL 1.x language version supported for the device
(typically OpenCL 1.2).
To utilize the OpenCL 2.0 Kernel programming language, a programmer must
specifically set the appropriate compiler flag (-cl-std=CL2.0).
The language version must not be higher than the platform version, but
may exceed the <a href="#opencl-c-version">device version</a>.</p>
</li>
<li>
<p>An OpenCL 2.1 device must support all the requirements of an OpenCL 2.0
device in addition to the SPIR-V intermediate language at version 1.0 or
above.
Intermediate language versioning is encoded as part of the binary object
and no flags are required to be passed to the compiler.</p>
</li>
<li>
<p>An OpenCL 2.2 device must support all the requirements of an OpenCL 2.0
device in addition to the SPIR-V intermediate language at version 1.2 or
above.
Intermediate language is encoded as a part of the binary object and no
flags are required to be passed to the compiler.</p>
</li>
</ol>
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="opencl-platform-layer">4. The OpenCL Platform Layer</h2>
<div class="sectionbody">
<div class="paragraph">
<p>This section describes the OpenCL platform layer which implements
platform-specific features that allow applications to query OpenCL devices,
device configuration information, and to create OpenCL contexts using one or
more devices.</p>
</div>
<div class="sect2">
<h3 id="_querying_platform_info">4.1. Querying Platform Info</h3>
<div class="paragraph">
<p>The list of platforms available can be obtained using the following
function.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetPlatformIDs(cl_uint num_entries,
cl_platform_id *platforms,
cl_uint *num_platforms)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>num_entries</em> is the number of cl_platform_id entries that can be added to
<em>platforms</em>.
If <em>platforms</em> is not <code>NULL</code>, the <em>num_entries</em> must be greater than zero.</p>
</div>
<div class="paragraph">
<p><em>platforms</em> returns a list of OpenCL platforms found.
The cl_platform_id_ values returned in <em>platforms</em> can be used to identify a
specific OpenCL platform.
If <em>platforms</em> argument is <code>NULL</code>, this argument is ignored.
The number of OpenCL platforms returned is the minimum of the value
specified by <em>num_entries</em> or the number of OpenCL platforms available.</p>
</div>
<div class="paragraph">
<p><em>num_platforms</em> returns the number of OpenCL platforms available.
If <em>num_platforms</em> is <code>NULL</code>, this argument is ignored.</p>
</div>
<div class="paragraph">
<p><strong>clGetPlatformIDs</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_VALUE if <em>num_entries</em> is equal to zero and <em>platforms</em> is
not <code>NULL</code> or if both <em>num_platforms</em> and <em>platforms</em> are <code>NULL</code>.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetPlatformInfo(cl_platform_id platform,
cl_platform_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>gets specific information about the OpenCL platform.
The information that can be queried using <strong>clGetPlatformInfo</strong> is specified
in the <a href="#platform-queries-table">Platform Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>platform</em> refers to the platform ID returned by <strong>clGetPlatformIDs</strong> or can
be <code>NULL</code>.
If <em>platform</em> is <code>NULL</code>, the behavior is implementation-defined.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> is an enumeration constant that identifies the platform
information being queried.
It can be one of the following values as specified in the
<a href="#platform-queries-table">Platform Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory location where appropriate values for a
given <em>param_name</em>, as specified in the <a href="#platform-queries-table">Platform
Queries</a> table, will be returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> specifies the size in bytes of memory pointed to by
<em>param_value</em>.
This size in bytes must be ≥ size of return type specified in the
<a href="#platform-queries-table">Platform Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<table id="platform-queries-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 3. OpenCL Platform Queries</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 10%;">
<col style="width: 40%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_platform_info</strong></th>
<th class="tableblock halign-left valign-top">Return Type</th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_PROFILE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]<sup>1</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL profile string.
Returns the profile name supported by the implementation.
The profile name returned can be one of the following strings:
</p><p class="tableblock"> FULL_PROFILE - if the implementation supports the OpenCL
specification (functionality defined as part of the core
specification and does not require any extensions to be supported).
</p><p class="tableblock"> EMBEDDED_PROFILE - if the implementation supports the OpenCL
embedded profile.
The embedded profile is defined to be a subset for each version of
OpenCL.
The embedded profile for OpenCL 2.2 is described in
<a href="#opencl-embedded-profile">OpenCL Embedded Profile</a>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_VERSION</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL version string.
Returns the OpenCL version supported by the implementation.
This version string has the following format:
</p><p class="tableblock"> <em>OpenCL&lt;space&gt;&lt;major_version.minor_version&gt;&lt;space&gt;&lt;platform-specific
information&gt;</em>
</p><p class="tableblock"> The <em>major_version.minor_version</em> value returned will be 2.2.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_NAME</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Platform name string.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_VENDOR</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Platform vendor string.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_EXTENSIONS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns a space separated list of extension names (the extension
names themselves do not contain any spaces) supported by the
platform.
Each extension that is supported by all devices associated with this
platform must be reported here.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PLATFORM_HOST_TIMER_RESOLUTION</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the resolution of the host timer in nanoseconds as used by
<strong>clGetDeviceAndHostTimer</strong>.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><strong>clGetPlatformInfo</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors<sup>2</sup>.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_PLATFORM if <em>platform</em> is not a valid platform.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
if size in bytes specified by <em>param_value_size</em> is &lt; size of return
type as specified in the <a href="#platform-queries-table">OpenCL Platform
Queries</a> table, and <em>param_value</em> is not a <code>NULL</code> value.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
<div class="dlist">
<dl>
<dt class="hdlist1">1</dt>
<dd>
<p>A null terminated string is returned by OpenCL query function calls if
the return type of the information being queried is a char[].</p>
</dd>
<dt class="hdlist1">2</dt>
<dd>
<p>The OpenCL specification does not describe the order of precedence for
error codes returned by API calls.</p>
</dd>
</dl>
</div>
</li>
</ul>
</div>
</div>
<div class="sect2">
<h3 id="platform-querying-devices">4.2. Querying Devices</h3>
<div class="paragraph">
<p>The list of devices available on a platform can be obtained using the
following function<sup>3</sup>.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetDeviceIDs(cl_platform_id platform,
cl_device_type device_type,
cl_uint num_entries,
cl_device_id * devices,
cl_uint *num_devices)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>platform</em> refers to the platform ID returned by <strong>clGetPlatformIDs</strong> or can
be <code>NULL</code>.
If <em>platform</em> is <code>NULL</code>, the behavior is implementation-defined.</p>
</div>
<div class="paragraph">
<p><em>device_type</em> is a bitfield that identifies the type of OpenCL device.
The <em>device_type</em> can be used to query specific OpenCL devices or all OpenCL
devices available.
The valid values for <em>device_type</em> are specified in the
<a href="#device-categories-table">Device Categories</a> table.</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">3</dt>
<dd>
<p><strong>clGetDeviceIDs</strong> may return all or a subset of the actual physical
devices present in the platform and that match <em>device_type</em>.</p>
</dd>
</dl>
</div>
<table id="device-categories-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 4. List of OpenCL Device Categories</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_device_type</strong></th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_CPU</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">An OpenCL device that is the host processor.
The host processor runs the OpenCL implementations and is a single or
multi-core CPU.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_GPU</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">An OpenCL device that is a GPU.
By this we mean that the device can also be used to accelerate a 3D API
such as OpenGL or DirectX.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_ACCELERATOR</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Dedicated OpenCL accelerators (for example the IBM CELL Blade).
These devices communicate with the host processor using a peripheral
interconnect such as PCIe.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_CUSTOM</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Dedicated accelerators that do not support programs written in an OpenCL
kernel language,</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_DEFAULT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The default OpenCL device in the system.
The default device cannot be a <strong>CL_DEVICE_TYPE_CUSTOM</strong> device.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_TYPE_ALL</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">All OpenCL devices available in the system except
<strong>CL_DEVICE_TYPE_CUSTOM</strong> devices..</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><em>num_entries</em> is the number of cl_device_id entries that can be added to
<em>devices</em>.
If <em>devices</em> is not <code>NULL</code>, the <em>num_entries</em> must be greater than zero.</p>
</div>
<div class="paragraph">
<p><em>devices</em> returns a list of OpenCL devices found.
The cl_device_id values returned in <em>devices</em> can be used to identify a
specific OpenCL device.
If <em>devices</em> argument is <code>NULL</code>, this argument is ignored.
The number of OpenCL devices returned is the minimum of the value specified
by <em>num_entries</em> or the number of OpenCL devices whose type matches
<em>device_type</em>.</p>
</div>
<div class="paragraph">
<p><em>num_devices</em> returns the number of OpenCL devices available that match
<em>device_type</em>.
If <em>num_devices</em> is <code>NULL</code>, this argument is ignored.</p>
</div>
<div class="paragraph">
<p><strong>clGetDeviceIDs</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_PLATFORM if <em>platform</em> is not a valid platform.</p>
</li>
<li>
<p>CL_INVALID_DEVICE_TYPE if <em>device_type</em> is not a valid value.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>num_entries</em> is equal to zero and <em>devices</em> is not
<code>NULL</code> or if both <em>num_devices</em> and <em>devices</em> are <code>NULL</code>.</p>
</li>
<li>
<p>CL_DEVICE_NOT_FOUND if no OpenCL devices that matched <em>device_type</em> were
found.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The application can query specific capabilities of the OpenCL device(s)
returned by <strong>clGetDeviceIDs</strong>.
This can be used by the application to determine which device(s) to use.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetDeviceInfo(cl_device_id device,
cl_device_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>gets specific information about an OpenCL device.</p>
</div>
<div class="paragraph">
<p><em>device</em> may be a device returned by <strong>clGetDeviceIDs</strong> or a sub-device
created by <strong>clCreateSubDevices</strong>.
If <em>device</em> is a sub-device, the specific information for the sub-device
will be returned.
The information that can be queried using <strong>clGetDeviceInfo</strong> is specified in
the <a href="#device-queries-table">Device Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> is an enumeration constant that identifies the device
information being queried.
It can be one of the following values as specified in the
<a href="#device-queries-table">Device Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory location where appropriate values for a
given <em>param_name</em>, as specified in the <a href="#device-queries-table">Device
Queries</a> table, will be returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> specifies the size in bytes of memory pointed to by
<em>param_value</em>.
This size in bytes must be ≥ size of return type specified in the
<a href="#device-queries-table">Device Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<table id="device-queries-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 5. OpenCL Device Queries</caption>
<colgroup>
<col style="width: 30%;">
<col style="width: 20%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_device_info</strong></th>
<th class="tableblock halign-left valign-top">Return Type</th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_TYPE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_type</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL device type.
Currently supported values are:
</p><p class="tableblock"> CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU, CL_DEVICE_TYPE_ACCELERATOR,
CL_DEVICE_TYPE_DEFAULT, a combination of the above types or
CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_VENDOR_ID</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">A unique device vendor identifier.
An example of a unique device identifier could be the PCIe ID.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_COMPUTE_UNITS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The number of parallel compute units on the OpenCL device.
A work-group executes on a single compute unit.
The minimum value is 1.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum dimensions that specify the global and local work-item IDs
used by the data parallel execution model. (Refer to
<strong>clEnqueueNDRangeKernel</strong>).
The minimum value is 3 for devices that are not of type
CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WORK_ITEM_SIZES</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t []</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of work-items that can be specified in each dimension
of the work-group to <strong>clEnqueueNDRangeKernel</strong>.
</p><p class="tableblock"> Returns <em>n</em> size_t entries, where <em>n</em> is the value returned by the
query for CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS.
</p><p class="tableblock"> The minimum value is (1, 1, 1) for devices that are not of type
CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WORK_GROUP_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of work-items in a work-group that a device is
capable of executing on a single compute unit, for any given
kernel-instance running on the device. (Refer also to
<strong>clEnqueueNDRangeKernel</strong> and CL_KERNEL_WORK_GROUP_SIZE ).
The minimum value is 1.
The returned value is an upper limit and will not necessarily
maximize performance.
This maximum may be larger than supported by a specific kernel
(refer to the CL_KERNEL_WORK_GROUP_SIZE query of <strong>clGetKernelWorkGroupInfo</strong>).</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR <br>
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT <br>
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT <br>
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG <br>
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT <br>
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE<br>
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Preferred native vector width size for built-in scalar types that
can be put into vectors.
The vector width is defined as the number of scalar elements that
can be stored in the vector.
</p><p class="tableblock"> If double precision is not supported,
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE must return 0.
</p><p class="tableblock"> If the <strong>cl_khr_fp16</strong> extension is not supported,
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF must return 0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR <br>
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT <br>
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT <br>
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG <br>
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT <br>
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE<br>
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the native ISA vector width.
The vector width is defined as the number of scalar elements that
can be stored in the vector.
</p><p class="tableblock"> If double precision is not supported,
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE must return 0.
</p><p class="tableblock"> If the <strong>cl_khr_fp16</strong> extension is not supported,
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF must return 0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_CLOCK_FREQUENCY</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Clock frequency of the device in MHz.
The meaning of this value is implementation-defined.
For devices with multiple clock domains, the clock frequency for any
of the clock domains may be returned.
For devices that dynamically change frequency for power or thermal
reasons, the returned clock frequency may be any valid frequency.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_ADDRESS_BITS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The default compute device address space size of the global address
space specified as an unsigned integer value in bits.
Currently supported values are 32 or 64 bits.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_MEM_ALLOC_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max size of memory object allocation in bytes.
The minimum value is max(min(1024 × 1024 × 1024, 1/4<sup>th</sup>
of CL_DEVICE_GLOBAL_MEM_SIZE), 32 × 1024 × 1024) for
devices that are not of type CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_SUPPORT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if images are supported by the OpenCL device and CL_FALSE
otherwise.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_READ_IMAGE_ARGS<sup>4</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of image objects arguments of a kernel declared with the
read_only qualifier.
The minimum value is 128 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_WRITE_IMAGE_ARGS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of image objects arguments of a kernel declared with the
write_only qualifier.
The minimum value is 64 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS<sup>5</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of image objects arguments of a kernel declared with the
write_only or read_write qualifier.
The minimum value is 64 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IL_VERSION</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The intermediate languages that can be supported by
<strong>clCreateProgramWithIL</strong> for this device.
Returns a space-separated list of IL version strings of the form
&lt;IL_Prefix&gt;_&lt;Major_Version&gt;.&lt;Minor_Version&gt;.
For OpenCL 2.2, SPIR-V is a required IL prefix.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE2D_MAX_WIDTH</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max width of 2D image or 1D image not created from a buffer object
in pixels.
</p><p class="tableblock"> The minimum value is 16384 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE2D_MAX_HEIGHT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max height of 2D image in pixels.
</p><p class="tableblock"> The minimum value is 16384 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE3D_MAX_WIDTH</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max width of 3D image in pixels.
</p><p class="tableblock"> The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE3D_MAX_HEIGHT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max height of 3D image in pixels.
</p><p class="tableblock"> The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE3D_MAX_DEPTH</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max depth of 3D image in pixels.
</p><p class="tableblock"> The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_MAX_BUFFER_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of pixels for a 1D image created from a buffer object.
</p><p class="tableblock"> The minimum value is 65536 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_MAX_ARRAY_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of images in a 1D or 2D image array.
</p><p class="tableblock"> The minimum value is 2048 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_SAMPLERS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of samplers that can be used in a kernel.
</p><p class="tableblock"> The minimum value is 16 if CL_DEVICE_IMAGE_SUPPORT is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_PITCH_ALIGNMENT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The row pitch alignment size in pixels for 2D images created from a
buffer.
The value returned must be a power of 2.
</p><p class="tableblock"> If the device does not support images, this value must be 0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This query should be used when a 2D image is created from a buffer
which was created using CL_MEM_USE_HOST_PTR.
The value returned must be a power of 2.
</p><p class="tableblock"> This query specifies the minimum alignment in pixels of the host_ptr
specified to <strong>clCreateBuffer</strong>.
</p><p class="tableblock"> If the device does not support images, this value must be 0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_PIPE_ARGS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of pipe objects that can be passed as arguments
to a kernel.
The minimum value is 16.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of reservations that can be active for a pipe per
work-item in a kernel.
A work-group reservation is counted as one reservation per
work-item.
The minimum value is 1.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PIPE_MAX_PACKET_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum size of pipe packet in bytes.
The minimum value is 1024 bytes.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_PARAMETER_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max size in bytes of all arguments that can be passed to a kernel.
</p><p class="tableblock"> The minimum value is 1024 for devices that are not of type
CL_DEVICE_TYPE_CUSTOM.
For this minimum value, only a maximum of 128 arguments can be
passed to a kernel</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MEM_BASE_ADDR_ALIGN</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Alignment requirement (in bits) for sub-buffer offsets.
The minimum value is the size (in bits) of the largest OpenCL
built-in data type supported by the device (long16 in FULL profile,
long16 or int16 in EMBEDDED profile) for devices that are not of
type CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_SINGLE_FP_CONFIG<sup>6</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_fp_config</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Describes single precision floating-point capability of the device.
This is a bit-field that describes one or more of the following
values:
</p><p class="tableblock"> CL_FP_DENORM - denorms are supported
</p><p class="tableblock"> CL_FP_INF_NAN - INF and quiet NaNs are supported.
</p><p class="tableblock"> CL_FP_ROUND_TO_NEAREST-- round to nearest even rounding mode
supported
</p><p class="tableblock"> CL_FP_ROUND_TO_ZERO - round to zero rounding mode supported
</p><p class="tableblock"> CL_FP_ROUND_TO_INF - round to positive and negative infinity
rounding modes supported
</p><p class="tableblock"> CL_FP_FMA - IEEE754-2008 fused multiply-add is supported.
</p><p class="tableblock"> CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT - divide and sqrt are correctly
rounded as defined by the IEEE754 specification.
</p><p class="tableblock"> CL_FP_SOFT_FLOAT - Basic floating-point operations (such as
addition, subtraction, multiplication) are implemented in software.
</p><p class="tableblock"> For the full profile, the mandated minimum floating-point capability
for devices that are not of type CL_DEVICE_TYPE_CUSTOM is:
CL_FP_ROUND_TO_NEAREST | CL_FP_INF_NAN.
</p><p class="tableblock"> For the embedded profile, see section 10.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_DOUBLE_FP_CONFIG<sup>7</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_fp_config</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Describes double precision floating-point capability of the OpenCL
device.
This is a bit-field that describes one or more of the following
values:
</p><p class="tableblock"> CL_FP_DENORM - denorms are supported
</p><p class="tableblock"> CL_FP_INF_NAN - INF and NaNs are supported.
</p><p class="tableblock"> CL_FP_ROUND_TO_NEAREST - round to nearest even rounding mode
supported.
</p><p class="tableblock"> CL_FP_ROUND_TO_ZERO - round to zero rounding mode supported.
</p><p class="tableblock"> CL_FP_ROUND_TO_INF - round to positive and negative infinity
rounding modes supported.
</p><p class="tableblock"> CL_FP_FMA - IEEE754-2008 fused multiply-add is supported.
</p><p class="tableblock"> CL_FP_SOFT_FLOAT - Basic floating-point operations (such as
addition, subtraction, multiplication) are implemented in software.
</p><p class="tableblock"> Double precision is an optional feature so the mandated minimum
double precision floating-point capability is 0.
</p><p class="tableblock"> If double precision is supported by the device, then the minimum
double precision floating-point capability must be:<br>
CL_FP_FMA |<br>
CL_FP_ROUND_TO_NEAREST |<br>
CL_FP_INF_NAN |<br>
CL_FP_DENORM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_CACHE_TYPE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_mem_cache_type</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Type of global memory cache supported.
Valid values are: CL_NONE, CL_READ_ONLY_CACHE and
CL_READ_WRITE_CACHE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Size of global memory cache line in bytes.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_CACHE_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Size of global memory cache in bytes.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_MEM_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Size of global device memory in bytes.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max size in bytes of a constant buffer allocation.
The minimum value is 64 KB for devices that are not of type
CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_CONSTANT_ARGS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Max number of arguments declared with the <code>__constant</code> qualifier
in a kernel.
The minimum value is 8 for devices that are not of type
CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of bytes of storage that may be allocated for any
single variable in program scope or inside a function in an OpenCL
kernel language declared in the global address space.
</p><p class="tableblock"> The minimum value is 64 KB.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum preferred total size, in bytes, of all program variables in
the global address space.
This is a performance hint.
An implementation may place such variables in storage with optimized
device access.
This query returns the capacity of such storage.
The minimum value is 0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_LOCAL_MEM_TYPE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_local_mem_type</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Type of local memory supported.
This can be set to CL_LOCAL implying dedicated local memory storage
such as SRAM , or CL_GLOBAL.
</p><p class="tableblock"> For custom devices, CL_NONE can also be returned indicating no local
memory support.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_LOCAL_MEM_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Size of local memory region in bytes.
The minimum value is 32 KB for devices that are not of type
CL_DEVICE_TYPE_CUSTOM.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_ERROR_CORRECTION_SUPPORT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the device implements error correction for all
accesses to compute device memory (global and constant).
Is CL_FALSE if the device does not implement such error correction.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PROFILING_TIMER_RESOLUTION</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the resolution of device timer.
This is measured in nanoseconds.
Refer to <a href="#profiling-operations">Profiling Operations</a> for details.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_ENDIAN_LITTLE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the OpenCL device is a little endian device and
CL_FALSE otherwise</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_AVAILABLE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the device is available and CL_FALSE otherwise.
A device is considered to be available if the device can be expected
to successfully execute commands enqueued to the device.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_COMPILER_AVAILABLE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_FALSE if the implementation does not have a compiler available
to compile the program source.
</p><p class="tableblock"> Is CL_TRUE if the compiler is available.
This can be CL_FALSE for the embedded platform profile only.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_LINKER_AVAILABLE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_FALSE if the implementation does not have a linker available.
Is CL_TRUE if the linker is available.
</p><p class="tableblock"> This can be CL_FALSE for the embedded platform profile only.
</p><p class="tableblock"> This must be CL_TRUE if CL_DEVICE_COMPILER_AVAILABLE is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_EXECUTION_CAPABILITIES</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_exec_ capabilities</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the execution capabilities of the device.
This is a bit-field that describes one or more of the following
values:
</p><p class="tableblock"> CL_EXEC_KERNEL - The OpenCL device can execute OpenCL kernels.
</p><p class="tableblock"> CL_EXEC_NATIVE_KERNEL - The OpenCL device can execute native
kernels.
</p><p class="tableblock"> The mandated minimum capability is: CL_EXEC_KERNEL.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_HOST_PROPERTIES<sup>8</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue_properties</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the on host command-queue properties supported by the
device.
This is a bit-field that describes one or more of the following
values:
</p><p class="tableblock"> CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE<br>
CL_QUEUE_PROFILING_ENABLE
</p><p class="tableblock"> These properties are described in the <a href="#queue-properties-table">Queue Properties</a> table.
</p><p class="tableblock"> The mandated minimum capability is: CL_QUEUE_PROFILING_ENABLE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue_properties</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the on device command-queue properties supported by the
device.
This is a bit-field that describes one or more of the following
values:
</p><p class="tableblock"> CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE<br>
CL_QUEUE_PROFILING_ENABLE
</p><p class="tableblock"> These properties are described in the <a href="#queue-properties-table">Queue Properties</a> table.
</p><p class="tableblock"> The mandated minimum capability is:
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_PROFILING_ENABLE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The size of the device queue in bytes preferred by the
implementation.
Applications should use this size for the device queue to ensure
good performance.
</p><p class="tableblock"> The minimum value is 16 KB</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The max. size of the device queue in bytes.
The minimum value is 256 KB for the full profile and 64 KB for the
embedded profile</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_ON_DEVICE_QUEUES</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of device queues that can be created for this
device in a single context.
</p><p class="tableblock"> The minimum value is 1.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_ON_DEVICE_EVENTS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The maximum number of events in use by a device queue.
These refer to events returned by the <code>enqueue_</code> built-in functions
to a device queue or user events returned by the <code>create_user_event</code>
built-in function that have not been released.
</p><p class="tableblock"> The minimum value is 1024.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_BUILT_IN_KERNELS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">A semi-colon separated list of built-in kernels supported by the
device.
An empty string is returned if no built-in kernels are supported by
the device.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PLATFORM</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_platform_id</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">The platform associated with this device.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_NAME</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Device name string.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_VENDOR</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Vendor name string.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DRIVER_VERSION</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL software driver version string.
Follows a vendor-specific format.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PROFILE<sup>9</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL profile string.
Returns the profile name supported by the device.
The profile name returned can be one of the following strings:
</p><p class="tableblock"> FULL_PROFILE - if the device supports the OpenCL specification
(functionality defined as part of the core specification and does
not require any extensions to be supported).
</p><p class="tableblock"> EMBEDDED_PROFILE - if the device supports the OpenCL embedded
profile.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_VERSION</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL version string.
Returns the OpenCL version supported by the device. This version
string has the following format:
</p><p class="tableblock"> <em>OpenCL&lt;space&gt;&lt;major_version.minor_version&gt;&lt;space&gt;&lt;vendor-specific
information&gt;</em>
</p><p class="tableblock"> The major_version.minor_version value returned will be 2.2.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_OPENCL_C_VERSION</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">OpenCL C version string.
Returns the highest OpenCL C version supported by the compiler for
this device that is not of type CL_DEVICE_TYPE_CUSTOM.
This version string has the following format:
</p><p class="tableblock"> <em>OpenCL&lt;space&gt;C&lt;space&gt;&lt;major_version.minor_version&gt;&lt;space&gt;&lt;vendor-specific
information&gt;</em>
</p><p class="tableblock"> The major_version.minor_version value returned must be 2.0 if
CL_DEVICE_VERSION is OpenCL 2.0.
</p><p class="tableblock"> The major_version.minor_version value returned must be 1.2 if
CL_DEVICE_VERSION is OpenCL 1.2.
</p><p class="tableblock"> The major_version.minor_version value returned must be 1.1 if
CL_DEVICE_VERSION is OpenCL 1.1.
</p><p class="tableblock"> The major_version.minor_version value returned can be 1.0 or 1.1 if
CL_DEVICE_VERSION is OpenCL 1.0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_EXTENSIONS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">char[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns a space separated list of extension names (the extension
names themselves do not contain any spaces) supported by the device.
The list of extension names returned can be vendor supported
extension names and one or more of the following Khronos approved
extension names:
</p><p class="tableblock"> <strong>cl_khr_int64_base_atomics</strong><br>
<strong>cl_khr_int64_extended_atomics</strong><br>
<strong>cl_khr_fp16</strong><br>
<strong>cl_khr_gl_sharing</strong><br>
<strong>cl_khr_gl_event</strong><br>
<strong>cl_khr_d3d10_sharing</strong><br>
<strong>cl_khr_dx9_media_sharing</strong><br>
<strong>cl_khr_d3d11_sharing</strong><br>
<strong>cl_khr_gl_depth_images</strong><br>
<strong>cl_khr_gl_msaa_sharing</strong><br>
<strong>cl_khr_initialize_memory</strong><br>
<strong>cl_khr_terminate_context</strong><br>
<strong>cl_khr_spir</strong><br>
<strong>cl_khr_srgb_image_writes</strong>
</p><p class="tableblock"> The following approved Khronos extension names must be returned by
all devices that support OpenCL C 2.0:
</p><p class="tableblock"> <strong>cl_khr_byte_addressable_store</strong><br>
<strong>cl_khr_fp64</strong> (for backward compatibility if double precision is
supported)<br>
<strong>cl_khr_3d_image_writes</strong><br>
<strong>cl_khr_image2d_from_buffer</strong><br>
<strong>cl_khr_depth_images</strong>
</p><p class="tableblock"> Please refer to the OpenCL 2.0 Extension Specification for a
detailed description of these extensions.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PRINTF_BUFFER_SIZE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum size in bytes of the internal buffer that holds the output
of printf calls from a kernel.
The minimum value for the FULL profile is 1 MB.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_INTEROP_USER_SYNC</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if the devices preference is for the user to be
responsible for synchronization, when sharing memory objects between
OpenCL and other APIs such as DirectX, CL_FALSE if the device /
implementation has a performant path for performing synchronization
of memory object shared between OpenCL and other APIs such as
DirectX.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARENT_DEVICE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_id</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the cl_device_id of the parent device to which this
sub-device belongs.
If <em>device</em> is a root-level device, a <code>NULL</code> value is returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_MAX_SUB_DEVICES</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the maximum number of sub-devices that can be created when a
device is partitioned.
</p><p class="tableblock"> The value returned cannot exceed CL_DEVICE_MAX_COMPUTE_UNITS.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_PROPERTIES</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_partition_ property[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the list of partition types supported by <em>device</em>.
This is an array of cl_device_partition_property values drawn from
the following list:
</p><p class="tableblock"> CL_DEVICE_PARTITION_EQUALLY<br>
CL_DEVICE_PARTITION_BY_COUNTS<br>
CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN
</p><p class="tableblock"> If the device cannot be partitioned (i.e. there is no partitioning
scheme supported by the device that will return at least two
subdevices), a value of 0 will be returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_AFFINITY_DOMAIN</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_affinity_ domain</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the list of supported affinity domains for partitioning the
device using CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN.
This is a bit-field that describes one or more of the following
values:
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_NUMA<br>
CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE<br>
CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE<br>
CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE<br>
CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE<br>
CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE
</p><p class="tableblock"> If the device does not support any affinity domains, a value of 0
will be returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PARTITION_TYPE</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_partition_ property[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the properties argument specified in <strong>clCreateSubDevices</strong> if
device is a sub-device.
In the case where the properties argument to <strong>clCreateSubDevices</strong> is
CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN,
CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE, the affinity domain
used to perform the partition will be returned.
This can be one of the following values:
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_NUMA<br>
CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE<br>
CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE<br>
CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE<br>
CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE
</p><p class="tableblock"> Otherwise the implementation may either return a
<em>param_value_size_ret</em> of 0 i.e. there is no partition type
associated with device or can return a property value of 0 (where 0
is used to terminate the partition property list) in the memory that
<em>param_value</em> points to.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_REFERENCE_COUNT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the <em>device</em> reference count.
If the device is a root-level device, a reference count of one is
returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_SVM_CAPABILITIES</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_svm_ capabilities</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Describes the various shared virtual memory (a.k.a. SVM) memory
allocation types the device supports.
Coarse-grain SVM allocations are required to be supported by all
OpenCL 2.0 devices.
This is a bit-field that describes a combination of the following
values:
</p><p class="tableblock"> CL_DEVICE_SVM_COARSE_GRAIN_BUFFER - Support for coarse-grain buffer
sharing using <strong>clSVMAlloc</strong>.
Memory consistency is guaranteed at synchronization points and the
host must use calls to <strong>clEnqueueMapBuffer</strong> and
<strong>clEnqueueUnmapMemObject</strong>.
</p><p class="tableblock"> CL_DEVICE_SVM_FINE_GRAIN_BUFFER - Support for fine-grain buffer
sharing using <strong>clSVMAlloc</strong>.
Memory consistency is guaranteed at synchronization points without
need for <strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueUnmapMemObject</strong>.
</p><p class="tableblock"> CL_DEVICE_SVM_FINE_GRAIN_SYSTEM - Support for sharing the host&#8217;s
entire virtual memory including memory allocated using <strong>malloc</strong>.
Memory consistency is guaranteed at synchronization points.
</p><p class="tableblock"> CL_DEVICE_SVM_ATOMICS - Support for the OpenCL 2.0 atomic
operations that provide memory consistency across the host and all
OpenCL devices supporting fine-grain SVM allocations.
</p><p class="tableblock"> The mandated minimum capability is
CL_DEVICE_SVM_COARSE_GRAIN_BUFFER.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_PLATFORM_ATOMIC_ALIGNMENT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the value representing the preferred alignment in bytes for
OpenCL 2.0 fine-grained SVM atomic types.
This query can return 0 which indicates that the preferred alignment
is aligned to the natural size of the type.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_GLOBAL_ATOMIC_ALIGNMENT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the value representing the preferred alignment in bytes for
OpenCL 2.0 atomic types to global memory.
This query can return 0 which indicates that the preferred alignment
is aligned to the natural size of the type.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_PREFERRED_LOCAL_ATOMIC_ALIGNMENT</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the value representing the preferred alignment in bytes for
OpenCL 2.0 atomic types to local memory.
This query can return 0 which indicates that the preferred alignment
is aligned to the natural size of the type.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_MAX_NUM_SUB_GROUPS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum number of sub-groups in a work-group that a device is
capable of executing on a single compute unit, for any given
kernel-instance running on the device.
The minimum value is 1.
(Refer also to <strong>clGetKernelSubGroupInfo</strong>.)</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Is CL_TRUE if this device supports independent forward progress of
sub-groups, CL_FALSE otherwise.
If <strong>cl_khr_subgroups</strong> is supported by the device this must return
CL_TRUE.</p></td>
</tr>
</tbody>
</table>
<div class="dlist">
<dl>
<dt class="hdlist1">4</dt>
<dd>
<p>A kernel that uses an image argument with the write_only or read_write
image qualifier may result in additional read_only images resources being
created internally by an implementation.
The internally created read_only image resources will count against the max
supported read image arguments given by CL_DEVICE_MAX_READ_IMAGE_ARGS.
Enqueuing a kernel that requires more images than the implementation can
support will result in a CL_OUT_OF_RESOURCES error being returned.</p>
</dd>
<dt class="hdlist1">5</dt>
<dd>
<p>NOTE: <strong>CL_DEVICE_MAX_WRITE_IMAGE_ARGS</strong> is only there for backward
compatibility.
<strong>CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS</strong> should be used instead.</p>
</dd>
<dt class="hdlist1">6</dt>
<dd>
<p>The optional rounding modes should be included as a device capability
only if it is supported natively.
All explicit conversion functions with specific rounding modes must
still operate correctly.</p>
</dd>
<dt class="hdlist1">7</dt>
<dd>
<p>The optional rounding modes should be included as a device capability
only if it is supported natively.
All explicit conversion functions with specific rounding modes must
still operate correctly.</p>
</dd>
<dt class="hdlist1">8</dt>
<dd>
<p>CL_DEVICE_QUEUE_PROPERTIES is deprecated and replaced by
CL_DEVICE_QUEUE_ON_HOST_PROPERTIES.</p>
</dd>
<dt class="hdlist1">9</dt>
<dd>
<p>The platform profile returns the profile that is implemented by the
OpenCL framework.
If the platform profile returned is FULL_PROFILE, the OpenCL framework
will support devices that are FULL_PROFILE and may also support devices
that are EMBEDDED_PROFILE.
The compiler must be available for all devices i.e.
CL_DEVICE_COMPILER_AVAILABLE is CL_TRUE.
If the platform profile returned is EMBEDDED_PROFILE, then devices that
are only EMBEDDED_PROFILE are supported.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>The device queries described in the <a href="#device-queries-table">Device Queries</a>
table should return the same information for a root-level device i.e. a
device returned by <strong>clGetDeviceIDs</strong> and any sub-devices created from this
device except for the following queries:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_DEVICE_GLOBAL_MEM_CACHE_SIZE</p>
</li>
<li>
<p>CL_DEVICE_BUILT_IN_KERNELS</p>
</li>
<li>
<p>CL_DEVICE_PARENT_DEVICE</p>
</li>
<li>
<p>CL_DEVICE_PARTITION_TYPE</p>
</li>
<li>
<p>CL_DEVICE_REFERENCE_COUNT</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clGetDeviceInfo</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_DEVICE if <em>device</em> is not valid.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
if size in bytes specified by <em>param_value_size</em> is &lt; size of return
type as specified in the <a href="#device-queries-table">Device Queries</a> table
and <em>param_value</em> is not a <code>NULL</code> value or if <em>param_name</em> is a value
that is available as an extension and the corresponding extension is not
supported by the device.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetDeviceAndHostTimer(cl_device_id device,
cl_ulong* device_timestamp,
cl_ulong* host_timestamp)</code></pre>
</div>
</div>
<div class="paragraph">
<p>Returns a reasonably synchronized pair of timestamps from the device timer
and the host timer as seen by <em>device</em>.
Implementations may need to execute this query with a high latency in order
to provide reasonable synchronization of the timestamps.
The host timestamp and device timestamp returned by this function and
<strong>clGetHostTimer</strong> each have an implementation defined timebase.
The timestamps will always be in their respective timebases regardless of
which query function is used.
The timestamp returned from <strong>clGetEventProfilingInfo</strong> for an event on a
device and a device timestamp queried from the same device will always be in
the same timebase.</p>
</div>
<div class="paragraph">
<p><em>device</em> is a device returned by <strong>clGetDeviceIDs</strong>.</p>
</div>
<div class="paragraph">
<p><em>device_timestamp</em> will be updated with the value of the device timer in
nanoseconds.
The resolution of the timer is the same as the device profiling timer
returned by <strong>clGetDeviceInfo</strong> and the CL_DEVICE_PROFILING_TIMER_RESOLUTION
query.</p>
</div>
<div class="paragraph">
<p><em>host_timestamp</em> will be updated with the value of the host timer in
nanoseconds at the closest possible point in time to that at which
<em>device_timer</em> was returned.
The resolution of the timer may be queried via <strong>clGetPlatformInfo</strong> and the
flag CL_PLATFORM_HOST_TIMER_RESOLUTION.</p>
</div>
<div class="paragraph">
<p><strong>clGetDeviceAndHostTimer</strong> will return CL_SUCCESS with a time value in
<em>host_timestamp</em> if provided.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_DEVICE if <em>device</em> is not a valid OpenCL device.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>host_timestamp</em> or <em>device_timestamp</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetHostTimer(cl_device_id device,
cl_ulong* host_timestamp)</code></pre>
</div>
</div>
<div class="paragraph">
<p>Return the current value of the host clock as seen by <em>device</em>.
This value is in the same timebase as the host_timestamp returned from
<strong>clGetDeviceAndHostTimer</strong>.
The implementation will return with as low a latency as possible to allow a
correlation with a subsequent application sampled time.
The host timestamp and device timestamp returned by this function and
<strong>clGetDeviceAndHostTimer</strong> each have an implementation defined timebase.
The timestamps will always be in their respective timebases regardless of
which query function is used.
The timestamp returned from <strong>clGetEventProfilingInfo</strong> for an event on a
device and a device timestamp queried from the same device will always be in
the same timebase.</p>
</div>
<div class="paragraph">
<p><em>device</em> is a device returned by <strong>clGetDeviceIDs</strong>.</p>
</div>
<div class="paragraph">
<p><em>host_timestamp</em> will be updated with the value of the current timer in
nanoseconds.
The resolution of the timer may be queried via <strong>clGetPlatformInfo</strong> and the
flag CL_PLATFORM_HOST_TIMER_RESOLUTION.</p>
</div>
<div class="paragraph">
<p><strong>clGetHostTimer</strong> will return CL_SUCCESS with a time value in
<em>host_timestamp</em> if provided.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_DEVICE if <em>device</em> is not a valid OpenCL device.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>host_timestamp</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
<div class="sect2">
<h3 id="_partitioning_a_device">4.3. Partitioning a Device</h3>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clCreateSubDevices(cl_device_id in_device,
<span class="directive">const</span> cl_device_partition_property *properties,
cl_uint num_devices,
cl_device_id *out_devices,
cl_uint *num_devices_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>creates an array of sub-devices that each reference a non-intersecting set
of compute units within in_device, according to a partition scheme given by
<em>properties</em>.
The output sub-devices may be used in every way that the root (or parent)
device can be used, including creating contexts, building programs, further
calls to <strong>clCreateSubDevices</strong> and creating command-queues.
When a command-queue is created against a sub-device, the commands enqueued
on the queue are executed only on the sub-device.</p>
</div>
<div class="paragraph">
<p><em>in_device</em> is the device to be partitioned.</p>
</div>
<div class="paragraph">
<p><em>properties</em> specifies how <em>in_device</em> is to be partition described by a
partition name and its corresponding value.
Each partition name is immediately followed by the corresponding desired
value.
The list is terminated with 0.
The list of supported partitioning schemes is described in the
<a href="#subdevice-partition-table">Subdevice Partition</a> table.
Only one of the listed partitioning schemes can be specified in
<em>properties</em>.</p>
</div>
<table id="subdevice-partition-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 6. <em>List of supported partition schemes by</em> <strong>clCreateSubDevices</strong></caption>
<colgroup>
<col style="width: 30%;">
<col style="width: 20%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_device_partition_property enum</strong></th>
<th class="tableblock halign-left valign-top">Partition value</th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_PARTITION_EQUALLY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Split the aggregate device into as many smaller aggregate devices as
can be created, each containing <em>n</em> compute units.
The value <em>n</em> is passed as the value accompanying this property.
If <em>n</em> does not divide evenly into
CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS, then the remaining compute
units are not used.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_PARTITION_BY_COUNTS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This property is followed by a
CL_DEVICE_PARTITION_BY_COUNTS_LIST_END terminated list of compute
unit counts.
For each non-zero count <em>m</em> in the list, a sub-device is created
with <em>m</em> compute units in it.
CL_DEVICE_PARTITION_BY_COUNTS_LIST_END is defined to be 0.
</p><p class="tableblock"> The number of non-zero count entries in the list may not exceed
CL_DEVICE_PARTITION_MAX_SUB_DEVICES.
</p><p class="tableblock"> The total number of compute units specified may not exceed
CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_affinity_ domain</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Split the device into smaller aggregate devices containing one or
more compute units that all share part of a cache hierarchy.
The value accompanying this property may be drawn from the following
list:
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_NUMA - Split the device into sub-devices
comprised of compute units that share a NUMA node.
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE - Split the device into
sub-devices comprised of compute units that share a level 4 data
cache.
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE - Split the device into
sub-devices comprised of compute units that share a level 3 data
cache.
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE - Split the device into
sub-devices comprised of compute units that share a level 2 data
cache.
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE - Split the device into
sub-devices comprised of compute units that share a level 1 data
cache.
</p><p class="tableblock"> CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE - Split the device
along the next partitionable affinity domain.
The implementation shall find the first level along which the device
or sub-device may be further subdivided in the order NUMA, L4, L3,
L2, L1, and partition the device into sub-devices comprised of
compute units that share memory subsystems at this level.
</p><p class="tableblock"> The user may determine what happened by calling
<strong>clGetDeviceInfo</strong>(CL_DEVICE_PARTITION_TYPE) on the sub-devices.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><em>num_devices</em> is the size of memory pointed to by <em>out_devices</em> specified as
the number of cl_device_id entries.</p>
</div>
<div class="paragraph">
<p><em>out_devices</em> is the buffer where the OpenCL sub-devices will be returned.
If <em>out_devices</em> is <code>NULL</code>, this argument is ignored.
If <em>out_devices</em> is not <code>NULL</code>, <em>num_devices</em> must be greater than or equal
to the number of sub-devices that <em>device</em> may be partitioned into according
to the partitioning scheme specified in <em>properties</em>.</p>
</div>
<div class="paragraph">
<p><em>num_devices_ret</em> returns the number of sub-devices that <em>device</em> may be
partitioned into according to the partitioning scheme specified in
<em>properties</em>.
If <em>num_devices_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><strong>clCreateSubDevices</strong> returns CL_SUCCESS if the partition is created
successfully.
Otherwise, it returns a <code>NULL</code> value with the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_DEVICE if <em>in_device</em> is not valid.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values specified in <em>properties</em> are not valid or if
values specified in <em>properties</em> are valid but not supported by the
device.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>out_devices</em> is not <code>NULL</code> and <em>num_devices</em> is
less than the number of sub-devices created by the partition scheme.</p>
</li>
<li>
<p>CL_DEVICE_PARTITION_FAILED if the partition name is supported by the
implementation but in_device could not be further partitioned.</p>
</li>
<li>
<p>CL_INVALID_DEVICE_PARTITION_COUNT if the partition name specified in
<em>properties</em> is CL_DEVICE_PARTITION_BY_COUNTS and the number of
sub-devices requested exceeds CL_DEVICE_PARTITION_MAX_SUB_DEVICES or the
total number of compute units requested exceeds
CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS for <em>in_device</em>, or the number of
compute units requested for one or more sub-devices is less than zero or
the number of sub-devices requested exceeds
CL_DEVICE_PARTITION_MAX_COMPUTE_UNITS for <em>in_device</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>A few examples that describe how to specify partition properties in
<em>properties</em> argument to <strong>clCreateSubDevices</strong> are given below:</p>
</div>
<div class="paragraph">
<p>To partition a device containing 16 compute units into two sub-devices, each
containing 8 compute units, pass the following in <em>properties</em>:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_EQUALLY, <span class="integer">8</span>, <span class="integer">0</span> }</code></pre>
</div>
</div>
<div class="paragraph">
<p>To partition a device with four compute units into two sub-devices with one
sub-device containing 3 compute units and the other sub-device 1 compute
unit, pass the following in properties argument:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_BY_COUNTS,
<span class="integer">3</span>, <span class="integer">1</span>, CL_DEVICE_PARTITION_BY_COUNTS_LIST_END, <span class="integer">0</span> }</code></pre>
</div>
</div>
<div class="paragraph">
<p>To split a device along the outermost cache line (if any), pass the
following in properties argument:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN,
CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE,
<span class="integer">0</span> }</code></pre>
</div>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainDevice(cl_device_id device)</code></pre>
</div>
</div>
<div class="paragraph">
<p>increments the <em>device</em> reference count if <em>device</em> is a valid sub-device
created by a call to <strong>clCreateSubDevices</strong>.
If <em>device</em> is a root level device i.e. a cl_device_id returned by
<strong>clGetDeviceIDs</strong>, the <em>device</em> reference count remains unchanged.
<strong>clRetainDevice</strong> returns CL_SUCCESS if the function is executed successfully
or the device is a root-level device.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_DEVICE if <em>device</em> is not a valid sub-device created by a
call to <strong>clCreateSubDevices</strong>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseDevice(cl_device_id device)</code></pre>
</div>
</div>
<div class="paragraph">
<p>decrements the <em>device</em> reference count if device is a valid sub-device
created by a call to <strong>clCreateSubDevices</strong>.
If <em>device</em> is a root level device i.e. a cl_device_id returned by
<strong>clGetDeviceIDs</strong>, the <em>device</em> reference count remains unchanged.
<strong>clReleaseDevice</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_DEVICE if <em>device</em> is not a valid sub-device created by a
call to <strong>clCreateSubDevices</strong>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>After the <em>device</em> reference count becomes zero and all the objects attached
to <em>device</em> (such as command-queues) are released, the <em>device</em> object is
deleted.
Using this function to release a reference that was not obtained by creating
the object or by calling <strong>clRetainDevice</strong> causes undefined behavior.</p>
</div>
</div>
<div class="sect2">
<h3 id="_contexts">4.4. Contexts</h3>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_context clCreateContext(<span class="directive">const</span> cl_context_properties *properties,
cl_uint num_devices,
<span class="directive">const</span> cl_device_id *devices,
<span class="directive">void</span>(CL_CALLBACK *pfn_notify)
(<span class="directive">const</span> <span class="predefined-type">char</span> *errinfo,
<span class="directive">const</span> <span class="directive">void</span> *private_info,
size_t cb,
<span class="directive">void</span> *user_data),
<span class="directive">void</span> *user_data,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>creates an OpenCL context.
An OpenCL context is created with one or more devices.
Contexts are used by the OpenCL runtime for managing objects such as
command-queues, memory, program and kernel objects and for executing kernels
on one or more devices specified in the context.</p>
</div>
<div class="paragraph">
<p><em>properties</em> specifies a list of context property names and their
corresponding values.
Each property name is immediately followed by the corresponding desired
value.
The list is terminated with 0.
The list of supported properties is described in the
<a href="#context-properties-table">Context Properties</a> table.
<em>properties</em> can be <code>NULL</code> in which case the platform that is selected is
implementation-defined.</p>
</div>
<table id="context-properties-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 7. <em>List of supported properties by</em> <strong>clCreateContext</strong></caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_context_properties enum</strong></th>
<th class="tableblock halign-left valign-top">Property value</th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_PLATFORM</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_platform_id</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies the platform to use.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_INTEROP_USER_SYNC</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies whether the user is responsible for synchronization
between OpenCL and other APIs.
Please refer to the specific sections in the OpenCL 2.0 extension
specification that describe sharing with other APIs for restrictions
on using this flag.
</p><p class="tableblock"> If CL_CONTEXT_INTEROP_USER_SYNC is not specified, a default of
CL_FALSE is assumed.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><em>num_devices</em> is the number of devices specified in the <em>devices</em> argument.</p>
</div>
<div class="paragraph">
<p><em>devices</em> is a pointer to a list of unique devices<sup>10</sup> returned by
<strong>clGetDeviceIDs</strong> or sub-devices created by <strong>clCreateSubDevices</strong> for a
platform.</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">10</dt>
<dd>
<p>Duplicate devices specified in <em>devices</em> are ignored.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p><em>pfn_notify</em> is a callback function that can be registered by the
application.
This callback function will be used by the OpenCL implementation to report
information on errors during context creation as well as errors that occur
at runtime in this context.
This callback function may be called asynchronously by the OpenCL
implementation.
It is the applications responsibility to ensure that the callback function
is thread-safe.
The parameters to this callback function are:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><em>errinfo</em> is a pointer to an error string.</p>
</li>
<li>
<p><em>private_info</em> and <em>cb</em> represent a pointer to binary data that is
returned by the OpenCL implementation that can be used to log additional
information helpful in debugging the error.</p>
</li>
<li>
<p><em>user_data</em> is a pointer to user supplied data.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If <em>pfn_notify</em> is <code>NULL</code>, no callback function is registered.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
There are a number of cases where error notifications need to be
delivered due to an error that occurs outside a context.
Such notifications may not be delivered through the <em>pfn_notify</em> callback.
Where these notifications go is implementation-defined.
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p><em>user_data</em> will be passed as the <em>user_data</em> argument when <em>pfn_notify</em> is
called.
<em>user_data</em> can be <code>NULL</code>.</p>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clCreateContext</strong> returns a valid non-zero context and <em>errcode_ret</em> is set
to CL_SUCCESS if the context is created successfully.
Otherwise, it returns a <code>NULL</code> value with the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_PLATFORM if <em>properties</em> is <code>NULL</code> and no platform could be
selected or if platform value specified in <em>properties</em> is not a valid
platform.</p>
</li>
<li>
<p>CL_INVALID_PROPERTY if context property name in <em>properties</em> is not a
supported property name, if the value specified for a supported property
name is not valid, or if the same property name is specified more than
once.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>devices</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>num_devices</em> is equal to zero.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>pfn_notify</em> is <code>NULL</code> but <em>user_data</em> is not
<code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_DEVICE if <em>devices</em> contains an invalid device.</p>
</li>
<li>
<p>CL_DEVICE_NOT_AVAILABLE if a device in <em>devices</em> is currently not
available even though the device was returned by <strong>clGetDeviceIDs</strong>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function<sup>11</sup></p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_context clCreateContextFromType(<span class="directive">const</span> cl_context_properties *properties,
cl_device_type device_type,
<span class="directive">void</span>(CL_CALLBACK *pfn_notify)(
(<span class="directive">const</span> <span class="predefined-type">char</span> *errinfo,
<span class="directive">const</span> <span class="directive">void</span> *private_info,
size_t cb,
<span class="directive">void</span> *user_data),
<span class="directive">void</span> *user_data,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>creates an OpenCL context from a device type that identifies the specific
device(s) to use.
Only devices that are returned by <strong>clGetDeviceIDs</strong> for <em>device_type</em> are
used to create the context.
The context does not reference any sub-devices that may have been created
from these devices.</p>
</div>
<div class="paragraph">
<p><em>properties</em> specifies a list of context property names and their
corresponding values.
Each property name is immediately followed by the corresponding desired
value.
The list of supported properties is described in the
<a href="#context-properties-table">Context Properties</a> table.
<em>properties</em> can also be <code>NULL</code> in which case the platform that is selected
is implementation-defined.</p>
</div>
<div class="paragraph">
<p><em>device_type</em> is a bit-field that identifies the type of device and is
described in the <a href="#device-categories-table">Device Categories</a> table.</p>
</div>
<div class="paragraph">
<p><em>pfn_notify</em> and <em>user_data</em> are described in <strong>clCreateContext</strong>.</p>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clCreateContextFromType</strong> returns a valid non-zero context and <em>errcode_ret</em>
is set to CL_SUCCESS if the context is created successfully.
Otherwise, it returns a <code>NULL</code> value with the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_PLATFORM if <em>properties</em> is <code>NULL</code> and no platform could be
selected or if platform value specified in <em>properties</em> is not a valid
platform.</p>
</li>
<li>
<p>CL_INVALID_PROPERTY if context property name in <em>properties</em> is not a
supported property name, if the value specified for a supported property
name is not valid, or if the same property name is specified more than
once.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>pfn_notify</em> is <code>NULL</code> but <em>user_data</em> is not
<code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_DEVICE_TYPE if <em>device_type</em> is not a valid value.</p>
</li>
<li>
<p>CL_DEVICE_NOT_AVAILABLE if no devices that match <em>device_type</em> and
property values specified in <em>properties</em> are currently available.</p>
</li>
<li>
<p>CL_DEVICE_NOT_FOUND if no devices that match <em>device_type</em> and property
values specified in <em>properties</em> were found.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
<div class="dlist">
<dl>
<dt class="hdlist1">11</dt>
<dd>
<p><strong>clCreateContextfromType</strong> may return all or a subset of the actual
physical devices present in the platform and that match device_type.</p>
</dd>
</dl>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainContext(cl_context context)</code></pre>
</div>
</div>
<div class="paragraph">
<p>increments the <em>context</em> reference count.
<strong>clRetainContext</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid OpenCL context.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clCreateContext</strong> and <strong>clCreateContextFromType</strong> perform an implicit retain.
This is very helpful for 3<sup>rd</sup> party libraries, which typically get a
context passed to them by the application.
However, it is possible that the application may delete the context without
informing the library.
Allowing functions to attach to (i.e. retain) and release a context solves
the problem of a context being used by a library no longer being valid.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseContext(cl_context context)</code></pre>
</div>
</div>
<div class="paragraph">
<p>decrements the <em>context</em> reference count.
<strong>clReleaseContext</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid OpenCL context.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>After the <em>context</em> reference count becomes zero and all the objects
attached to <em>context</em> (such as memory objects, command-queues) are released,
the <em>context</em> is deleted.
Using this function to release a reference that was not obtained by creating
the object or by calling <strong>clRetainContext</strong> causes undefined behavior.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetContextInfo(cl_context context,
cl_context_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>can be used to query information about a context.</p>
</div>
<div class="paragraph">
<p><em>context</em> specifies the OpenCL context being queried.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> is an enumeration constant that specifies the information to
query.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory where the appropriate result being
queried is returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> specifies the size in bytes of memory pointed to by
<em>param_value</em>.
This size must be greater than or equal to the size of return type as
described in the <a href="#context-info-table">Context Attributes</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p>The list of supported <em>param_name</em> values and the information returned in
<em>param_value</em> by <strong>clGetContextInfo</strong> is described in the
<a href="#context-info-table">Context Attributes</a> table.</p>
</div>
<table id="context-info-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 8. List of supported param_names by <strong>clGetContextInfo</strong></caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_context_info</strong></th>
<th class="tableblock halign-left valign-top">Return Type</th>
<th class="tableblock halign-left valign-top">Information returned in param_value</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_REFERENCE_COUNT</strong><sup>12</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the <em>context</em> reference count.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_NUM_DEVICES</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the number of devices in <em>context</em>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_DEVICES</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_id[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the list of devices and sub-devices in <em>context</em>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_CONTEXT_PROPERTIES</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context_properties[]</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the properties argument specified in <strong>clCreateContext</strong> or
<strong>clCreateContextFromType</strong>.
</p><p class="tableblock"> If the <em>properties</em> argument specified in <strong>clCreateContext</strong> or
<strong>clCreateContextFromType</strong> used to create <em>context</em> is not <code>NULL</code>, the
implementation must return the values specified in the properties
argument.
</p><p class="tableblock"> If the <em>properties</em> argument specified in <strong>clCreateContext</strong> or
<strong>clCreateContextFromType</strong> used to create <em>context</em> is <code>NULL</code>, the
implementation may return either a <em>param_value_size_ret</em> of 0
i.e. there is no context property value to be returned or can return
a context property value of 0 (where 0 is used to terminate the
context properties list) in the memory that <em>param_value</em> points
to.</p></td>
</tr>
</tbody>
</table>
<div class="dlist">
<dl>
<dt class="hdlist1">12</dt>
<dd>
<p>The reference count returned should be considered immediately stale.
It is unsuitable for general use in applications.
This feature is provided for identifying memory leaks.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p><strong>clGetContextInfo</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
if size in bytes specified by <em>param_value_size</em> is &lt; size of return
type as specified in the <a href="#context-info-table">Context Attributes</a>
table and <em>param_value</em> is not a <code>NULL</code> value.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="opencl-runtime">5. The OpenCL Runtime</h2>
<div class="sectionbody">
<div class="paragraph">
<p>In this section we describe the API calls that manage OpenCL objects such as
command-queues, memory objects, program objects, kernel objects for kernel
functions in a program and calls that allow you to enqueue commands to a
command-queue such as executing a kernel, reading, or writing a memory
object.</p>
</div>
<div class="sect2">
<h3 id="_command_queues">5.1. Command Queues</h3>
<div class="paragraph">
<p>OpenCL objects such as memory, program and kernel objects are created using
a context.
Operations on these objects are performed using a command-queue.
The command-queue can be used to queue a set of operations (referred to as
commands) in order.
Having multiple command-queues allows applications to queue multiple
independent commands without requiring synchronization.
Note that this should work as long as these objects are not being shared.
Sharing of objects across multiple command-queues will require the
application to perform appropriate synchronization.
This is described in <a href="#shared-opencl-objects">Shared OpenCL Objects</a></p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_command_queue clCreateCommandQueueWithProperties(
cl_context context,
cl_device_id device,
<span class="directive">const</span> cl_queue_properties *properties,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>creates a host or device command-queue on a specific device.</p>
</div>
<div class="paragraph">
<p><em>context</em> must be a valid OpenCL context.</p>
</div>
<div class="paragraph">
<p><em>device</em> must be a device or sub-device associated with <em>context</em>.
It can either be in the list of devices and sub-devices specified when
<em>context</em> is created using <strong>clCreateContext</strong> or be a root device with the
same device type as specified when <em>context</em> is created using
<strong>clCreateContextFromType</strong>.</p>
</div>
<div class="paragraph">
<p><em>properties</em> specifies a list of properties for the command-queue and their
corresponding values.
Each property name is immediately followed by the corresponding desired
value.
The list is terminated with 0.
The list of supported properties is described in the table below.
If a supported property and its value is not specified in <em>properties</em>, its
default value will be used.
<em>properties</em> can be <code>NULL</code> in which case the default values for supported
command-queue properties will be used.</p>
</div>
<table id="queue-properties-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 9. List of supported cl_queue_properties values and description</caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>Queue Properties</strong></th>
<th class="tableblock halign-left valign-top">Property Value</th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_PROPERTIES</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bitfield</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This is a bitfield and can be set to a combination of the following
values:
</p><p class="tableblock"> CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE - Determines whether the
commands queued in the command-queue are executed in-order or
out-of-order.
If set, the commands in the command-queue are executed out-of-order.
Otherwise, commands are executed in-order.
</p><p class="tableblock"> CL_QUEUE_PROFILING_ENABLE - Enable or disable profiling of commands
in the command-queue.
If set, the profiling of commands is enabled.
Otherwise profiling of commands is disabled.
</p><p class="tableblock"> CL_QUEUE_ON_DEVICE - Indicates that this is a device queue.
If CL_QUEUE_ON_DEVICE is set,
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE<sup>1</sup> must also be set.
</p><p class="tableblock"> CL_QUEUE_ON_DEVICE_DEFAULT<sup>2</sup> --indicates that this is the default
device queue.
This can only be used with CL_QUEUE_ON_DEVICE.
</p><p class="tableblock"> If CL_QUEUE_PROPERTIES is not specified an in-order host command
queue is created for the specified device</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_SIZE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies the size of the device queue in bytes.
</p><p class="tableblock"> This can only be specified if CL_QUEUE_ON_DEVICE is set in
CL_QUEUE_PROPERTIES.
This must be a value ≤ CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE.
</p><p class="tableblock"> For best performance, this should be ≤
CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE.
</p><p class="tableblock"> If CL_QUEUE_SIZE is not specified, the device queue is created with
CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE as the size of the queue.</p></td>
</tr>
</tbody>
</table>
<div class="dlist">
<dl>
<dt class="hdlist1">1</dt>
<dd>
<p>Only out-of-order device queues are supported.</p>
</dd>
<dt class="hdlist1">2</dt>
<dd>
<p>The application must create the default device queue if any kernels
containing calls to get_default_queue are enqueued.
There can only be one default device queue for each device within a
context.
<strong>clCreateCommandQueueWithProperties</strong> with CL_QUEUE_PROPERTIES set to
CL_QUEUE_ON_DEVICE or CL_QUEUE_ON_DEVICE_DEFAULT will return the default
device queue that has already been created and increment its retain
count by 1.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clCreateCommandQueueWithProperties</strong> returns a valid non-zero command-queue
and <em>errcode_ret</em> is set to CL_SUCCESS if the command-queue is created
successfully.
Otherwise, it returns a <code>NULL</code> value with one of the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_DEVICE if <em>device</em> is not a valid device or is not associated
with <em>context</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values specified in <em>properties</em> are not valid.</p>
</li>
<li>
<p>CL_INVALID_QUEUE_PROPERTIES if values specified in <em>properties</em> are
valid but are not supported by the device.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clSetDefaultDeviceCommandQueue(cl_context context,
cl_device_id device,
cl_command_queue command_queue)</code></pre>
</div>
</div>
<div class="paragraph">
<p>replaces the default command queue on the <em>device</em>.</p>
</div>
<div class="paragraph">
<p><strong>clSetDefaultDeviceCommandQueue</strong> returns CL_SUCCESS if the function is
executed successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_DEVICE if <em>device</em> is not a valid device or is not associated
with <em>context</em>.</p>
</li>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid command-queue
for <em>device</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clSetDefaultDeviceCommandQueue</strong> may be used to replace a default device
command queue created with <strong>clCreateCommandQueueWithProperties</strong> and the
CL_QUEUE_ON_DEVICE_DEFAULT flag.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainCommandQueue(cl_command_queue command_queue)</code></pre>
</div>
</div>
<div class="paragraph">
<p>increments the <em>command_queue</em> reference count.
<strong>clRetainCommandQueue</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid
command-queue.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clCreateCommandQueueWithProperties</strong> performs an implicit retain.
This is very helpful for 3<sup>rd</sup> party libraries, which typically get a
command-queue passed to them by the application.
However, it is possible that the application may delete the command-queue
without informing the library.
Allowing functions to attach to (i.e. retain) and release a command-queue
solves the problem of a command-queue being used by a library no longer
being valid.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseCommandQueue(cl_command_queue command_queue)</code></pre>
</div>
</div>
<div class="paragraph">
<p>decrements the <em>command_queue</em> reference count.
<strong>clReleaseCommandQueue</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid
command-queue.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>After the <em>command_queue</em> reference count becomes zero and all commands
queued to <em>command_queue</em> have finished (eg.
kernel-instances, memory object updates etc.), the command-queue is deleted.</p>
</div>
<div class="paragraph">
<p><strong>clReleaseCommandQueue</strong> performs an implicit flush to issue any previously
queued OpenCL commands in <em>command_queue</em>.
Using this function to release a reference that was not obtained by creating
the object or by calling <strong>clRetainCommandQueue</strong> causes undefined behavior.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetCommandQueueInfo(cl_command_queue command_queue,
cl_command_queue_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>can be used to query information about a command-queue.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> specifies the command-queue being queried.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> specifies the information to query.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory where the appropriate result being
queried is returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> is used to specify the size in bytes of memory pointed to
by <em>param_value</em>.
This size must be ≥ size of return type as described in the
<a href="#command-queue-param-table">Command Queue Parameter</a> table.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p>The list of supported <em>param_name</em> values and the information returned in
<em>param_value</em> by <strong>clGetCommandQueueInfo</strong> is described in the
<a href="#command-queue-param-table">Command Queue Parameter</a> table.</p>
</div>
<table id="command-queue-param-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 10. List of supported param_names by <strong>clGetCommandQueueInfo</strong></caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_command_queue_info</strong></th>
<th class="tableblock halign-left valign-top">Return Type</th>
<th class="tableblock halign-left valign-top">Information returned in param_value</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_CONTEXT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the context specified when the command-queue is created.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_DEVICE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_device_id</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the device specified when the command-queue is created.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_REFERENCE_COUNT</strong><sup>3</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the command-queue reference count.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_PROPERTIES</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue_properties</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the currently specified properties for the command-queue.
These properties are specified by the value associated with the
CL_COMMAND_QUEUE_PROPERTIES passed in <em>properties</em> argument in
<strong>clCreateCommandQueueWithProperties.</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_SIZE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the currently specified size for the device command-queue.
This query is only supported for device command queues.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_QUEUE_DEVICE_DEFAULT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_command_queue</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the current default command queue for the underlying device.</p></td>
</tr>
</tbody>
</table>
<div class="dlist">
<dl>
<dt class="hdlist1">3</dt>
<dd>
<p>The reference count returned should be considered immediately stale.
It is unsuitable for general use in applications.
This feature is provided for identifying memory leaks.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p><strong>clGetCommandQueueInfo</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid
command-queue.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not one of the supported values or
if size in bytes specified by <em>param_value_size</em> is &lt; size of return
type as specified in the <a href="#command-queue-param-table">Command Queue
Parameter</a> table, and <em>param_value</em> is not a <code>NULL</code> value.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>It is possible that a device(s) becomes unavailable after a context and
command-queues that use this device(s) have been created and commands have
been queued to command-queues.
In this case the behavior of OpenCL API calls that use this context (and
command-queues) are considered to be implementation-defined.
The user callback function, if specified, when the context is created can be
used to record appropriate information in the <em>errinfo</em>, <em>private_info</em>
arguments passed to the callback function when the device becomes
unavailable.</p>
</div>
</td>
</tr>
</table>
</div>
</div>
<div class="sect2">
<h3 id="_buffer_objects">5.2. Buffer Objects</h3>
<div class="paragraph">
<p>A <em>buffer</em> object stores a one-dimensional collection of elements.
Elements of a <em>buffer</em> object can be a scalar data type (such as an int,
float), vector data type, or a user-defined structure.</p>
</div>
<div class="sect3">
<h4 id="_creating_buffer_objects">5.2.1. Creating Buffer Objects</h4>
<div class="paragraph">
<p>A <strong>buffer object</strong> is created using the following function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_mem clCreateBuffer(cl_context context,
cl_mem_flags flags,
size_t size,
<span class="directive">void</span> *host_ptr,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>context</em> is a valid OpenCL context used to create the buffer object.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify allocation and usage
information such as the memory arena that should be used to allocate the
buffer object and how it will be used.
The <a href="#memory-flags-table">Memory Flags</a> table describes the possible values
for <em>flags</em>.
If value specified for <em>flags</em> is 0, the default is used which is
CL_MEM_READ_WRITE.</p>
</div>
<table id="memory-flags-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 11. List of supported cl_mem_flags values</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_mem_flags</strong></th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_READ_WRITE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the memory object will be read
and written by a kernel.
This is the default.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_WRITE_ONLY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the memory object will be
written but not read by a kernel.
</p><p class="tableblock"> Reading from a buffer or image object created with CL_MEM_WRITE_ONLY
inside a kernel is undefined.
</p><p class="tableblock"> CL_MEM_READ_WRITE and CL_MEM_WRITE_ONLY are mutually exclusive.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_READ_ONLY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the memory object is a
readonly memory object when used inside a kernel.
</p><p class="tableblock"> Writing to a buffer or image object created with CL_MEM_READ_ONLY inside
a kernel is undefined.
</p><p class="tableblock"> CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY and CL_MEM_READ_ONLY are mutually
exclusive.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_USE_HOST_PTR</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag is valid only if host_ptr is not <code>NULL</code>.
If specified, it indicates that the application wants the OpenCL
implementation to use memory referenced by host_ptr as the storage bits
for the memory object.
</p><p class="tableblock"> The contents of the memory pointed to by host_ptr at the time of the
clCreateBuffer call define the initial contents of the buffer object.
</p><p class="tableblock"> OpenCL implementations are allowed to cache the buffer contents pointed
to by host_ptr in device memory.
This cached copy can be used when kernels are executed on a device.
</p><p class="tableblock"> The result of OpenCL commands that operate on multiple buffer objects
created with the same host_ptr or from overlapping host or SVM regions
is considered to be undefined.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_ALLOC_HOST_PTR</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the application wants the OpenCL implementation
to allocate memory from host accessible memory.
</p><p class="tableblock"> CL_MEM_ALLOC_HOST_PTR and CL_MEM_USE_HOST_PTR are mutually exclusive.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_COPY_HOST_PTR</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag is valid only if host_ptr is not <code>NULL</code>.
If specified, it indicates that the application wants the OpenCL
implementation to allocate memory for the memory object and copy the
data from memory referenced by host_ptr.
The implementation will copy the memory immediately and host_ptr is
available for reuse by the application when the <strong>clCreateBuffer</strong> or
<strong>clCreateImage</strong> operation returns.
</p><p class="tableblock"> CL_MEM_COPY_HOST_PTR and CL_MEM_USE_HOST_PTR are mutually exclusive.
</p><p class="tableblock"> CL_MEM_COPY_HOST_PTR can be used with CL_MEM_ALLOC_HOST_PTR to
initialize the contents of the cl_mem object allocated using
hostaccessible (e.g. PCIe) memory.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_HOST_WRITE_ONLY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the host will only write to the memory object
(using OpenCL APIs that enqueue a write or a map for write).
This can be used to optimize write access from the host (e.g. enable
write-combined allocations for memory objects for devices that
communicate with the host over a system bus such as PCIe).</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_HOST_READ_ONLY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the host will only read
the memory object (using OpenCL APIs that enqueue a read or a map for
read).
</p><p class="tableblock"> CL_MEM_HOST_WRITE_ONLY and CL_MEM_HOST_READ_ONLY are mutually exclusive.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_HOST_NO_ACCESS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the host will not read or
write the memory object.
</p><p class="tableblock"> CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_READ_ONLY and
CL_MEM_HOST_NO_ACCESS are mutually exclusive.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><em>size</em> is the size in bytes of the buffer memory object to be allocated.</p>
</div>
<div class="paragraph">
<p><em>host_ptr</em> is a pointer to the buffer data that may already be allocated by
the application.
The size of the buffer that <em>host_ptr</em> points to must be ≥ <em>size</em> bytes.</p>
</div>
<div class="paragraph">
<p>The user is responsible for ensuring that data passed into and out of OpenCL
images are natively aligned relative to the start of the buffer as per
kernel language or IL requirements.
OpenCL buffers created with CL_MEM_USE_HOST_PTR need to provide an
appropriately aligned host memory pointer that is aligned to the data types
used to access these buffers in a kernel(s).</p>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p>If <strong>clCreateBuffer</strong> is called with CL_MEM_USE_HOST_PTR set in its <em>flags</em>
argument, the contents of the memory pointed to by <em>host_ptr</em> at the time
of the <strong>clCreateBuffer</strong> call define the initial contents of the
buffer object.</p>
</div>
<div class="paragraph">
<p>If <strong>clCreateBuffer</strong> is called with a pointer returned by <strong>clSVMAlloc</strong> as its
<em>host_ptr</em> argument, and CL_MEM_USE_HOST_PTR is set in its <em>flags</em> argument,
<strong>clCreateBuffer</strong> will succeed and return a valid non-zero buffer object as
long as the <em>size</em> argument to <strong>clCreateBuffer</strong> is no larger than the <em>size</em>
argument passed in the original <strong>clSVMAlloc</strong> call.
The new buffer object returned has the shared memory as the underlying
storage.
Locations in the buffers underlying shared memory can be operated on using
atomic operations to the devices level of support as defined in the memory
model.</p>
</div>
<div class="paragraph">
<p><strong>clCreateBuffer</strong> returns a valid non-zero buffer object and <em>errcode_ret</em> is
set to CL_SUCCESS if the buffer object is created successfully.
Otherwise, it returns a <code>NULL</code> value with one of the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values specified in <em>flags</em> are not valid as defined
in the <a href="#memory-flags-table">Memory Flags</a> table.</p>
</li>
<li>
<p>CL_INVALID_BUFFER_SIZE if <em>size</em> is 0<sup>4</sup>.</p>
</li>
<li>
<p>CL_INVALID_HOST_PTR if <em>host_ptr</em> is <code>NULL</code> and CL_MEM_USE_HOST_PTR or
CL_MEM_COPY_HOST_PTR are set in <em>flags</em> or if <em>host_ptr</em> is not <code>NULL</code>
but CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in <em>flags</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for buffer object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
<div class="dlist">
<dl>
<dt class="hdlist1">4</dt>
<dd>
<p>Implementations may return CL_INVALID_BUFFER_SIZE if size is greater
than CL_DEVICE_MAX_MEM_ALLOC_SIZE value specified in the
<a href="#device-queries-table">Device Queries</a> table for all devices in
context.</p>
</dd>
</dl>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_mem clCreateSubBuffer(cl_mem buffer,
cl_mem_flags flags,
cl_buffer_create_type buffer_create_type,
<span class="directive">const</span> <span class="directive">void</span> *buffer_create_info,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>can be used to create a new buffer object (referred to as a sub-buffer
object) from an existing buffer object.</p>
</div>
<div class="paragraph">
<p><em>buffer</em> must be a valid buffer object and cannot be a sub-buffer object.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify allocation and usage
information about the sub-buffer memory object being created and is
described in the <a href="#memory-flags-table">Memory Flags</a> table.
If the CL_MEM_READ_WRITE, CL_MEM_READ_ONLY or CL_MEM_WRITE_ONLY values are
not specified in <em>flags</em>, they are inherited from the corresponding memory
access qualifers associated with <em>buffer</em>.
The CL_MEM_USE_HOST_PTR, CL_MEM_ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR
values cannot be specified in <em>flags</em> but are inherited from the
corresponding memory access qualifiers associated with <em>buffer</em>.
If CL_MEM_COPY_HOST_PTR is specified in the memory access qualifier values
associated with <em>buffer</em> it does not imply any additional copies when the
sub-buffer is created from <em>buffer</em>.
If the CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY or
CL_MEM_HOST_NO_ACCESS values are not specified in <em>flags</em>, they are
inherited from the corresponding memory access qualifiers associated with
<em>buffer</em>.</p>
</div>
<div class="paragraph">
<p><em>buffer_create_type</em> and <em>buffer_create_info</em> describe the type of buffer
object to be created.
The list of supported values for <em>buffer_create_type</em> and corresponding
descriptor that <em>buffer_create_info</em> points to is described in the
<a href="#subbuffer-create-info-table">SubBuffer Attributes</a> table.</p>
</div>
<table id="subbuffer-create-info-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 12. List of supported names and values in <strong>clCreateSubBuffer</strong></caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_buffer_create_type</strong></th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_BUFFER_CREATE_TYPE_REGION</strong></p></td>
<td class="tableblock halign-left valign-top"><div><div class="paragraph">
<p>Create a buffer object that represents a
specific region in buffer.</p>
</div>
<div class="openblock">
<div class="content">
<div class="paragraph">
<p>buffer_create_info is a pointer to the following structure:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> <span class="keyword">struct</span> _cl_buffer_region {
size_t origin;
size_t size;
} cl_buffer_region;</code></pre>
</div>
</div>
<div class="paragraph">
<p>(<em>origin</em>, <em>size</em>) defines the offset and size in bytes in buffer.</p>
</div>
<div class="paragraph">
<p>If buffer is created with CL_MEM_USE_HOST_PTR, the host_ptr associated with
the buffer object returned is <em>host_ptr + origin</em>.</p>
</div>
<div class="paragraph">
<p>The buffer object returned references the data store allocated for buffer
and points to a specific region given by (origin, size) in this data store.</p>
</div>
<div class="paragraph">
<p>CL_INVALID_VALUE is returned in errcode_ret if the region specified by
(origin, size) is out of bounds in buffer.</p>
</div>
<div class="paragraph">
<p>CL_INVALID_BUFFER_SIZE if size is 0.</p>
</div>
<div class="paragraph">
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET is returned in errcode_ret if there are no
devices in context associated with buffer for which the origin value is
aligned to the CL_DEVICE_MEM_BASE_ADDR_ALIGN value.</p>
</div>
</div>
</div></div></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><strong>clCreateSubBuffer</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object or is a
sub-buffer object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>buffer</em> was created with CL_MEM_WRITE_ONLY and
<em>flags</em> specifies CL_MEM_READ_WRITE or CL_MEM_READ_ONLY, or if <em>buffer</em>
was created with CL_MEM_READ_ONLY and <em>flags</em> specifies
CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY, or if <em>flags</em> specifies
CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR or CL_MEM_COPY_HOST_PTR.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>buffer</em> was created with CL_MEM_HOST_WRITE_ONLY and
<em>flags</em> specify CL_MEM_HOST_READ_ONLY, or if <em>buffer</em> was created with
CL_MEM_HOST_READ_ONLY and <em>flags</em> specify CL_MEM_HOST_WRITE_ONLY, or if
<em>buffer</em> was created with CL_MEM_HOST_NO_ACCESS and <em>flags</em> specify
CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY.</p>
</li>
<li>
<p>CL_INVALID_VALUE if value specified in <em>buffer_create_type</em> is not
valid.</p>
</li>
<li>
<p>CL_INVALID_VALUE if value(s) specified in <em>buffer_create_info</em> (for a
given <em>buffer_create_type</em>) is not valid or if <em>buffer_create_info</em> is
<code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_BUFFER_SIZE if <em>size</em> is 0.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for sub-buffer object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>Concurrent reading from, writing to and copying between both a buffer object
and its sub-buffer object(s) is undefined.
Concurrent reading from, writing to and copying between overlapping
sub-buffer objects created with the same buffer object is undefined.
Only reading from both a buffer object and its sub-buffer objects or reading
from multiple overlapping sub-buffer objects is defined.</p>
</div>
</td>
</tr>
</table>
</div>
</div>
<div class="sect3">
<h4 id="_reading_writing_and_copying_buffer_objects">5.2.2. Reading, Writing and Copying Buffer Objects</h4>
<div class="paragraph">
<p>The following functions enqueue commands to read from a buffer object to
host memory or write to a buffer object from host memory.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueReadBuffer(cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_read,
size_t offset,
size_t size,
<span class="directive">void</span> *ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueWriteBuffer(cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_write,
size_t offset,
size_t size,
<span class="directive">const</span> <span class="directive">void</span> *ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>command_queue</em> is a valid host command-queue in which the read / write
command will be queued.
<em>command_queue</em> and <em>buffer</em> must be created with the same OpenCL context.</p>
</div>
<div class="paragraph">
<p><em>buffer</em> refers to a valid buffer object.</p>
</div>
<div class="paragraph">
<p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
operations are <em>blocking</em> or <em>non-blocking</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
<strong>clEnqueueReadBuffer</strong> does not return until the buffer data has been read
and copied into memory pointed to by <em>ptr</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
<strong>clEnqueueReadBuffer</strong> queues a non-blocking read command and returns.
The contents of the buffer that <em>ptr</em> points to cannot be used until the
read command has completed.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the read command.
When the read command has completed, the contents of the buffer that <em>ptr</em>
points to can be used by the application.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_write</em> is CL_TRUE, the write command is blocking and does not
return until the command is complete, including transfer of the data.
The memory pointed to by <em>ptr</em> can be reused by the application after the
<strong>clEnqueueWriteBuffer</strong> call returns.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use <em>ptr</em> to
perform a non-blocking write.
As the write is non-blocking the implementation can return immediately.
The memory pointed to by <em>ptr</em> cannot be reused by the application after the
call returns.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the write command.
When the write command has completed, the memory pointed to by <em>ptr</em> can
then be reused by the application.</p>
</div>
<div class="paragraph">
<p><em>offset</em> is the offset in bytes in the buffer object to read from or write
to.</p>
</div>
<div class="paragraph">
<p><em>size</em> is the size in bytes of data being read or written.</p>
</div>
<div class="paragraph">
<p><em>ptr</em> is the pointer to buffer in host memory where data is to be read into
or to be written from.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular read / write
command and can be used to query or queue a wait for this particular command
to complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueReadBuffer</strong> and <strong>clEnqueueWriteBuffer</strong> return CL_SUCCESS if the
function is executed successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
<em>buffer</em> are not the same or if the context associated with
<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the region being read or written specified by
(<em>offset</em>, <em>size</em>) is out of bounds or if <em>ptr</em> is a <code>NULL</code> value.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
<em>offset</em> specified when the sub-buffer object is created is not aligned
to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
operations are blocking and the execution status of any of the events in
<em>event_wait_list</em> is a negative integer value.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if <strong>clEnqueueReadBuffer</strong> is called on <em>buffer</em>
which has been created with CL_MEM_HOST_WRITE_ONLY or
CL_MEM_HOST_NO_ACCESS.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if <strong>clEnqueueWriteBuffer</strong> is called on <em>buffer</em>
which has been created with CL_MEM_HOST_READ_ONLY or
CL_MEM_HOST_NO_ACCESS.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The following functions enqueue commands to read a 2D or 3D rectangular
region from a buffer object to host memory or write a 2D or 3D rectangular
region to a buffer object from host memory.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueReadBufferRect(cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_read,
<span class="directive">const</span> size_t *buffer_origin,
<span class="directive">const</span> size_t *host_origin,
<span class="directive">const</span> size_t *region,
size_t buffer_row_pitch,
size_t buffer_slice_pitch,
size_t host_row_pitch,
size_t host_slice_pitch,
<span class="directive">void</span> *ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueWriteBufferRect(cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_write,
<span class="directive">const</span> size_t *buffer_origin,
<span class="directive">const</span> size_t *host_origin,
<span class="directive">const</span> size_t *region,
size_t buffer_row_pitch,
size_t buffer_slice_pitch,
size_t host_row_pitch,
size_t host_slice_pitch,
<span class="directive">const</span> <span class="directive">void</span> *ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers is a valid host command-queue in which the read /
write command will be queued.
<em>command_queue</em> and <em>buffer</em> must be created with the same OpenCL context.</p>
</div>
<div class="paragraph">
<p><em>buffer</em> refers to a valid buffer object.</p>
</div>
<div class="paragraph">
<p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
operations are <em>blocking</em> or <em>non-blocking</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
<strong>clEnqueueReadBufferRect</strong> does not return until the buffer data has been
read and copied into memory pointed to by <em>ptr</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
<strong>clEnqueueReadBufferRect</strong> queues a non-blocking read command and returns.
The contents of the buffer that <em>ptr</em> points to cannot be used until the
read command has completed.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the read command.
When the read command has completed, the contents of the buffer that <em>ptr</em>
points to can be used by the application.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_write</em> is CL_TRUE, the write command is blocking and does not
return until the command is complete, including transfer of the data.
The memory pointed to by <em>ptr</em> can be reused by the application after the
<strong>clEnqueueWriteBufferRect</strong> call returns.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use <em>ptr</em> to
perform a non-blocking write.
As the write is non-blocking the implementation can return immediately.
The memory pointed to by <em>ptr</em> cannot be reused by the application after the
call returns.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the write command.
When the write command has completed, the memory pointed to by <em>ptr</em> can
then be reused by the application.</p>
</div>
<div class="paragraph">
<p><em>buffer_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
associated with <em>buffer</em>.
For a 2D rectangle region, the <em>z</em> value given by <em>buffer_origin</em>[2] should
be 0.
The offset in bytes is computed as <em>buffer_origin</em>[2] ×
<em>buffer_slice_pitch</em> + <em>buffer_origin</em>[1] × <em>buffer_row_pitch</em> +
<em>buffer_origin</em>[0].</p>
</div>
<div class="paragraph">
<p><em>host_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
pointed to by <em>ptr</em>.
For a 2D rectangle region, the <em>z</em> value given by <em>host_origin</em>[2] should be
0.
The offset in bytes is computed as <em>host_origin</em>[2] ×
<em>host_slice_pitch</em> + <em>host_origin</em>[1] × <em>host_row_pitch</em> +
<em>host_origin</em>[0].</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em> in bytes, <em>height</em> in rows, <em>depth</em> in slices)
of the 2D or 3D rectangle being read or written.
For a 2D rectangle copy, the <em>depth</em> value given by <em>region</em>[2] should be 1.
The values in region cannot be 0.</p>
</div>
<div class="paragraph">
<p><em>buffer_row_pitch</em> is the length of each row in bytes to be used for the
memory region associated with <em>buffer</em>.
If <em>buffer_row_pitch</em> is 0, <em>buffer_row_pitch</em> is computed as <em>region</em>[0].</p>
</div>
<div class="paragraph">
<p><em>buffer_slice_pitch</em> is the length of each 2D slice in bytes to be used for
the memory region associated with <em>buffer</em>.
If <em>buffer_slice_pitch</em> is 0, <em>buffer_slice_pitch</em> is computed as
<em>region</em>[1] × <em>buffer_row_pitch</em>.</p>
</div>
<div class="paragraph">
<p><em>host_row_pitch</em> is the length of each row in bytes to be used for the
memory region pointed to by <em>ptr</em>.
If <em>host_row_pitch</em> is 0, <em>host_row_pitch</em> is computed as <em>region</em>[0].</p>
</div>
<div class="paragraph">
<p><em>host_slice_pitch</em> is the length of each 2D slice in bytes to be used for
the memory region pointed to by <em>ptr</em>.
If <em>host_slice_pitch</em> is 0, <em>host_slice_pitch</em> is computed as <em>region</em>[1]
× <em>host_row_pitch</em>.</p>
</div>
<div class="paragraph">
<p><em>ptr</em> is the pointer to buffer in host memory where data is to be read into
or to be written from.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular read / write
command and can be used to query or queue a wait for this particular command
to complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueReadBufferRect</strong> and <strong>clEnqueueWriteBufferRect</strong> return CL_SUCCESS
if the function is executed successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
<em>buffer</em> are not the same or if the context associated with
<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the region being read or written specified by
(<em>buffer_origin</em>, <em>region</em>, <em>buffer_row_pitch</em>, <em>buffer_slice_pitch</em>) is
out of bounds.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>ptr</em> is a <code>NULL</code> value.</p>
</li>
<li>
<p>CL_INVALID_VALUE if any <em>region</em> array element is 0.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>buffer_row_pitch</em> is not 0 and is less than
<em>region</em>[0].</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>host_row_pitch</em> is not 0 and is less than
<em>region</em>[0].</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>buffer_slice_pitch</em> is not 0 and is less than
<em>region</em>[1] × <em>buffer_row_pitch</em> and not a multiple of
<em>buffer_row_pitch</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>host_slice_pitch</em> is not 0 and is less than
<em>region</em>[1] × <em>host_row_pitch</em> and not a multiple of
<em>host_row_pitch</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
<em>offset</em> specified when the sub-buffer object is created is not aligned
to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
operations are blocking and the execution status of any of the events in
<em>event_wait_list</em> is a negative integer value.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if <strong>clEnqueueReadBufferRect</strong> is called on <em>buffer</em>
which has been created with CL_MEM_HOST_WRITE_ONLY or
CL_MEM_HOST_NO_ACCESS.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if <strong>clEnqueueWriteBufferRect</strong> is called on <em>buffer</em>
which has been created with CL_MEM_HOST_READ_ONLY or
CL_MEM_HOST_NO_ACCESS.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>Calling <strong>clEnqueueReadBuffer</strong> to read a region of the buffer object with the
<em>ptr</em> argument value set to <em>host_ptr</em> + <em>offset</em>, where <em>host_ptr</em> is a
pointer to the memory region specified when the buffer object being read is
created with CL_MEM_USE_HOST_PTR, must meet the following requirements in
order to avoid undefined behavior:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>All commands that use this buffer object or a memory object (buffer or
image) created from this buffer object have finished execution before
the read command begins execution.</p>
</li>
<li>
<p>The buffer object or memory objects created from this buffer object are
not mapped.</p>
</li>
<li>
<p>The buffer object or memory objects created from this buffer object are
not used by any command-queue until the read command has finished
execution.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Calling <strong>clEnqueueReadBufferRect</strong> to read a region of the buffer object with
the <em>ptr</em> argument value set to <em>host_ptr</em> and <em>host_origin</em>,
<em>buffer_origin</em> values are the same, where <em>host_ptr</em> is a pointer to the
memory region specified when the buffer object being read is created with
CL_MEM_USE_HOST_PTR, must meet the same requirements given above for
<strong>clEnqueueReadBuffer</strong>.</p>
</div>
<div class="paragraph">
<p>Calling <strong>clEnqueueWriteBuffer</strong> to update the latest bits in a region of the
buffer object with the <em>ptr</em> argument value set to <em>host_ptr</em> + <em>offset</em>,
where <em>host_ptr</em> is a pointer to the memory region specified when the buffer
object being written is created with CL_MEM_USE_HOST_PTR, must meet the
following requirements in order to avoid undefined behavior:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The host memory region given by (<em>host_ptr</em> + <em>offset</em>, <em>cb</em>) contains
the latest bits when the enqueued write command begins execution.</p>
</li>
<li>
<p>The buffer object or memory objects created from this buffer object are
not mapped.</p>
</li>
<li>
<p>The buffer object or memory objects created from this buffer object are
not used by any command-queue until the write command has finished
execution.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Calling <strong>clEnqueueWriteBufferRect</strong> to update the latest bits in a region of
the buffer object with the <em>ptr</em> argument value set to <em>host_ptr</em> and
<em>host_origin</em>, <em>buffer_origin</em> values are the same, where <em>host_ptr</em> is a
pointer to the memory region specified when the buffer object being written
is created with CL_MEM_USE_HOST_PTR, must meet the following requirements in
order to avoid undefined behavior:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The host memory region given by (<em>buffer_origin region</em>) contains the
latest bits when the enqueued write command begins execution.</p>
</li>
<li>
<p>The buffer object or memory objects created from this buffer object are
not mapped.</p>
</li>
<li>
<p>The buffer object or memory objects created from this buffer object are
not used by any command-queue until the write command has finished
execution.</p>
</li>
</ul>
</div>
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueCopyBuffer(cl_command_queue command_queue,
cl_mem src_buffer,
cl_mem dst_buffer,
size_t src_offset,
size_t dst_offset,
size_t size,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to copy a buffer object identified by <em>src_buffer</em> to
another buffer object identified by <em>dst_buffer</em>.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to a host command-queue in which the copy command
will be queued.
The OpenCL context associated with <em>command_queue</em>, <em>src_buffer</em> and
<em>dst_buffer</em> must be the same.</p>
</div>
<div class="paragraph">
<p><em>src_offset</em> refers to the offset where to begin copying data from
<em>src_buffer</em>.</p>
</div>
<div class="paragraph">
<p><em>dst_offset</em> refers to the offset where to begin copying data into
<em>dst_buffer</em>.</p>
</div>
<div class="paragraph">
<p><em>size</em> refers to the size in bytes to copy.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular copy command
and can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueCopyBuffer</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>,
<em>src_buffer</em> and <em>dst_buffer</em> are not the same or if the context
associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>src_buffer</em> and <em>dst_buffer</em> are not valid
buffer objects.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>src_offset</em>, <em>dst_offset</em>, <em>size</em>, <em>src_offset</em>
+ <em>size</em> or <em>dst_offset</em> + <em>size</em> require accessing elements
outside the <em>src_buffer</em> and <em>dst_buffer</em> buffer objects respectively.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
and <em>offset</em> specified when the sub-buffer object is created is not
aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
with <em>queue</em>.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
and <em>offset</em> specified when the sub-buffer object is created is not
aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
with <em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_COPY_OVERLAP if <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer
or sub-buffer object and the source and destination regions overlap or
if <em>src_buffer</em> and <em>dst_buffer</em> are different sub-buffers of the same
associated buffer object and they overlap.
The regions overlap if <em>src_offset</em><em>dst_offset</em>
<em>src_offset</em> + <em>size</em> 1 or if <em>dst_offset</em><em>src_offset</em>
<em>dst_offset</em> + <em>size</em> 1.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>src_buffer</em> or <em>dst_buffer</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueCopyBufferRect(cl_command_queue command_queue,
cl_mem src_buffer,
cl_mem dst_buffer,
<span class="directive">const</span> size_t *src_origin,
<span class="directive">const</span> size_t *dst_origin,
<span class="directive">const</span> size_t *region,
size_t src_row_pitch,
size_t src_slice_pitch,
size_t dst_row_pitch,
size_t dst_slice_pitch,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to copy a 2D or 3D rectangular region from the buffer
object identified by <em>src_buffer</em> to a 2D or 3D region in the buffer object
identified by <em>dst_buffer</em>.
Copying begins at the source offset and destination offset which are
computed as described below in the description for <em>src_origin</em> and
<em>dst_origin</em>.
Each byte of the region&#8217;s width is copied from the source offset to the
destination offset.
After copying each width, the source and destination offsets are incremented
by their respective source and destination row pitches.
After copying each 2D rectangle, the source and destination offsets are
incremented by their respective source and destination slice pitches.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>If <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer object, <em>src_row_pitch</em>
must equal <em>dst_row_pitch</em> and <em>src_slice_pitch</em> must equal
<em>dst_slice_pitch</em>.</p>
</div>
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to the host command-queue in which the copy command
will be queued.
The OpenCL context associated with <em>command_queue</em>, <em>src_buffer</em> and
<em>dst_buffer</em> must be the same.</p>
</div>
<div class="paragraph">
<p><em>src_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
associated with <em>src_buffer</em>.
For a 2D rectangle region, the <em>z</em> value given by <em>src_origin</em>[2] should be
0.
The offset in bytes is computed as <em>src_origin</em>[2] × <em>src_slice_pitch</em>
+ <em>src_origin</em>[1] × <em>src_row_pitch</em> + <em>src_origin</em>[0].</p>
</div>
<div class="paragraph">
<p><em>dst_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in the memory region
associated with <em>dst_buffer</em>.
For a 2D rectangle region, the <em>z</em> value given by <em>dst_origin</em>[2] should be
0.
The offset in bytes is computed as <em>dst_origin</em>[2] × <em>dst_slice_pitch</em>
+ <em>dst_origin</em>[1] × <em>dst_row_pitch</em> + <em>dst_origin</em>[0].</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em> in bytes, <em>height</em> in rows, <em>depth</em> in slices)
of the 2D or 3D rectangle being copied.
For a 2D rectangle, the <em>depth</em> value given by <em>region</em>[2] should be 1.
The values in region cannot be 0.</p>
</div>
<div class="paragraph">
<p><em>src_row_pitch</em> is the length of each row in bytes to be used for the memory
region associated with <em>src_buffer</em>.
If <em>src_row_pitch</em> is 0, <em>src_row_pitch</em> is computed as <em>region</em>[0].</p>
</div>
<div class="paragraph">
<p><em>src_slice_pitch</em> is the length of each 2D slice in bytes to be used for the
memory region associated with <em>src_buffer</em>.
If <em>src_slice_pitch</em> is 0, <em>src_slice_pitch</em> is computed as <em>region</em>[1]
× <em>src_row_pitch</em>.</p>
</div>
<div class="paragraph">
<p><em>dst_row_pitch</em> is the length of each row in bytes to be used for the memory
region associated with <em>dst_buffer</em>.
If <em>dst_row_pitch</em> is 0, <em>dst_row_pitch</em> is computed as <em>region</em>[0].</p>
</div>
<div class="paragraph">
<p><em>dst_slice_pitch</em> is the length of each 2D slice in bytes to be used for the
memory region associated with <em>dst_buffer</em>.
If <em>dst_slice_pitch</em> is 0, <em>dst_slice_pitch</em> is computed as <em>region</em>[1]
× <em>dst_row_pitch</em>.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular copy command
and can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueCopyBufferRect</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>,
<em>src_buffer</em> and <em>dst_buffer</em> are not the same or if the context
associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>src_buffer</em> and <em>dst_buffer</em> are not valid
buffer objects.</p>
</li>
<li>
<p>CL_INVALID_VALUE if (<em>src_origin, region, src_row_pitch,
src_slice_pitch</em>) or (<em>dst_origin, region, dst_row_pitch,
dst_slice_pitch</em>) require accessing elements outside the <em>src_buffer</em>
and <em>dst_buffer</em> buffer objects respectively.</p>
</li>
<li>
<p>CL_INVALID_VALUE if any <em>region</em> array element is 0.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>src_row_pitch</em> is not 0 and is less than
<em>region</em>[0].</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>dst_row_pitch</em> is not 0 and is less than
<em>region</em>[0].</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>src_slice_pitch</em> is not 0 and is less than
<em>region</em>[1] × <em>src_row_pitch</em> or if <em>src_slice_pitch</em> is not 0 and
is not a multiple of <em>src_row_pitch</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>dst_slice_pitch</em> is not 0 and is less than
<em>region</em>[1] × <em>dst_row_pitch</em> or if <em>dst_slice_pitch</em> is not 0 and
is not a multiple of <em>dst_row_pitch</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer
object and <em>src_slice_pitch</em> is not equal to <em>dst_slice_pitch</em> and
<em>src_row_pitch</em> is not equal to <em>dst_row_pitch</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MEM_COPY_OVERLAP if <em>src_buffer</em> and <em>dst_buffer</em> are the same buffer
or sub-buffer object and the source and destination regions overlap or
if <em>src_buffer</em> and <em>dst_buffer</em> are different sub-buffers of the same
associated buffer object and they overlap.
Refer to <a href="#check-copy-overlap">CL_MEM_COPY_OVERLAP</a> for details on how
to determine if source and destination regions overlap.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
and <em>offset</em> specified when the sub-buffer object is created is not
aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
with <em>queue</em>.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
and <em>offset</em> specified when the sub-buffer object is created is not
aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
with <em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>src_buffer</em> or <em>dst_buffer</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="_filling_buffer_objects">5.2.3. Filling Buffer Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueFillBuffer(cl_command_queue command_queue,
cl_mem buffer,
<span class="directive">const</span> <span class="directive">void</span> *pattern,
size_t pattern_size,
size_t offset,
size_t size,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to fill a buffer object with a pattern of a given pattern
size.
The usage information which indicates whether the memory object can be read
or written by a kernel and/or the host and is given by the cl_mem_flags
argument value specified when <em>buffer</em> is created is ignored by
<strong>clEnqueueFillBuffer</strong>.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to the host command-queue in which the fill command
will be queued.
The OpenCL context associated with <em>command_queue</em> and <em>buffer</em> must be the
same.</p>
</div>
<div class="paragraph">
<p><em>buffer</em> is a valid buffer object.</p>
</div>
<div class="paragraph">
<p><em>pattern</em> is a pointer to the data pattern of size <em>pattern_size</em> in bytes.
<em>pattern</em> will be used to fill a region in <em>buffer</em> starting at <em>offset</em> and
is <em>size</em> bytes in size.
The data pattern must be a scalar or vector integer or floating-point data
type supported by OpenCL as described in <a href="#scalar-data-types">Shared
Application Scalar Data Types</a> and <a href="#vector-data-types">Supported
Application Vector Data Types</a>.
For example, if <em>buffer</em> is to be filled with a pattern of <code>float4</code> values,
then <em>pattern</em> will be a pointer to a <code>cl_float4</code> value and <em>pattern_size</em>
will be <code>sizeof(cl_float4)</code>.
The maximum value of <em>pattern_size</em> is the size of the largest integer or
floating-point vector data type supported by the OpenCL device.
The memory associated with <em>pattern</em> can be reused or freed after the
function returns.</p>
</div>
<div class="paragraph">
<p><em>offset</em> is the location in bytes of the region being filled in <em>buffer</em> and
must be a multiple of <em>pattern_size</em>.</p>
</div>
<div class="paragraph">
<p><em>size</em> is the size in bytes of region being filled in <em>buffer</em> and must be a
multiple of <em>pattern_size</em>.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueFillBuffer</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
<em>buffer</em> are not the same or if the context associated with
<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>offset</em> or <em>offset</em> + <em>size</em> require accessing
elements outside the <em>buffer</em> buffer object respectively.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>pattern</em> is <code>NULL</code> or if <em>pattern_size</em> is 0 or if
<em>pattern_size</em> is not one of { 1, 2, 4, 8, 16, 32, 64, 128 }.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>offset</em> and <em>size</em> are not a multiple of
<em>pattern_size</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
offset specified when the sub-buffer object is created is not aligned to
CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>buffer</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="_mapping_buffer_objects">5.2.4. Mapping Buffer Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c"><span class="directive">void</span> clEnqueueMapBuffer(cl_command_queue command_queue,
cl_mem buffer,
cl_bool blocking_map,
cl_map_flags map_flags,
size_t offset,
size_t size,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to map a region of the buffer object given by <em>buffer</em>
into the host address space and returns a pointer to this mapped region.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> must be a valid host command-queue.</p>
</div>
<div class="paragraph">
<p><em>blocking_map</em> indicates if the map operation is <em>blocking</em> or
<em>non-blocking</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_map</em> is CL_TRUE, <strong>clEnqueueMapBuffer</strong> does not return until the
specified region in <em>buffer</em> is mapped into the host address space and the
application can access the contents of the mapped region using the pointer
returned by <strong>clEnqueueMapBuffer</strong>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_map</em> is CL_FALSE i.e. map operation is non-blocking, the
pointer to the mapped region returned by <strong>clEnqueueMapBuffer</strong> cannot be used
until the map command has completed.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the map command.
When the map command is completed, the application can access the contents
of the mapped region using the pointer returned by <strong>clEnqueueMapBuffer</strong>.</p>
</div>
<div class="paragraph">
<p><em>map_flags</em> is a bit-field and is described in the
<a href="#memory-map-flags-table">Memory Map Flags</a> table.</p>
</div>
<div class="paragraph">
<p><em>buffer</em> is a valid buffer object.
The OpenCL context associated with <em>command_queue</em> and <em>buffer</em> must be the
same.</p>
</div>
<div class="paragraph">
<p><em>offset</em> and <em>size</em> are the offset in bytes and the size of the region in
the buffer object that is being mapped.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueMapBuffer</strong> will return a pointer to the mapped region.
The <em>errcode_ret</em> is set to CL_SUCCESS.</p>
</div>
<div class="paragraph">
<p>A <code>NULL</code> pointer is returned otherwise with one of the following error
values returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if context associated with <em>command_queue</em> and
<em>buffer</em> are not the same or if the context associated with
<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if region being mapped given by (<em>offset</em>, <em>size</em>) is
out of bounds or if <em>size</em> is 0 or if values specified in <em>map_flags</em>
are not valid.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>buffer</em> is a sub-buffer object and
<em>offset</em> specified when the sub-buffer object is created is not aligned
to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for the device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_MAP_FAILURE if there is a failure to map the requested region into
the host address space.
This error cannot occur for buffer objects created with
CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR.</p>
</li>
<li>
<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the map operation is
blocking and the execution status of any of the events in
<em>event_wait_list</em> is a negative integer value.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if buffer_ has been created with
CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_READ is set
in <em>map_flags</em> or if <em>buffer</em> has been created with
CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_WRITE or
CL_MAP_WRITE_INVALIDATE_REGION is set in <em>map_flags</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if mapping would lead to overlapping regions being
mapped for writing.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The pointer returned maps a region starting at <em>offset</em> and is at least
<em>size</em> bytes in size.
The result of a memory access outside this region is undefined.</p>
</div>
<div class="paragraph">
<p>If the buffer object is created with CL_MEM_USE_HOST_PTR set in <em>mem_flags</em>,
the following will be true:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The <em>host_ptr</em> specified in <strong>clCreateBuffer</strong> to contain the latest bits
in the region being mapped when the <strong>clEnqueueMapBuffer</strong> command has
completed.</p>
</li>
<li>
<p>The pointer value returned by <strong>clEnqueueMapBuffer</strong> will be derived from
the <em>host_ptr</em> specified when the buffer object is created.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Mapped buffer objects are unmapped using <strong>clEnqueueUnmapMemObject</strong>.
This is described in <a href="#unmapping-mapped-memory">Unmapping Mapped Memory
Objects</a>.</p>
</div>
<table id="memory-map-flags-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 13. List of supported cl_map_flags values</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_map_flags</strong></th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MAP_READ</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the region being mapped in the memory object is
being mapped for reading.
</p><p class="tableblock"> The pointer returned by <strong>clEnqueueMapBuffer</strong> (<strong>clEnqueueMapImage</strong>) is
guaranteed to contain the latest bits in the region being mapped when
the <strong>clEnqueueMapBuffer</strong> (<strong>clEnqueueMapImage</strong>) command has completed.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MAP_WRITE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the region being mapped in the memory object is
being mapped for writing.
</p><p class="tableblock"> The pointer returned by <strong>clEnqueueMapBuffer</strong> (<strong>clEnqueueMapImage</strong>) is
guaranteed to contain the latest bits in the region being mapped when
the <strong>clEnqueueMapBuffer</strong> (<strong>clEnqueueMapImage</strong>) command has completed</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MAP_WRITE_INVALIDATE_REGION</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the region being mapped in the memory object is
being mapped for writing.
</p><p class="tableblock"> The contents of the region being mapped are to be discarded.
This is typically the case when the region being mapped is overwritten
by the host.
This flag allows the implementation to no longer guarantee that the
pointer returned by <strong>clEnqueueMapBuffer</strong> (<strong>clEnqueueMapImage</strong>) contains
the latest bits in the region being mapped which can be a significant
performance enhancement.
</p><p class="tableblock"> CL_MAP_READ or CL_MAP_WRITE and CL_MAP_WRITE_INVALIDATE_REGION are
mutually exclusive.</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect2">
<h3 id="_image_objects">5.3. Image Objects</h3>
<div class="paragraph">
<p>An <em>image</em> object is used to store a one-, two- or three-dimensional
texture, frame-buffer or image.
The elements of an image object are selected from a list of predefined image
formats.
The minimum number of elements in a memory object is one.</p>
</div>
<div class="sect3">
<h4 id="_creating_image_objects">5.3.1. Creating Image Objects</h4>
<div class="paragraph">
<p>A <strong>1D image</strong>, <strong>1D image buffer</strong>, <strong>1D image array</strong>, <strong>2D image</strong>, <strong>2D image
array</strong> and <strong>3D image object</strong> can be created using the following function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_mem clCreateImage(cl_context context,
cl_mem_flags flags,
<span class="directive">const</span> cl_image_format *image_format,
<span class="directive">const</span> cl_image_desc *image_desc,
<span class="directive">void</span> *host_ptr,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>context</em> is a valid OpenCL context on which the image object is to be
created.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify allocation and usage
information about the image memory object being created and is described in
the <a href="#memory-flags-table">Memory Flags</a> table.</p>
</div>
<div class="paragraph">
<p>For all image types except CL_MEM_OBJECT_IMAGE1D_BUFFER, if value specified
for <em>flags</em> is 0, the default is used which is CL_MEM_READ_WRITE.</p>
</div>
<div class="paragraph">
<p>For CL_MEM_OBJECT_IMAGE1D_BUFFER image type, or an image created from
another memory object (image or buffer), if the CL_MEM_READ_WRITE,
CL_MEM_READ_ONLY or CL_MEM_WRITE_ONLY values are not specified in <em>flags</em>,
they are inherited from the corresponding memory access qualifers associated
with <em>mem_object</em>.
The CL_MEM_USE_HOST_PTR, CL_MEM_ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR
values cannot be specified in <em>flags</em> but are inherited from the
corresponding memory access qualifiers associated with <em>mem_object</em>.
If CL_MEM_COPY_HOST_PTR is specified in the memory access qualifier values
associated with <em>mem_object</em> it does not imply any additional copies when
the image is created from <em>mem_object</em>.
If the CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY or
CL_MEM_HOST_NO_ACCESS values are not specified in <em>flags</em>, they are
inherited from the corresponding memory access qualifiers associated with
<em>mem_object</em>.</p>
</div>
<div class="paragraph">
<p><em>image_format</em> is a pointer to a structure that describes format properties
of the image to be allocated.
A 1D image buffer or 2D image can be created from a buffer by specifying a
buffer object in the <em>image_desc&#8594;mem_object</em>.
A 2D image can be created from another 2D image object by specifying an
image object in the <em>image_desc</em>&#8594;_mem_object_.
Refer to <a href="#image-format-descriptor">Image Format Descriptor</a> for a detailed
description of the image format descriptor.</p>
</div>
<div class="paragraph">
<p><em>image_desc</em> is a pointer to a structure that describes type and dimensions
of the image to be allocated.
Refer to <a href="#image-descriptor">Image Descriptor</a> for a detailed description
of the image descriptor.</p>
</div>
<div class="paragraph">
<p><em>host_ptr</em> is a pointer to the image data that may already be allocated by
the application.
It is only used to initialize the image, and can be freed after the call to
<strong>clCreateImage</strong>.
Refer to table below for a description of how large the buffer that
<em>host_ptr</em> points to must be.</p>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Image Type</th>
<th class="tableblock halign-left valign-top">Size of buffer that <em>host_ptr</em> points to</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE1D</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">≥ image_row_pitch</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE1D_BUFFER</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">≥ image_row_pitch</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE2D</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">≥ image_row_pitch × image_height</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE3D</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">≥ image_slice_pitch × image_depth</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE1D_ARRAY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">≥ image_slice_pitch × image_array_size</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_OBJECT_IMAGE2D_ARRAY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">≥ image_slice_pitch × image_array_size</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>For a 3D image or 2D image array, the image data specified by <em>host_ptr</em> is
stored as a linear sequence of adjacent 2D image slices or 2D images
respectively.
Each 2D image is a linear sequence of adjacent scanlines.
Each scanline is a linear sequence of image elements.</p>
</div>
<div class="paragraph">
<p>For a 2D image, the image data specified by <em>host_ptr</em> is stored as a linear
sequence of adjacent scanlines.
Each scanline is a linear sequence of image elements.</p>
</div>
<div class="paragraph">
<p>For a 1D image array, the image data specified by <em>host_ptr</em> is stored as a
linear sequence of adjacent 1D images.
Each 1D image is stored as a single scanline which is a linear sequence of
adjacent elements.</p>
</div>
<div class="paragraph">
<p>For 1D image or 1D image buffer, the image data specified by <em>host_ptr</em> is
stored as a single scanline which is a linear sequence of adjacent elements.</p>
</div>
<div class="paragraph">
<p>Image elements are stored according to their image format as described in
<a href="#image-format-descriptor">Image Format Descriptor</a>.</p>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clCreateImage</strong> returns a valid non-zero image object created and the
<em>errcode_ret</em> is set to CL_SUCCESS if the image object is created
successfully.
Otherwise, it returns a <code>NULL</code> value with one of the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values specified in <em>flags</em> are not valid.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if values specified in <em>image_format</em>
are not valid or if <em>image_format</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if a 2D image is created from a
buffer and the row pitch and base address alignment does not follow the
rules described for creating a 2D image from a buffer.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_FORMAT_DESCRIPTOR if a 2D image is created from a 2D
image object and the rules described above are not followed.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_DESCRIPTOR if values specified in <em>image_desc</em> are not
valid or if <em>image_desc</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_SIZE if image dimensions specified in <em>image_desc</em>
exceed the maximum image dimensions described in the
<a href="#device-queries-table">Device Queries</a> table for all devices
in_context_.</p>
</li>
<li>
<p>CL_INVALID_HOST_PTR if <em>host_ptr</em> is <code>NULL</code> and CL_MEM_USE_HOST_PTR or
CL_MEM_COPY_HOST_PTR are set in <em>flags</em> or if <em>host_ptr</em> is not <code>NULL</code>
but CL_MEM_COPY_HOST_PTR or CL_MEM_USE_HOST_PTR are not set in <em>flags</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if an image is being created from another memory object
(buffer or image) under one of the following circumstances: 1)
<em>mem_object</em> was created with CL_MEM_WRITE_ONLY and <em>flags</em> specifies
CL_MEM_READ_WRITE or CL_MEM_READ_ONLY, 2) <em>mem_object</em> was created with
CL_MEM_READ_ONLY and <em>flags</em> specifies CL_MEM_READ_WRITE or
CL_MEM_WRITE_ONLY, 3) <em>flags</em> specifies CL_MEM_USE_HOST_PTR or
CL_MEM_ALLOC_HOST_PTR or CL_MEM_COPY_HOST_PTR.</p>
</li>
<li>
<p>CL_INVALID_VALUE if an image is being created from another memory object
(buffer or image) and <em>mem_object</em> object was created with
CL_MEM_HOST_WRITE_ONLY and <em>flags</em> specifies CL_MEM_HOST_READ_ONLY, or
if <em>mem_object</em> was created with CL_MEM_HOST_READ_ONLY and <em>flags</em>
specifies CL_MEM_HOST_WRITE_ONLY, or if <em>mem_object</em> was created with
CL_MEM_HOST_NO_ACCESS and_flags_ specifies CL_MEM_HOST_READ_ONLY or
CL_MEM_HOST_WRITE_ONLY.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_NOT_SUPPORTED if the <em>image_format</em> is not supported.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for image object.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if there are no devices in <em>context</em> that support
images (i.e. CL_DEVICE_IMAGE_SUPPORT specified in the
<a href="#device-queries-table">Device Queries</a> table is CL_FALSE).</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="sect4">
<h5 id="image-format-descriptor">Image Format Descriptor</h5>
<div class="paragraph">
<p>The image format descriptor structure is defined as</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> <span class="keyword">struct</span> cl_image_format {
cl_channel_order image_channel_order;
cl_channel_type image_channel_data_type;
} cl_image_format;</code></pre>
</div>
</div>
<div class="paragraph">
<p><code>image_channel_order</code> specifies the number of channels and the channel
layout i.e. the memory layout in which channels are stored in the image.
Valid values are described in the <a href="#image-channel-order-table">Image Channel
Order</a> table.</p>
</div>
<div class="paragraph">
<p><code>image_channel_data_type</code> describes the size of the channel data type.
The list of supported values is described in the
<a href="#image-channel-data-types-table">Image Channel Data Types</a> table.
The number of bits per element determined by the <code>image_channel_data_type</code>
and <code>image_channel_order</code> must be a power of two.</p>
</div>
<table id="image-channel-order-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 14. List of supported Image Channel Order Values</caption>
<colgroup>
<col style="width: 100%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Enum values that can be specified in channel_order</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_R</strong>, <strong>CL_Rx</strong> or <strong>CL_A</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_INTENSITY</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_LUMINANCE</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_DEPTH</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_RG</strong>, <strong>CL_RGx</strong> or <strong>CL_RA</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_RGB</strong> or <strong>CL_RGBx</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_RGBA</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_sRGB</strong>, <strong>CL_sRGBx</strong>, <strong>CL_sRGBA</strong>, or <strong>CL_sBGRA</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_ARGB</strong>, <strong>CL_BGRA</strong>, or <strong>CL_ABGR</strong></p></td>
</tr>
</tbody>
</table>
<table id="image-channel-data-types-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 15. List of supported Image Channel Data Types</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Image Channel Data Type</th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SNORM_INT8</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is a normalized signed 8-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SNORM_INT16</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is a normalized signed 16-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNORM_INT8</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is a normalized unsigned 8-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNORM_INT16</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is a normalized unsigned 16-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNORM_SHORT_565</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Represents a normalized 5-6-5 3-channel RGB image.
The channel order must be CL_RGB or CL_RGBx.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNORM_SHORT_555</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Represents a normalized x-5-5-5 4-channel xRGB image.
The channel order must be CL_RGB or CL_RGBx.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNORM_INT_101010</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Represents a normalized x-10-10-10 4-channel xRGB image.
The channel order must be CL_RGB or CL_RGBx.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNORM_INT_101010_2</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Represents a normalized 10-10-10-2 four-channel RGBA image.
The channel order must be CL_RGBA.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SIGNED_INT8</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is an unnormalized signed 8-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SIGNED_INT16</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is an unnormalized signed 16-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SIGNED_INT32</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is an unnormalized signed 32-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNSIGNED_INT8</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is an unnormalized unsigned 8-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNSIGNED_INT16</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is an unnormalized unsigned 16-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_UNSIGNED_INT32</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is an unnormalized unsigned 32-bit integer value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_HALF_FLOAT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is a 16-bit half-float value</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_FLOAT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Each channel component is a single precision floating-point value</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>For example, to specify a normalized unsigned 8-bit / channel RGBA image,
<code>image_channel_order</code> = CL_RGBA, and <code>image_channel_data_type</code> =
CL_UNORM_INT8.
The memory layout of this image format is described below:</p>
</div>
<table class="tableblock frame-all grid-all" style="width: 60%;">
<colgroup>
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 60%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">R</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">G</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">B</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">A</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">&#8230;&#8203;</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>with the corresponding byte offsets</p>
</div>
<table class="tableblock frame-all grid-all" style="width: 60%;">
<colgroup>
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 60%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">0</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">2</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">3</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">&#8230;&#8203;</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>Similar, if <code>image_channel_order</code> = CL_RGBA and <code>image_channel_data_type</code> =
CL_SIGNED_INT16, the memory layout of this image format is described below:</p>
</div>
<table class="tableblock frame-all grid-all" style="width: 60%;">
<colgroup>
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 60%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">R</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">G</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">B</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">A</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">&#8230;&#8203;</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>with the corresponding byte offsets</p>
</div>
<table class="tableblock frame-all grid-all" style="width: 60%;">
<colgroup>
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 10%;">
<col style="width: 60%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">0</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">2</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">4</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">6</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">&#8230;&#8203;</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><code>image_channel_data_type</code> values of CL_UNORM_SHORT_565, CL_UNORM_SHORT_555,
CL_UNORM_INT_101010 and CL_UNORM_INT_101010_2 are special cases of packed
image formats where the channels of each element are packed into a single
unsigned short or unsigned int.
For these special packed image formats, the channels are normally packed
with the first channel in the most significant bits of the bitfield, and
successive channels occupying progressively less significant locations.
For CL_UNORM_SHORT_565, R is in bits 15:11, G is in bits 10:5 and B is in
bits 4:0.
For CL_UNORM_SHORT_555, bit 15 is undefined, R is in bits 14:10, G in bits
9:5 and B in bits 4:0.
For CL_UNORM_INT_101010, bits 31:30 are undefined, R is in bits 29:20, G in
bits 19:10 and B in bits 9:0.
For CL_UNORM_INT_101010_2, R is in bits 31:22, G in bits 21:12, B in bits
11:2 and A in bits 1:0.</p>
</div>
<div class="paragraph">
<p>OpenCL implementations must maintain the minimum precision specified by the
number of bits in <code>image_channel_data_type</code>.
If the image format specified by <code>image_channel_order</code>, and
<code>image_channel_data_type</code> cannot be supported by the OpenCL implementation,
then the call to <strong>clCreateImage</strong> will return a <code>NULL</code> memory object.</p>
</div>
</div>
<div class="sect4">
<h5 id="image-descriptor">Image Descriptor</h5>
<div class="paragraph">
<p>The image descriptor structure describes the type and dimensions of the
image or image array and is defined as:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> <span class="keyword">struct</span> cl_image_desc {
cl_mem_object_type image_type,
size_t image_width;
size_t image_height;
size_t image_depth;
size_t image_array_size;
size_t image_row_pitch;
size_t image_slice_pitch;
cl_uint num_mip_levels;
cl_uint num_samples;
cl_mem mem_object;
} cl_image_desc;</code></pre>
</div>
</div>
<div class="paragraph">
<p><code>image_type</code> describes the image type and must be either
CL_MEM_OBJECT_IMAGE1D, CL_MEM_OBJECT_IMAGE1D_BUFFER,
CL_MEM_OBJECT_IMAGE1D_ARRAY, CL_MEM_OBJECT_IMAGE2D,
CL_MEM_OBJECT_IMAGE2D_ARRAY or CL_MEM_OBJECT_IMAGE3D.</p>
</div>
<div class="paragraph">
<p><code>image_width</code> is the width of the image in pixels.
For a 2D image and image array, the image width must be a value ≥ 1 and
≤ CL_DEVICE_IMAGE2D_MAX_WIDTH.
For a 3D image, the image width must be a value ≥1 and ≤
CL_DEVICE_IMAGE3D_MAX_WIDTH.
For a 1D image buffer, the image width must be a value ≥1 and ≤
CL_DEVICE_IMAGE_MAX_BUFFER_SIZE.
For a 1D image and 1D image array, the image width must be a value ≥1
and ≤ CL_DEVICE_IMAGE2D_MAX_WIDTH.</p>
</div>
<div class="paragraph">
<p><code>image_height</code> is height of the image in pixels.
This is only used if the image is a 2D or 3D image, or a 2D image array.
For a 2D image or image array, the image height must be a value ≥ 1 and
≤ CL_DEVICE_IMAGE2D_MAX_HEIGHT.
For a 3D image, the image height must be a value ≥ 1 and ≤
CL_DEVICE_IMAGE3D_MAX_HEIGHT.</p>
</div>
<div class="paragraph">
<p><code>image_depth</code> is the depth of the image in pixels.
This is only used if the image is a 3D image and must be a value ≥ 1 and
≤ CL_DEVICE_IMAGE3D_MAX_DEPTH.</p>
</div>
<div class="paragraph">
<p><code>image_array_size</code><sup>5</sup> is the number of images in the image array.
This is only used if the image is a 1D or 2D image array.
The values for <code>image_array_size</code>, if specified, must be a value ≥ 1 and
≤ CL_DEVICE_IMAGE_MAX_ARRAY_SIZE.</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">5</dt>
<dd>
<p>Note that reading and writing 2D image arrays from a kernel with
<code>image_array_size</code>=1 may be lower performance than 2D images.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p><code>image_row_pitch</code> is the scan-line pitch in bytes.
This must be 0 if <em>host_ptr</em> is <code>NULL</code> and can be either 0 or ≥
<code>image_width</code> × size of element in bytes if <em>host_ptr</em> is not <code>NULL</code>.
If <em>host_ptr</em> is not <code>NULL</code> and <code>image_row_pitch</code> = 0, <code>image_row_pitch</code> is
calculated as <code>image_width</code> × size of element in bytes.
If <code>image_row_pitch</code> is not 0, it must be a multiple of the image element
size in bytes.
For a 2D image created from a buffer, the pitch specified (or computed if
pitch specified is 0) must be a multiple of the maximum of the
CL_DEVICE_IMAGE_PITCH_ALIGNMENT value for all devices in the context
associated with <code>image_desc</code>&#8594;`mem_object` and that support images.</p>
</div>
<div class="paragraph">
<p><code>image_slice_pitch</code> is the size in bytes of each 2D slice in the 3D image or
the size in bytes of each image in a 1D or 2D image array.
This must be 0 if <em>host_ptr</em> is <code>NULL</code>.
If <em>host_ptr</em> is not <code>NULL</code>, <code>image_slice_pitch</code> can be either 0 or ≥
<code>image_row_pitch</code> × <code>image_height</code> for a 2D image array or 3D image
and can be either 0 or ≥ <code>image_row_pitch</code> for a 1D image array.
If <em>host_ptr</em> is not <code>NULL</code> and <code>image_slice_pitch</code> = 0, <code>image_slice_pitch</code>
is calculated as <code>image_row_pitch</code> × <code>image_height</code> for a 2D image
array or 3D image and <code>image_row_pitch</code> for a 1D image array.
If <code>image_slice_pitch</code> is not 0, it must be a multiple of the
<code>image_row_pitch</code>.</p>
</div>
<div class="paragraph">
<p>num_mip_levels and num_samples must be 0.</p>
</div>
<div class="paragraph">
<p>mem_object may refer to a valid buffer or image memory object.
mem_object can be a buffer memory object if <code>image_type</code> is
CL_MEM_OBJECT_IMAGE1D_BUFFER or CL_MEM_OBJECT_IMAGE2D<sup>6</sup>.
mem_object can be an image object if <code>image_type</code> is
CL_MEM_OBJECT_IMAGE2D<sup>7</sup>.
Otherwise it must be <code>NULL</code>.
The image pixels are taken from the memory objects data store.
When the contents of the specified memory objects data store are modified,
those changes are reflected in the contents of the image object and
vice-versa at corresponding synchronization points.</p>
</div>
<div class="dlist">
<dl>
<dt class="hdlist1">6</dt>
<dd>
<p>To create a 2D image from a buffer object that share the data store
between the image and buffer object.</p>
</dd>
<dt class="hdlist1">7</dt>
<dd>
<p>To create an image object from another image object that share the data
store between these image objects.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>For a 1D image buffer create from a buffer object, the <code>image_width</code> ×
size of element in bytes must be ≤ size of the buffer object.
The image data in the buffer object is stored as a single scanline which is
a linear sequence of adjacent elements.</p>
</div>
<div class="paragraph">
<p>For a 2D image created from a buffer object, the <code>image_row_pitch</code> ×
<code>image_height</code> must be ≤ size of the buffer object specified by
mem_object.
The image data in the buffer object is stored as a linear sequence of
adjacent scanlines.
Each scanline is a linear sequence of image elements padded to
<code>image_row_pitch</code> bytes.</p>
</div>
<div class="paragraph">
<p>For an image object created from another image object, the values specified
in the image descriptor except for mem_object must match the image
descriptor information associated with mem_object.</p>
</div>
<div class="paragraph">
<p>Image elements are stored according to their image format as described in
<a href="#image-format-descriptor">Image Format Descriptor</a>.</p>
</div>
<div class="paragraph">
<p>If the buffer object specified by mem_object is created with
CL_MEM_USE_HOST_PTR, the <em>host_ptr</em> specified to <strong>clCreateBuffer</strong> must be
aligned to the minimum of the <strong>CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT</strong> value
for all devices in the context associated with the buffer specified by
mem_object and that support images.</p>
</div>
<div class="paragraph">
<p>Creating a 2D image object from another 2D image object allows users to
create a new image object that shares the image data store with mem_object
but views the pixels in the image with a different channel order.
The restrictions are:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>all the values specified in `image_desc except for mem_object must match
the image descriptor information associated with mem_object.</p>
</li>
<li>
<p>The <em>`image_desc</em> used for creation of <em>mem_object</em> may not be
equivalent to image descriptor information associated with mem_object.
To ensure the values in <em>`image_desc</em> will match one can query
mem_object for associated information using <strong>clGetImageInfo</strong> function
described in <a href="#image-object-queries">Image Object Queries</a>.</p>
</li>
<li>
<p>the channel data type specified in <code>image_format</code> must match the channel
data type associated with mem_object.
The channel order values<sup>8</sup> supported are:</p>
<div class="openblock">
<div class="content">
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><code>image_channel_order</code> specified in <code>image_format</code></th>
<th class="tableblock halign-left valign-top">image channel order of mem_object</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sBGRA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_BGRA</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_BGRA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sBGRA</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sRGBA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGBA</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGBA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sRGBA</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sRGB</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGB</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGB</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sRGB</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sRGBx</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGBx</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGBx</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sRGBx</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEPTH</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_R</p></td>
</tr>
</tbody>
</table>
</div>
</div>
</li>
<li>
<p>the channel order specified must have the same number of channels as the
channel order of mem_object.</p>
<div class="dlist">
<dl>
<dt class="hdlist1">8</dt>
<dd>
<p>This allows developers to create a sRGB view of the image from a linear
RGB view or vice-versa i.e. the pixels stored in the image can be
accessed as linear RGB or sRGB values.</p>
</dd>
</dl>
</div>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>Concurrent reading from, writing to and copying between both a buffer object
and 1D image buffer or 2D image object associated with the buffer object is
undefined.
Only reading from both a buffer object and 1D image buffer or 2D image
object associated with the buffer object is defined.</p>
</div>
<div class="paragraph">
<p>Writing to an image created from a buffer and then reading from this buffer
in a kernel even if appropriate synchronization operations (such as a
barrier) are performed between the writes and reads is undefined.
Similarly, writing to the buffer and reading from the image created from
this buffer with appropriate synchronization between the writes and reads is
undefined.</p>
</div>
</td>
</tr>
</table>
</div>
</div>
</div>
<div class="sect3">
<h4 id="_querying_list_of_supported_image_formats">5.3.2. Querying List of Supported Image Formats</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetSupportedImageFormats(cl_context context,
cl_mem_flags flags,
cl_mem_object_type image_type,
cl_uint num_entries,
cl_image_format *image_formats,
cl_uint *num_image_formats)</code></pre>
</div>
</div>
<div class="paragraph">
<p>can be used to get the list of image formats supported by an OpenCL
implementation when the following information about an image memory object
is specified:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Context</p>
</li>
<li>
<p>Image type 1D, 2D, or 3D image, 1D image buffer, 1D or 2D image array.</p>
</li>
<li>
<p>Image object allocation information</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clGetSupportedImageFormats</strong> returns a union of image formats supported by
all devices in the context.</p>
</div>
<div class="paragraph">
<p><em>context</em> is a valid OpenCL context on which the image object(s) will be
created.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify allocation and usage
information about the image memory object being queried and is described in
the <a href="#memory-flags-table">Memory Flags</a> table.
To get a list of supported image formats that can be read from or written to
by a kernel, <em>flags</em> must be set to CL_MEM_READ_WRITE (get a list of images
that can be read from and written to by different kernel instances when
correctly ordered by event dependencies), CL_MEM_READ_ONLY (list of images
that can be read from by a kernel) or CL_MEM_WRITE_ONLY (list of images that
can be written to by a kernel).
To get a list of supported image formats that can be both read from and
written to by the same kernel instance, <em>flags</em> must be set to
CL_MEM_KERNEL_READ_AND_WRITE.
Please see <a href="#image-format-mapping">Image Format Mapping</a> for clarification.</p>
</div>
<div class="paragraph">
<p><em>image_type</em> describes the image type and must be either
CL_MEM_OBJECT_IMAGE1D, CL_MEM_OBJECT_IMAGE1D_BUFFER, CL_MEM_OBJECT_IMAGE2D,
CL_MEM_OBJECT_IMAGE3D, CL_MEM_OBJECT_IMAGE1D_ARRAY or
CL_MEM_OBJECT_IMAGE2D_ARRAY.</p>
</div>
<div class="paragraph">
<p><em>num_entries</em> specifies the number of entries that can be returned in the
memory location given by <em>image_formats</em>.</p>
</div>
<div class="paragraph">
<p><em>image_formats</em> is a pointer to a memory location where the list of
supported image formats are returned.
Each entry describes a <em>cl_image_format</em> structure supported by the OpenCL
implementation.
If <em>image_formats</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>num_image_formats</em> is the actual number of supported image formats for a
specific <em>context</em> and values specified by <em>flags</em>.
If <em>num_image_formats</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><strong>clGetSupportedImageFormats</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>flags</em> or <em>image_type</em> are not valid, or if
<em>num_entries</em> is 0 and <em>image_formats</em> is not <code>NULL</code>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If CL_DEVICE_IMAGE_SUPPORT specified in the <a href="#device-queries-table">Device
Queries</a> table is CL_TRUE, the values assigned to
CL_DEVICE_MAX_READ_IMAGE_ARGS, CL_DEVICE_MAX_WRITE_IMAGE_ARGS,
CL_DEVICE_IMAGE2D_MAX_WIDTH, CL_DEVICE_IMAGE2D_MAX_HEIGHT,
CL_DEVICE_IMAGE3D_MAX_WIDTH, CL_DEVICE_IMAGE3D_MAX_HEIGHT,
CL_DEVICE_IMAGE3D_MAX_DEPTH and CL_DEVICE_MAX_SAMPLERS by the implementation
must be greater than or equal to the minimum values specified in the
<a href="#device-queries-table">Device Queries</a> table.</p>
</div>
<div class="sect4">
<h5 id="_minimum_list_of_supported_image_formats">Minimum List of Supported Image Formats</h5>
<div class="paragraph">
<p>For 1D, 1D image from buffer, 2D, 3D image objects, 1D and 2D image array
objects, the mandated minimum list of image formats that can be read from
and written to by different kernel instances when correctly ordered by event
dependencies and that must be supported by all devices that support images
is described in the <a href="#min-supported-cross-kernel-table">Supported Formats -
Kernel Read Or Write</a> table.</p>
</div>
<table id="min-supported-cross-kernel-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 16. Min. list of supported image formats kernel read or write</caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">num_channels</th>
<th class="tableblock halign-left valign-top">channel_order</th>
<th class="tableblock halign-left valign-top">channel_data_type</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_R</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT8<br>
CL_UNORM_INT16<br>
CL_SNORM_INT8<br>
CL_SNORM_INT16<br>
CL_SIGNED_INT8<br>
CL_SIGNED_INT16<br>
CL_SIGNED_INT32<br>
CL_UNSIGNED_INT8<br>
CL_UNSIGNED_INT16<br>
CL_UNSIGNED_INT32<br>
CL_HALF_FLOAT<br>
CL_FLOAT</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_DEPTH<sup>9</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT16<br>
CL_FLOAT</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">2</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RG</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT8<br>
CL_UNORM_INT16<br>
CL_SNORM_INT8<br>
CL_SNORM_INT16<br>
CL_SIGNED_INT8<br>
CL_SIGNED_INT16<br>
CL_SIGNED_INT32<br>
CL_UNSIGNED_INT8<br>
CL_UNSIGNED_INT16<br>
CL_UNSIGNED_INT32<br>
CL_HALF_FLOAT<br>
CL_FLOAT</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">4</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGBA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT8<br>
CL_UNORM_INT16<br>
CL_SNORM_INT8<br>
CL_SNORM_INT16<br>
CL_SIGNED_INT8<br>
CL_SIGNED_INT16<br>
CL_SIGNED_INT32<br>
CL_UNSIGNED_INT8<br>
CL_UNSIGNED_INT16<br>
CL_UNSIGNED_INT32<br>
CL_HALF_FLOAT<br>
CL_FLOAT</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">4</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_BGRA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT8</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">4</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_sRGBA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT8<sup>10</sup></p></td>
</tr>
</tbody>
</table>
<div class="dlist">
<dl>
<dt class="hdlist1">9</dt>
<dd>
<p>CL_DEPTH channel order is supported only for 2D image and 2D image array
objects.</p>
</dd>
<dt class="hdlist1">10</dt>
<dd>
<p>sRGB channel order support is not required for 1D image buffers.
Writes to images with sRGB channel orders requires device support of the
<strong>cl_khr_srgb_image_writes</strong> extension.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p>For 1D, 1D image from buffer, 2D, 3D image objects, 1D and 2D image array
objects, the mandated minimum list of image formats that can be read from
and written to by the same kernel instance and that must be supported by all
devices that support images is described in the
<a href="#min-supported-same-kernel-table">Supported Formats - Kernel Read And
Write</a> table.</p>
</div>
<table id="min-supported-same-kernel-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 17. Min. list of supported image formats kernel read and write</caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">num_channels</th>
<th class="tableblock halign-left valign-top">channel_order</th>
<th class="tableblock halign-left valign-top">channel_data_type</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_R</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT8<br>
CL_SIGNED_INT8<br>
CL_SIGNED_INT16<br>
CL_SIGNED_INT32<br>
CL_UNSIGNED_INT8<br>
CL_UNSIGNED_INT16<br>
CL_UNSIGNED_INT32<br>
CL_HALF_FLOAT<br>
CL_FLOAT</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock">4</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_RGBA</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_UNORM_INT8<br>
CL_SIGNED_INT8<br>
CL_SIGNED_INT16<br>
CL_SIGNED_INT32<br>
CL_UNSIGNED_INT8<br>
CL_UNSIGNED_INT16<br>
CL_UNSIGNED_INT32<br>
CL_HALF_FLOAT<br>
CL_FLOAT</p></td>
</tr>
</tbody>
</table>
</div>
<div class="sect4">
<h5 id="image-format-mapping">Image format mapping to OpenCL kernel language image access qualifiers</h5>
<div class="paragraph">
<p>Image arguments to kernels may have the <code>read_only</code>, <code>write_only</code> or
<code>read_write</code> qualifier.
Not all image formats supported by the device and platform are valid to be
passed to all of these access qualifiers.
For each access qualifier, only images whose format is in the list of
formats returned by <strong>clGetSupportedImageFormats</strong> with the given flag
arguments in the <a href="#image-format-mapping-table">Image Format Mapping</a> table
are permitted.
It is not valid to pass an image supporting writing as both a <code>read_only</code>
image and a <code>write_only</code> image parameter, or to a <code>read_write</code> image
parameter and any other image parameter.</p>
</div>
<table id="image-format-mapping-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 18. Mapping from format flags passed to <strong>clGetSupportedImageFormats</strong> to OpenCL kernel language image access qualifiers</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top">Access Qualifier</th>
<th class="tableblock halign-left valign-top">cl_mem_flags</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>read_only</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_READ_ONLY,<br>
CL_MEM_READ_WRITE,<br>
CL_MEM_KERNEL_READ_AND_WRITE</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>write_only</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_WRITE_ONLY,<br>
CL_MEM_READ_WRITE,<br>
CL_MEM_KERNEL_READ_AND_WRITE</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><code>read_write</code></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_KERNEL_READ_AND_WRITE</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect3">
<h4 id="_reading_writing_and_copying_image_objects">5.3.3. Reading, Writing and Copying Image Objects</h4>
<div class="paragraph">
<p>The following functions enqueue commands to read from an image or image
array object to host memory or write to an image or image array object from
host memory.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueReadImage(cl_command_queue command_queue,
cl_mem image,
cl_bool blocking_read,
<span class="directive">const</span> size_t *origin,
<span class="directive">const</span> size_t *region,
size_t row_pitch,
size_t slice_pitch,
<span class="directive">void</span> *ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueWriteImage(cl_command_queue command_queue,
cl_mem image,
cl_bool blocking_write,
<span class="directive">const</span> size_t *origin,
<span class="directive">const</span> size_t *region,
size_t input_row_pitch,
size_t input_slice_pitch,
<span class="directive">const</span> <span class="directive">void</span> *ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to the host command-queue in which the read / write
command will be queued.
<em>command_queue</em> and <em>image</em> must be created with the same OpenCL context.</p>
</div>
<div class="paragraph">
<p><em>image</em> refers to a valid image or image array object.</p>
</div>
<div class="paragraph">
<p><em>blocking_read</em> and <em>blocking_write</em> indicate if the read and write
operations are <em>blocking</em> or <em>non-blocking</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_read</em> is CL_TRUE i.e. the read command is blocking,
<strong>clEnqueueReadImage</strong> does not return until the buffer data has been read and
copied into memory pointed to by <em>ptr</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_read</em> is CL_FALSE i.e. the read command is non-blocking,
<strong>clEnqueueReadImage</strong> queues a non-blocking read command and returns.
The contents of the buffer that <em>ptr</em> points to cannot be used until the
read command has completed.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the read command.
When the read command has completed, the contents of the buffer that <em>ptr</em>
points to can be used by the application.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_write</em> is CL_TRUE, the write command is blocking and does not
return until the command is complete, including transfer of the data.
The memory pointed to by <em>ptr</em> can be reused by the application after the
<strong>clEnqueueWriteImage</strong> call returns.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_write</em> is CL_FALSE, the OpenCL implementation will use <em>ptr</em> to
perform a non-blocking write.
As the write is non-blocking the implementation can return immediately.
The memory pointed to by <em>ptr</em> cannot be reused by the application after the
call returns.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the write command.
When the write command has completed, the memory pointed to by <em>ptr</em> can
then be reused by the application.</p>
</div>
<div class="paragraph">
<p><em>origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or 3D
image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image array or
the (<em>x</em>) offset and the image index in the 1D image array.
If <em>image</em> is a 2D image object, <em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image or 1D image buffer object, <em>origin</em>[1] and
<em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image array object, <em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image array object, <em>origin</em>[1] describes the image index
in the 1D image array.
If <em>image</em> is a 2D image array object, <em>origin</em>[2] describes the image index
in the 2D image array.</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em>, <em>height</em>, <em>depth</em>) in pixels of the 1D, 2D or
3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D rectangle and the
number of images of a 2D image array or the (<em>width</em>) in pixels of the 1D
rectangle and the number of images of a 1D image array.
If <em>image</em> is a 2D image object, <em>region</em>[2] must be 1.
If <em>image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
<em>region</em>[2] must be 1.
If <em>image</em> is a 1D image array object, <em>region</em>[2] must be 1.
The values in <em>region</em> cannot be 0.</p>
</div>
<div class="paragraph">
<p><em>row_pitch</em> in <strong>clEnqueueReadImage</strong> and <em>input_row_pitch</em> in
<strong>clEnqueueWriteImage</strong> is the length of each row in bytes.
This value must be greater than or equal to the element size in bytes
× <em>width</em>.
If <em>row_pitch</em> (or <em>input_row_pitch</em>) is set to 0, the appropriate row pitch
is calculated based on the size of each element in bytes multiplied by
<em>width</em>.</p>
</div>
<div class="paragraph">
<p><em>slice_pitch</em> in <strong>clEnqueueReadImage</strong> and <em>input_slice_pitch</em> in
<strong>clEnqueueWriteImage</strong> is the size in bytes of the 2D slice of the 3D region
of a 3D image or each image of a 1D or 2D image array being read or written
respectively.
This must be 0 if <em>image</em> is a 1D or 2D image.
Otherwise this value must be greater than or equal to <em>row_pitch</em> ×
<em>height</em>.
If <em>slice_pitch</em> (or <em>input_slice_pitch</em>) is set to 0, the appropriate slice
pitch is calculated based on the <em>row_pitch</em> × <em>height</em>.</p>
</div>
<div class="paragraph">
<p><em>ptr</em> is the pointer to a buffer in host memory where image data is to be
read from or to be written to.
The alignment requirements for ptr are specified in
<a href="#alignment-app-data-types">Alignment of Application Data Types</a>.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular read / write
command and can be used to query or queue a wait for this particular command
to complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueReadImage</strong> and <strong>clEnqueueWriteImage</strong> return CL_SUCCESS if the
function is executed successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
<em>image</em> are not the same or if the context associated with
<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if i_mage_ is not a valid image object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the region being read or written specified by
<em>origin</em> and <em>region</em> is out of bounds or if <em>ptr</em> is a <code>NULL</code> value.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values in <em>origin</em> and <em>region</em> do not follow rules
described in the argument description for <em>origin</em> and <em>region</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_SIZE if image dimensions (image width, height,
specified or compute row and/or slice pitch) for <em>image</em> are not
supported by device associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
data type) for <em>image</em> are not supported by device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>image</em>.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if the device associated with <em>command_queue</em> does
not support images (i.e. CL_DEVICE_IMAGE_SUPPORT specified in the
<a href="#device-queries-table">Device Queries</a> table is CL_FALSE).</p>
</li>
<li>
<p>CL_INVALID_OPERATION if <strong>clEnqueueReadImage</strong> is called on <em>image</em> which
has been created with CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if <strong>clEnqueueWriteImage</strong> is called on <em>image</em> which
has been created with CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_NO_ACCESS.</p>
</li>
<li>
<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
operations are blocking and the execution status of any of the events in
<em>event_wait_list</em> is a negative integer value.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>Calling <strong>clEnqueueReadImage</strong> to read a region of the <em>image</em> with the <em>ptr</em>
argument value set to <em>host_ptr</em> + (<em>origin</em>[2] × <em>image slice pitch</em>
+ <em>origin</em>[1] × <em>image row pitch</em> + <em>origin</em>[0] × <em>bytes
per pixel</em>), where <em>host_ptr</em> is a pointer to the memory region specified
when the <em>image</em> being read is created with CL_MEM_USE_HOST_PTR, must meet
the following requirements in order to avoid undefined behavior:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>All commands that use this image object have finished execution before
the read command begins execution.</p>
</li>
<li>
<p>The <em>row_pitch</em> and <em>slice_pitch</em> argument values in
<strong>clEnqueueReadImage</strong> must be set to the image row pitch and slice pitch.</p>
</li>
<li>
<p>The image object is not mapped.</p>
</li>
<li>
<p>The image object is not used by any command-queue until the read command
has finished execution.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Calling <strong>clEnqueueWriteImage</strong> to update the latest bits in a region of the
<em>image</em> with the <em>ptr</em> argument value set to <em>host_ptr</em> + (<em>origin</em>[2]
× <em>image slice pitch</em> + <em>origin</em>[1] × <em>image row pitch</em> +
<em>origin</em>[0] × <em>bytes per pixel</em>), where <em>host_ptr</em> is a pointer to the
memory region specified when the <em>image</em> being written is created with
CL_MEM_USE_HOST_PTR, must meet the following requirements in order to avoid
undefined behavior:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The host memory region being written contains the latest bits when the
enqueued write command begins execution.</p>
</li>
<li>
<p>The <em>input_row_pitch</em> and <em>input_slice_pitch</em> argument values in
<strong>clEnqueueWriteImage</strong> must be set to the image row pitch and slice
pitch.</p>
</li>
<li>
<p>The image object is not mapped.</p>
</li>
<li>
<p>The image object is not used by any command-queue until the write
command has finished execution.</p>
</li>
</ul>
</div>
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueCopyImage(cl_command_queue command_queue,
cl_mem src_image,
cl_mem dst_image,
<span class="directive">const</span> size_t *src_origin,
<span class="directive">const</span> size_t *dst_origin,
<span class="directive">const</span> size_t *region,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to copy image objects.
<em>src_image</em> and <em>dst_image</em> can be 1D, 2D, 3D image or a 1D, 2D image array
objects.
It is possible to copy subregions between any combinations of source and
destination types, provided that the dimensions of the subregions are the
same e.g., one can copy a rectangular region from a 2D image to a slice of a
3D image.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to the host command-queue in which the copy command
will be queued.
The OpenCL context associated with <em>command_queue</em>, <em>src_image</em> and
<em>dst_image</em> must be the same.</p>
</div>
<div class="paragraph">
<p><em>src_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or
3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image array or
the (<em>x</em>) offset and the image index in the 1D image array.
If <em>image</em> is a 2D image object, <em>src_origin</em>[2] must be 0.
If <em>src_image</em> is a 1D image object, <em>src_origin</em>[1] and <em>src_origin</em>[2]
must be 0.
If <em>src_image</em> is a 1D image array object, <em>src_origin</em>[2] must be 0.
If <em>src_image</em> is a 1D image array object, <em>src_origin</em>[1] describes the
image index in the 1D image array.
If <em>src_image</em> is a 2D image array object, <em>src_origin</em>[2] describes the
image index in the 2D image array.</p>
</div>
<div class="paragraph">
<p><em>dst_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or
3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image array or
the (<em>x</em>) offset and the image index in the 1D image array.
If <em>dst_image</em> is a 2D image object, <em>dst_origin</em>[2] must be 0.
If <em>dst_image</em> is a 1D image or 1D image buffer object, <em>dst_origin</em>[1] and
<em>dst_origin</em>[2] must be 0.
If <em>dst_image</em> is a 1D image array object, <em>dst_origin</em>[2] must be 0.
If <em>dst_image</em> is a 1D image array object, <em>dst_origin</em>[1] describes the
image index in the 1D image array.
If <em>dst_image</em> is a 2D image array object, <em>dst_origin</em>[2] describes the
image index in the 2D image array.</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em>, <em>height</em>, <em>depth</em>) in pixels of the 1D, 2D or
3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D rectangle and the
number of images of a 2D image array or the (<em>width</em>) in pixels of the 1D
rectangle and the number of images of a 1D image array.
If <em>src_image</em> or <em>dst_image</em> is a 2D image object, <em>region</em>[2] must be 1.
If <em>src_image</em> or <em>dst_image</em> is a 1D image or 1D image buffer object,
<em>region</em>[1] and <em>region</em>[2] must be 1.
If <em>src_image</em> or <em>dst_image</em> is a 1D image array object, <em>region</em>[2] must
be 1.
The values in <em>region</em> cannot be 0.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular copy command
and can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p>It is currently a requirement that the <em>src_image</em> and <em>dst_image</em> image
memory objects for <strong>clEnqueueCopyImage</strong> must have the exact same image
format (i.e. the cl_image_format descriptor specified when <em>src_image</em> and
<em>dst_image</em> are created must match).</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueCopyImage</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>,
<em>src_image</em> and <em>dst_image</em> are not the same or if the context
associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>src_image</em> and <em>dst_image</em> are not valid image
objects.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_MISMATCH if <em>src_image</em> and <em>dst_image</em> do not use the
same image format.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the 2D or 3D rectangular region specified by
<em>src_origin</em> and <em>src_origin</em> + <em>region</em> refers to a region outside
<em>src_image</em>, or if the 2D or 3D rectangular region specified by
<em>dst_origin</em> and <em>dst_origin</em> + <em>region</em> refers to a region outside
<em>dst_image</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values in <em>src_origin</em>, <em>dst_origin</em> and <em>region</em> do
not follow rules described in the argument description for <em>src_origin</em>,
<em>dst_origin</em> and <em>region</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_SIZE if image dimensions (image width, height,
specified or compute row and/or slice pitch) for <em>src_image</em> or
<em>dst_image</em> are not supported by device associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
data type) for <em>src_image</em> or <em>dst_image</em> are not supported by device
associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>src_image</em> or <em>dst_image</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if the device associated with <em>command_queue</em> does
not support images (i.e. CL_DEVICE_IMAGE_SUPPORT specified in the
<a href="#device-queries-table">Device Queries</a> table is CL_FALSE).</p>
</li>
<li>
<p>CL_MEM_COPY_OVERLAP if <em>src_image</em> and <em>dst_image</em> are the same image
object and the source and destination regions overlap.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="_filling_image_objects">5.3.4. Filling Image Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueFillImage(cl_command_queue command_queue,
cl_mem image,
<span class="directive">const</span> <span class="directive">void</span> *fill_color,
<span class="directive">const</span> size_t *origin,
<span class="directive">const</span> size_t *region,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to fill an image object with a specified color.
The usage information which indicates whether the memory object can be read
or written by a kernel and/or the host and is given by the cl_mem_flags
argument value specified when <em>image</em> is created is ignored by
<strong>clEnqueueFillImage</strong>.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to the host command-queue in which the fill command
will be queued.
The OpenCL context associated with <em>command_queue</em> and <em>image</em> must be the
same.</p>
</div>
<div class="paragraph">
<p><em>image</em> is a valid image object.</p>
</div>
<div class="paragraph">
<p><em>fill_color</em> is the color used to fill the image.
The fill color is a single floating point value if the channel order is
CL_DEPTH.
Otherwise, the fill color is a four component RGBA floating-point color
value if the <em>image</em> channel data type is not an unnormalized signed or
unsigned integer type, is a four component signed integer value if the
<em>image</em> channel data type is an unnormalized signed integer type and is a
four component unsigned integer value if the <em>image</em> channel data type is an
unnormalized unsigned integer type.
The fill color will be converted to the appropriate image channel format and
order associated with <em>image</em>.</p>
</div>
<div class="paragraph">
<p><em>origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or 3D
image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image array or
the (<em>x</em>) offset and the image index in the 1D image array.
If <em>image</em> is a 2D image object, <em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image or 1D image buffer object, <em>origin</em>[1] and
<em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image array object, <em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image array object, <em>origin</em>[1] describes the image index
in the 1D image array.
If <em>image</em> is a 2D image array object, <em>origin</em>[2] describes the image index
in the 2D image array.</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em>, <em>height</em>, <em>depth</em>) in pixels of the 1D, 2D or
3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D rectangle and the
number of images of a 2D image array or the (<em>width</em>) in pixels of the 1D
rectangle and the number of images of a 1D image array.
If <em>image</em> is a 2D image object, <em>region</em>[2] must be 1.
If <em>image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
<em>region</em>[2] must be 1.
If <em>image</em> is a 1D image array object, <em>region</em>[2] must be 1.
The values in <em>region</em> cannot be 0.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueFillImage</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
<em>image</em> are not the same or if the context associated with
<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>image</em> is not a valid image object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>fill_color</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the region being filled as specified by <em>origin</em> and
<em>region</em> is out of bounds or if <em>ptr</em> is a <code>NULL</code> value.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values in <em>origin</em> and <em>region</em> do not follow rules
described in the argument description for <em>origin</em> and <em>region</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_SIZE if image dimensions (image width, height,
specified or compute row and/or slice pitch) for <em>image</em> are not
supported by device associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
data type) for <em>image</em> are not supported by device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>image</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="_copying_between_image_and_buffer_objects">5.3.5. Copying between Image and Buffer Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueCopyImageToBuffer(cl_command_queue command_queue,
cl_mem src_image,
cl_mem dst_buffer,
<span class="directive">const</span> size_t *src_origin,
<span class="directive">const</span> size_t *region,
size_t dst_offset,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to copy an image object to a buffer object.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> must be a valid host command-queue.
The OpenCL context associated with <em>command_queue</em>, <em>src_image</em> and
<em>dst_buffer</em> must be the same.</p>
</div>
<div class="paragraph">
<p><em>src_image</em> is a valid image object.</p>
</div>
<div class="paragraph">
<p><em>dst_buffer</em> is a valid buffer object.</p>
</div>
<div class="paragraph">
<p><em>src_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or
3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image array or
the (<em>x</em>) offset and the image index in the 1D image array.
If <em>src_image</em> is a 2D image object, <em>src_origin</em>[2] must be 0.
If <em>src_image</em> is a 1D image or 1D image buffer object, <em>src_origin</em>[1] and
<em>src_origin</em>[2] must be 0.
If <em>src_image</em> is a 1D image array object, <em>src_origin</em>[2] must be 0.
If <em>src_image</em> is a 1D image array object, <em>src_origin</em>[1] describes the
image index in the 1D image array.
If <em>src_image</em> is a 2D image array object, <em>src_origin</em>[2] describes the
image index in the 2D image array.</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em>, <em>height</em>, <em>depth</em>) in pixels of the 1D, 2D or
3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D rectangle and the
number of images of a 2D image array or the (<em>width</em>) in pixels of the 1D
rectangle and the number of images of a 1D image array.
If <em>src_image</em> is a 2D image object, <em>region</em>[2] must be 1.
If <em>src_image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
<em>region</em>[2] must be 1.
If <em>src_image</em> is a 1D image array object, <em>region</em>[2] must be 1.
The values in <em>region</em> cannot be 0.</p>
</div>
<div class="paragraph">
<p><em>dst_offset</em> refers to the offset where to begin copying data into
<em>dst_buffer</em>.
The size in bytes of the region to be copied referred to as <em>dst_cb</em> is
computed as <em>width</em> × <em>height</em> × <em>depth</em> × <em>bytes/image
element</em> if <em>src_image</em> is a 3D image object, is computed as <em>width</em> ×
<em>height</em> × <em>bytes/image element</em> if <em>src_image</em> is a 2D image, is
computed as <em>width</em> × <em>height</em> × <em>arraysize</em> ×
<em>bytes/image element</em> if <em>src_image</em> is a 2D image array object, is computed
as <em>width</em> × <em>bytes/image element</em> if <em>src_image</em> is a 1D image or 1D
image buffer object and is computed as <em>width</em> × <em>arraysize</em> ×
<em>bytes/image element</em> if <em>src_image</em> is a 1D image array object.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular copy command
and can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueCopyImageToBuffer</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>,
<em>src_image</em> and <em>dst_buffer</em> are not the same or if the context
associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>src_image</em> is not a valid image object or
<em>dst_buffer</em> is not a valid buffer object or if <em>src_image</em> is a 1D
image buffer object created from <em>dst_buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the 1D, 2D or 3D rectangular region specified by
<em>src_origin</em> and <em>src_origin</em> + <em>region</em> refers to a region outside
<em>src_image</em>, or if the region specified by <em>dst_offset</em> and <em>dst_offset</em>
+ <em>dst_cb</em> to a region outside <em>dst_buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values in <em>src_origin</em> and <em>region</em> do not follow
rules described in the argument description for <em>src_origin</em> and
<em>region</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>dst_buffer</em> is a sub-buffer object
and <em>offset</em> specified when the sub-buffer object is created is not
aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
with <em>queue</em>.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_SIZE if image dimensions (image width, height,
specified or compute row and/or slice pitch) for <em>src_image</em> are not
supported by device associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
data type) for <em>src_image</em> are not supported by device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>src_image</em> or <em>dst_buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if the device associated with <em>command_queue</em> does
not support images (i.e. CL_DEVICE_IMAGE_SUPPORT specified in the
<a href="#device-queries-table">Device Queries</a> table is CL_FALSE).</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueCopyBufferToImage(cl_command_queue command_queue,
cl_mem src_buffer,
cl_mem dst_image,
size_t src_offset,
<span class="directive">const</span> size_t *dst_origin,
<span class="directive">const</span> size_t *region,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to copy a buffer object to an image object.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> must be a valid host command-queue.
The OpenCL context associated with <em>command_queue</em>, <em>src_buffer</em> and
<em>dst_image</em> must be the same.</p>
</div>
<div class="paragraph">
<p><em>src_buffer</em> is a valid buffer object.</p>
</div>
<div class="paragraph">
<p><em>dst_image</em> is a valid image object.</p>
</div>
<div class="paragraph">
<p><em>src_offset</em> refers to the offset where to begin copying data from
<em>src_buffer</em>.</p>
</div>
<div class="paragraph">
<p><em>dst_origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or
3D image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image array or
the (<em>x</em>) offset and the image index in the 1D image array.
If <em>dst_image</em> is a 2D image object, <em>dst_origin</em>[2] must be 0.
If <em>dst_image</em> is a 1D image or 1D image buffer object, <em>dst_origin</em>[1] and
<em>dst_origin</em>[2] must be 0.
If <em>dst_image</em> is a 1D image array object, <em>dst_origin</em>[2] must be 0.
If <em>dst_image</em> is a 1D image array object, <em>dst_origin</em>[1] describes the
image index in the 1D image array.
If <em>dst_image</em> is a 2D image array object, <em>dst_origin</em>[2] describes the
image index in the 2D image array.</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em>, <em>height</em>, <em>depth</em>) in pixels of the 1D, 2D or
3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D rectangle and the
number of images of a 2D image array or the (<em>width</em>) in pixels of the 1D
rectangle and the number of images of a 1D image array.
If <em>dst_image</em> is a 2D image object, <em>region</em>[2] must be 1.
If <em>dst_image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
<em>region</em>[2] must be 1.
If <em>dst_image</em> is a 1D image array object, <em>region</em>[2] must be 1.
The values in <em>region</em> cannot be 0.</p>
</div>
<div class="paragraph">
<p>The size in bytes of the region to be copied from <em>src_buffer</em> referred to
as <em>src_cb</em> is computed as <em>width</em> × <em>height</em> × <em>depth</em> ×
<em>bytes/image element</em> if <em>dst_image</em> is a 3D image object, is computed as
<em>width</em> × <em>height</em> × <em>bytes/image element</em> if <em>dst_image</em> is a
2D image, is computed as <em>width</em> × <em>height</em> × <em>arraysize</em>
× <em>bytes/image element</em> if <em>dst_image</em> is a 2D image array object, is
computed as <em>width</em> × <em>bytes/image element</em> if <em>dst_image</em> is a 1D
image or 1D image buffer object and is computed as <em>width</em> ×
<em>arraysize</em> × <em>bytes/image element</em> if <em>dst_image</em> is a 1D image array
object.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular copy command
and can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueCopyBufferToImage</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>,
<em>src_buffer</em> and <em>dst_image</em> are not the same or if the context
associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>src_buffer</em> is not a valid buffer object or
<em>dst_image</em> is not a valid image object or if <em>dst_image</em> is a 1D image
buffer object created from <em>src_buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the 1D, 2D or 3D rectangular region specified by
<em>dst_origin</em> and <em>dst_origin</em> + <em>region</em> refer to a region outside
<em>dst_image</em>, or if the region specified by <em>src_offset</em> and <em>src_offset</em>
+ <em>src_cb</em> refer to a region outside <em>src_buffer</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values in <em>dst_origin</em> and <em>region</em> do not follow
rules described in the argument description for <em>dst_origin</em> and
<em>region</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MISALIGNED_SUB_BUFFER_OFFSET if <em>src_buffer</em> is a sub-buffer object
and <em>offset</em> specified when the sub-buffer object is created is not
aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN value for device associated
with <em>queue</em>.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_SIZE if image dimensions (image width, height,
specified or compute row and/or slice pitch) for <em>dst_image</em> are not
supported by device associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
data type) for <em>dst_image</em> are not supported by device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>src_buffer</em> or <em>dst_image</em>.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if the device associated with <em>command_queue</em> does
not support images (i.e. CL_DEVICE_IMAGE_SUPPORT specified in the
<a href="#device-queries-table">Device Queries</a> table is CL_FALSE).</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="_mapping_image_objects">5.3.6. Mapping Image Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c"><span class="directive">void</span> clEnqueueMapImage(cl_command_queue command_queue,
cl_mem image,
cl_bool blocking_map,
cl_map_flags map_flags,
<span class="directive">const</span> size_t *origin,
<span class="directive">const</span> size_t *region,
size_t *image_row_pitch,
size_t *image_slice_pitch,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to map a region in the image object given by <em>image</em> into
the host address space and returns a pointer to this mapped region.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> must be a valid host command-queue.</p>
</div>
<div class="paragraph">
<p><em>image</em> is a valid image object.
The OpenCL context associated with <em>command_queue</em> and <em>image</em> must be the
same.</p>
</div>
<div class="paragraph">
<p><em>blocking_map</em> indicates if the map operation is <em>blocking</em> or
<em>non-blocking</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_map</em> is CL_TRUE, <strong>clEnqueueMapImage</strong> does not return until the
specified region in <em>image</em> is mapped into the host address space and the
application can access the contents of the mapped region using the pointer
returned by <strong>clEnqueueMapImage</strong>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_map</em> is CL_FALSE i.e. map operation is non-blocking, the
pointer to the mapped region returned by <strong>clEnqueueMapImage</strong> cannot be used
until the map command has completed.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the map command.
When the map command is completed, the application can access the contents
of the mapped region using the pointer returned by <strong>clEnqueueMapImage</strong>.</p>
</div>
<div class="paragraph">
<p><em>map_flags</em> is a bit-field and is described in the
<a href="#memory-map-flags-table">Memory Map Flags</a> table.</p>
</div>
<div class="paragraph">
<p><em>origin</em> defines the (<em>x</em>, <em>y</em>, <em>z</em>) offset in pixels in the 1D, 2D or 3D
image, the (<em>x</em>, <em>y</em>) offset and the image index in the 2D image array or
the (<em>x</em>) offset and the image index in the 1D image array.
If <em>image</em> is a 2D image object, <em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image or 1D image buffer object, <em>origin</em>[1] and
<em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image array object, <em>origin</em>[2] must be 0.
If <em>image</em> is a 1D image array object, <em>origin</em>[1] describes the image index
in the 1D image array.
If <em>image</em> is a 2D image array object, <em>origin</em>[2] describes the image index
in the 2D image array.</p>
</div>
<div class="paragraph">
<p><em>region</em> defines the (<em>width</em>, <em>height</em>, <em>depth</em>) in pixels of the 1D, 2D or
3D rectangle, the (<em>width</em>, <em>height</em>) in pixels of the 2D rectangle and the
number of images of a 2D image array or the (<em>width</em>) in pixels of the 1D
rectangle and the number of images of a 1D image array.
If <em>image</em> is a 2D image object, <em>region</em>[2] must be 1.
If <em>image</em> is a 1D image or 1D image buffer object, <em>region</em>[1] and
<em>region</em>[2] must be 1.
If <em>image</em> is a 1D image array object, <em>region</em>[2] must be 1.
The values in <em>region</em> cannot be 0.</p>
</div>
<div class="paragraph">
<p><em>image_row_pitch</em> returns the scan-line pitch in bytes for the mapped
region.
This must be a non-<code>NULL</code> value.</p>
</div>
<div class="paragraph">
<p><em>image_slice_pitch</em> returns the size in bytes of each 2D slice of a 3D image
or the size of each 1D or 2D image in a 1D or 2D image array for the mapped
region.
For a 1D and 2D image, zero is returned if this argument is not <code>NULL</code>.
For a 3D image, 1D and 2D image array, <em>image_slice_pitch</em> must be a
non-<code>NULL</code> value.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before <strong>clEnqueueMapImage</strong> can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then <strong>clEnqueueMapImage</strong> does not wait on
any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueMapImage</strong> will return a pointer to the mapped region.
The <em>errcode_ret</em> is set to CL_SUCCESS.</p>
</div>
<div class="paragraph">
<p>A <code>NULL</code> pointer is returned otherwise with one of the following error
values returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if context associated with <em>command_queue</em> and
<em>image</em> are not the same or if context associated with <em>command_queue</em>
and events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>image</em> is not a valid image object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if region being mapped given by (<em>origin</em>,
<em>origin+region</em>) is out of bounds or if values specified in <em>map_flags</em>
are not valid.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values in <em>origin</em> and <em>region</em> do not follow rules
described in the argument description for <em>origin</em> and <em>region</em>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>image_row_pitch</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>image</em> is a 3D image, 1D or 2D image array object
and <em>image_slice_pitch</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_INVALID_IMAGE_SIZE if image dimensions (image width, height,
specified or compute row and/or slice pitch) for <em>image</em> are not
supported by device associated with <em>queue</em>.</p>
</li>
<li>
<p>CL_IMAGE_FORMAT_NOT_SUPPORTED if image format (image channel order and
data type) for <em>image</em> are not supported by device associated with
<em>queue</em>.</p>
</li>
<li>
<p>CL_MAP_FAILURE if there is a failure to map the requested region into
the host address space.
This error cannot occur for image objects created with
CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR.</p>
</li>
<li>
<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the map operation is
blocking and the execution status of any of the events in
<em>event_wait_list</em> is a negative integer value.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for data store associated with <em>image</em>.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if the device associated with <em>command_queue</em> does
not support images (i.e. CL_DEVICE_IMAGE_SUPPORT specified in the
<a href="#device-queries-table">Device Queries</a> table is CL_FALSE).</p>
</li>
<li>
<p>CL_INVALID_OPERATION if <em>image</em> has been created with
CL_MEM_HOST_WRITE_ONLY or CL_MEM_HOST_NO_ACCESS and CL_MAP_READ is set
in <em>map_flags</em> or if <em>image</em> has been created with CL_MEM_HOST_READ_ONLY
or CL_MEM_HOST_NO_ACCESS and CL_MAP_WRITE or
CL_MAP_WRITE_INVALIDATE_REGION is set in <em>map_flags</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if mapping would lead to overlapping regions being
mapped for writing.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The pointer returned maps a 1D, 2D or 3D region starting at <em>origin</em> and is
at least <em>region</em>[0] pixels in size for a 1D image, 1D image buffer or 1D
image array, (<em>image_row_pitch × region[1])</em> pixels in size for a 2D
image or 2D image array, and (<em>image_slice_pitch × region[2])</em> pixels
in size for a 3D image.
The result of a memory access outside this region is undefined.</p>
</div>
<div class="paragraph">
<p>If the image object is created with CL_MEM_USE_HOST_PTR set in <em>mem_flags</em>,
the following will be true:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>The <em>host_ptr</em> specified in <strong>clCreateImage</strong> is guaranteed to contain the
latest bits in the region being mapped when the <strong>clEnqueueMapImage</strong>
command has completed.</p>
</li>
<li>
<p>The pointer value returned by <strong>clEnqueueMapImage</strong> will be derived from
the <em>host_ptr</em> specified when the image object is created.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Mapped image objects are unmapped using <strong>clEnqueueUnmapMemObject</strong>.
This is described in <a href="#unmapping-mapped-memory">Unmapping Mapped Memory
Objects</a>.</p>
</div>
</div>
<div class="sect3">
<h4 id="image-object-queries">5.3.7. Image Object Queries</h4>
<div class="paragraph">
<p>To get information that is common to all memory objects, use the
<strong>clGetMemObjectInfo</strong> function described in <a href="#memory-object-queries">Memory
Object Queries</a>.</p>
</div>
<div class="paragraph">
<p>To get information specific to an image object created with <strong>clCreateImage</strong>,
use the following function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetImageInfo(cl_mem image,
cl_image_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>image</em> specifies the image object being queried.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> specifies the information to query.
The list of supported <em>param_name</em> types and the information returned in
<em>param_value</em> by <strong>clGetImageInfo</strong> is described in the
<a href="#image-info-table">Image Object Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory where the appropriate result being
queried is returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> is used to specify the size in bytes of memory pointed to
by <em>param_value</em>.
This size must be ≥ size of return type as described in the
<a href="#image-info-table">Image Object Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><strong>clGetImageInfo</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not valid, or if size in bytes
specified by <em>param_value_size</em> is &lt; size of return type as described in
the <a href="#image-info-table">Image Object Queries</a> table and <em>param_value</em> is
not <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>image</em> is a not a valid image object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<table id="image-info-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 19. List of supported param_names by <strong>clGetImageInfo</strong></caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_image_info</strong></th>
<th class="tableblock halign-left valign-top">Return type</th>
<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_FORMAT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_image_format</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return image format descriptor specified when <em>image</em> is created
with <strong>clCreateImage</strong>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_ELEMENT_SIZE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return size of each element of the image memory object given by
<em>image</em> in bytes.
An element is made up of <em>n</em> channels.
The value of <em>n</em> is given in <em>cl_image_format</em> descriptor.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_ROW_PITCH</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return calculated row pitch in bytes of a row of elements of the
image object given by <em>image</em>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_SLICE_PITCH</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return calculated slice pitch in bytes of a 2D slice for the 3D
image object or size of each image in a 1D or 2D image array given
by <em>image</em>.
For a 1D image, 1D image buffer and 2D image object return 0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_WIDTH</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return width of the image in pixels.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_HEIGHT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return height of the image in pixels.
For a 1D image, 1D image buffer and 1D image array object, height =
0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_DEPTH</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return depth of the image in pixels.
For a 1D image, 1D image buffer, 2D image or 1D and 2D image array
object, depth = 0.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_ARRAY_SIZE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return number of images in the image array.
If <em>image</em> is not an image array, 0 is returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_NUM_MIP_LEVELS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return num_mip_levels associated with <em>image</em>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_IMAGE_NUM_SAMPLES</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return num_samples associated with <em>image</em>.</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect2">
<h3 id="_pipes">5.4. Pipes</h3>
<div class="paragraph">
<p>A <em>pipe</em> is a memory object that stores data organized as a FIFO.
Pipe objects can only be accessed using built-in functions that read from
and write to a pipe.
Pipe objects are not accessible from the host.
A pipe object encapsulates the following information:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Packet size in bytes</p>
</li>
<li>
<p>Maximum capacity in packets</p>
</li>
<li>
<p>Information about the number of packets currently in the pipe</p>
</li>
<li>
<p>Data packets</p>
</li>
</ul>
</div>
<div class="sect3">
<h4 id="_creating_pipe_objects">5.4.1. Creating Pipe Objects</h4>
<div class="paragraph">
<p>A <strong>pipe object</strong> is created using the following function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_mem clCreatePipe(cl_context context,
cl_mem_flags flags,
cl_uint pipe_packet_size,
cl_uint pipe_max_packets,
<span class="directive">const</span> cl_pipe_properties *properties,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>context</em> is a valid OpenCL context used to create the pipe object.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify allocation and usage
information such as the memory arena that should be used to allocate the
pipe object and how it will be used.
The <a href="#memory-flags">Memory Flags</a> table describes the possible values for
<em>flags</em>.
Only CL_MEM_READ_WRITE and CL_MEM_HOST_NO_ACCESS can be specified when
creating a pipe object.
If the value specified for <em>flags</em> is 0, the default is used which is
CL_MEM_READ_WRITE | CL_MEM_HOST_NO_ACCESS.</p>
</div>
<div class="paragraph">
<p><em>pipe_packet_size</em> is the size in bytes of a pipe packet.</p>
</div>
<div class="paragraph">
<p><em>pipe_max_packets</em> specifies the pipe capacity by specifying the maximum
number of packets the pipe can hold.</p>
</div>
<div class="paragraph">
<p><em>properties</em> specifies a list of properties for the pipe and their
corresponding values.
Each property name is immediately followed by the corresponding desired
value.
The list is terminated with 0.
In OpenCL 2.2, <em>properties</em> must be <code>NULL</code>.</p>
</div>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clCreatePipe</strong> returns a valid non-zero pipe object and <em>errcode_ret</em> is set
to CL_SUCCESS if the pipe object is created successfully.
Otherwise, it returns a <code>NULL</code> value with one of the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_VALUE if values specified in <em>flags</em> are not as defined
above.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>properties</em> is not <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_PIPE_SIZE if <em>pipe_packet_size</em> is 0 or the
<em>pipe_packet_size</em> exceeds CL_DEVICE_PIPE_MAX_PACKET_SIZE value
specified in the <a href="#device-queries-table">Device Queries</a> table for all
devices in <em>context</em> or if <em>pipe_max_packets</em> is 0.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for the pipe object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Pipes follow the same memory consistency model as defined for buffer and
image objects.
The pipe state i.e. contents of the pipe across kernel-instances (on the
same or different devices) is enforced at a synchronization point.</p>
</div>
</div>
<div class="sect3">
<h4 id="_pipe_object_queries">5.4.2. Pipe Object Queries</h4>
<div class="paragraph">
<p>To get information that is common to all memory objects, use the
<strong>clGetMemObjectInfo</strong> function described in <a href="#memory-object-queries">Memory
Object Queries</a>.</p>
</div>
<div class="paragraph">
<p>To get information specific to a pipe object created with <strong>clCreatePipe</strong>,
use the following function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetPipeInfo(cl_mem pipe,
cl_pipe_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>pipe</em> specifies the pipe object being queried.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> specifies the information to query.
The list of supported <em>param_name</em> types and the information returned in
<em>param_value</em> by <strong>clGetPipeInfo</strong> is described in the <a href="#pipe-info-table">Pipe
Object Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory where the appropriate result being
queried is returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> is used to specify the size in bytes of memory pointed to
by <em>param_value</em>.
This size must be ≥ size of return type as described in the
<a href="#pipe-info-table">Pipe Object Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><strong>clGetPipeInfo</strong> returns CL_SUCCESS if the function is executed successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not valid, or if size in bytes
specified by <em>param_value_size</em> is &lt; size of return type as described in
the <a href="#pipe-info-table">Pipe Object Queries</a> table and <em>param_value</em> is
not <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>pipe</em> is a not a valid pipe object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<table id="pipe-info-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 20. List of supported param_names by <strong>clGetPipeInfo</strong></caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_pipe_info</strong></th>
<th class="tableblock halign-left valign-top">Return type</th>
<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PIPE_PACKET_SIZE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return pipe packet size specified when <em>pipe</em> is created with
<strong>clCreatePipe</strong>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_PIPE_MAX_PACKETS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return max. number of packets specified when <em>pipe</em> is created with
<strong>clCreatePipe</strong>.</p></td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="sect2">
<h3 id="_querying_unmapping_migrating_retaining_and_releasing_memory_objects">5.5. Querying, Unmapping, Migrating, Retaining and Releasing Memory Objects</h3>
<div class="sect3">
<h4 id="_retaining_and_releasing_memory_objects">5.5.1. Retaining and Releasing Memory Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainMemObject(cl_mem memobj)</code></pre>
</div>
</div>
<div class="paragraph">
<p>increments the <em>memobj</em> reference count.
<strong>clRetainMemObject</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>memobj</em> is not a valid memory object (buffer
or image object).</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clCreateBuffer</strong>, <strong>clCreateSubBuffer</strong>, <strong>clCreateImage</strong> and <strong>clCreatePipe</strong>
perform an implicit retain.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseMemObject(cl_mem memobj)</code></pre>
</div>
</div>
<div class="paragraph">
<p>decrements the <em>memobj</em> reference count.
<strong>clReleaseMemObject</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>memobj</em> is not a valid memory object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>After the <em>memobj</em> reference count becomes zero and commands queued for
execution on a command-queue(s) that use <em>memobj</em> have finished, the memory
object is deleted.
If <em>memobj</em> is a buffer object, <em>memobj</em> cannot be deleted until all
sub-buffer objects associated with <em>memobj</em> are deleted.
Using this function to release a reference that was not obtained by creating
the object or by calling <strong>clRetainMemObject</strong> causes undefined behavior.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clSetMemObjectDestructorCallback
(cl_mem memobj,
<span class="directive">void</span> (CL_CALLBACK *pfn_notify)(cl_mem memobj,<span class="directive">void</span> *user_data),
<span class="directive">void</span> *user_data)</code></pre>
</div>
</div>
<div class="paragraph">
<p>registers a user callback function with a memory object.
Each call to <strong>clSetMemObjectDestructorCallback</strong> registers the specified user
callback function on a callback stack associated with <em>memobj</em>.
The registered user callback functions are called in the reverse order in
which they were registered.
The user callback functions are called and then the memory objects resources
are freed and the memory object is deleted.
This provides a mechanism for the application (and libraries) using <em>memobj</em>
to be notified when the memory referenced by <em>host_ptr</em>, specified when the
memory object is created and used as the storage bits for the memory object,
can be reused or freed.</p>
</div>
<div class="paragraph">
<p><em>memobj</em> is a valid memory object.</p>
</div>
<div class="paragraph">
<p><em>pfn_notify</em> is the callback function that can be registered by the
application.
This callback function may be called asynchronously by the OpenCL
implementation.
It is the applications responsibility to ensure that the callback function
is thread-safe.
The parameters to this callback function are:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><em>memobj</em> is the memory object being deleted.
When the user callback is called by the implementation, this memory
object is not longer valid.
<em>memobj</em> is only provided for reference purposes.</p>
</li>
<li>
<p><em>user_data</em> is a pointer to user supplied data.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><em>user_data</em> will be passed as the <em>user_data</em> argument when <em>pfn_notify</em> is
called.
<em>user_data</em> can be <code>NULL</code>.</p>
</div>
<div class="paragraph">
<p><strong>clSetMemObjectDestructorCallback</strong> returns CL_SUCCESS if the function is
executed successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>memobj</em> is not a valid memory object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>pfn_notify</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>When the user callback function is called by the implementation, the
contents of the memory region pointed to by <em>host_ptr</em> (if the memory object
is created with CL_MEM_USE_HOST_PTR) are undefined.
The callback function is typically used by the application to either free or
reuse the memory region pointed to by <em>host_ptr</em>.</p>
</div>
<div class="paragraph">
<p>The behavior of calling expensive system routines, OpenCL API calls to
create contexts or command-queues, or blocking OpenCL operations from the
following list below, in a callback is undefined.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><strong>clFinish</strong>,</p>
</li>
<li>
<p><strong>clWaitForEvents</strong>,</p>
</li>
<li>
<p>blocking calls to <strong>clEnqueueReadBuffer</strong>, <strong>clEnqueueReadBufferRect</strong>,
<strong>clEnqueueWriteBuffer</strong>, <strong>clEnqueueWriteBufferRect</strong>,</p>
</li>
<li>
<p>blocking calls to <strong>clEnqueueReadImage</strong> and <strong>clEnqueueWriteImage</strong>,</p>
</li>
<li>
<p>blocking calls to <strong>clEnqueueMapBuffer</strong>, <strong>clEnqueueMapImage</strong>,</p>
</li>
<li>
<p>blocking calls to <strong>clBuildProgram</strong>, <strong>clCompileProgram</strong> or
<strong>clLinkProgram</strong></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>If an application needs to wait for completion of a routine from the above
list in a callback, please use the non-blocking form of the function, and
assign a completion callback to it to do the remainder of your work.
Note that when a callback (or other code) enqueues commands to a
command-queue, the commands are not required to begin execution until the
queue is flushed.
In standard usage, blocking enqueue calls serve this role by implicitly
flushing the queue.
Since blocking calls are not permitted in callbacks, those callbacks that
enqueue commands on a command queue should either call <strong>clFlush</strong> on the
queue before returning or arrange for <strong>clFlush</strong> to be called later on
another thread.</p>
</div>
<div class="paragraph">
<p>The user callback function may not call OpenCL APIs with the memory object
for which the callback function is invoked and for such cases the behavior
of OpenCL APIs is considered to be undefined.</p>
</div>
</td>
</tr>
</table>
</div>
</div>
<div class="sect3">
<h4 id="unmapping-mapped-memory">5.5.2. Unmapping Mapped Memory Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueUnmapMemObject(cl_command_queue command_queue,
cl_mem memobj,
<span class="directive">void</span> *mapped_ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to unmap a previously mapped region of a memory object.
Reads or writes from the host using the pointer returned by
<strong>clEnqueueMapBuffer</strong> or <strong>clEnqueueMapImage</strong> are considered to be complete.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> must be a valid host command-queue.</p>
</div>
<div class="paragraph">
<p><em>memobj</em> is a valid memory (buffer or image) object.
The OpenCL context associated with <em>command_queue</em> and <em>memobj</em> must be the
same.</p>
</div>
<div class="paragraph">
<p><em>mapped_ptr</em> is the host address returned by a previous call to
<strong>clEnqueueMapBuffer</strong>, or <strong>clEnqueueMapImage</strong> for <em>memobj</em>.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before <strong>clEnqueueUnmapMemObject</strong> can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then <strong>clEnqueueUnmapMemObject</strong> does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueUnmapMemObject</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>memobj</em> is not a valid memory object or is a
pipe object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>mapped_ptr</em> is not a valid pointer returned by
<strong>clEnqueueMapBuffer</strong> or <strong>clEnqueueMapImage</strong> for <em>memobj</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or if <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if context associated with <em>command_queue</em> and
<em>memobj</em> are not the same or if the context associated with
<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueMapImage</strong> increment the mapped count of
the memory object.
The initial mapped count value of the memory object is zero.
Multiple calls to <strong>clEnqueueMapBuffer</strong>, or <strong>clEnqueueMapImage</strong> on the same
memory object will increment this mapped count by appropriate number of
calls.
<strong>clEnqueueUnmapMemObject</strong> decrements the mapped count of the memory object.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueMapBuffer</strong>, and <strong>clEnqueueMapImage</strong> act as synchronization points
for a region of the buffer object being mapped.</p>
</div>
</div>
<div class="sect3">
<h4 id="accessing-mapped-regions">5.5.3. Accessing mapped regions of a memory object</h4>
<div class="paragraph">
<p>This section describes the behavior of OpenCL commands that access mapped
regions of a memory object.</p>
</div>
<div class="paragraph">
<p>The contents of the region of a memory object and associated memory objects
(sub-buffer objects or 1D image buffer objects that overlap this region)
mapped for writing (i.e. CL_MAP_WRITE or CL_MAP_WRITE_INVALIDATE_REGION is
set in <em>map_flags</em> argument to <strong>clEnqueueMapBuffer</strong>, or <strong>clEnqueueMapImage</strong>)
are considered to be undefined until this region is unmapped.</p>
</div>
<div class="paragraph">
<p>Multiple commands in command-queues can map a region or overlapping regions
of a memory object and associated memory objects (sub-buffer objects or 1D
image buffer objects that overlap this region) for reading (i.e. <em>map_flags</em>
= CL_MAP_READ).
The contents of the regions of a memory object mapped for reading can also
be read by kernels and other OpenCL commands (such as <strong>clEnqueueCopyBuffer</strong>)
executing on a device(s).</p>
</div>
<div class="paragraph">
<p>Mapping (and unmapping) overlapped regions in a memory object and/or
associated memory objects (sub-buffer objects or 1D image buffer objects
that overlap this region) for writing is an error and will result in
CL_INVALID_OPERATION error returned by <strong>clEnqueueMapBuffer</strong>, or
<strong>clEnqueueMapImage</strong>.</p>
</div>
<div class="paragraph">
<p>If a memory object is currently mapped for writing, the application must
ensure that the memory object is unmapped before any enqueued kernels or
commands that read from or write to this memory object or any of its
associated memory objects (sub-buffer or 1D image buffer objects) or its
parent object (if the memory object is a sub-buffer or 1D image buffer
object) begin execution; otherwise the behavior is undefined.</p>
</div>
<div class="paragraph">
<p>If a memory object is currently mapped for reading, the application must
ensure that the memory object is unmapped before any enqueued kernels or
commands that write to this memory object or any of its associated memory
objects (sub-buffer or 1D image buffer objects) or its parent object (if the
memory object is a sub-buffer or 1D image buffer object) begin execution;
otherwise the behavior is undefined.</p>
</div>
<div class="paragraph">
<p>A memory object is considered as mapped if there are one or more active
mappings for the memory object irrespective of whether the mapped regions
span the entire memory object.</p>
</div>
<div class="paragraph">
<p>Accessing the contents of the memory region referred to by the mapped
pointer that has been unmapped is undefined.</p>
</div>
<div class="paragraph">
<p>The mapped pointer returned by <strong>clEnqueueMapBuffer</strong> or <strong>clEnqueueMapImage</strong>
can be used as <em>ptr</em> argument value to <strong>clEnqueue{Read\|Write}Buffer</strong>,
<strong>clEnqeue{Read\|Write}BufferRect</strong>, <strong>clEnqueue{Read\|Write}Image</strong> provided
the rules described above are adhered to.</p>
</div>
</div>
<div class="sect3">
<h4 id="_migrating_memory_objects">5.5.4. Migrating Memory Objects</h4>
<div class="paragraph">
<p>This section describes a mechanism for assigning which device an OpenCL
memory object resides.
A user may wish to have more explicit control over the location of their
memory objects on creation.
This could be used to:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Ensure that an object is allocated on a specific device prior to usage.</p>
</li>
<li>
<p>Preemptively migrate an object from one device to another.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueMigrateMemObjects(cl_command_queue command_queue,
cl_uint num_mem_objects,
<span class="directive">const</span> cl_mem *mem_objects,
cl_mem_migration_flags flags,
cl_uint num_events_in_wait_list
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to indicate which device a set of memory objects should
be associated with.
Typically, memory objects are implicitly migrated to a device for which
enqueued commands, using the memory object, are targeted.
<strong>clEnqueueMigrateMemObjects</strong> allows this migration to be explicitly
performed ahead of the dependent commands.
This allows a user to preemptively change the association of a memory
object, through regular command queue scheduling, in order to prepare for
another upcoming command.
This also permits an application to overlap the placement of memory objects
with other unrelated operations before these memory objects are needed
potentially hiding transfer latencies.
Once the event, returned from <strong>clEnqueueMigrateMemObjects</strong>, has been marked
CL_COMPLETE the memory objects specified in <em>mem_objects</em> have been
successfully migrated to the device associated with <em>command_queue</em>.
The migrated memory object shall remain resident on the device until another
command is enqueued that either implicitly or explicitly migrates it away.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueMigrateMemObjects</strong> can also be used to direct the initial
placement of a memory object, after creation, possibly avoiding the initial
overhead of instantiating the object on the first enqueued command to use
it.</p>
</div>
<div class="paragraph">
<p>The user is responsible for managing the event dependencies, associated with
this command, in order to avoid overlapping access to memory objects.
Improperly specified event dependencies passed to
<strong>clEnqueueMigrateMemObjects</strong> could result in undefined results.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> is a valid host command-queue.
The specified set of memory objects in <em>mem_objects</em> will be migrated to the
OpenCL device associated with <em>command_queue</em> or to the host if the
CL_MIGRATE_MEM_OBJECT_HOST has been specified.</p>
</div>
<div class="paragraph">
<p><em>num_mem_objects</em> is the number of memory objects specified in
<em>mem_objects</em>.</p>
</div>
<div class="paragraph">
<p><em>mem_objects</em> is a pointer to a list of memory objects.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify migration options.
The <a href="#migration-flags-table">Memory Migration Flags</a> describes the possible
values for flags.</p>
</div>
<table id="migration-flags-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 21. Supported values for cl_mem_migration_flags</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_mem_migration flags</strong></th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MIGRATE_MEM_OBJECT_HOST</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag indicates that the specified set of memory objects are to be
migrated to the host, regardless of the target command-queue.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag indicates that the contents of the set of memory objects are
undefined after migration.
The specified set of memory objects are migrated to the device
associated with <em>command_queue</em> without incurring the overhead of
migrating their contents.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueMigrateMemObjects</strong> return CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
memory objects in <em>mem_objects</em> are not the same or if the context
associated with <em>command_queue</em> and events in <em>event_wait_list</em> are not
the same.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if any of the memory objects in <em>mem_objects</em> is
not a valid memory object.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>num_mem_objects</em> is zero or if <em>mem_objects</em> is
<code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>flags</em> is not 0 or is not any of the values
described in the table above.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
memory for the specified set of memory objects in <em>mem_objects</em>.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="memory-object-queries">5.5.5. Memory Object Queries</h4>
<div class="paragraph">
<p>To get information that is common to all memory objects (buffer and image
objects), use the following function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetMemObjectInfo(cl_mem memobj,
cl_mem_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>memobj</em> specifies the memory object being queried.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> specifies the information to query.
The list of supported <em>param_name</em> types and the information returned in
<em>param_value</em> by <strong>clGetMemObjectInfo</strong> is described in the
<a href="#mem-info-table">Memory Object Info</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory where the appropriate result being
queried is returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> is used to specify the size in bytes of memory pointed to
by <em>param_value</em>.
This size must be ≥ size of return type as described in the
<a href="#mem-info-table">Memory Object Info</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><strong>clGetMemObjectInfo</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not valid, or if size in bytes
specified by <em>param_value_size</em> is &lt; size of return type as described in
the <a href="#mem-info-table">Memory Object Info</a> table and <em>param_value</em> is not
<code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_MEM_OBJECT if <em>memobj</em> is a not a valid memory object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<table id="mem-info-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 22. List of supported param_names by <strong>clGetMemObjectInfo</strong></caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_mem_info</strong></th>
<th class="tableblock halign-left valign-top">Return type</th>
<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_TYPE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_mem_object_type</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Returns one of the following values:
</p><p class="tableblock"> CL_MEM_OBJECT_BUFFER if <em>memobj</em> is created with <strong>clCreateBuffer</strong> or
<strong>clCreateSubBuffer</strong>.
</p><p class="tableblock"> cl_image_desc.image_type argument value if <em>memobj</em> is created with
<strong>clCreateImage</strong>.
</p><p class="tableblock"> CL_MEM_OBJECT_PIPE if <em>memobj</em> is created with <strong>clCreatePipe</strong>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_FLAGS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_mem_flags</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the flags argument value specified when <em>memobj</em> is created
with <strong>clCreateBuffer</strong>,<br>
<strong>clCreateSubBuffer</strong>,<br>
<strong>clCreateImage</strong> or<br>
<strong>clCreatePipe</strong>.
</p><p class="tableblock"> If <em>memobj</em> is a sub-buffer the memory access qualifiers inherited
from parent buffer is also returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_SIZE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return actual size of the data store associated with <em>memobj</em> in
bytes.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_HOST_PTR</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">void *</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">If <em>memobj</em> is created with <strong>clCreateBuffer</strong> or <strong>clCreateImage</strong> and
CL_MEM_USE_HOST_PTR is specified in mem_flags, return the host_ptr
argument value specified when <em>memobj</em> is created.
Otherwise a <code>NULL</code> value is returned.
</p><p class="tableblock"> If <em>memobj</em> is created with <strong>clCreateSubBuffer</strong>, return the host_ptr
+ origin value specified when <em>memobj</em> is created.
host_ptr is the argument value specified to <strong>clCreateBuffer</strong> and
CL_MEM_USE_HOST_PTR is specified in mem_flags for memory object from
which <em>memobj</em> is created.
Otherwise a <code>NULL</code> value is returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_MAP_COUNT</strong><sup>11</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Map count.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_REFERENCE_COUNT</strong><sup>12</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return <em>memobj</em> reference count.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_CONTEXT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return context specified when memory object is created.
If <em>memobj</em> is created using <strong>clCreateSubBuffer</strong>, the context
associated with the memory object specified as the <em>buffer</em> argument
to <strong>clCreateSubBuffer</strong> is returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_ASSOCIATED_MEMOBJECT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_mem</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return memory object from which <em>memobj</em> is created.
This returns the memory object specified as buffer argument to
<strong>clCreateSubBuffer</strong> if <em>memobj</em> is a subbuffer object created using
<strong>clCreateSubBuffer</strong>.
</p><p class="tableblock"> This returns the mem_object specified in cl_image_desc if <em>memobj</em>
is an image object.
</p><p class="tableblock"> Otherwise a <code>NULL</code> value is returned.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_OFFSET</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return offset if <em>memobj</em> is a sub-buffer object created using
<strong>clCreateSubBuffer</strong>.
</p><p class="tableblock"> This return 0 if <em>memobj</em> is not a subbuffer object.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_USES_SVM_POINTER</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return CL_TRUE if <em>memobj</em> is a buffer object that was created with
CL_MEM_USE_HOST_PTR or is a sub-buffer object of a buffer object
that was created with CL_MEM_USE_HOST_PTR and the <em>host_ptr</em>
specified when the buffer object was created is a SVM pointer;
otherwise returns CL_FALSE.</p></td>
</tr>
</tbody>
</table>
<div class="dlist">
<dl>
<dt class="hdlist1">11</dt>
<dd>
<p>The map count returned should be considered immediately stale.
It is unsuitable for general use in applications.
This feature is provided for debugging.</p>
</dd>
<dt class="hdlist1">12</dt>
<dd>
<p>The reference count returned should be considered immediately stale.
It is unsuitable for general use in applications.
This feature is provided for identifying memory leaks.</p>
</dd>
</dl>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_shared_virtual_memory">5.6. Shared Virtual Memory</h3>
<div class="paragraph">
<p>OpenCL 2.2 adds support for shared virtual memory (a.k.a.
SVM).
SVM allows the host and kernels executing on devices to directly share
complex, pointer-containing data structures such as trees and linked lists.
It also eliminates the need to marshal data between the host and devices.
As a result, SVM substantially simplifies OpenCL programming and may improve
performance.</p>
</div>
<div class="sect3">
<h4 id="_svm_sharing_granularity_coarse_and_fine_grained_sharing">5.6.1. SVM sharing granularity: coarse- and fine- grained sharing</h4>
<div class="paragraph">
<p>OpenCL maintains memory consistency in a coarse-grained fashion in regions
of buffers.
We call this coarse-grained sharing.
Many platforms such as those with integrated CPU-GPU processors and ones
using the SVM-related PCI-SIG IOMMU services can do better, and can support
sharing at a granularity smaller than a buffer.
We call this fine-grained sharing.
OpenCL 2.0 requires that the host and all OpenCL 2.2 devices support
coarse-grained sharing at a minimum.</p>
</div>
<div class="ulist">
<ul>
<li>
<p>Coarse-grained sharing: Coarse-grain sharing may be used for memory and
virtual pointer sharing between multiple devices as well as between the
host and one or more devices.
The shared memory region is a memory buffer allocated using
<strong>clSVMAlloc</strong>.
Memory consistency is guaranteed at synchronization points and the host
can use calls to <strong>clEnqueueSVMMap</strong> and <strong>clEnqueueSVMUnmap</strong> or create a
cl_mem buffer object using the SVM pointer and use OpenCLs existing host
API functions <strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueUnmapMemObject</strong> to
update regions of the buffer.
What coarse-grain buffer SVM adds to OpenCLs earlier buffer support are
the ability to share virtual memory pointers and a guarantee that
concurrent access to the same memory allocation from multiple kernels on
a single device is valid.
The coarse-grain buffer SVM provides a memory consistency model similar
to the global memory consistency model described in <em>sections 3.3.1</em> and
<em>3.4.3</em> of the OpenCL 1.2 specification.
This memory consistency applies to the regions of buffers being shared
in a coarse-grained fashion.
It is enforced at the synchronization points between commands enqueued
to command queues in a single context with the additional consideration
that multiple kernels concurrently running on the same device may safely
share the data.</p>
</li>
<li>
<p>Fine-grained sharing: Shared virtual memory where memory consistency is
maintained at a granularity smaller than a buffer.
How fine-grained SVM is used depends on whether the device supports SVM
atomic operations.</p>
<div class="ulist">
<ul>
<li>
<p>If SVM atomic operations are supported, they provide memory consistency
for loads and stores by the host and kernels executing on devices
supporting SVM.
This means that the host and devices can concurrently read and update
the same memory.
The consistency provided by SVM atomics is in addition to the
consistency provided at synchronization points.
There is no need for explicit calls to <strong>clEnqueueSVMMap</strong> and
<strong>clEnqueueSVMUnmap</strong> or <strong>clEnqueueMapBuffer</strong> and
<strong>clEnqueueUnmapMemObject</strong> on a cl_mem buffer object created using the
SVM pointer.</p>
</li>
<li>
<p>If SVM atomic operations are not supported, the host and devices can
concurrently read the same memory locations and can concurrently update
non-overlapping memory regions, but attempts to update the same memory
locations are undefined.
Memory consistency is guaranteed at synchronization points without the
need for explicit calls to to <strong>clEnqueueSVMMap</strong> and <strong>clEnqueueSVMUnmap</strong>
or <strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueUnmapMemObject</strong> on a cl_mem
buffer object created using the SVM pointer.</p>
</li>
</ul>
</div>
</li>
<li>
<p>There are two kinds of fine-grain sharing support.
Devices may support either fine-grain buffer sharing or fine-grain
system sharing.</p>
<div class="ulist">
<ul>
<li>
<p>Fine-grain buffer sharing provides fine-grain SVM only within buffers
and is an extension of coarse-grain sharing.
To support fine-grain buffer sharing in an OpenCL context, all devices
in the context must support CL_DEVICE_SVM_FINE_GRAIN_BUFFER.</p>
</li>
<li>
<p>Fine-grain system sharing enables fine-grain sharing of the hosts
entire virtual memory, including memory regions allocated by the system
<strong>malloc</strong> API.
OpenCL buffer objects are unnecessary and programmers can pass pointers
allocated using <strong>malloc</strong> to OpenCL kernels.</p>
</li>
</ul>
</div>
</li>
</ul>
</div>
<div class="paragraph">
<p>As an illustration of fine-grain SVM using SVM atomic operations to maintain
memory consistency, consider the following example.
The host and a set of devices can simultaneously access and update a shared
work-queue data structure holding work-items to be done.
The host can use atomic operations to insert new work-items into the queue
at the same time as the devices using similar atomic operations to remove
work-items for processing.</p>
</div>
<div class="paragraph">
<p>It is the programmers responsibility to ensure that no host code or
executing kernels attempt to access a shared memory region after that memory
is freed.
We require the SVM implementation to work with either 32- or 64- bit host
applications subject to the following requirement: the address space size
must be the same for the host and all OpenCL devices in the context.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c"><span class="directive">void</span>* clSVMAlloc(cl_context context,
cl_svm_mem_flags flags,
size_t size,
cl_uint alignment)</code></pre>
</div>
</div>
<div class="paragraph">
<p>allocates a shared virtual memory buffer (referred to as a SVM buffer) that
can be shared by the host and all devices in an OpenCL context that support
shared virtual memory.</p>
</div>
<div class="paragraph">
<p><em>context</em> is a valid OpenCL context used to create the SVM buffer.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify allocation and usage
information.
The <a href="#svm-flags-table">SVM Memory Flags</a> table describes the possible values
for <em>flags</em>.</p>
</div>
<table id="svm-flags-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 23. List of supported cl_svm_mem_flags_ values</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_svm_mem_flags</strong></th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_READ_WRITE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the SVM buffer will be read and written by a
kernel.
This is the default.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_WRITE_ONLY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the SVM buffer will be written but not read by
a kernel.
</p><p class="tableblock"> Reading from a SVM buffer created with CL_MEM_WRITE_ONLY inside a kernel
is undefined.
</p><p class="tableblock"> CL_MEM_READ_WRITE and CL_MEM_WRITE_ONLY are mutually exclusive.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_READ_ONLY</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag specifies that the SVM buffer object is a read-only memory
object when used inside a kernel.
</p><p class="tableblock"> Writing to a SVM buffer created with CL_MEM_READ_ONLY inside a kernel is
undefined.
</p><p class="tableblock"> CL_MEM_READ_WRITE or CL_MEM_WRITE_ONLY and CL_MEM_READ_ONLY are mutually
exclusive.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_SVM_FINE_GRAIN_BUFFER</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This specifies that the application wants the OpenCL implementation to
do a fine-grained allocation.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_MEM_SVM_ATOMICS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">This flag is valid only if CL_MEM_SVM_FINE_GRAIN_BUFFER is specified in
flags.
It is used to indicate that SVM atomic operations can control visibility
of memory accesses in this SVM buffer.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>If CL_MEM_SVM_FINE_GRAIN_BUFFER is not specified, the buffer can be created
as a coarse grained SVM allocation.
Similarly, if CL_MEM_SVM_ATOMICS is not specified, the buffer can be created
without support for SVM atomic operations (refer to an OpenCL kernel
language specifications).</p>
</div>
<div class="paragraph">
<p><em>size</em> is the size in bytes of the SVM buffer to be allocated.</p>
</div>
<div class="paragraph">
<p><em>alignment</em> is the minimum alignment in bytes that is required for the newly
created buffers memory region.
It must be a power of two up to the largest data type supported by the
OpenCL device.
For the full profile, the largest data type is long16.
For the embedded profile, it is long16 if the device supports 64-bit
integers; otherwise it is int16.
If alignment is 0, a default alignment will be used that is equal to the
size of largest data type supported by the OpenCL implementation.</p>
</div>
<div class="paragraph">
<p><strong>clSVMAlloc</strong> returns a valid non-<code>NULL</code> shared virtual memory address if the
SVM buffer is successfully allocated.
Otherwise, like <strong>malloc</strong>, it returns a <code>NULL</code> pointer value.
<strong>clSVMAlloc</strong> will fail if</p>
</div>
<div class="ulist">
<ul>
<li>
<p><em>context</em> is not a valid context.</p>
</li>
<li>
<p><em>flags</em> does not contain CL_MEM_SVM_FINE_GRAIN_BUFFER but does contain
CL_MEM_SVM_ATOMICS.</p>
</li>
<li>
<p>Values specified in <em>flags</em> do not follow rules described for supported
values in the <a href="#svm-flags-table">SVM Memory Flags</a> table.</p>
</li>
<li>
<p>CL_MEM_SVM_FINE_GRAIN_BUFFER or CL_MEM_SVM_ATOMICS is specified in
<em>flags</em> and these are not supported by at least one device in <em>context</em>.</p>
</li>
<li>
<p>The values specified in <em>flags</em> are not valid, i.e. don&#8217;t match those
defined in the <a href="#svm-flags-table">SVM Memory Flags</a> table.</p>
</li>
<li>
<p><em>size</em> is 0 or &gt; CL_DEVICE_MAX_MEM_ALLOC_SIZE value for any device in
<em>context</em>.</p>
</li>
<li>
<p><em>alignment</em> is not a power of two or the OpenCL implementation cannot
support the specified alignment for at least one device in <em>context</em>.</p>
</li>
<li>
<p>There was a failure to allocate resources.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Calling <strong>clSVMAlloc</strong> does not itself provide consistency for the shared
memory region.
When the host cant use the SVM atomic operations, it must rely on OpenCLs
guaranteed memory consistency at synchronization points.</p>
</div>
<div class="paragraph">
<p>For SVM to be used efficiently, the host and any devices sharing a buffer
containing virtual memory pointers should have the same endianness.
If the context passed to <strong>clSVMAlloc</strong> has devices with mixed endianness and
the OpenCL implementation is unable to implement SVM because of that mixed
endianness, <strong>clSVMAlloc</strong> will fail and return <code>NULL</code>.</p>
</div>
<div class="paragraph">
<p>Although SVM is generally not supported for image objects, <strong>clCreateImage</strong>
may create an image from a buffer (a 1D image from a buffer or a 2D image
from buffer) if the buffer specified in its image description parameter is a
SVM buffer.
Such images have a linear memory representation so their memory can be
shared using SVM.
However, fine grained sharing and atomics are not supported for image reads
and writes in a kernel.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c"><span class="directive">void</span> clSVMFree(cl_context context,
<span class="directive">void</span> *svm_pointer)</code></pre>
</div>
</div>
<div class="paragraph">
<p>frees a shared virtual memory buffer allocated using <strong>clSVMAlloc</strong>.</p>
</div>
<div class="paragraph">
<p><em>context</em> is a valid OpenCL context used to create the SVM buffer.</p>
</div>
<div class="paragraph">
<p><em>svm_pointer</em> must be the value returned by a call to <strong>clSVMAlloc</strong>.
If a <code>NULL</code> pointer is passed in <em>svm_pointer</em>, no action occurs.</p>
</div>
<div class="paragraph">
<p>Note that <strong>clSVMFree</strong> does not wait for previously enqueued commands that
may be using <em>svm_pointer</em> to finish before freeing <em>svm_pointer</em>.
It is the responsibility of the application to make sure that enqueued
commands that use <em>svm_pointer</em> have finished before freeing <em>svm_pointer</em>.
This can be done by enqueuing a blocking operation such as <strong>clFinish</strong>,
<strong>clWaitForEvents</strong>, <strong>clEnqueueReadBuffer</strong> or by registering a callback with
the events associated with enqueued commands and when the last enqueued
comamnd has finished freeing <em>svm_pointer</em>.</p>
</div>
<div class="paragraph">
<p>The behavior of using <em>svm_pointer</em> after it has been freed is undefined.
In addition, if a buffer object is created using <strong>clCreateBuffer</strong> with
<em>svm_pointer</em>, the buffer object must first be released before the
<em>svm_pointer</em> is freed.</p>
</div>
<div class="paragraph">
<p>The <strong>clEnqueueSVMFree</strong> API can also be used to enqueue a callback to free
the shared virtual memory buffer allocated using <strong>clSVMAlloc</strong> or a shared
system memory pointer.</p>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueSVMFree(cl_command_queue command_queue,
cl_uint num_svm_pointers,
<span class="directive">void</span> *svm_pointers[],
<span class="directive">void</span> (CL_CALLBACK *pfn_free_func)
(cl_command_queue queue,
cl_uint num_svm_pointers,
<span class="directive">void</span> *svm_pointers[],
<span class="directive">void</span> *user_data),
<span class="directive">void</span> *user_data,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to free the shared virtual memory allocated using
<strong>clSVMAlloc</strong> or a shared system memory pointer.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> is a valid host command-queue.</p>
</div>
<div class="paragraph">
<p><em>svm_pointers</em> and <em>num_svm_pointers</em> specify shared virtual memory pointers
to be freed.
Each pointer in <em>svm_pointers</em> that was allocated using <strong>clSVMAlloc</strong> must
have been allocated from the same context from which <em>command_queue</em> was
created.
The memory associated with <em>svm_pointers</em> can be reused or freed after the
function returns.</p>
</div>
<div class="paragraph">
<p><em>pfn_free_func</em> specifies the callback function to be called to free the SVM
pointers.
<em>pfn_free_func</em> takes four arguments: <em>queue</em> which is the command queue in
which <strong>clEnqueueSVMFree</strong> was enqueued, the count and list of SVM pointers to
free and <em>user_data</em> which is a pointer to user specified data.
If <em>pfn_free_func</em> is <code>NULL</code>, all pointers specified in <em>svm_pointers</em> must
be allocated using <strong>clSVMAlloc</strong> and the OpenCL implementation will free
these SVM pointers.
<em>pfn_free_func</em> must be a valid callback function if any SVM pointer to be
freed is a shared system memory pointer i.e. not allocated using
<strong>clSVMAlloc</strong>.
If <em>pfn_free_func</em> is a valid callback function, the OpenCL implementation
will call <em>pfn_free_func</em> to free all the SVM pointers specified in
<em>svm_pointers</em>.</p>
</div>
<div class="paragraph">
<p><em>user_data</em> will be passed as the <em>user_data</em> argument when <em>pfn_free_func</em>
is called.
<em>user_data</em> can be <code>NULL</code>.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before <strong>clEnqueueSVMFree</strong> can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then <strong>clEnqueueSVMFree</strong> does not wait on any
event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueSVMFree</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>num_svm_pointers</em> is 0 and <em>svm_pointers</em> is
non-<code>NULL</code>, <em>or</em> if <em>svm_pointers</em> is <code>NULL</code> and <em>num_svm_pointers</em> is
not 0.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The following function enqueues a command to do a memcpy operation.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueSVMMemcpy(cl_command_queue command_queue,
cl_bool blocking_copy,
<span class="directive">void</span> *dst_ptr,
<span class="directive">const</span> <span class="directive">void</span> *src_ptr,
size_t size,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to the host command-queue in which the read / write
command will be queued.
If either <em>dst_ptr</em> or <em>src_ptr</em> is allocated using <strong>clSVMAlloc</strong> then the
OpenCL context allocated against must match that of <em>command_queue</em>.</p>
</div>
<div class="paragraph">
<p><em>blocking_copy</em> indicates if the copy operation is <em>blocking</em> or
<em>non-blocking</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_copy</em> is CL_TRUE i.e. the copy command is blocking,
<strong>clEnqueueSVMMemcpy</strong> does not return until the buffer data has been copied
into memory pointed to by <em>dst_ptr</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_copy</em> is CL_FALSE i.e. the copy command is non-blocking,
<strong>clEnqueueSVMMemcpy</strong> queues a non-blocking copy command and returns.
The contents of the buffer that <em>dst_ptr</em> points to cannot be used until the
copy command has completed.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the read command.
When the copy command has completed, the contents of the buffer that
<em>dst_ptr</em> points to can be used by the application.</p>
</div>
<div class="paragraph">
<p><em>size</em> is the size in bytes of data being copied.</p>
</div>
<div class="paragraph">
<p><em>dst_ptr</em> is the pointer to a host or SVM memory allocation where data is
copied to.</p>
</div>
<div class="paragraph">
<p><em>src_ptr</em> is the pointer to a host or SVM memory allocation where data is
copied from.</p>
</div>
<div class="paragraph">
<p>If the memory allocation(s) containing <em>dst_ptr</em> and/or <em>src_ptr</em> are
allocated using <strong>clSVMAlloc</strong> and either is not allocated from the same
context from which <em>command_queue</em> was created the behavior is undefined.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular read / write
command and can be used to query or queue a wait for this particular command
to complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueSVMMemcpy</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the copy operation is
blocking and the execution status of any of the events in
<em>event_wait_list</em> is a negative integer value.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>dst_ptr</em> or <em>src_ptr</em> are <code>NULL</code>.</p>
</li>
<li>
<p>CL_MEM_COPY_OVERLAP if the values specified for <em>dst_ptr</em>, <em>src_ptr</em> and
<em>size</em> result in an overlapping copy.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueSVMMemFill(cl_command_queue command_queue,
<span class="directive">void</span> *svm_ptr,
<span class="directive">const</span> <span class="directive">void</span> *pattern,
size_t pattern_size,
size_t size,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to fill a region in memory with a pattern of a given
pattern size.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> refers to the host command-queue in which the fill command
will be queued.
The OpenCL context associated with <em>command_queue</em> and SVM pointer referred
to by <em>svm_ptr</em> must be the same.</p>
</div>
<div class="paragraph">
<p><em>svm_ptr</em> is a pointer to a memory region that will be filled with
<em>pattern</em>.
It must be aligned to <em>pattern_size</em> bytes.
If <em>svm_ptr</em> is allocated using <strong>clSVMAlloc</strong> then it must be allocated from
the same context from which <em>command_queue</em> was created.
Otherwise the behavior is undefined.</p>
</div>
<div class="paragraph">
<p><em>pattern</em> is a pointer to the data pattern of size <em>pattern_size</em> in bytes.
<em>pattern</em> will be used to fill a region in <em>buffer</em> starting at <em>svm_ptr</em>
and is <em>size</em> bytes in size.
The data pattern must be a scalar or vector integer or floating-point data
type supported by OpenCL as described in <a href="#scalar-data-types">Shared
Application Scalar Data Types</a> and <a href="#vector-data-types">Supported
Application Vector Data Types</a>.
For example, if region pointed to by <em>svm_ptr</em> is to be filled with a
pattern of float4 values, then <em>pattern</em> will be a pointer to a cl_float4
value and <em>pattern_size</em> will be <code>sizeof(cl_float4)</code>.
The maximum value of <em>pattern_size</em> is the size of the largest integer or
floating-point vector data type supported by the OpenCL device.
The memory associated with <em>pattern</em> can be reused or freed after the
function returns.</p>
</div>
<div class="paragraph">
<p><em>size</em> is the size in bytes of region being filled starting with <em>svm_ptr</em>
and must be a multiple of <em>pattern_size</em>.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueSVMMemFill</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em> and
events in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>svm_ptr</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>svm_ptr</em> is not aligned to <em>pattern_size</em> bytes.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>pattern</em> is <code>NULL</code> or if <em>pattern_size</em> is 0 or if
<em>pattern_size</em> is not one of {1, 2, 4, 8, 16, 32, 64, 128}.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>size</em> is not a multiple of <em>pattern_size</em>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueSVMMap(cl_command_queue command_queue,
cl_bool blocking_map,
cl_map_flags map_flags,
<span class="directive">void</span> *svm_ptr,
size_t size,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command that will allow the host to update a region of a SVM
buffer.
Note that since we are enqueuing a command with a SVM buffer, the region is
already mapped in the host address space.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> must be a valid host command-queue.</p>
</div>
<div class="paragraph">
<p><em>blocking_map</em> indicates if the map operation is <em>blocking</em> or
<em>non-blocking</em>.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_map</em> is CL_TRUE, <strong>clEnqueueSVMMap</strong> does not return until the
application can access the contents of the SVM region specified by <em>svm_ptr</em>
and <em>size</em> on the host.</p>
</div>
<div class="paragraph">
<p>If <em>blocking_map</em> is CL_FALSE i.e. map operation is non-blocking, the region
specified by <em>svm_ptr</em> and <em>size</em> cannot be used until the map command has
completed.
The <em>event</em> argument returns an event object which can be used to query the
execution status of the map command.
When the map command is completed, the application can access the contents
of the region specified by <em>svm_ptr</em> and <em>size</em>.</p>
</div>
<div class="paragraph">
<p><em>map_flags</em> is a bit-field and is described in the
<a href="#memory-map-flags-table">Memory Map Flags</a> table.</p>
</div>
<div class="paragraph">
<p><em>svm_ptr</em> and <em>size</em> are a pointer to a memory region and size in bytes that
will be updated by the host.
If <em>svm_ptr</em> is allocated using <strong>clSVMAlloc</strong> then it must be allocated from
the same context from which <em>command_queue</em> was created.
Otherwise the behavior is undefined.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueSVMMap</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if context associated with <em>command_queue</em> and events
in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>svm_ptr</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>size</em> is 0 or if values specified in <em>map_flags</em>
are not valid.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the map operation is
blocking and the execution status of any of the events in
<em>event_wait_list</em> is a negative integer value.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueSVMUnmap(cl_command_queue command_queue,
<span class="directive">void</span> *svm_ptr,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to indicate that the host has completed updating the
region given by <em>svm_ptr</em> and which was specified in a previous call to
<strong>clEnqueueSVMMap</strong>.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> must be a valid host command-queue.</p>
</div>
<div class="paragraph">
<p><em>svm_ptr</em> is a pointer that was specified in a previous call to
<strong>clEnqueueSVMMap</strong>.
If <em>svm_ptr</em> is allocated using <strong>clSVMAlloc</strong> then it must be allocated from
the same context from which <em>command_queue</em> was created.
Otherwise the behavior is undefined.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before <strong>clEnqueueSVMUnmap</strong> can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then <strong>clEnqueueUnmap</strong> does not wait on any
event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue a wait for this
command to complete.
<strong>clEnqueueBarrierWithWaitList</strong> can be used instead.
If the <em>event_wait_list</em> and the <em>event</em> arguments are not <code>NULL</code>, the
<em>event</em> argument should not refer to an element of the <em>event_wait_list</em>
array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueSVMUnmap</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if context associated with <em>command_queue</em> and events
in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>svm_ptr</em> is <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or if <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p><strong>clEnqueueSVMMap</strong> and <strong>clEnqueueSVMUnmap</strong> act as synchronization points for
the region of the SVM buffer specified in these calls.</p>
</div>
<div class="admonitionblock note">
<table>
<tr>
<td class="icon">
<i class="fa icon-note" title="Note"></i>
</td>
<td class="content">
<div class="paragraph">
<p>If a coarse-grained SVM buffer is currently mapped for writing, the
application must ensure that the SVM buffer is unmapped before any enqueued
kernels or commands that read from or write to this SVM buffer or any of its
associated cl_mem buffer objects begin execution; otherwise the behavior is
undefined.</p>
</div>
<div class="paragraph">
<p>If a coarse-grained SVM buffer is currently mapped for reading, the
application must ensure that the SVM buffer is unmapped before any enqueued
kernels or commands that write to this memory object or any of its
associated cl_mem buffer objects begin execution; otherwise the behavior is
undefined.</p>
</div>
<div class="paragraph">
<p>A SVM buffer is considered as mapped if there are one or more active
mappings for the SVM buffer irrespective of whether the mapped regions span
the entire SVM buffer.</p>
</div>
<div class="paragraph">
<p>The above note does not apply to fine-grained SVM buffers (fine-grained
buffers allocated using <strong>clSVMAlloc</strong> or fine-grained system allocations).</p>
</div>
</td>
</tr>
</table>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clEnqueueSVMMigrateMem(cl_command_queue command_queue,
cl_uint num_svm_pointers,
<span class="directive">const</span> <span class="directive">void</span> **svm_pointers,
<span class="directive">const</span> size_t *sizes,
cl_mem_migration_flags flags,
cl_uint num_events_in_wait_list,
<span class="directive">const</span> cl_event *event_wait_list,
cl_event *event)</code></pre>
</div>
</div>
<div class="paragraph">
<p>enqueues a command to indicate which device a set of ranges of SVM
allocations should be associated with.
Once the event returned by <strong>clEnqueueSVMMigrateMem</strong> has become CL_COMPLETE,
the ranges specified by svm pointers and sizes have been successfully
migrated to the device associated with command queue.</p>
</div>
<div class="paragraph">
<p>The user is responsible for managing the event dependencies associated with
this command in order to avoid overlapping access to SVM allocations.
Improperly specified event dependencies passed to <strong>clEnqueueSVMMigrateMem</strong>
could result in undefined results.</p>
</div>
<div class="paragraph">
<p><em>command_queue</em> is a valid host command queue.
The specified set of allocation ranges will be migrated to the OpenCL device
associated with <em>command_queue</em>.</p>
</div>
<div class="paragraph">
<p><em>num_svm_pointers</em> is the number of pointers in the specified <em>svm_pointers</em>
array, and the number of sizes in the <em>sizes</em> array, if <em>sizes</em> is not
<code>NULL</code>.</p>
</div>
<div class="paragraph">
<p><em>svm_pointers</em> is a pointer to an array of pointers.
Each pointer in this array must be within an allocation produced by a call
to <strong>clSVMAlloc</strong>.</p>
</div>
<div class="paragraph">
<p><em>sizes</em> is an array of sizes.
The pair <em>svm_pointers</em>[i] and <em>sizes</em>[i] together define the starting
address and number of bytes in a range to be migrated.
<em>sizes</em> may be <code>NULL</code> indicating that every allocation containing any
<em>svm_pointer</em>[i] is to be migrated.
Also, if <em>sizes</em>[i] is zero, then the entire allocation containing
<em>svm_pointer</em>[i] is migrated.</p>
</div>
<div class="paragraph">
<p><em>flags</em> is a bit-field that is used to specify migration options.
The <a href="#migration-flags-table">Memory Migration Flags</a> describes the possible
values for <em>flags</em>.</p>
</div>
<div class="paragraph">
<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that need to
complete before this particular command can be executed.
If <em>event_wait_list</em> is <code>NULL</code>, then this particular command does not wait
on any event to complete.
If <em>event_wait_list</em> is <code>NULL</code>, <em>num_events_in_wait_list</em> must be 0.
If <em>event_wait_list</em> is not <code>NULL</code>, the list of events pointed to by
<em>event_wait_list</em> must be valid and <em>num_events_in_wait_list</em> must be
greater than 0.
The events specified in <em>event_wait_list</em> act as synchronization points.
The context associated with events in <em>event_wait_list</em> and <em>command_queue</em>
must be the same.
The memory associated with <em>event_wait_list</em> can be reused or freed after
the function returns.</p>
</div>
<div class="paragraph">
<p><em>event</em> returns an event object that identifies this particular command and
can be used to query or queue a wait for this particular command to
complete.
<em>event</em> can be <code>NULL</code> in which case it will not be possible for the
application to query the status of this command or queue another command
that waits for this command to complete.
If the <em>event_wait_list</em> and <em>event</em> arguments are not <code>NULL</code>, the <em>event</em>
argument should not refer to an element of the <em>event_wait_list</em> array.</p>
</div>
<div class="paragraph">
<p><strong>clEnqueueSVMMigrateMem</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
command-queue.</p>
</li>
<li>
<p>CL_INVALID_CONTEXT if context associated with <em>command_queue</em> and events
in <em>event_wait_list</em> are not the same.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>num_svm_pointers</em> is zero or <em>svm_pointers</em> is
<code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_VALUE if <em>sizes</em>[i] is non-zero range [<em>svm_pointers</em>[i],
<em>svm_pointers</em>[i]+<em>sizes</em>[i]) is not contained within an existing
<strong>clSVMAlloc</strong> allocation.</p>
</li>
<li>
<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is <code>NULL</code> and
<em>num_events_in_wait_list</em> &gt; 0, or if <em>event_wait_list</em> is not <code>NULL</code> and
<em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
are not valid events.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
<div class="sect3">
<h4 id="_memory_consistency_for_svm_allocations">5.6.2. Memory consistency for SVM allocations</h4>
<div class="paragraph">
<p>To ensure memory consistency in SVM allocations, the program can rely on the
guaranteed memory consistency at synchronization points.
This consistency support already exists in OpenCL 1.x and can be used for
coarse-grained SVM allocations or for fine-grained buffer SVM allocations;
what SVM adds is the ability to share pointers between the host and all SVM
devices.</p>
</div>
<div class="paragraph">
<p>In addition, sub-buffers can also be used to ensure that each device gets a
consistent view of a SVM buffers memory when it is shared by multiple
devices.
For example, assume that two devices share a SVM pointer.
The host can create a cl_mem buffer object using <strong>clCreateBuffer</strong> with
CL_MEM_USE_HOST_PTR and <em>host_ptr</em> set to the SVM pointer and then create
two disjoint sub-buffers with starting virtual addresses <em>sb1_ptr</em> and
<em>sb2_ptr</em>.
These pointers (<em>sb1_ptr</em> and <em>sb2_ptr</em>) can be passed to kernels executing
on the two devices.
<strong>clEnqueueMapBuffer</strong> and <strong>clEnqueueUnmapMemObject</strong> and the existing
<a href="#accessing-mapped-regions">access rules for memory objects</a> ensure
consistency for buffer regions (<em>sb1_ptr</em> and <em>sb2_ptr</em>) read and written by
these kernels.</p>
</div>
<div class="paragraph">
<p>When the host and devices are able to use SVM atomic operations (i.e.
CL_DEVICE_SVM_ATOMICS is set in CL_DEVICE_SVM_CAPABILITIES), these atomic
operations can be used to provide memory consistency at a fine grain in a
shared memory region.
The effect of these operations is visible to the host and all devices with
which that memory is shared.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_sampler_objects">5.7. Sampler Objects</h3>
<div class="paragraph">
<p>A sampler object describes how to sample an image when the image is read in
the kernel.
The built-in functions to read from an image in a kernel take a sampler as
an argument.
The sampler arguments to the image read function can be sampler objects
created using OpenCL functions and passed as argument values to the kernel
or can be samplers declared inside a kernel.
In this section we discuss how sampler objects are created using OpenCL
functions.</p>
</div>
<div class="sect3">
<h4 id="_creating_sampler_objects">5.7.1. Creating Sampler Objects</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_sampler clCreateSamplerWithProperties(cl_context context,
<span class="directive">const</span> cl_sampler_properties *sampler_properties,
cl_int *errcode_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>creates a sampler object.</p>
</div>
<div class="paragraph">
<p><em>context</em> must be a valid OpenCL context.</p>
</div>
<div class="paragraph">
<p><em>sampler_properties</em> specifies a list of sampler property names and their
corresponding values.
Each sampler property name is immediately followed by the corresponding
desired value.
The list is terminated with 0.
The list of supported properties is described in the
<a href="#sampler-properties-table">Sampler Properties</a> table.
If a supported property and its value is not specified in
<em>sampler_properties</em>, its default value will be used.
<em>sampler_properties</em> can be <code>NULL</code> in which case the default values for
supported sampler properties will be used.</p>
</div>
<table id="sampler-properties-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 24. List of supported cl_sampler_properties values and description</caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_sampler_properties</strong> enum</th>
<th class="tableblock halign-left valign-top">Property Value</th>
<th class="tableblock halign-left valign-top">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_NORMALIZED_COORDS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">A boolean value that specifies whether the image coordinates
specified are normalized or not.
</p><p class="tableblock"> The default value (i.e. the value used if this property is not
specified in sampler_properties) is CL_TRUE.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_ADDRESSING_MODE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_addressing_mode</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies how out-of-range image coordinates are handled when
reading from an image.
</p><p class="tableblock"> Valid values are:
</p><p class="tableblock"> CL_ADDRESS_MIRRORED_REPEAT<br>
CL_ADDRESS_REPEAT<br>
CL_ADDRESS_CLAMP_TO_EDGE<br>
CL_ADDRESS_CLAMP<br>
CL_ADDRESS_NONE
</p><p class="tableblock"> The default is CL_ADDRESS_CLAMP.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_FILTER_MODE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_filter_mode</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Specifies the type of filter that must be applied when reading an
image.
Valid values are:
</p><p class="tableblock"> CL_FILTER_NEAREST<br>
CL_FILTER_LINEAR
</p><p class="tableblock"> The default value is CL_FILTER_NEAREST.</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p><em>errcode_ret</em> will return an appropriate error code.
If <em>errcode_ret</em> is <code>NULL</code>, no error code is returned.</p>
</div>
<div class="paragraph">
<p><strong>clCreateSamplerWithProperties</strong> returns a valid non-zero sampler object and
<em>errcode_ret</em> is set to CL_SUCCESS if the sampler object is created
successfully.
Otherwise, it returns a <code>NULL</code> value with one of the following error values
returned in <em>errcode_ret</em>:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_CONTEXT if <em>context</em> is not a valid context.</p>
</li>
<li>
<p>CL_INVALID_VALUE if the property name in <em>sampler_properties</em> is not a
supported property name, if the value specified for a supported property
name is not valid, or if the same property name is specified more than
once.</p>
</li>
<li>
<p>CL_INVALID_OPERATION if images are not supported by any device
associated with <em>context</em> (i.e. CL_DEVICE_IMAGE_SUPPORT specified in the
<a href="#device-queries-table">Device Queries</a> table is CL_FALSE).</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clRetainSampler(cl_sampler sampler)</code></pre>
</div>
</div>
<div class="paragraph">
<p>increments the <em>sampler</em> reference count.
<strong>clCreateSamplerWithProperties</strong> performs an implicit retain.
<strong>clRetainSampler</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_SAMPLER if <em>sampler</em> is not a valid sampler object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clReleaseSampler(cl_sampler sampler)</code></pre>
</div>
</div>
<div class="paragraph">
<p>decrements the <em>sampler</em> reference count.
The sampler object is deleted after the reference count becomes zero and
commands queued for execution on a command-queue(s) that use <em>sampler</em> have
finished.
<strong>clReleaseSampler</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_SAMPLER if <em>sampler</em> is not a valid sampler object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>Using this function to release a reference that was not obtained by creating
the object or by calling <strong>clRetainSampler</strong> causes undefined behavior.</p>
</div>
</div>
<div class="sect3">
<h4 id="_sampler_object_queries">5.7.2. Sampler Object Queries</h4>
<div class="paragraph">
<p>The function</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="CodeRay highlight"><code data-lang="c">cl_int clGetSamplerInfo(cl_sampler sampler,
cl_sampler_info param_name,
size_t param_value_size,
<span class="directive">void</span> *param_value,
size_t *param_value_size_ret)</code></pre>
</div>
</div>
<div class="paragraph">
<p>returns information about the sampler object.</p>
</div>
<div class="paragraph">
<p><em>sampler</em> specifies the sampler being queried.</p>
</div>
<div class="paragraph">
<p><em>param_name</em> specifies the information to query.
The list of supported <em>param_name</em> types and the information returned in
<em>param_value</em> by <strong>clGetSamplerInfo</strong> is described in the
<a href="#sampler-info-table">Sampler Object Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value</em> is a pointer to memory where the appropriate result being
queried is returned.
If <em>param_value</em> is <code>NULL</code>, it is ignored.</p>
</div>
<div class="paragraph">
<p><em>param_value_size</em> is used to specify the size in bytes of memory pointed to
by <em>param_value</em>.
This size must be ≥ size of return type as described in the
<a href="#sampler-info-table">Sampler Object Queries</a> table.</p>
</div>
<div class="paragraph">
<p><em>param_value_size_ret</em> returns the actual size in bytes of data being
queried by <em>param_name</em>.
If <em>param_value_size_ret</em> is <code>NULL</code>, it is ignored.</p>
</div>
<table id="sampler-info-table" class="tableblock frame-all grid-all spread">
<caption class="title">Table 25. <strong>clGetSamplerInfo</strong> parameter queries</caption>
<colgroup>
<col style="width: 34%;">
<col style="width: 33%;">
<col style="width: 33%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"><strong>cl_sampler_info</strong></th>
<th class="tableblock halign-left valign-top">Return Type</th>
<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_REFERENCE_COUNT</strong><sup>13</sup></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the <em>sampler</em> reference count.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_CONTEXT</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the context specified when the sampler is created.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_NORMALIZED_COORDS</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the normalized coords value associated with <em>sampler</em>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_ADDRESSING_MODE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_addressing_mode</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the addressing mode value associated with <em>sampler</em>.</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>CL_SAMPLER_FILTER_MODE</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">cl_filter_mode</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">Return the filter mode value associated with <em>sampler</em>.</p></td>
</tr>
</tbody>
</table>
<div class="dlist">
<dl>
<dt class="hdlist1">13</dt>
<dd>
<p>The reference count returned should be considered immediately stale.
It is unsuitable for general use in applications.
This feature is provided for identifying memory leaks.</p>
</dd>
</dl>
</div>
<div class="paragraph">
<p><strong>clGetSamplerInfo</strong> returns CL_SUCCESS if the function is executed
successfully.
Otherwise, it returns one of the following errors:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>CL_INVALID_VALUE if <em>param_name</em> is not valid, or if size in bytes
specified by <em>param_value_size</em> is &lt; size of return type as described in
the <a href="#sampler-info-table">Sampler Object Queries</a> table and
<em>param_value</em> is not <code>NULL</code>.</p>
</li>
<li>
<p>CL_INVALID_SAMPLER if <em>sampler</em> is a not a valid sampler object.</p>
</li>
<li>
<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources required
by the OpenCL implementation on the device.</p>
</li>
<li>
<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
required by the OpenCL implementation on the host.</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_program_objects">5.8. Program Objects</h3>
<div class="paragraph">
<p>An OpenCL program consists of a set of kernels that are identified as
functions declared with the <code>__kernel</code> qualifier in the program source.
OpenCL programs may also contain auxiliary functions and constant data that
can be used by <code>__kernel</code> functions.
The program executable can be generated <em>online</em> or <em>offline</em> by the OpenCL
compiler for the appropriate target device(s).</p>
</div>
<div class="paragraph">
<p>A program object encapsulates the following information:</p>
</