blob: 5f67db790924d44f0b6aacb1fa37f4d2854f30a2 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="generator" content="AsciiDoc 8.6.9">
<title>cl_intel_spirv_subgroups</title>
<style type="text/css">
/* Shared CSS for AsciiDoc xhtml11 and html5 backends */
/* Default font. */
body {
font-family: Georgia,serif;
}
/* Title font. */
h1, h2, h3, h4, h5, h6,
div.title, caption.title,
thead, p.table.header,
#toctitle,
#author, #revnumber, #revdate, #revremark,
#footer {
font-family: Arial,Helvetica,sans-serif;
}
body {
margin: 1em 5% 1em 5%;
}
a {
color: blue;
text-decoration: underline;
}
a:visited {
color: fuchsia;
}
em {
font-style: italic;
color: navy;
}
strong {
font-weight: bold;
color: #083194;
}
h1, h2, h3, h4, h5, h6 {
color: #527bbd;
margin-top: 1.2em;
margin-bottom: 0.5em;
line-height: 1.3;
}
h1, h2, h3 {
border-bottom: 2px solid silver;
}
h2 {
padding-top: 0.5em;
}
h3 {
float: left;
}
h3 + * {
clear: left;
}
h5 {
font-size: 1.0em;
}
div.sectionbody {
margin-left: 0;
}
hr {
border: 1px solid silver;
}
p {
margin-top: 0.5em;
margin-bottom: 0.5em;
}
ul, ol, li > p {
margin-top: 0;
}
ul > li { color: #aaa; }
ul > li > * { color: black; }
.monospaced, code, pre {
font-family: "Courier New", Courier, monospace;
font-size: inherit;
color: navy;
padding: 0;
margin: 0;
}
pre {
white-space: pre-wrap;
}
#author {
color: #527bbd;
font-weight: bold;
font-size: 1.1em;
}
#email {
}
#revnumber, #revdate, #revremark {
}
#footer {
font-size: small;
border-top: 2px solid silver;
padding-top: 0.5em;
margin-top: 4.0em;
}
#footer-text {
float: left;
padding-bottom: 0.5em;
}
#footer-badges {
float: right;
padding-bottom: 0.5em;
}
#preamble {
margin-top: 1.5em;
margin-bottom: 1.5em;
}
div.imageblock, div.exampleblock, div.verseblock,
div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
div.admonitionblock {
margin-top: 1.0em;
margin-bottom: 1.5em;
}
div.admonitionblock {
margin-top: 2.0em;
margin-bottom: 2.0em;
margin-right: 10%;
color: #606060;
}
div.content { /* Block element content. */
padding: 0;
}
/* Block element titles. */
div.title, caption.title {
color: #527bbd;
font-weight: bold;
text-align: left;
margin-top: 1.0em;
margin-bottom: 0.5em;
}
div.title + * {
margin-top: 0;
}
td div.title:first-child {
margin-top: 0.0em;
}
div.content div.title:first-child {
margin-top: 0.0em;
}
div.content + div.title {
margin-top: 0.0em;
}
div.sidebarblock > div.content {
background: #ffffee;
border: 1px solid #dddddd;
border-left: 4px solid #f0f0f0;
padding: 0.5em;
}
div.listingblock > div.content {
border: 1px solid #dddddd;
border-left: 5px solid #f0f0f0;
background: #f8f8f8;
padding: 0.5em;
}
div.quoteblock, div.verseblock {
padding-left: 1.0em;
margin-left: 1.0em;
margin-right: 10%;
border-left: 5px solid #f0f0f0;
color: #888;
}
div.quoteblock > div.attribution {
padding-top: 0.5em;
text-align: right;
}
div.verseblock > pre.content {
font-family: inherit;
font-size: inherit;
}
div.verseblock > div.attribution {
padding-top: 0.75em;
text-align: left;
}
/* DEPRECATED: Pre version 8.2.7 verse style literal block. */
div.verseblock + div.attribution {
text-align: left;
}
div.admonitionblock .icon {
vertical-align: top;
font-size: 1.1em;
font-weight: bold;
text-decoration: underline;
color: #527bbd;
padding-right: 0.5em;
}
div.admonitionblock td.content {
padding-left: 0.5em;
border-left: 3px solid #dddddd;
}
div.exampleblock > div.content {
border-left: 3px solid #dddddd;
padding-left: 0.5em;
}
div.imageblock div.content { padding-left: 0; }
span.image img { border-style: none; vertical-align: text-bottom; }
a.image:visited { color: white; }
dl {
margin-top: 0.8em;
margin-bottom: 0.8em;
}
dt {
margin-top: 0.5em;
margin-bottom: 0;
font-style: normal;
color: navy;
}
dd > *:first-child {
margin-top: 0.1em;
}
ul, ol {
list-style-position: outside;
}
ol.arabic {
list-style-type: decimal;
}
ol.loweralpha {
list-style-type: lower-alpha;
}
ol.upperalpha {
list-style-type: upper-alpha;
}
ol.lowerroman {
list-style-type: lower-roman;
}
ol.upperroman {
list-style-type: upper-roman;
}
div.compact ul, div.compact ol,
div.compact p, div.compact p,
div.compact div, div.compact div {
margin-top: 0.1em;
margin-bottom: 0.1em;
}
tfoot {
font-weight: bold;
}
td > div.verse {
white-space: pre;
}
div.hdlist {
margin-top: 0.8em;
margin-bottom: 0.8em;
}
div.hdlist tr {
padding-bottom: 15px;
}
dt.hdlist1.strong, td.hdlist1.strong {
font-weight: bold;
}
td.hdlist1 {
vertical-align: top;
font-style: normal;
padding-right: 0.8em;
color: navy;
}
td.hdlist2 {
vertical-align: top;
}
div.hdlist.compact tr {
margin: 0;
padding-bottom: 0;
}
.comment {
background: yellow;
}
.footnote, .footnoteref {
font-size: 0.8em;
}
span.footnote, span.footnoteref {
vertical-align: super;
}
#footnotes {
margin: 20px 0 20px 0;
padding: 7px 0 0 0;
}
#footnotes div.footnote {
margin: 0 0 5px 0;
}
#footnotes hr {
border: none;
border-top: 1px solid silver;
height: 1px;
text-align: left;
margin-left: 0;
width: 20%;
min-width: 100px;
}
div.colist td {
padding-right: 0.5em;
padding-bottom: 0.3em;
vertical-align: top;
}
div.colist td img {
margin-top: 0.3em;
}
@media print {
#footer-badges { display: none; }
}
#toc {
margin-bottom: 2.5em;
}
#toctitle {
color: #527bbd;
font-size: 1.1em;
font-weight: bold;
margin-top: 1.0em;
margin-bottom: 0.1em;
}
div.toclevel0, div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
margin-top: 0;
margin-bottom: 0;
}
div.toclevel2 {
margin-left: 2em;
font-size: 0.9em;
}
div.toclevel3 {
margin-left: 4em;
font-size: 0.9em;
}
div.toclevel4 {
margin-left: 6em;
font-size: 0.9em;
}
span.aqua { color: aqua; }
span.black { color: black; }
span.blue { color: blue; }
span.fuchsia { color: fuchsia; }
span.gray { color: gray; }
span.green { color: green; }
span.lime { color: lime; }
span.maroon { color: maroon; }
span.navy { color: navy; }
span.olive { color: olive; }
span.purple { color: purple; }
span.red { color: red; }
span.silver { color: silver; }
span.teal { color: teal; }
span.white { color: white; }
span.yellow { color: yellow; }
span.aqua-background { background: aqua; }
span.black-background { background: black; }
span.blue-background { background: blue; }
span.fuchsia-background { background: fuchsia; }
span.gray-background { background: gray; }
span.green-background { background: green; }
span.lime-background { background: lime; }
span.maroon-background { background: maroon; }
span.navy-background { background: navy; }
span.olive-background { background: olive; }
span.purple-background { background: purple; }
span.red-background { background: red; }
span.silver-background { background: silver; }
span.teal-background { background: teal; }
span.white-background { background: white; }
span.yellow-background { background: yellow; }
span.big { font-size: 2em; }
span.small { font-size: 0.6em; }
span.underline { text-decoration: underline; }
span.overline { text-decoration: overline; }
span.line-through { text-decoration: line-through; }
div.unbreakable { page-break-inside: avoid; }
/*
* xhtml11 specific
*
* */
div.tableblock {
margin-top: 1.0em;
margin-bottom: 1.5em;
}
div.tableblock > table {
border: 3px solid #527bbd;
}
thead, p.table.header {
font-weight: bold;
color: #527bbd;
}
p.table {
margin-top: 0;
}
/* Because the table frame attribute is overriden by CSS in most browsers. */
div.tableblock > table[frame="void"] {
border-style: none;
}
div.tableblock > table[frame="hsides"] {
border-left-style: none;
border-right-style: none;
}
div.tableblock > table[frame="vsides"] {
border-top-style: none;
border-bottom-style: none;
}
/*
* html5 specific
*
* */
table.tableblock {
margin-top: 1.0em;
margin-bottom: 1.5em;
}
thead, p.tableblock.header {
font-weight: bold;
color: #527bbd;
}
p.tableblock {
margin-top: 0;
}
table.tableblock {
border-width: 3px;
border-spacing: 0px;
border-style: solid;
border-color: #527bbd;
border-collapse: collapse;
}
th.tableblock, td.tableblock {
border-width: 1px;
padding: 4px;
border-style: solid;
border-color: #527bbd;
}
table.tableblock.frame-topbot {
border-left-style: hidden;
border-right-style: hidden;
}
table.tableblock.frame-sides {
border-top-style: hidden;
border-bottom-style: hidden;
}
table.tableblock.frame-none {
border-style: hidden;
}
th.tableblock.halign-left, td.tableblock.halign-left {
text-align: left;
}
th.tableblock.halign-center, td.tableblock.halign-center {
text-align: center;
}
th.tableblock.halign-right, td.tableblock.halign-right {
text-align: right;
}
th.tableblock.valign-top, td.tableblock.valign-top {
vertical-align: top;
}
th.tableblock.valign-middle, td.tableblock.valign-middle {
vertical-align: middle;
}
th.tableblock.valign-bottom, td.tableblock.valign-bottom {
vertical-align: bottom;
}
/*
* manpage specific
*
* */
body.manpage h1 {
padding-top: 0.5em;
padding-bottom: 0.5em;
border-top: 2px solid silver;
border-bottom: 2px solid silver;
}
body.manpage h2 {
border-style: none;
}
body.manpage div.sectionbody {
margin-left: 3em;
}
@media print {
body.manpage div#toc { display: none; }
}
@media screen {
body {
max-width: 50em; /* approximately 80 characters wide */
margin-left: 16em;
}
#toc {
position: fixed;
top: 0;
left: 0;
bottom: 0;
width: 13em;
padding: 0.5em;
padding-bottom: 1.5em;
margin: 0;
overflow: auto;
border-right: 3px solid #f8f8f8;
background-color: white;
}
#toc .toclevel1 {
margin-top: 0.5em;
}
#toc .toclevel2 {
margin-top: 0.25em;
display: list-item;
color: #aaaaaa;
}
#toctitle {
margin-top: 0.5em;
}
}
</style>
<script type="text/javascript">
/*<![CDATA[*/
var asciidoc = { // Namespace.
/////////////////////////////////////////////////////////////////////
// Table Of Contents generator
/////////////////////////////////////////////////////////////////////
/* Author: Mihai Bazon, September 2002
* http://students.infoiasi.ro/~mishoo
*
* Table Of Content generator
* Version: 0.4
*
* Feel free to use this script under the terms of the GNU General Public
* License, as long as you do not remove or alter this notice.
*/
/* modified by Troy D. Hanson, September 2006. License: GPL */
/* modified by Stuart Rackham, 2006, 2009. License: GPL */
// toclevels = 1..4.
toc: function (toclevels) {
function getText(el) {
var text = "";
for (var i = el.firstChild; i != null; i = i.nextSibling) {
if (i.nodeType == 3 /* Node.TEXT_NODE */) // IE doesn't speak constants.
text += i.data;
else if (i.firstChild != null)
text += getText(i);
}
return text;
}
function TocEntry(el, text, toclevel) {
this.element = el;
this.text = text;
this.toclevel = toclevel;
}
function tocEntries(el, toclevels) {
var result = new Array;
var re = new RegExp('[hH]([1-'+(toclevels+1)+'])');
// Function that scans the DOM tree for header elements (the DOM2
// nodeIterator API would be a better technique but not supported by all
// browsers).
var iterate = function (el) {
for (var i = el.firstChild; i != null; i = i.nextSibling) {
if (i.nodeType == 1 /* Node.ELEMENT_NODE */) {
var mo = re.exec(i.tagName);
if (mo && (i.getAttribute("class") || i.getAttribute("className")) != "float") {
result[result.length] = new TocEntry(i, getText(i), mo[1]-1);
}
iterate(i);
}
}
}
iterate(el);
return result;
}
var toc = document.getElementById("toc");
if (!toc) {
return;
}
// Delete existing TOC entries in case we're reloading the TOC.
var tocEntriesToRemove = [];
var i;
for (i = 0; i < toc.childNodes.length; i++) {
var entry = toc.childNodes[i];
if (entry.nodeName.toLowerCase() == 'div'
&& entry.getAttribute("class")
&& entry.getAttribute("class").match(/^toclevel/))
tocEntriesToRemove.push(entry);
}
for (i = 0; i < tocEntriesToRemove.length; i++) {
toc.removeChild(tocEntriesToRemove[i]);
}
// Rebuild TOC entries.
var entries = tocEntries(document.getElementById("content"), toclevels);
for (var i = 0; i < entries.length; ++i) {
var entry = entries[i];
if (entry.element.id == "")
entry.element.id = "_toc_" + i;
var a = document.createElement("a");
a.href = "#" + entry.element.id;
a.appendChild(document.createTextNode(entry.text));
var div = document.createElement("div");
div.appendChild(a);
div.className = "toclevel" + entry.toclevel;
toc.appendChild(div);
}
if (entries.length == 0)
toc.parentNode.removeChild(toc);
},
/////////////////////////////////////////////////////////////////////
// Footnotes generator
/////////////////////////////////////////////////////////////////////
/* Based on footnote generation code from:
* http://www.brandspankingnew.net/archive/2005/07/format_footnote.html
*/
footnotes: function () {
// Delete existing footnote entries in case we're reloading the footnodes.
var i;
var noteholder = document.getElementById("footnotes");
if (!noteholder) {
return;
}
var entriesToRemove = [];
for (i = 0; i < noteholder.childNodes.length; i++) {
var entry = noteholder.childNodes[i];
if (entry.nodeName.toLowerCase() == 'div' && entry.getAttribute("class") == "footnote")
entriesToRemove.push(entry);
}
for (i = 0; i < entriesToRemove.length; i++) {
noteholder.removeChild(entriesToRemove[i]);
}
// Rebuild footnote entries.
var cont = document.getElementById("content");
var spans = cont.getElementsByTagName("span");
var refs = {};
var n = 0;
for (i=0; i<spans.length; i++) {
if (spans[i].className == "footnote") {
n++;
var note = spans[i].getAttribute("data-note");
if (!note) {
// Use [\s\S] in place of . so multi-line matches work.
// Because JavaScript has no s (dotall) regex flag.
note = spans[i].innerHTML.match(/\s*\[([\s\S]*)]\s*/)[1];
spans[i].innerHTML =
"[<a id='_footnoteref_" + n + "' href='#_footnote_" + n +
"' title='View footnote' class='footnote'>" + n + "</a>]";
spans[i].setAttribute("data-note", note);
}
noteholder.innerHTML +=
"<div class='footnote' id='_footnote_" + n + "'>" +
"<a href='#_footnoteref_" + n + "' title='Return to text'>" +
n + "</a>. " + note + "</div>";
var id =spans[i].getAttribute("id");
if (id != null) refs["#"+id] = n;
}
}
if (n == 0)
noteholder.parentNode.removeChild(noteholder);
else {
// Process footnoterefs.
for (i=0; i<spans.length; i++) {
if (spans[i].className == "footnoteref") {
var href = spans[i].getElementsByTagName("a")[0].getAttribute("href");
href = href.match(/#.*/)[0]; // Because IE return full URL.
n = refs[href];
spans[i].innerHTML =
"[<a href='#_footnote_" + n +
"' title='View footnote' class='footnote'>" + n + "</a>]";
}
}
}
},
install: function(toclevels) {
var timerId;
function reinstall() {
asciidoc.footnotes();
if (toclevels) {
asciidoc.toc(toclevels);
}
}
function reinstallAndRemoveTimer() {
clearInterval(timerId);
reinstall();
}
timerId = setInterval(reinstall, 500);
if (document.addEventListener)
document.addEventListener("DOMContentLoaded", reinstallAndRemoveTimer, false);
else
window.onload = reinstallAndRemoveTimer;
}
}
asciidoc.install(1);
/*]]>*/
</script>
</head>
<body class="article">
<div id="header">
<h1>cl_intel_spirv_subgroups</h1>
<div id="toc">
<div id="toctitle">Table of Contents</div>
<noscript><p><b>JavaScript must be enabled in your browser to display the table of contents.</b></p></noscript>
</div>
</div>
<div id="content">
<div class="sect1">
<h2 id="_name_strings">Name Strings</h2>
<div class="sectionbody">
<div class="paragraph"><p><span class="monospaced">cl_intel_spirv_subgroups</span></p></div>
</div>
</div>
<div class="sect1">
<h2 id="_contact">Contact</h2>
<div class="sectionbody">
<div class="paragraph"><p>Ben Ashbaugh, Intel (ben <em>dot</em> ashbaugh <em>at</em> intel <em>dot</em> com)</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_contributors">Contributors</h2>
<div class="sectionbody">
<div class="paragraph"><p>Ben Ashbaugh, Intel<br>
Biju George, Intel<br>
Michael Kinsner, Intel<br>
Mariusz Merecki, Intel</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_notice">Notice</h2>
<div class="sectionbody">
<div class="paragraph"><p>Copyright (c) 2018 Intel Corporation. All rights reserved.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_status">Status</h2>
<div class="sectionbody">
<div class="paragraph"><p>Final Draft</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_version">Version</h2>
<div class="sectionbody">
<div class="paragraph"><p>Built On: 2018-10-29<br>
Revision: 1</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_dependencies">Dependencies</h2>
<div class="sectionbody">
<div class="paragraph"><p>This extension is written against the OpenCL SPIR-V Environment Specification Version 2.2, Revision v2.2-3.</p></div>
<div class="paragraph"><p>This extension requires OpenCL support for SPIR-V, either via OpenCL 2.1 or via the <span class="monospaced">cl_khr_il_program</span> extension, and support for the <span class="monospaced">cl_intel_subgroups</span> extension.</p></div>
<div class="paragraph"><p>This extension interacts with the <span class="monospaced">cl_intel_subgroups_short</span> extension.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_overview">Overview</h2>
<div class="sectionbody">
<div class="paragraph"><p>This extension defines how modules using the SPIR-V extension <span class="monospaced">SPV_INTEL_subgroups</span> may behave in an OpenCL environment.</p></div>
<div class="paragraph"><p>This extension is a companion to the <span class="monospaced">cl_intel_subgroups</span> and <span class="monospaced">cl_intel_subgroups_short</span> OpenCL extensions, and the functionality described in this extension and <span class="monospaced">SPV_INTEL_subgroups</span> is sufficient to implement the built-in functions defined in the <span class="monospaced">cl_intel_subgroups</span> and <span class="monospaced">cl_intel_subgroups_short</span> extensions.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_new_api_functions">New API Functions</h2>
<div class="sectionbody">
<div class="paragraph"><p>None.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_new_api_enums">New API Enums</h2>
<div class="sectionbody">
<div class="paragraph"><p>None.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_modifications_to_the_opencl_spir_v_environment_specification">Modifications to the OpenCL SPIR-V Environment Specification</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_add_a_new_section_7_1_x_span_class_monospaced_cl_intel_spirv_subgroups_span">Add a new Section 7.1.X - <span class="monospaced">cl_intel_spirv_subgroups</span></h3>
<div class="paragraph"><p>If the OpenCL environment supports the extension <span class="monospaced">cl_intel_spirv_subgroups</span>, then the environment must accept SPIR-V modules that declare use of the <span class="monospaced">SPV_INTEL_subgroups</span> extension via <strong>OpExtension</strong>.</p></div>
<div class="paragraph"><p>If the OpenCL environment supports the extension <span class="monospaced">cl_intel_spirv_subgroups</span> and use of the <span class="monospaced">SPV_INTEL_subgroups</span> extension is declared in the module via <strong>OpExtension</strong>, then the environment must accept modules that declare the following SPIR-V capabilities:</p></div>
<div class="ulist"><ul>
<li>
<p>
<strong>SubgroupShuffleINTEL</strong>
</p>
</li>
<li>
<p>
<strong>SubgroupBufferBlockIOINTEL</strong>
</p>
</li>
<li>
<p>
<strong>SubgroupImageBlockIOINTEL</strong>
</p>
</li>
</ul></div>
<div class="paragraph"><p>Additionally, the environment must accept modules that use the following <strong>BuiltIns</strong>:</p></div>
<div class="ulist"><ul>
<li>
<p>
<strong>SubgroupSize</strong>
</p>
</li>
<li>
<p>
<strong>SubgroupMaxSize</strong>
</p>
</li>
<li>
<p>
<strong>NumSubgroups</strong>
</p>
</li>
<li>
<p>
<strong>SubgroupId</strong>
</p>
</li>
<li>
<p>
<strong>SubgroupLocalInvocationId</strong>
</p>
</li>
</ul></div>
<div class="paragraph"><p>For an OpenCL 2.0 or newer environment, the following <strong>BuiltIns</strong> must additionally be accepted:</p></div>
<div class="ulist"><ul>
<li>
<p>
<strong>NumEnqueuedSubgroups</strong>
</p>
</li>
</ul></div>
<div class="paragraph"><p>Additionally, the environment must accept the following instruction semantics:</p></div>
<div class="paragraph"><p>For the control <em>Barrier Instruction</em>:</p></div>
<div class="ulist"><ul>
<li>
<p>
<strong>OpControlBarrier</strong>:
</p>
<div class="ulist"><ul>
<li>
<p>
The <em>Scope</em> for <em>Execution</em> may be <strong>Subgroup</strong>.
</p>
</li>
<li>
<p>
The <em>Scope</em> for <em>Memory</em> may be <strong>Subgroup</strong>.
</p>
</li>
</ul></div>
</li>
</ul></div>
<div class="paragraph"><p>For the <em>Group Instructions</em>:</p></div>
<div class="ulist"><ul>
<li>
<p>
<strong>OpGroupAll</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupAny</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupBroadcast</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupIAdd</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupFAdd</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupFMin</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupUMin</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupSMin</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupFMax</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupUMax</strong>
</p>
</li>
<li>
<p>
<strong>OpGroupSMax</strong>
</p>
<div class="ulist"><ul>
<li>
<p>
The <em>Scope</em> for <em>Execution</em> may be <strong>Subgroup</strong>.
</p>
</li>
</ul></div>
</li>
</ul></div>
</div>
<div class="sect2">
<h3 id="_add_a_new_section_7_1_x_1_shuffle_instructions">Add a new Section 7.1.X.1 - Shuffle Instructions</h3>
<div class="paragraph"><p>Because support for <span class="monospaced">cl_intel_subgroups</span> is required for <span class="monospaced">cl_intel_spirv_subgroups</span>, if the OpenCL environment supports the extension <span class="monospaced">cl_intel_spirv_subgroups</span> and use of the <span class="monospaced">SPV_INTEL_subgroups</span> extension is declared in the module via <strong>OpExtension</strong>, then the environment must accept the following types for <em>Data</em> for the <strong>SubgroupShuffleINTEL</strong> instructions:</p></div>
<div class="ulist"><ul>
<li>
<p>
Scalars and <strong>OpTypeVectors</strong> with 2, 4, 8, or 16 <em>Component Count</em> components of the following <em>Component Type</em> types:
</p>
<div class="ulist"><ul>
<li>
<p>
<strong>OpTypeFloat</strong> with a <em>Width</em> of 32 bits (equivalent to <span class="monospaced">float</span>)
</p>
</li>
<li>
<p>
<strong>OpTypeInt</strong> with a <em>Width</em> of 32 bits and <em>Signedness</em> of 0 (equivalent to <span class="monospaced">int</span> and <span class="monospaced">uint</span>)
</p>
</li>
</ul></div>
</li>
<li>
<p>
Scalars of <strong>OpTypeInt</strong> with a <em>Width</em> of 64 bits and <em>Signedness</em> of 0 (equivalent to <span class="monospaced">long</span> and <span class="monospaced">ulong</span>)
</p>
</li>
</ul></div>
<div class="paragraph"><p>Additionally, if the <strong>Float16</strong> capability is declared and supported:</p></div>
<div class="ulist"><ul>
<li>
<p>
Scalars of <strong>OpTypeFloat</strong> with a <em>Width</em> of 16 bits (equivalent to <span class="monospaced">half</span>)
</p>
</li>
</ul></div>
<div class="paragraph"><p>Additionally, if the <strong>Float64</strong> capability is declared and supported:</p></div>
<div class="ulist"><ul>
<li>
<p>
Scalars of <strong>OpTypeFloat</strong> with a <em>Width</em> of 64 bits (equivalent to <span class="monospaced">double</span>)
</p>
</li>
</ul></div>
<div class="paragraph"><p>Additionally, if the OpenCL environment supports the extension <span class="monospaced">cl_intel_subgroups_short</span>:</p></div>
<div class="ulist"><ul>
<li>
<p>
Scalars and <strong>OpTypeVectors</strong> with 2, 4, 8, or 16 <em>Component Count</em> components of the following <em>Component Type</em> types:
</p>
<div class="ulist"><ul>
<li>
<p>
<strong>OpTypeInt</strong> with a <em>Width</em> of 16 bits and <em>Signedness</em> of 0 (equivalent to <span class="monospaced">short</span> and <span class="monospaced">ushort</span>)
</p>
</li>
</ul></div>
</li>
</ul></div>
</div>
<div class="sect2">
<h3 id="_add_a_new_section_7_1_x_2_block_io_instructions">Add a new Section 7.1.X.2 - Block IO Instructions</h3>
<div class="paragraph"><p>Because support for <span class="monospaced">cl_intel_subgroups</span> is required for <span class="monospaced">cl_intel_spirv_subgroups</span>, if the OpenCL environment supports the extension <span class="monospaced">cl_intel_spirv_subgroups</span> and use of the <span class="monospaced">SPV_INTEL_subgroups</span> extension is declared in the module via <strong>OpExtension</strong>, then the environment must accept the following types for <em>Result</em> and <em>Data</em> for the <strong>SubgroupBufferBlockIOINTEL</strong> and <strong>SubgroupImageBlockIOINTEL</strong> instructions:</p></div>
<div class="ulist"><ul>
<li>
<p>
Scalars and <strong>OpTypeVectors</strong> with 2, 4, or 8 <em>Component Count</em> components of the following <em>Component Type</em> types:
</p>
<div class="ulist"><ul>
<li>
<p>
<strong>OpTypeInt</strong> with a <em>Width</em> of 32 bits and <em>Signedness</em> of 0 (equivalent to <span class="monospaced">int</span> and <span class="monospaced">uint</span>)
</p>
</li>
</ul></div>
</li>
</ul></div>
<div class="paragraph"><p>Additionally, if the OpenCL environment supports the extension <span class="monospaced">cl_intel_subgroups_short</span>:</p></div>
<div class="ulist"><ul>
<li>
<p>
Scalars and <strong>OpTypeVectors</strong> with 2, 4, or 8 <em>Component Count</em> components of the following <em>Component Type</em> types:
</p>
<div class="ulist"><ul>
<li>
<p>
<strong>OpTypeInt</strong> with a <em>Width</em> of 16 bits and <em>Signedness</em> of 0 (equivalent to <span class="monospaced">short</span> and <span class="monospaced">ushort</span>)
</p>
</li>
</ul></div>
</li>
</ul></div>
<div class="paragraph"><p>For <em>Ptr</em>, valid <em>Storage Classes</em> are:</p></div>
<div class="ulist"><ul>
<li>
<p>
<strong>CrossWorkGroup</strong> (equivalent to the <span class="monospaced">global</span> address space)
</p>
</li>
</ul></div>
<div class="paragraph"><p>For <em>Image</em>:</p></div>
<div class="ulist"><ul>
<li>
<p>
<em>Dim</em> must be <strong>2D</strong>
</p>
</li>
<li>
<p>
<em>Depth</em> must be 0 (not a depth image)
</p>
</li>
<li>
<p>
<em>Arrayed</em> must be 0 (non-arrayed content)
</p>
</li>
<li>
<p>
<em>MS</em> must be 0 (single-sampled content)
</p>
</li>
<li>
<p>
(equivalent to <span class="monospaced">image2d_t</span>)
</p>
</li>
</ul></div>
<div class="paragraph"><p>For <em>Coordinate</em>, the following types are supported:</p></div>
<div class="ulist"><ul>
<li>
<p>
<strong>OpTypeVectors</strong> with 2 <em>Component Count</em> components of <em>Component Type</em> <strong>OpTypeInt</strong> with a <em>Width</em> of 32 bits and <em>Signedness</em> of 0 (equivalent to <span class="monospaced">int2</span>)
</p>
</li>
</ul></div>
</div>
<div class="sect2">
<h3 id="_add_a_new_section_7_1_x_3_notes_and_restrictions">Add a new Section 7.1.X.3 - Notes and Restrictions</h3>
<div class="paragraph"><p>The <strong>SubgroupShuffleINTEL</strong> instructions may be placed within non-uniform control flow and hence do not have to be encountered by all invocations in the subgroup, however <em>Data</em> may only be shuffled among invocations encountering the <strong>SubgroupShuffleINTEL</strong> instruction. Shuffling <em>Data</em> from an invocation that does not encounter the <strong>SubgroupShuffleINTEL</strong> instruction will produce undefined results.</p></div>
<div class="paragraph"><p>There is no defined behavior for out-of-range shuffle indices for the <strong>SubgroupShuffleINTEL</strong> instructions.</p></div>
<div class="paragraph"><p>The <strong>SubgroupBufferBlockIOINTEL</strong> and <strong>SubgroupImageBlockIOINTEL</strong> instructions are only guaranteed to work correctly if placed strictly within uniform control flow within the subgroup. This ensures that if any invocation executes it, all invocations will execute it. If placed elsewhere, behavior is undefined.</p></div>
<div class="paragraph"><p>There is no defined out-of-range behavior for the <strong>SubgroupBufferBlockIOINTEL</strong> instructions.</p></div>
<div class="paragraph"><p>The <strong>SubgroupImageBlockIOINTEL</strong> instructions do support bounds checking, however they bounds-check to the image width in units of <span class="monospaced">uints</span>, not in units of image elements. This means:</p></div>
<div class="ulist"><ul>
<li>
<p>
If the image has an <em>Image Format</em> size equal to the size of a <span class="monospaced">uint</span> (four bytes, for example <strong>Rgba8</strong>), the image will be correctly bounds-checked. In this case, out-of-bounds reads will return the edge image element (the equivalent of <strong>ClampToEdge</strong>), and out-of-bounds writes will be ignored.
</p>
</li>
<li>
<p>
If the image has an <em>Image Format</em> size less than the size of a <span class="monospaced">uint</span> (such as <strong>R8</strong>), the entire image is addressable, however bounds checking will occur too late. For this reason, extra care should be taken to avoid out-of-bounds reads and writes, since out-of-bounds reads may return invalid data and out-of-bounds writes may corrupt other images or buffers unpredictably.
</p>
</li>
</ul></div>
<div class="paragraph"><p>The following restrictions apply to the <strong>SubgroupBufferBlockIOINTEL</strong> instructions:</p></div>
<div class="ulist"><ul>
<li>
<p>
The pointer <em>Ptr</em> must be 32-bit (4-byte) aligned for reads, and must be 128-bit (16-byte) aligned for writes.
</p>
</li>
<li>
<p>
If the pointer <em>Ptr</em> is computed from a kernel argument that is a <span class="monospaced">cl_mem</span> that was created with <span class="monospaced">CL_MEM_USE_HOST_PTR</span>, then the <em>host_ptr</em> must be 32-bit (4-byte) aligned for reads, and must be 128-bit (16-byte) aligned for writes.
</p>
</li>
<li>
<p>
If the pointer <em>Ptr</em> is computed from a kernel argument that is a <span class="monospaced">cl_mem</span> that is a sub-buffer, then the <em>origin</em> defining the sub-buffer offset into the <em>buffer</em> must be a multiple of 4 bytes for reads, and must be a multiple of 16 bytes for write, in addition to the <span class="monospaced">CL_DEVICE_MEM_BASE_ADDR_ALIGN</span> requirements. Additionally, if the <em>buffer</em> that the sub-buffer is created from was created with <span class="monospaced">CL_MEM_USE_HOST_PTR</span>, then the <em>host_ptr</em> for the <em>buffer</em> must be 32-bit (4-byte) aligned for reads, and must be 128-bit (16-byte) aligned for writes.
</p>
</li>
<li>
<p>
If the pointer <em>Ptr</em> is computed from an SVM pointer kernel argument, then the SVM pointer kernel argument must be 32-bit (4-byte) aligned for reads, and must be 128-bit (16-byte) aligned for writes.
</p>
</li>
</ul></div>
<div class="paragraph"><p>The following restrictions apply to the <strong>SubgroupImageBlockIOINTEL</strong> instructions:</p></div>
<div class="ulist"><ul>
<li>
<p>
The behavior of the <strong>SubgroupImageBlockIOINTEL</strong> instructions is undefined for images with an element size greater than four bytes (such as <strong>Rgba32f</strong>).
</p>
</li>
<li>
<p>
When reading or writing a 2D image created from a buffer with the <strong>SubgroupImageBlockIOINTEL</strong> instructions, the image row pitch is required to be a multiple of 64-bytes, in addition to the <span class="monospaced">CL_DEVICE_IMAGE_PITCH_ALIGNMENT</span> requirements.
</p>
</li>
<li>
<p>
When reading or writing a 2D image created from a buffer with the <strong>SubgroupImageBlockIOINTEL</strong> instructions, if the buffer is a <span class="monospaced">cl_mem</span> that was created with <span class="monospaced">CL_MEM_USE_HOST_PTR</span>, then the <em>host_ptr</em> must be 256-bit (32-byte) aligned.
</p>
</li>
<li>
<p>
When reading or writing a 2D image created from a buffer with the <strong>SubgroupImageBlockIOINTEL</strong> instructions, if the buffer is a <span class="monospaced">cl_mem</span> that is a sub-buffer, then the <em>origin</em> must be a multiple of 32-bytes. Additionally, if the <em>buffer</em> that the sub-buffer is created from was created with <span class="monospaced">CL_MEM_USE_HOST_PTR</span>, then the <em>host_ptr</em> for the <em>buffer</em> must be 256-bit (32-byte) aligned.
</p>
</li>
</ul></div>
<div class="paragraph"><p>The following restrictions apply to the <strong>OpSubgroupImageBlockWriteINTEL</strong> instruction:</p></div>
<div class="ulist"><ul>
<li>
<p>
Unlike the image block read instruction, which may read from any arbitrary byte offset, the x-component of the byte coordinate for the image block write instruction must be a multiple of four; in other words, the write must begin at a 32-bit boundary. There is no restriction on the y-component of the coordinate.
</p>
</li>
</ul></div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_issues">Issues</h2>
<div class="sectionbody">
<div class="paragraph"><p>None.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_revision_history">Revision History</h2>
<div class="sectionbody">
<table class="tableblock frame-all grid-rows"
style="
width:100%;
">
<col style="width:4%;">
<col style="width:14%;">
<col style="width:14%;">
<col style="width:66%;">
<thead>
<tr>
<th class="tableblock halign-left valign-top" >Rev</th>
<th class="tableblock halign-left valign-top" >Date</th>
<th class="tableblock halign-left valign-top" >Author</th>
<th class="tableblock halign-left valign-top" >Changes</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top" ><p class="tableblock">1</p></td>
<td class="tableblock halign-left valign-top" ><p class="tableblock">2018-10-29</p></td>
<td class="tableblock halign-left valign-top" ><p class="tableblock">Ben Ashbaugh</p></td>
<td class="tableblock halign-left valign-top" ><p class="tableblock"><strong>Initial revision</strong></p></td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div id="footnotes"><hr></div>
<div id="footer">
<div id="footer-text">
Last updated
2018-10-29 11:49:46 PDT
</div>
</div>
</body>
</html>