OpenCL Specifications v3.0.6 (#91)

diff --git a/index.php b/index.php
index 6d5fc37..759e353 100644
--- a/index.php
+++ b/index.php
@@ -31,19 +31,19 @@
 <li> OpenCL API Specification
      (<a href="specs/3.0-unified/html/OpenCL_API.html">HTML</a>)
      (<a href="specs/3.0-unified/pdf/OpenCL_API.pdf">PDF</a>)
-     (September 30, 2020). </li>
+     (December 18, 2020). </li>
 <li> OpenCL C Language Specification
      (<a href="specs/3.0-unified/html/OpenCL_C.html">HTML</a>)
      (<a href="specs/3.0-unified/pdf/OpenCL_C.pdf">PDF</a>)
-     (September 30, 2020). </li>
+     (December 18, 2020). </li>
 <li> OpenCL Extension Specification
      (<a href="specs/3.0-unified/html/OpenCL_Ext.html">HTML</a>)
      (<a href="specs/3.0-unified/pdf/OpenCL_Ext.pdf">PDF</a>)
-     (September 30, 2020. </li>
+     (December 18, 2020. </li>
 <li> OpenCL SPIR-V Environment Specification
      (<a href="specs/3.0-unified/html/OpenCL_Env.html">HTML</a>)
      (<a href="specs/3.0-unified/pdf/OpenCL_Env.pdf">PDF</a>)
-     (September 30, 2020). </li>
+     (December 18, 2020). </li>
 <li> <a href="https://www.khronos.org/files/opencl30-reference-guide.pdf">
      OpenCL 3.0 Reference Guide.</a> </li>
 </ul>
diff --git a/specs/3.0-unified/html/OpenCL_API.html b/specs/3.0-unified/html/OpenCL_API.html
index 656d7eb..69315ee 100644
--- a/specs/3.0-unified/html/OpenCL_API.html
+++ b/specs/3.0-unified/html/OpenCL_API.html
@@ -841,9 +841,9 @@
 <h1>The OpenCL<sup>&#8482;</sup> Specification</h1>
 <div class="details">
 <span id="author" class="author">Khronos<sup>&#174;</sup> OpenCL Working Group</span><br>
-<span id="revnumber">version v3.0.5,</span>
-<span id="revdate">Wed, 30 Sep 2020 00:00:00 +0000</span>
-<br><span id="revremark">from git branch: master commit: 4d8a36725aa8af9658ab5cb62fdbf52adb44bcca</span>
+<span id="revnumber">version v3.0.6,</span>
+<span id="revdate">Fri, 18 Dec 2020 12:00:00 +0000</span>
+<br><span id="revremark">from git branch: master commit: e9a4d468b1a0a38c1e10b8af484bb2bbb495e2b7</span>
 </div>
 <div id="toc" class="toc2">
 <div id="toctitle">Table of Contents</div>
@@ -1461,7 +1461,7 @@
 <dt class="hdlist1">Control flow </dt>
 <dd>
 <p>The flow of instructions executed by a work-item.
-Multiple logically related work items may or may not execute the same
+Multiple logically related work-items may or may not execute the same
 control flow.
 The control flow is said to be <em>converged</em> if all the work-items in the
 set execution the same stream of instructions.
@@ -1501,7 +1501,7 @@
 <dt class="hdlist1">Data race </dt>
 <dd>
 <p>The execution of a program contains a data race if it contains two
-actions in different work items or host threads where (1) one action
+actions in different work-items or host threads where (1) one action
 modifies a memory location and the other action reads or modifies the
 same memory location, and (2) at least one of these actions is not
 atomic, or the corresponding memory scopes are not inclusive, and (3)
@@ -1691,9 +1691,9 @@
 must be released by, any other entity), then its execution cannot be
 blocked by the execution of any other entity in the system (it will not
 be starved).
-Work items in a subgroup, for example, typically do not support
-independent forward progress, so one work item in a subgroup may be
-completely blocked (starved) if a different work item in the same
+Work-items in a subgroup, for example, typically do not support
+independent forward progress, so one work-item in a subgroup may be
+completely blocked (starved) if a different work-item in the same
 subgroup enters a spin loop.</p>
 </dd>
 <dt class="hdlist1">In-order Execution </dt>
@@ -2730,7 +2730,7 @@
 an NDRange (the local size) need not be the same for all work-groups.
 In this case, any single dimension for which the global size is not
 divisible by the local size will be partitioned into two regions.
-One region will have work-groups that have the same number of work items as
+One region will have work-groups that have the same number of work-items as
 was specified for that dimension by the programmer (the local size).
 The other region will have work-groups with less than the number of work
 items specified by the local size parameter in that dimension (the
@@ -2769,7 +2769,7 @@
 As a result, the combination of a work-group ID and the local-ID within a
 work-group uniquely defines a work-item.
 Each work-item is identifiable in two ways; in terms of a global index, and
-in terms of a work-group index plus a local index within a work group.</p>
+in terms of a work-group index plus a local index within a work-group.</p>
 </div>
 <div class="paragraph">
 <p>For example, consider the <a href="#index-space-image">2-dimensional index space</a>
@@ -2791,7 +2791,7 @@
 <div class="ulist none">
 <ul class="none">
 <li>
-<p>(g<sub>x</sub> , g<sub>y</sub>) = (w<sub>x</sub> S<sub>x</sub> + s<sub>x</sub> + F<sub>x</sub>, w<sub>y</sub> S<sub>y</sub> + s<sub>y</sub> + F<sub>y</sub>)</p>
+<p>(g<sub>x</sub>, g<sub>y</sub>) = (w<sub>x</sub> × S<sub>x</sub> + s<sub>x</sub> + F<sub>x</sub>, w<sub>y</sub> × S<sub>y</sub> + s<sub>y</sub> + F<sub>y</sub>)</p>
 </li>
 </ul>
 </div>
@@ -2812,7 +2812,7 @@
 <div class="ulist none">
 <ul class="none">
 <li>
-<p>(w<sub>x</sub>, w<sub>y</sub>) = ( (g<sub>x</sub> s<sub>x</sub> F<sub>x</sub>) / S<sub>x</sub>, (g<sub>y</sub> s<sub>y</sub> F<sub>y</sub>) / S<sub>y</sub> )</p>
+<p>(w<sub>x</sub>, w<sub>y</sub>) = ( (g<sub>x</sub> - s<sub>x</sub> - F<sub>x</sub>) / S<sub>x</sub>, (g<sub>y</sub> - s<sub>y</sub> - F<sub>y</sub>) / S<sub>y</sub> )</p>
 </li>
 </ul>
 </div>
@@ -2843,7 +2843,7 @@
 </table>
 </div>
 <div class="paragraph">
-<p>Work items are mapped into sub-groups through a combination of compile-time
+<p>Work-items are mapped into sub-groups through a combination of compile-time
 decisions and the parameters of the dispatch.
 The mapping to sub-groups is invariant for the duration of a kernels
 execution, across dispatches of a given kernel with the same work-group
@@ -3506,18 +3506,20 @@
 dynamic at runtime) and the type of access allowed (i.e. whether the kernel
 or the host can read and/or write to a memory region).</p>
 </div>
-<table id="memory-regions-table" class="tableblock frame-all grid-all" style="width: 80%;">
+<table id="memory-regions-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 1. Memory Regions</caption>
 <colgroup>
-<col style="width: 20%;">
-<col style="width: 20%;">
-<col style="width: 20%;">
-<col style="width: 20%;">
-<col style="width: 20%;">
+<col style="width: 12.5%;">
+<col style="width: 12.5%;">
+<col style="width: 18.75%;">
+<col style="width: 18.75%;">
+<col style="width: 18.75%;">
+<col style="width: 18.75%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top"></th>
+<th class="tableblock halign-left valign-top"></th>
 <th class="tableblock halign-left valign-top">Global</th>
 <th class="tableblock halign-left valign-top">Constant</th>
 <th class="tableblock halign-left valign-top">Local</th>
@@ -3526,46 +3528,45 @@
 </thead>
 <tbody>
 <tr>
-<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock">Host</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic Allocation</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">No Allocation</p></td>
+<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock"><strong>Host</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Allocation</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Dynamic</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">None</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access to buffers and images but not pipes</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">No access</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">No access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write to Buffers and Images, but not Pipes</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">None</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">None</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock">Kernel</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation for program scope variables</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation.</p>
-<p class="tableblock">            Dynamic allocation for child kernel</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Static Allocation</p></td>
+<td class="tableblock halign-left valign-top" rowspan="2"><p class="tableblock"><strong>Kernel</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Allocation</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static (program scope variables)</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static (program scope variables)</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static for parent kernel,
+      Dynamic for child kernels</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Static</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Read-only access</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access.</p>
-<p class="tableblock">        No access to child&#8217;s local memory.</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Access</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read-only</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write,
+      No access to child kernel memory</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Read/Write</p></td>
 </tr>
 </tbody>
 </table>
-<div class="sidebarblock">
-<div class="content">
-<div class="title">Caption</div>
 <div class="paragraph">
 <p>The <a href="#memory-regions-table">Memory Regions</a> table shows the different
 memory regions in OpenCL and how memory objects are allocated and accessed
 by the host and by an executing instance of a kernel.
-For the case of kernels, we distinguish between the behavior of local memory
-with respect to a kernel (self) and its child kernels.</p>
-</div>
-</div>
+For kernels, we distinguish between the behavior of local memory
+for a parent kernel and its child kernels.</p>
 </div>
 <div class="paragraph">
 <p>Once allocated, a memory object is made available to kernel-instances
@@ -4083,12 +4084,12 @@
 </ul>
 </div>
 <div class="paragraph">
-<p>If specified, the cl_mem_fence_flags argument must be CLK_IMAGE_MEM_FENCE,
-CLK_GLOBAL_MEM_FENCE, CLK_LOCAL_MEM_FENCE, or CLK_GLOBAL_MEM_FENCE |
-CLK_LOCAL_MEM_FENCE.</p>
+<p>If specified, the cl_mem_fence_flags argument must be <code>CLK_IMAGE_MEM_FENCE</code>,
+<code>CLK_GLOBAL_MEM_FENCE</code>, <code>CLK_LOCAL_MEM_FENCE</code>, or <code>CLK_GLOBAL_MEM_FENCE |
+CLK_LOCAL_MEM_FENCE</code>.</p>
 </div>
 <div class="paragraph">
-<p>The atomic_work_item_fence(CLK_IMAGE_MEM_FENCE) built-in function must be
+<p>The <code>atomic_work_item_fence(CLK_IMAGE_MEM_FENCE, &#8230;&#8203;)</code> built-in function must be
 used to make sure that sampler-less writes are visible to later reads by the
 same work-item.
 Without use of the atomic_work_item_fence function, write-read coherence on
@@ -4688,8 +4689,8 @@
 <a href="#iso-c11">[C11 standard, Section 7.17.4, paragraph 4, modified.]</a></p>
 </div>
 <div class="paragraph">
-<p>Let <strong>X</strong> and <strong>Y</strong> be two work item fences that each have both the
-CLK_GLOBAL_MEM_FENCE and CLK_LOCAL_MEM_FENCE flags set.
+<p>Let <strong>X</strong> and <strong>Y</strong> be two work-item fences that each have both the
+<code>CLK_GLOBAL_MEM_FENCE</code> and <code>CLK_LOCAL_MEM_FENCE</code> flags set.
 <strong>X</strong> global-synchronizes-with <strong>Y</strong> and <strong>X</strong> local synchronizes with <strong>Y</strong> if the
 conditions required for <strong>X</strong> to global-synchronize with <strong>Y</strong> are met, the
 conditions required for <strong>X</strong> to local-synchronize-with <strong>Y</strong> are met, or both
@@ -4742,12 +4743,12 @@
 <p>For each work-item the entry fence is sequenced before the exit fence.</p>
 </li>
 <li>
-<p>If the flags have CLK_GLOBAL_MEM_FENCE set then for each work-item the
+<p>If the flags have <code>CLK_GLOBAL_MEM_FENCE</code> set then for each work-item the
 entry fence global-synchronizes-with the exit fence of all other
 work-items in the same work-group.</p>
 </li>
 <li>
-<p>If the flags have CLK_LOCAL_MEM_FENCE set then for each work-item the
+<p>If the flags have <code>CLK_LOCAL_MEM_FENCE</code> set then for each work-item the
 entry fence local-synchronizes-with the exit fence of all other
 work-items in the same work-group.</p>
 </li>
@@ -4759,10 +4760,10 @@
 The use of these work-group functions implies sequenced-before relationships
 between statements within the execution of a single work-item in order to
 satisfy data dependencies.
-For example, a work item that provides a value to a work-group function must
+For example, a work-item that provides a value to a work-group function must
 behave as if it generates that value before beginning execution of that
 work-group function.
-Furthermore, the programmer must ensure that all work items in a work group
+Furthermore, the programmer must ensure that all work-items in a work-group
 must execute the same work-group function call site, or dynamic work-group
 function instance.</p>
 </div>
@@ -4825,12 +4826,12 @@
 <p>For each work-item the entry fence is sequenced before the exit fence.</p>
 </li>
 <li>
-<p>If the flags have CLK_GLOBAL_MEM_FENCE set then for each work-item the
+<p>If the flags have <code>CLK_GLOBAL_MEM_FENCE</code> set then for each work-item the
 entry fence global-synchronizes-with the exit fence of all other
 work-items in the same sub-group.</p>
 </li>
 <li>
-<p>If the flags have CLK_LOCAL_MEM_FENCE set then for each work-item the
+<p>If the flags have <code>CLK_LOCAL_MEM_FENCE</code> set then for each work-item the
 entry fence local-synchronizes-with the exit fence of all other
 work-items in the same sub-group.</p>
 </li>
@@ -4842,10 +4843,10 @@
 The use of these sub-group functions implies sequenced-before relationships
 between statements within the execution of a single work-item in order to
 satisfy data dependencies.
-For example, a work item that provides a value to a sub-group function must
+For example, a work-item that provides a value to a sub-group function must
 behave as if it generates that value before beginning execution of that
 sub-group function.
-Furthermore, the programmer must ensure that all work items in a sub-group
+Furthermore, the programmer must ensure that all work-items in a sub-group
 must execute the same sub-group function call site, or dynamic sub-group
 function instance.</p>
 </div>
@@ -4956,15 +4957,15 @@
 </li>
 <li>
 <p>The start of a kernel-instance <strong>K</strong> global-synchronizes-with all
-operations in the work items of <strong>K</strong>.
+operations in the work-items of <strong>K</strong>.
 Note that this includes the execution of any atomic operations by the
-work items in a program using fine-grain SVM.</p>
+work-items in a program using fine-grain SVM.</p>
 </li>
 <li>
-<p>All operations of all work items of a kernel-instance <strong>K</strong>
+<p>All operations of all work-items of a kernel-instance <strong>K</strong>
 global-synchronizes-with the event signaling the completion of <strong>K</strong>.
 Note that this also includes the execution of any atomic operations by
-the work items in a program using fine-grain SVM.</p>
+the work-items in a program using fine-grain SVM.</p>
 </li>
 <li>
 <p>If a callback procedure <strong>P</strong> is registered on an event <strong>E</strong>, then <strong>E</strong>
@@ -4980,12 +4981,12 @@
 </li>
 <li>
 <p>If a device enqueues a command <strong>C</strong> with the
-CLK_ENQUEUE_FLAGS_WAIT_KERNEL flag, then the end state of the parent
+<code>CLK_ENQUEUE_FLAGS_WAIT_KERNEL</code> flag, then the end state of the parent
 kernel instance global-synchronizes with <strong>C</strong>.</p>
 </li>
 <li>
 <p>If a work-group enqueues a command <strong>C</strong> with the
-CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP flag, then the end state of the
+<code>CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP</code> flag, then the end state of the
 work-group global-synchronizes with <strong>C</strong>.</p>
 </li>
 </ol>
@@ -5452,9 +5453,9 @@
 <table id="platform-queries-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 3. List of supported param_names by <a href="#clGetPlatformInfo">clGetPlatformInfo</a></caption>
 <colgroup>
+<col style="width: 33%;">
+<col style="width: 17%;">
 <col style="width: 50%;">
-<col style="width: 10%;">
-<col style="width: 40%;">
 </colgroup>
 <thead>
 <tr>
@@ -5800,8 +5801,8 @@
 <table id="device-queries-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 5. List of supported param_names by <a href="#clGetDeviceInfo">clGetDeviceInfo</a></caption>
 <colgroup>
-<col style="width: 30%;">
-<col style="width: 20%;">
+<col style="width: 33%;">
+<col style="width: 17%;">
 <col style="width: 50%;">
 </colgroup>
 <thead>
@@ -6558,7 +6559,7 @@
 <p class="tableblock">        <strong>cl_khr_fp64</strong></p>
 <p class="tableblock">        Additionally, the following Khronos extension names must be returned
         by all devices that support OpenCL 2.0, OpenCL 2.1, or OpenCL 2.2.
-        For devices that support OpenCL 3.0, these extension names must only
+        For devices that support OpenCL 3.0, these extension names must
         be returned when and only when the optional feature is supported:</p>
 <p class="tableblock">        <strong>cl_khr_3d_image_writes</strong><br>
         <strong>cl_khr_depth_images</strong><br>
@@ -6760,7 +6761,7 @@
         <a id="CL_DEVICE_ATOMIC_ORDER_ACQ_REL"></a><a href="#CL_DEVICE_ATOMIC_ORDER_ACQ_REL"><code>CL_DEVICE_<wbr>ATOMIC_<wbr>ORDER_<wbr>ACQ_<wbr>REL</code></a> - Support for the <strong>acquire</strong>, <strong>release</strong>, and <strong>acquire-release</strong> memory orders.<br>
         <a id="CL_DEVICE_ATOMIC_ORDER_SEQ_CST"></a><a href="#CL_DEVICE_ATOMIC_ORDER_SEQ_CST"><code>CL_DEVICE_<wbr>ATOMIC_<wbr>ORDER_<wbr>SEQ_<wbr>CST</code></a> - Support for the <strong>sequentially consistent</strong> memory order.</p>
 <p class="tableblock">        Because atomic memory orders are hierarchical, a device that supports a strong memory order must also support all weaker memory orders.</p>
-<p class="tableblock">        <a id="CL_DEVICE_ATOMIC_SCOPE_WORK_ITEM"></a><a href="#CL_DEVICE_ATOMIC_SCOPE_WORK_ITEM"><code>CL_DEVICE_<wbr>ATOMIC_<wbr>SCOPE_<wbr>WORK_<wbr>ITEM</code></a> <sup class="footnote">[<a id="_footnoteref_10" class="footnote" href="#_footnotedef_10" title="View footnote.">10</a>]</sup> - Support for memory ordering constraints that apply to a single work item.<br>
+<p class="tableblock">        <a id="CL_DEVICE_ATOMIC_SCOPE_WORK_ITEM"></a><a href="#CL_DEVICE_ATOMIC_SCOPE_WORK_ITEM"><code>CL_DEVICE_<wbr>ATOMIC_<wbr>SCOPE_<wbr>WORK_<wbr>ITEM</code></a> <sup class="footnote">[<a id="_footnoteref_10" class="footnote" href="#_footnotedef_10" title="View footnote.">10</a>]</sup> - Support for memory ordering constraints that apply to a single work-item.<br>
         <a id="CL_DEVICE_ATOMIC_SCOPE_WORK_GROUP"></a><a href="#CL_DEVICE_ATOMIC_SCOPE_WORK_GROUP"><code>CL_DEVICE_<wbr>ATOMIC_<wbr>SCOPE_<wbr>WORK_<wbr>GROUP</code></a> - Support for memory ordering constraints that apply to all work-items in a work-group.<br>
         <a id="CL_DEVICE_ATOMIC_SCOPE_DEVICE"></a><a href="#CL_DEVICE_ATOMIC_SCOPE_DEVICE"><code>CL_DEVICE_<wbr>ATOMIC_<wbr>SCOPE_<wbr>DEVICE</code></a> - Support for memory ordering constraints that apply to all work-items executing on the device.<br>
         <a id="CL_DEVICE_ATOMIC_SCOPE_ALL_DEVICES"></a><a href="#CL_DEVICE_ATOMIC_SCOPE_ALL_DEVICES"><code>CL_DEVICE_<wbr>ATOMIC_<wbr>SCOPE_<wbr>ALL_<wbr>DEVICES</code></a> - Support for memory ordering constraints that apply to all work-items executing across all devices that can share SVM memory with each other and the host process.</p>
@@ -6784,13 +6785,13 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT"></a><a href="#CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT"><code>CL_DEVICE_<wbr>NON_<wbr>UNIFORM_<wbr>WORK_<wbr>GROUP_<wbr>SUPPORT</code></a></p>
 <p class="tableblock"><a href="#unified-spec">Missing before</a> version 3.0.</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_bool</code></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Is <a href="#CL_TRUE"><code>CL_TRUE</code></a> if the device supports non-uniform work groups, and <a href="#CL_FALSE"><code>CL_FALSE</code></a> otherwise.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is <a href="#CL_TRUE"><code>CL_TRUE</code></a> if the device supports non-uniform work-groups, and <a href="#CL_FALSE"><code>CL_FALSE</code></a> otherwise.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT"></a><a href="#CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT"><code>CL_DEVICE_<wbr>WORK_<wbr>GROUP_<wbr>COLLECTIVE_<wbr>FUNCTIONS_<wbr>SUPPORT</code></a></p>
 <p class="tableblock"><a href="#unified-spec">Missing before</a> version 3.0.</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_bool</code></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Is <a href="#CL_TRUE"><code>CL_TRUE</code></a> if the device supports work group collective functions e.g. <code>work_group_broadcast</code>, <code>work_group_reduce</code>, and <code>work_group_scan</code>, and <a href="#CL_FALSE"><code>CL_FALSE</code></a> otherwise.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Is <a href="#CL_TRUE"><code>CL_TRUE</code></a> if the device supports work-group collective functions e.g. <code>work_group_broadcast</code>, <code>work_group_reduce</code>, and <code>work_group_scan</code>, and <a href="#CL_FALSE"><code>CL_FALSE</code></a> otherwise.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_DEVICE_GENERIC_ADDRESS_SPACE_SUPPORT"></a><a href="#CL_DEVICE_GENERIC_ADDRESS_SPACE_SUPPORT"><code>CL_DEVICE_<wbr>GENERIC_<wbr>ADDRESS_<wbr>SPACE_<wbr>SUPPORT</code></a></p>
@@ -7124,8 +7125,8 @@
 <table id="subdevice-partition-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 6. List of supported partition schemes by <a href="#clCreateSubDevices">clCreateSubDevices</a></caption>
 <colgroup>
-<col style="width: 30%;">
-<col style="width: 20%;">
+<col style="width: 33%;">
+<col style="width: 17%;">
 <col style="width: 50%;">
 </colgroup>
 <thead>
@@ -7247,7 +7248,8 @@
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_EQUALLY, <span class="integer">8</span>, <span class="integer">0</span> }</code></pre>
+<pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_EQUALLY, <span class="integer">8</span>,
+  <span class="integer">0</span> } <span class="comment">// 0 terminates the property list</span></code></pre>
 </div>
 </div>
 <div class="paragraph">
@@ -7258,7 +7260,8 @@
 <div class="listingblock">
 <div class="content">
 <pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_BY_COUNTS,
-  <span class="integer">3</span>, <span class="integer">1</span>, CL_DEVICE_PARTITION_BY_COUNTS_LIST_END, <span class="integer">0</span> }</code></pre>
+    <span class="integer">3</span>, <span class="integer">1</span>, CL_DEVICE_PARTITION_BY_COUNTS_LIST_END,
+  <span class="integer">0</span> } <span class="comment">// 0 terminates the property list</span></code></pre>
 </div>
 </div>
 <div class="paragraph">
@@ -7268,8 +7271,8 @@
 <div class="listingblock">
 <div class="content">
 <pre class="CodeRay highlight"><code data-lang="c">{ CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN,
-  CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE,
-  <span class="integer">0</span> }</code></pre>
+    CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE,
+  <span class="integer">0</span> } <span class="comment">// 0 terminates the property list</span></code></pre>
 </div>
 </div>
 </div>
@@ -7486,9 +7489,9 @@
 <table id="context-properties-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 7. List of supported context creation properties by <a href="#clCreateContext">clCreateContext</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
@@ -7847,15 +7850,15 @@
 <table id="context-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 8. List of supported param_names by <a href="#clGetContextInfo">clGetContextInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Context Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Information returned in param_value</th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -8105,9 +8108,9 @@
 <table id="queue-properties-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 9. List of supported queue creation properties by <a href="#clCreateCommandQueueWithProperties">clCreateCommandQueueWithProperties</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
@@ -8253,7 +8256,7 @@
 </colgroup>
 <thead>
 <tr>
-<th class="tableblock halign-left valign-top"><strong>Command-Queue Properties</strong></th>
+<th class="tableblock halign-left valign-top">Command-Queue Properties</th>
 <th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
@@ -8552,15 +8555,15 @@
 <table id="command-queue-param-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 11. List of supported param_names by <a href="#clGetCommandQueueInfo">clGetCommandQueueInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Queue Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Information returned in param_value</th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -9407,8 +9410,8 @@
     cl_command_queue command_queue,
     cl_mem buffer,
     cl_bool blocking_read,
-    <span class="directive">const</span> size_t* buffer_offset,
-    <span class="directive">const</span> size_t* host_offset,
+    <span class="directive">const</span> size_t* buffer_origin,
+    <span class="directive">const</span> size_t* host_origin,
     <span class="directive">const</span> size_t* region,
     size_t buffer_row_pitch,
     size_t buffer_slice_pitch,
@@ -9438,8 +9441,8 @@
     cl_command_queue command_queue,
     cl_mem buffer,
     cl_bool blocking_write,
-    <span class="directive">const</span> size_t* buffer_offset,
-    <span class="directive">const</span> size_t* host_offset,
+    <span class="directive">const</span> size_t* buffer_origin,
+    <span class="directive">const</span> size_t* host_origin,
     <span class="directive">const</span> size_t* region,
     size_t buffer_row_pitch,
     size_t buffer_slice_pitch,
@@ -9605,14 +9608,14 @@
 <p><a href="#CL_INVALID_MEM_OBJECT"><code>CL_INVALID_<wbr>MEM_<wbr>OBJECT</code></a> if <em>buffer</em> is not a valid buffer object.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>buffer_origin</em>, <em>host_origin</em>, or <em>region</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if the region being read or written specified by
 (<em>buffer_origin</em>, <em>region</em>, <em>buffer_row_pitch</em>, <em>buffer_slice_pitch</em>) is
 out of bounds.</p>
 </li>
 <li>
-<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>ptr</em> is a <code>NULL</code> value.</p>
-</li>
-<li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if any <em>region</em> array element is 0.</p>
 </li>
 <li>
@@ -9634,6 +9637,9 @@
 <em>host_row_pitch</em>.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>ptr</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_EVENT_WAIT_LIST"><code>CL_INVALID_<wbr>EVENT_<wbr>WAIT_<wbr>LIST</code></a> if <em>event_wait_list</em> is <code>NULL</code> and
 <em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
 <em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
@@ -9888,8 +9894,8 @@
 if <em>src_buffer</em> and <em>dst_buffer</em> are different sub-buffers of the same
 associated buffer object and they overlap.
 The regions overlap if <em>src_offset</em> ≤ <em>dst_offset</em> ≤
-<em>src_offset</em> + <em>size</em> 1 or if <em>dst_offset</em> ≤ <em>src_offset</em> ≤
-<em>dst_offset</em> + <em>size</em> 1.</p>
+<em>src_offset</em> + <em>size</em> - 1 or if <em>dst_offset</em> ≤ <em>src_offset</em> ≤
+<em>dst_offset</em> + <em>size</em> - 1.</p>
 </li>
 <li>
 <p><a href="#CL_MEM_OBJECT_ALLOCATION_FAILURE"><code>CL_MEM_<wbr>OBJECT_<wbr>ALLOCATION_<wbr>FAILURE</code></a> if there is a failure to allocate
@@ -10071,9 +10077,12 @@
 buffer objects.</p>
 </li>
 <li>
-<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if (<em>src_origin, region, src_row_pitch,
-src_slice_pitch</em>) or (<em>dst_origin, region, dst_row_pitch,
-dst_slice_pitch</em>) require accessing elements outside the <em>src_buffer</em>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>src_origin</em>, <em>dst_origin</em>, or <em>region</em> is <code>NULL</code>.</p>
+</li>
+<li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if (<em>src_origin</em>, <em>region</em>, <em>src_row_pitch</em>,
+<em>src_slice_pitch</em>) or (<em>dst_origin</em>, <em>region</em>, <em>dst_row_pitch</em>,
+<em>dst_slice_pitch</em>) require accessing elements outside the <em>src_buffer</em>
 and <em>dst_buffer</em> buffer objects respectively.</p>
 </li>
 <li>
@@ -11491,9 +11500,6 @@
     size_t                image_slice_pitch;
     cl_uint               num_mip_levels;
     cl_uint               num_samples;
-    <span class="preprocessor">#ifdef</span> __GNUC__
-    __extension__   <span class="comment">/* Prevents warnings about anonymous union in -pedantic builds */</span>
-<span class="preprocessor">#endif</span>
     <span class="keyword">union</span> {
         cl_mem buffer;
         cl_mem mem_object;
@@ -12242,17 +12248,23 @@
 <em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
 </li>
 <li>
-<p><a href="#CL_INVALID_MEM_OBJECT"><code>CL_INVALID_<wbr>MEM_<wbr>OBJECT</code></a> if i_mage_ is not a valid image object.</p>
+<p><a href="#CL_INVALID_MEM_OBJECT"><code>CL_INVALID_<wbr>MEM_<wbr>OBJECT</code></a> if <em>image</em> is not a valid image object.</p>
+</li>
+<li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>origin</em> or <em>region</em> is <code>NULL</code>.</p>
 </li>
 <li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if the region being read or written specified by
-<em>origin</em> and <em>region</em> is out of bounds or if <em>ptr</em> is a <code>NULL</code> value.</p>
+<em>origin</em> and <em>region</em> is out of bounds.</p>
 </li>
 <li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if values in <em>origin</em> and <em>region</em> do not follow rules
 described in the argument description for <em>origin</em> and <em>region</em>.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>ptr</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_EVENT_WAIT_LIST"><code>CL_INVALID_<wbr>EVENT_<wbr>WAIT_<wbr>LIST</code></a> if <em>event_wait_list</em> is <code>NULL</code> and
 <em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not <code>NULL</code> and
 <em>num_events_in_wait_list</em> is 0, or if event objects in <em>event_wait_list</em>
@@ -12501,6 +12513,9 @@
 same image format.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>src_origin</em>, <em>dst_origin</em>, or <em>region</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if the 2D or 3D rectangular region specified by
 <em>src_origin</em> and <em>src_origin</em> +  <em>region</em> refers to a region outside
 <em>src_image</em>, or if the 2D or 3D rectangular region specified by
@@ -12702,8 +12717,11 @@
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>fill_color</em> is <code>NULL</code>.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>origin</em> or <em>region</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if the region being filled as specified by <em>origin</em> and
-<em>region</em> is out of bounds or if <em>ptr</em> is a <code>NULL</code> value.</p>
+<em>region</em> is out of bounds.</p>
 </li>
 <li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if values in <em>origin</em> and <em>region</em> do not follow rules
@@ -12863,6 +12881,9 @@
 image buffer object created from <em>dst_buffer</em>.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>src_origin</em> or <em>region</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if the 1D, 2D or 3D rectangular region specified by
 <em>src_origin</em> and <em>src_origin</em> + <em>region</em> refers to a region outside
 <em>src_image</em>, or if the region specified by <em>dst_offset</em> and <em>dst_offset</em>
@@ -13039,6 +13060,9 @@
 buffer object created from <em>src_buffer</em>.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>dst_origin</em> or <em>region</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if the 1D, 2D or 3D rectangular region specified by
 <em>dst_origin</em> and <em>dst_origin</em> + <em>region</em> refer to a region outside
 <em>dst_image</em>, or if the region specified by <em>src_offset</em> and <em>src_offset</em>
@@ -13243,8 +13267,11 @@
 <p><a href="#CL_INVALID_MEM_OBJECT"><code>CL_INVALID_<wbr>MEM_<wbr>OBJECT</code></a> if <em>image</em> is not a valid image object.</p>
 </li>
 <li>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>origin</em> or <em>region</em> is <code>NULL</code>.</p>
+</li>
+<li>
 <p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if region being mapped given by (<em>origin</em>,
-<em>origin+region</em>) is out of bounds or if values specified in <em>map_flags</em>
+<em>origin</em> + <em>region</em>) is out of bounds or if values specified in <em>map_flags</em>
 are not valid.</p>
 </li>
 <li>
@@ -13406,15 +13433,15 @@
 <table id="image-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 22. List of supported param_names by <a href="#clGetImageInfo">clGetImageInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Image Info</th>
 <th class="tableblock halign-left valign-top">Return type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -13767,15 +13794,15 @@
 <table id="pipe-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 23. List of supported param_names by <a href="#clGetPipeInfo">clGetPipeInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Pipe Info</th>
 <th class="tableblock halign-left valign-top">Return type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -14529,15 +14556,15 @@
 <table id="mem-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 25. List of supported param_names by <a href="#clGetMemObjectInfo">clGetMemObjectInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Memory Object Info</th>
 <th class="tableblock halign-left valign-top">Return type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -14576,7 +14603,7 @@
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_MEM_HOST_PTR"></a><a href="#CL_MEM_HOST_PTR"><code>CL_MEM_<wbr>HOST_<wbr>PTR</code></a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>void</code> *</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>void</code>*</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">If <em>memobj</em> is created with <a href="#clCreateBuffer"><strong>clCreateBuffer</strong></a>, <a href="#clCreateBufferWithProperties"><strong>clCreateBufferWithProperties</strong></a>,
         <a href="#clCreateImage"><strong>clCreateImage</strong></a>, <a href="#clCreateImageWithProperties"><strong>clCreateImageWithProperties</strong></a>, <a href="#clCreateImage2D"><strong>clCreateImage2D</strong></a>, or
         <a href="#clCreateImage3D"><strong>clCreateImage3D</strong></a>, and <a href="#CL_MEM_USE_HOST_PTR"><code>CL_MEM_<wbr>USE_<wbr>HOST_<wbr>PTR</code></a> is specified in mem_flags,
@@ -15321,7 +15348,7 @@
 <em>event_wait_list</em> is a negative integer value.</p>
 </li>
 <li>
-<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>dst_ptr</em> or <em>src_ptr</em> are <code>NULL</code>.</p>
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>dst_ptr</em> or <em>src_ptr</em> is <code>NULL</code>.</p>
 </li>
 <li>
 <p><a href="#CL_MEM_COPY_OVERLAP"><code>CL_MEM_<wbr>COPY_<wbr>OVERLAP</code></a> if the values specified for <em>dst_ptr</em>, <em>src_ptr</em> and
@@ -16012,9 +16039,9 @@
 <table id="sampler-properties-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 27. List of supported sampler creation properties by <a href="#clCreateSamplerWithProperties">clCreateSamplerWithProperties</a></caption>
 <colgroup>
-<col style="width: 33.3333%;">
-<col style="width: 16.6666%;">
-<col style="width: 50.0001%;">
+<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
@@ -16339,15 +16366,15 @@
 <table id="sampler-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 28. List of supported param_names by <a href="#clGetSamplerInfo">clGetSamplerInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Sampler Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -16752,7 +16779,7 @@
 the list of devices associated with <em>context</em>.</p>
 </li>
 <li>
-<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>lengths</em> or <em>binaries</em> are <code>NULL</code> or if any entry
+<p><a href="#CL_INVALID_VALUE"><code>CL_INVALID_<wbr>VALUE</code></a> if <em>lengths</em> or <em>binaries</em> is <code>NULL</code> or if any entry
 in <em>lengths</em>[i] is zero or <em>binaries</em>[i] is <code>NULL</code>.</p>
 </li>
 <li>
@@ -17887,28 +17914,27 @@
 <dt class="hdlist1"><code>-D name=definition</code> </dt>
 <dd>
 <p>The contents of <em>definition</em> are tokenized and processed as if they
-appeared during translation phase three in a `#define' directive.
+appeared during translation phase three in a <code>#define</code> directive.
 In particular, the definition will be truncated by embedded newline
 characters.</p>
-</dd>
-</dl>
-</div>
+<div class="openblock">
+<div class="content">
 <div class="paragraph">
-<p>-D options are processed in the order they are given in the <em>options</em>
+<p><code>-D</code> options are processed in the order they are given in the <em>options</em>
 argument to <a href="#clBuildProgram"><strong>clBuildProgram</strong></a> or <a href="#clCompileProgram"><strong>clCompileProgram</strong></a>.
-Note that a space is required between the -D option and the symbol it
+Note that a space is required between the <code>-D</code> option and the symbol it
 defines, otherwise behavior is implementation defined.</p>
 </div>
-<div class="dlist">
-<dl>
+</div>
+</div>
+</dd>
 <dt class="hdlist1"><code>-I dir</code> </dt>
 <dd>
 <p>Add the directory <em>dir</em> to the list of directories to be searched for
 header files.
 <em>dir</em> can optionally be enclosed in double quotes.</p>
-</dd>
-</dl>
-</div>
+<div class="openblock">
+<div class="content">
 <div class="paragraph">
 <p>This option is not portable due to its dependency on host file system and
 host operating system.
@@ -17917,6 +17943,11 @@
 of <a href="#clCompileProgram"><strong>clCompileProgram</strong></a> followed by <a href="#clLinkProgram"><strong>clLinkProgram</strong></a>.</p>
 </div>
 </div>
+</div>
+</dd>
+</dl>
+</div>
+</div>
 <div class="sect4">
 <h5 id="math-intrinsics-options"><a class="anchor" href="#math-intrinsics-options"></a>5.8.6.2. Math Intrinsics Options</h5>
 <div class="paragraph">
@@ -18019,6 +18050,34 @@
 </div>
 </div>
 </dd>
+<dt class="hdlist1"><code>-cl-uniform-work-group-size</code> </dt>
+<dd>
+<p>This requires that the global work-size be a multiple of the work-group
+size specified to <a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a>.
+Allow optimizations that are made possible by this restriction.</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>Note: This option is <a href="#unified-spec">missing before</a> version 2.0.</p>
+</div>
+</div>
+</div>
+</dd>
+<dt class="hdlist1"><code>-cl-no-subgroup-ifp</code> </dt>
+<dd>
+<p>This indicates that kernels in this program do not require subgroups to
+make independent forward progress.
+Allows optimizations that are made possible by this restriction.
+This option has no effect for devices that do not support independent
+forward progress for subgroups.</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>Note: This option is <a href="#unified-spec">missing before</a> version 2.1.</p>
+</div>
+</div>
+</div>
+</dd>
 </dl>
 </div>
 <div class="paragraph">
@@ -18076,34 +18135,6 @@
 This option causes the preprocessor macro <code>__FAST_RELAXED_MATH__</code> to
 be defined in the OpenCL program.</p>
 </dd>
-<dt class="hdlist1"><code>-cl-uniform-work-group-size</code> </dt>
-<dd>
-<p>This requires that the global work-size be a multiple of the work-group
-size specified to <a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a>.
-Allow optimizations that are made possible by this restriction.</p>
-<div class="openblock">
-<div class="content">
-<div class="paragraph">
-<p>Note: This option is <a href="#unified-spec">missing before</a> version 2.0.</p>
-</div>
-</div>
-</div>
-</dd>
-<dt class="hdlist1"><code>-cl-no-subgroup-ifp</code> </dt>
-<dd>
-<p>This indicates that kernels in this program do not require subgroups to
-make independent forward progress.
-Allows optimizations that are made possible by this restriction.
-This option has no effect for devices that do not support independent
-forward progress for subgroups.</p>
-<div class="openblock">
-<div class="content">
-<div class="paragraph">
-<p>Note: This option is <a href="#unified-spec">missing before</a> version 2.1.</p>
-</div>
-</div>
-</div>
-</dd>
 </dl>
 </div>
 </div>
@@ -18149,20 +18180,20 @@
 <div class="ulist">
 <ul>
 <li>
-<p>CL1.1 Support OpenCL C language
-features defined in <em>section 6</em> of the OpenCL 1.1 specification.</p>
+<p><code>CL1.1</code>: Support OpenCL C 1.1 language features defined in <em>section 6</em> of
+the OpenCL 1.1 specification or in the unified OpenCL C specification.</p>
 </li>
 <li>
-<p>CL1.2 Support OpenCL C language
-features defined in <em>section 6</em> of the OpenCL 1.2 specification.</p>
+<p><code>CL1.2</code>: Support OpenCL C 1.2 language features defined in <em>section 6</em> of
+the OpenCL 1.2 specification or in the unified OpenCL C specification.</p>
 </li>
 <li>
-<p>CL2.0 Support OpenCL C language
-features defined in <em>section 6</em> of the OpenCL C 2.0 specification.</p>
+<p><code>CL2.0</code>: Support OpenCL C 2.0 language features defined in the OpenCL C 2.0
+specification or in the unified OpenCL C specification.</p>
 </li>
 <li>
-<p>CL3.0 Support OpenCL C language
-features defined in <em>section 6</em> of the OpenCL C 3.0 specification.</p>
+<p><code>CL3.0</code>: Support OpenCL C 3.0 language features defined in the unified
+OpenCL C specification.</p>
 </li>
 </ul>
 </div>
@@ -18247,9 +18278,6 @@
 </tr>
 </table>
 </div>
-<div class="paragraph">
-<p>The following option is available.</p>
-</div>
 <div class="dlist">
 <dl>
 <dt class="hdlist1"><code>-g</code> </dt>
@@ -18487,15 +18515,15 @@
 <table id="program-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 29. List of supported param_names by <a href="#clGetProgramInfo">clGetProgramInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Program Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -18568,7 +18596,7 @@
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_PROGRAM_BINARIES"></a><a href="#CL_PROGRAM_BINARIES"><code>CL_PROGRAM_<wbr>BINARIES</code></a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>unsigned char</code> *[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>unsigned char</code>*[]</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Return the program binaries (could be an executable binary, compiled
         binary or library binary) for all devices associated with program.
         For each device in program, the binary returned can be the binary
@@ -18740,15 +18768,15 @@
 <table id="program-build-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 30. List of supported param_names by <a href="#clGetProgramBuildInfo">clGetProgramBuildInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Program Build Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -19295,7 +19323,7 @@
 counts to <code>cl_kernel</code> arguments, because no mechanism is provided for the
 user to tell the kernel to release that ownership right.
 If the kernel holds ownership rights on kernel args, that would make it
-impossible for the user to tell with certainty when he may safely
+impossible for users to tell with certainty when they may safely
 release user allocated resources associated with OpenCL objects such as
 the <code>cl_mem</code> backing store used with <a href="#CL_MEM_USE_HOST_PTR"><code>CL_MEM_<wbr>USE_<wbr>HOST_<wbr>PTR</code></a>.</p>
 </div>
@@ -19501,9 +19529,9 @@
 <table id="kernel-exec-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 31. List of supported param_names by <a href="#clSetKernelExecInfo">clSetKernelExecInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
@@ -19516,7 +19544,7 @@
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_KERNEL_EXEC_INFO_SVM_PTRS"></a><a href="#CL_KERNEL_EXEC_INFO_SVM_PTRS"><code>CL_KERNEL_<wbr>EXEC_<wbr>INFO_<wbr>SVM_<wbr>PTRS</code></a></p>
 <p class="tableblock"><a href="#unified-spec">Missing before</a> version 2.0.</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>void</code> *[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>void</code>*[]</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">SVM pointers must reference locations contained entirely within
         buffers that are passed to kernel as arguments, or that are passed
         through the execution information.</p>
@@ -19805,15 +19833,15 @@
 <table id="kernel-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 32. List of supported param_names by <a href="#clGetKernelInfo">clGetKernelInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Kernel Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -19948,15 +19976,15 @@
 <table id="kernel-workgroup-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 33. List of supported param_names by <a href="#clGetKernelWorkGroupInfo">clGetKernelWorkGroupInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Kernel Work Group Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -20158,7 +20186,7 @@
 <th class="tableblock halign-left valign-top">Kernel Subgroup Info</th>
 <th class="tableblock halign-left valign-top">Input Type</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -20166,7 +20194,7 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE"></a><a href="#CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE"><code>CL_KERNEL_<wbr>MAX_<wbr>SUB_<wbr>GROUP_<wbr>SIZE_<wbr>FOR_<wbr>NDRANGE</code></a></p>
 <p class="tableblock"><a href="#unified-spec">Missing before</a> version 2.1.
 Also see extension <strong>cl_khr_subgroups</strong>.</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>size_t</code> *</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>size_t</code>*</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>size_t</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Returns the maximum sub-group size for this kernel.
             All sub-groups must be the same size, while the last subgroup in
@@ -20182,7 +20210,7 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE"></a><a href="#CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE"><code>CL_KERNEL_<wbr>SUB_<wbr>GROUP_<wbr>COUNT_<wbr>FOR_<wbr>NDRANGE</code></a></p>
 <p class="tableblock"><a href="#unified-spec">Missing before</a> version 2.1.
 Also see extension <strong>cl_khr_subgroups</strong>.</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>size_t</code> *</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>size_t</code>*</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>size_t</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of sub-groups that will be present in each
             work-group for a given local work size.
@@ -20356,15 +20384,15 @@
 <table id="kernel-argument-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 35. List of supported param_names by <a href="#clGetKernelArgInfo">clGetKernelArgInfo</a></caption>
 <colgroup>
-<col style="width: 33.3333%;">
-<col style="width: 16.6666%;">
-<col style="width: 50.0001%;">
+<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Kernel Arg Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -20601,11 +20629,11 @@
 <p>If non-uniform work-groups are supported, any single dimension
 for which the global size is not divisible by the local size will be
 partitioned into two regions.
-One region will have work-groups that have the same number of work items as
+One region will have work-groups that have the same number of work-items as
 was specified by the local size parameter in that dimension.
 The other region will have work-groups with less than the number of work
 items specified by the local size parameter in that dimension.
-The global IDs and group IDs of the work items in the first region will be
+The global IDs and group IDs of the work-items in the first region will be
 numerically lower than those in the second, and the second region will be at
 most one work-group wide in that dimension.
 Work-group sizes could be non-uniform in multiple dimensions, potentially
@@ -20613,7 +20641,7 @@
 different sizes in a 3D range.</p>
 </div>
 <div class="paragraph">
-<p>If non-uniform work groups are supported and <em>local_work_size</em> is <code>NULL</code>, the OpenCL runtime may choose a uniform or non-uniform work-group size.</p>
+<p>If non-uniform work-groups are supported and <em>local_work_size</em> is <code>NULL</code>, the OpenCL runtime may choose a uniform or non-uniform work-group size.</p>
 </div>
 <div class="paragraph">
 <p>Otherwise, when non-uniform work-groups are not supported, the size of each work-group must be uniform.
@@ -20623,7 +20651,7 @@
 <div class="paragraph">
 <p>The work-group size to be used for <em>kernel</em> can also be specified in the
 program source or intermediate language.
-In this case the size of work group specified by <em>local_work_size</em> must
+In this case the size of work-group specified by <em>local_work_size</em> must
 match the value specified in the program source.</p>
 </div>
 <div class="paragraph">
@@ -21121,24 +21149,6 @@
 <div class="sect2">
 <h3 id="event-objects"><a class="anchor" href="#event-objects"></a>5.11. Event Objects</h3>
 <div class="paragraph">
-<p>Event objects can be used to refer to a kernel-instance command
-(<a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a>, <a href="#clEnqueueTask"><strong>clEnqueueTask</strong></a>, <a href="#clEnqueueNativeKernel"><strong>clEnqueueNativeKernel</strong></a>), read,
-write, map and copy commands on memory objects (
-<a href="#clEnqueueReadBuffer"><strong>clEnqueueReadBuffer</strong></a>, <a href="#clEnqueueWriteBuffer"><strong>clEnqueueWriteBuffer</strong></a>,
-<a href="#clEnqueueMapBuffer"><strong>clEnqueueMapBuffer</strong></a>, <a href="#clEnqueueUnmapMemObject"><strong>clEnqueueUnmapMemObject</strong></a>,
-<a href="#clEnqueueReadBufferRect"><strong>clEnqueueReadBufferRect</strong></a>, <a href="#clEnqueueWriteBufferRect"><strong>clEnqueueWriteBufferRect</strong></a>,
-<a href="#clEnqueueReadImage"><strong>clEnqueueReadImage</strong></a>, <a href="#clEnqueueWriteImage"><strong>clEnqueueWriteImage</strong></a>, <a href="#clEnqueueMapImage"><strong>clEnqueueMapImage</strong></a>,
-<a href="#clEnqueueCopyBuffer"><strong>clEnqueueCopyBuffer</strong></a>, <a href="#clEnqueueCopyImage"><strong>clEnqueueCopyImage</strong></a>,
-<a href="#clEnqueueCopyBufferRect"><strong>clEnqueueCopyBufferRect</strong></a>,
-<a href="#clEnqueueCopyBufferToImage"><strong>clEnqueueCopyBufferToImage</strong></a>, <a href="#clEnqueueCopyImageToBuffer"><strong>clEnqueueCopyImageToBuffer</strong></a>),
-<a href="#clEnqueueSVMMemcpy"><strong>clEnqueueSVMMemcpy</strong></a>, <a href="#clEnqueueSVMMemFill"><strong>clEnqueueSVMMemFill</strong></a>,
-<a href="#clEnqueueSVMMap"><strong>clEnqueueSVMMap</strong></a>, <a href="#clEnqueueSVMUnmap"><strong>clEnqueueSVMUnmap</strong></a>, <a href="#clEnqueueSVMFree"><strong>clEnqueueSVMFree</strong></a>, <a href="#clEnqueueMarker"><strong>clEnqueueMarker</strong></a>,
-<a href="#clEnqueueMarkerWithWaitList"><strong>clEnqueueMarkerWithWaitList</strong></a>, <a href="#clEnqueueWaitForEvents"><strong>clEnqueueWaitForEvents</strong></a>, <a href="#clEnqueueBarrier"><strong>clEnqueueBarrier</strong></a>,
-<a href="#clEnqueueBarrierWithWaitList"><strong>clEnqueueBarrierWithWaitList</strong></a>,  (refer to
-<a href="#markers-barriers-waiting-for-events">Markers, Barriers and Waiting for
-Events</a>) or user events.</p>
-</div>
-<div class="paragraph">
 <p>An event object can be used to track the execution status of a command.
 The API calls that enqueue commands to a command-queue create a new event
 object that is returned in the <em>event</em> argument.
@@ -21152,17 +21162,17 @@
 <div class="ulist">
 <ul>
 <li>
-<p><a id="CL_QUEUED"></a><a href="#CL_QUEUED"><code>CL_QUEUED</code></a> This indicates that the command has been enqueued in a
+<p><a id="CL_QUEUED"></a><a href="#CL_QUEUED"><code>CL_QUEUED</code></a>: Indicates that the command has been enqueued in a
 command-queue.
 This is the initial state of all events except user events.</p>
 </li>
 <li>
-<p><a id="CL_SUBMITTED"></a><a href="#CL_SUBMITTED"><code>CL_SUBMITTED</code></a> This is the initial state for all user events.
-For all other events, this indicates that the command has been submitted
+<p><a id="CL_SUBMITTED"></a><a href="#CL_SUBMITTED"><code>CL_SUBMITTED</code></a>: The initial state for all user events.
+For all other events, indicates that the command has been submitted
 by the host to the device.</p>
 </li>
 <li>
-<p><a id="CL_RUNNING"></a><a href="#CL_RUNNING"><code>CL_RUNNING</code></a> This indicates that the device has started executing this
+<p><a id="CL_RUNNING"></a><a href="#CL_RUNNING"><code>CL_RUNNING</code></a>: Indicates that the device has started executing this
 command.
 In order for the execution status of an enqueued command to change from
 <a href="#CL_SUBMITTED"><code>CL_SUBMITTED</code></a> to <a href="#CL_RUNNING"><code>CL_RUNNING</code></a>, all events that this command is waiting on
@@ -21170,13 +21180,12 @@
 <a href="#CL_COMPLETE"><code>CL_COMPLETE</code></a>.</p>
 </li>
 <li>
-<p><a id="CL_COMPLETE"></a><a href="#CL_COMPLETE"><code>CL_COMPLETE</code></a> This indicates that the command has successfully completed.</p>
+<p><a id="CL_COMPLETE"></a><a href="#CL_COMPLETE"><code>CL_COMPLETE</code></a>: Indicates that the command has successfully completed.</p>
 </li>
 <li>
-<p>Error code The error code is a negative integer value and indicates that
-the command was abnormally terminated.
-Abnormal termination may occur for a number of reasons such as a bad
-memory access.</p>
+<p>An Error Code: A negative integer value indicating that the command was
+abnormally terminated. Abnormal termination may occur for a number of reasons,
+such as a bad memory access.</p>
 </li>
 </ul>
 </div>
@@ -21266,8 +21275,7 @@
 </ul>
 </div>
 <div class="paragraph">
-<p>The execution status of the user event object created is set to
-<a href="#CL_SUBMITTED"><code>CL_SUBMITTED</code></a>.</p>
+<p>The initial execution status for the user event object is <a href="#CL_SUBMITTED"><code>CL_SUBMITTED</code></a>.</p>
 </div>
 </div>
 </div>
@@ -21494,15 +21502,15 @@
 <table id="event-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 36. List of supported param_names by <a href="#clGetEventInfo">clGetEventInfo</a></caption>
 <colgroup>
-<col style="width: 30%;">
 <col style="width: 33%;">
-<col style="width: 37%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Event Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -22542,15 +22550,15 @@
 <table id="event-profiling-info-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 38. List of supported param_names by <a href="#clGetEventProfilingInfo">clGetEventProfilingInfo</a></caption>
 <colgroup>
-<col style="width: 34%;">
 <col style="width: 33%;">
-<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
 <th class="tableblock halign-left valign-top">Event Profiling Info</th>
 <th class="tableblock halign-left valign-top">Return Type</th>
-<th class="tableblock halign-left valign-top">Info. returned in <em>param_value</em></th>
+<th class="tableblock halign-left valign-top">Description</th>
 </tr>
 </thead>
 <tbody>
@@ -22784,8 +22792,7 @@
 <div class="paragraph">
 <p>In addition to the specification of core features, OpenCL provides a number
 of extensions to the API, kernel language or intermediate representation.
-These features are defined in the OpenCL 2.2 extensions specification
-document.</p>
+These features are defined in the OpenCL extension specification document.</p>
 </div>
 <div class="paragraph">
 <p>Extensions defined against earlier versions of the OpenCL specifications,
@@ -23060,9 +23067,9 @@
 <table id="embedded-device-queries-table" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 39. List of supported param_names by <a href="#clGetDeviceInfo">clGetDeviceInfo</a> for embedded profile</caption>
 <colgroup>
-<col style="width: 33.3333%;">
-<col style="width: 33.3333%;">
-<col style="width: 33.3334%;">
+<col style="width: 33%;">
+<col style="width: 17%;">
+<col style="width: 50%;">
 </colgroup>
 <thead>
 <tr>
@@ -23428,10 +23435,9 @@
 CPUs are designed in general to work well on latency sensitive algorithms on
 single threaded tasks, whereas common GPUs may encounter extremely long
 latencies, potentially orders of magnitude worse.
-A developer interested in writing portable code may find that it is
-necessary to test his design on a diversity of hardware designs to make sure
-that key algorithms are structured in a way that works well on a diversity
-of hardware.
+Developers interested in writing portable code may need to test their
+software on a diversity of hardware designs to make sure that key algorithms
+are structured in a way that works well on a diversity of hardware.
 We suggest favoring more work-items over fewer.
 It is anticipated that over the coming months and years experience will
 produce a set of best practices that will help foster a uniformly favorable
@@ -25131,11 +25137,11 @@
 </li>
 <li>
 <p><a href="#CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT"><code>CL_DEVICE_<wbr>NON_<wbr>UNIFORM_<wbr>WORK_<wbr>GROUP_<wbr>SUPPORT</code></a> to
-determine if a device supports non-uniform work group sizes.</p>
+determine if a device supports non-uniform work-group sizes.</p>
 </li>
 <li>
 <p><a href="#CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT"><code>CL_DEVICE_<wbr>WORK_<wbr>GROUP_<wbr>COLLECTIVE_<wbr>FUNCTIONS_<wbr>SUPPORT</code></a>
-to determine whether a device supports optional work group
+to determine whether a device supports optional work-group
 collective functions, such as broadcasts, scans, and reductions.</p>
 </li>
 <li>
@@ -25531,11 +25537,11 @@
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_INVALID_WORK_GROUP_SIZE"></a><a href="#CL_INVALID_WORK_GROUP_SIZE"><code>CL_INVALID_<wbr>WORK_<wbr>GROUP_<wbr>SIZE</code></a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Returned by <a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a> when the specified total work group size is not valid for the specified kernel or device.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returned by <a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a> when the specified total work-group size is not valid for the specified kernel or device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_INVALID_WORK_ITEM_SIZE"></a><a href="#CL_INVALID_WORK_ITEM_SIZE"><code>CL_INVALID_<wbr>WORK_<wbr>ITEM_<wbr>SIZE</code></a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Returned by <a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a> when the specified work group size in one dimension is not valid for the device.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returned by <a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a> when the specified work-group size in one dimension is not valid for the device.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a id="CL_KERNEL_ARG_INFO_NOT_AVAILABLE"></a><a href="#CL_KERNEL_ARG_INFO_NOT_AVAILABLE"><code>CL_KERNEL_<wbr>ARG_<wbr>INFO_<wbr>NOT_<wbr>AVAILABLE</code></a></p>
@@ -25945,8 +25951,8 @@
 <div class="sect2">
 <h3 id="_non_uniform_work_groups"><a class="anchor" href="#_non_uniform_work_groups"></a>Non-Uniform Work Groups</h3>
 <div class="paragraph">
-<p>Support for non-uniform work groups is optional for devices supporting OpenCL 3.0.
-When non-uniform work groups are not supported:</p>
+<p>Support for non-uniform work-groups is optional for devices supporting OpenCL 3.0.
+When non-uniform work-groups are not supported:</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
 <colgroup>
@@ -25963,11 +25969,11 @@
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#clGetDeviceInfo"><strong>clGetDeviceInfo</strong></a>, passing<br>
 <a href="#CL_DEVICE_NON_UNIFORM_WORK_GROUP_SUPPORT"><code>CL_DEVICE_<wbr>NON_<wbr>UNIFORM_<wbr>WORK_<wbr>GROUP_<wbr>SUPPORT</code></a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">May return <a href="#CL_FALSE"><code>CL_FALSE</code></a>, indicating that <em>device</em> does not support non-uniform work groups.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">May return <a href="#CL_FALSE"><code>CL_FALSE</code></a>, indicating that <em>device</em> does not support non-uniform work-groups.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#clEnqueueNDRangeKernel"><strong>clEnqueueNDRangeKernel</strong></a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Behaves as though non-uniform Work Groups were not enabled for <em>kernel</em>, if the device associated with <em>command_queue</em> does not support non-uniform work groups.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Behaves as though non-uniform Work Groups were not enabled for <em>kernel</em>, if the device associated with <em>command_queue</em> does not support non-uniform work-groups.</p></td>
 </tr>
 </tbody>
 </table>
@@ -26285,8 +26291,8 @@
 <div class="sect2">
 <h3 id="_work_group_collective_functions"><a class="anchor" href="#_work_group_collective_functions"></a>Work Group Collective Functions</h3>
 <div class="paragraph">
-<p>Work group collective functions for broadcasts, scans, and reductions are optional for devices supporting OpenCL 3.0.
-When work group collective functions are not supported:</p>
+<p>Work-group collective functions for broadcasts, scans, and reductions are optional for devices supporting OpenCL 3.0.
+When work-group collective functions are not supported:</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
 <colgroup>
@@ -26303,12 +26309,12 @@
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#clGetDeviceInfo"><strong>clGetDeviceInfo</strong></a>, passing<br>
 <a href="#CL_DEVICE_WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT"><code>CL_DEVICE_<wbr>WORK_<wbr>GROUP_<wbr>COLLECTIVE_<wbr>FUNCTIONS_<wbr>SUPPORT</code></a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">May return <a href="#CL_FALSE"><code>CL_FALSE</code></a>, indicating that <em>device</em> does not support work group collective functions.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">May return <a href="#CL_FALSE"><code>CL_FALSE</code></a>, indicating that <em>device</em> does not support work-group collective functions.</p></td>
 </tr>
 </tbody>
 </table>
 <div class="paragraph">
-<p>OpenCL C compilers supporting work group collective functions will define the feature macro <code>__opencl_c_work_group_collective_functions</code>.</p>
+<p>OpenCL C compilers supporting work-group collective functions will define the feature macro <code>__opencl_c_work_group_collective_functions</code>.</p>
 </div>
 </div>
 <div class="sect2">
@@ -26404,6 +26410,7 @@
 Robert Elliott, ARM<br>
 Scott Moyers, ARM<br>
 Stuart Brady, ARM<br>
+Sven Van Haastregt, Arm<br>
 Tom Olson, ARM<br>
 Anastasia Stulova, ARM<br>
 Christopher Thompson-Walsh, Broadcom<br>
@@ -26442,6 +26449,7 @@
 Allen Hux, Intel<br>
 Andrew Brownsword, Intel<br>
 Andrew Lauritzen, Intel<br>
+Anton Zabaznov, Intel<br>
 Bartosz Sochacki, Intel<br>
 Ben Ashbaugh, Intel<br>
 Boaz Ouriel, Intel<br>
@@ -26562,7 +26570,7 @@
 <div id="footnotes">
 <hr>
 <div class="footnote" id="_footnotedef_1">
-<a href="#_footnoteref_1">1</a>. This value for memory_scope can only be used with atomic_work_item_fence with flags set to CLK_IMAGE_MEM_FENCE.
+<a href="#_footnoteref_1">1</a>. This value for memory_scope can only be used with atomic_work_item_fence with flags set to <code>CLK_IMAGE_MEM_FENCE</code>.
 </div>
 <div class="footnote" id="_footnotedef_2">
 <a href="#_footnoteref_2">2</a>. The platform profile returns the profile that is implemented by the OpenCL framework. If the platform profile returned is FULL_PROFILE, the OpenCL framework will support devices that are FULL_PROFILE and may also support devices that are EMBEDDED_PROFILE. The compiler must be available for all devices i.e. <a href="#CL_DEVICE_COMPILER_AVAILABLE"><code>CL_DEVICE_<wbr>COMPILER_<wbr>AVAILABLE</code></a> is <a href="#CL_TRUE"><code>CL_TRUE</code></a>. If the platform profile returned is EMBEDDED_PROFILE, then devices that are only EMBEDDED_PROFILE are supported.
@@ -26675,8 +26683,8 @@
 </div>
 <div id="footer">
 <div id="footer-text">
-Version v3.0.5<br>
-Last updated 2020-09-28 14:46:56 -0700
+Version v3.0.6<br>
+Last updated 2020-12-13 22:09:28 -0800
 </div>
 </div>
 
diff --git a/specs/3.0-unified/html/OpenCL_C.html b/specs/3.0-unified/html/OpenCL_C.html
index d8fdd08..4fad35c 100644
--- a/specs/3.0-unified/html/OpenCL_C.html
+++ b/specs/3.0-unified/html/OpenCL_C.html
@@ -841,9 +841,9 @@
 <h1>The OpenCL<sup>&#8482;</sup> C Specification</h1>
 <div class="details">
 <span id="author" class="author">Khronos<sup>&#174;</sup> OpenCL Working Group</span><br>
-<span id="revnumber">version v3.0.5,</span>
-<span id="revdate">Wed, 30 Sep 2020 00:00:00 +0000</span>
-<br><span id="revremark">from git branch: master commit: 4d8a36725aa8af9658ab5cb62fdbf52adb44bcca</span>
+<span id="revnumber">version v3.0.6,</span>
+<span id="revdate">Fri, 18 Dec 2020 12:00:00 +0000</span>
+<br><span id="revremark">from git branch: master commit: e9a4d468b1a0a38c1e10b8af484bb2bbb495e2b7</span>
 </div>
 <div id="toc" class="toc2">
 <div id="toctitle">Table of Contents</div>
@@ -907,7 +907,10 @@
 <li><a href="#constant-or-constant">6.7.3. <code>__constant</code> (or <code>constant</code>)</a></li>
 <li><a href="#private-or-private">6.7.4. <code>__private</code> (or <code>private</code>)</a></li>
 <li><a href="#the-generic-address-space">6.7.5. The Generic Address Space</a></li>
-<li><a href="#changes-to-C99">6.7.6. Changes to C99</a></li>
+<li><a href="#_usage_for_declaration_scopes_and_variable_types">6.7.6. Usage for declaration scopes and variable types</a></li>
+<li><a href="#_initialization">6.7.7. Initialization</a></li>
+<li><a href="#addr-spaces-inference">6.7.8. Inference</a></li>
+<li><a href="#addr-spaces-conversions">6.7.9. Address space conversions</a></li>
 </ul>
 </li>
 <li><a href="#access-qualifiers">6.8. Access Qualifiers</a></li>
@@ -1163,8 +1166,8 @@
 </table>
 </div>
 <div class="paragraph">
-<p>Optional language features are described in this document. They are optional
-from OpenCL C 3.0 onwards and therefore are not supported by all
+<p>Optional core language features are described in this document. They are
+optional from OpenCL C 3.0 onwards and therefore are not supported by all
 implementations. When an OpenCL C 3.0 optional feature is supported, an
 associated <em>feature test macro</em> will be predefined.</p>
 </div>
@@ -1190,87 +1193,87 @@
 </thead>
 <tbody>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_3d_image_writes</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>3d_<wbr>image_<wbr>writes</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports built-in functions for writing to 3D image
 objects.</p>
-<p class="tableblock">OpenCL C compilers that define the feature macro <code>__opencl_c_3d_image_writes</code>
-must also define the feature macro <code>__opencl_c_images</code>.</p></td>
+<p class="tableblock">OpenCL C compilers that define the feature macro <code>__opencl_c_<wbr>3d_<wbr>image_<wbr>writes</code>
+must also define the feature macro <code>__opencl_c_<wbr>images</code>.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_atomic_order_acq_rel</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>acq_<wbr>rel</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports enumerations and built-in functions for atomic
 operations with acquire and release memory consistency orders.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_atomic_order_seq_cst</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports enumerations and built-in functions for atomic
 operations and fences with sequentially consistent memory consistency order.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_atomic_scope_device</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports enumerations and built-in functions for atomic
 operations and fences with device memory scope.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_atomic_scope_all_devices</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>all_<wbr>devices</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports enumerations and built-in functions for atomic
 operations and fences with all with memory scope across all devices that can
 share SVM memory with each other and the host process.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_device_enqueue</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>device_<wbr>enqueue</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports built-in functions to enqueue additional work
 from the device.</p>
-<p class="tableblock">OpenCL C compilers that define the feature macro <code>__opencl_c_device_enqueue</code>
-must also define the feature macro <code>__opencl_c_generic_address_space</code>.</p></td>
+<p class="tableblock">OpenCL C compilers that define the feature macro <code>__opencl_c_<wbr>device_<wbr>enqueue</code>
+must also define the feature macro <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_generic_address_space</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports the unnamed generic address space.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_fp64</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>fp64</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports types and built-in functions with 64-bit
 floating point types.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_images</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>images</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports types and built-in functions for images.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_int64</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>int64</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports types and built-in functions with 64-bit
 integers.</p>
 <p class="tableblock">OpenCL C compilers for FULL profile devices or devices with 64-bit pointers
-must always define the <code>__opencl_c_int64</code> feature macro.</p></td>
+must always define the <code>__opencl_c_<wbr>int64</code> feature macro.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_pipes</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>pipes</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports the pipe modifier and built-in functions
 to read and write from a pipe.</p>
-<p class="tableblock">OpenCL C compilers that define the feature macro <code>__opencl_c_pipes</code> must
-also define the feature macro <code>__opencl_c_generic_address_space</code>.</p></td>
+<p class="tableblock">OpenCL C compilers that define the feature macro <code>__opencl_c_<wbr>pipes</code> must
+also define the feature macro <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_program_scope_global_variables</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports program scope variables in the global address
 space.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_read_write_images</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>read_<wbr>write_<wbr>images</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports reading from and writing to the same image
 object in a kernel.</p>
 <p class="tableblock">OpenCL C compilers that define the feature macro
-<code>__opencl_c_read_write_images</code> must also define the feature macro
-<code>__opencl_c_images</code>.</p></td>
+<code>__opencl_c_<wbr>read_<wbr>write_<wbr>images</code> must also define the feature macro
+<code>__opencl_c_<wbr>images</code>.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_subgroups</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>subgroups</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports built-in functions operating on sub-groupings
 of work-items.</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_work_group_collective_functions</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__opencl_c_<wbr>work_<wbr>group_<wbr>collective_<wbr>functions</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The OpenCL C compiler supports built-in functions that perform collective
 operations across a work-group.</p></td>
 </tr>
@@ -1284,33 +1287,37 @@
 value of the feature macro must compare greater than the prior value of the
 feature macro.</p>
 </div>
+<div class="paragraph">
+<p>As specified in <a href="#C99-spec">section 7.1.3 of the C99 Specification</a> double
+underscore identifiers are reserved and therefore implementations
+for earlier OpenCL C versions are allowed to define feature test macros
+but they are not required to do so. This means that applications which
+target earlier OpenCL C versions should not rely on the presence of
+feature test macros because there is no guarantee that feature test macros
+will be defined and that if defined they will indicate the presence of the
+corresponding optional functionality.</p>
+</div>
 </div>
 <div class="sect3">
 <h4 id="extensions"><a class="anchor" href="#extensions"></a>6.2.2. Extensions</h4>
 <div class="paragraph">
-<p>Optional functionality that is not defined in this document is referred to
-as extensions. Extensions are described in
-<a href="#opencl-extension-spec">the OpenCL Extension Specification</a>.</p>
+<p>Other optional functionality may be described by language extensions to OpenCL
+C. Extensions are described in the <a href="#opencl-extension-spec">OpenCL Extension
+Specification</a>.  When an OpenCL C extension is supported an associated
+<em>extension macro</em> will be predefined.  Please refer to the OpenCL Extension
+Specification for more information about predefined extension macros.</p>
 </div>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
 <div class="paragraph">
-<p>Prior to OpenCL C 3.0 some optional features described in this document were
-referred to as optional core features. Their presence could be
-indicated by the predefined extension macros. If any of the features has been
-an optional extension in earlier OpenCL versions it can still be used as an
-extension i.e. the same predefined extension macros are still valid in OpenCL C
-3.0 or newer. However, the use of feature macros is preferred whenever
-possible.</p>
+<p>Prior to OpenCL C 3.0, support for some optional core language features was
+indicated using predefined extension macros.</p>
 </div>
-</td>
-</tr>
-</table>
+<div class="paragraph">
+<p>When an optional core language feature began as an extension it may have both an
+associated feature macro and an associated extension macro.  If an optional core
+language feature was an optional extension to an earlier version of OpenCL C it
+can still be used as an extension, i.e. the same predefined extension macros are
+still valid in OpenCL C 3.0 or newer, however the use of feature macros is
+preferred whenever possible.</p>
 </div>
 </div>
 </div>
@@ -1589,9 +1596,9 @@
 <div class="content">
 <div class="paragraph">
 <p>The <code>char</code>, <code>unsigned char</code>, <code>short</code>, <code>unsigned short</code>, <code>int</code>, <code>unsigned int</code>,
-<code>long</code>, <code>unsigned long</code>, <code>float</code> and <code>double vector data types are supported.
+<code>long</code>, <code>unsigned long</code>, <code>float</code> and <code>double</code> vector data types are supported.
 <sup class="footnote">[<a id="_footnoteref_6" class="footnote" href="#_footnotedef_6" title="View footnote.">6</a>]</sup>
-The vector data type is defined with the type name, i.e. `char</code>, <code>uchar</code>,
+The vector data type is defined with the type name, i.e. <code>char</code>, <code>uchar</code>,
 <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, or <code>double</code>
 followed by a literal value <em>n</em> that defines the number of elements in the
 vector.
@@ -1798,20 +1805,20 @@
       This queue can only be used to enqueue commands from kernels executing
       on the device.</p>
 <p class="tableblock">      <a href="#unifed-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_device_enqueue</code> feature.</p></td>
+      newer and the <code>__opencl_c_<wbr>device_<wbr>enqueue</code> feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>ndrange_t</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The N-dimensional range over which a kernel executes.</p>
 <p class="tableblock">      <a href="#unified-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_device_enqueue</code> feature.</p></td>
+      newer and the <code>__opencl_c_<wbr>device_<wbr>enqueue</code> feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>clk_event_t</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">A device side event that identifies a command enqueue to
       a device command queue.</p>
 <p class="tableblock">      <a href="#unified-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_device_enqueue</code> feature.</p></td>
+      newer and the <code>__opencl_c_<wbr>device_<wbr>enqueue</code> feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>reserve_id_t</code></p></td>
@@ -1819,7 +1826,7 @@
       This opaque type is used to identify the reservation for
       <a href="#pipe-functions">reading and writing a pipe</a>.</p>
 <p class="tableblock">      <a href="#unified-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_pipes</code> feature.</p></td>
+      newer and the <code>__opencl_c_<wbr>pipes</code> feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>event_t</code></p></td>
@@ -1852,7 +1859,7 @@
 <code>image2d_array_depth_t</code> and <code>sampler_t</code> types are only defined if the device
 supports images, i.e. the value of the <a href="#opencl-device-queries"><code>CL_DEVICE_IMAGE_SUPPORT</code> device query</a>) is <code>CL_TRUE</code>.
 If this is the case then an OpenCL C 3.0 or newer compiler must also define
-the <code>__opencl_c_images</code> feature macro.</p>
+the <code>__opencl_c_<wbr>images</code> feature macro.</p>
 </div>
 </td>
 </tr>
@@ -2090,7 +2097,7 @@
 vectors or a mixture thereof.
 A vector literal can be used either as a vector initializer or as a primary
 expression.
-A vector literal cannot be used as an l-value.</p>
+Whether a vector literal can be used as an l-value is implementation-defined.</p>
 </div>
 <div class="paragraph">
 <p>A vector literal is written as a parenthesized vector type followed by a
@@ -2238,14 +2245,16 @@
 <table id="table-vector-indices" class="tableblock frame-all grid-all stretch">
 <caption class="title">Table 6. Numeric indices for built-in vector data types</caption>
 <colgroup>
-<col style="width: 50%;">
-<col style="width: 50%;">
+<col style="width: 34%;">
+<col style="width: 66%;">
 </colgroup>
-<tbody>
+<thead>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Vector Components</strong></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Numeric indices that can be used</strong></p></td>
+<th class="tableblock halign-left valign-top"><strong>Vector Components</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Numeric indices that can be used</strong></th>
 </tr>
+</thead>
+<tbody>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">2-component</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">0, 1</p></td>
@@ -2396,7 +2405,7 @@
 </div>
 </div>
 <div class="paragraph">
-<p>It is an error to take the address of a vector element and will result in a
+<p>It is illegal to take the address of a vector element and will result in a
 compilation error.
 For example:</p>
 </div>
@@ -2483,7 +2492,7 @@
 </div>
 <div class="paragraph">
 <p>Implicit conversions for pointer types follow the rules described in the
-&lt;C99-spec,C99 Specification&gt;&gt;.</p>
+<a href="#C99-spec">C99 Specification</a>.</p>
 </div>
 </div>
 <div class="sect3">
@@ -2494,9 +2503,6 @@
 <code>void</code> and <code>half</code> <sup class="footnote">[<a id="_footnoteref_12" class="footnote" href="#_footnotedef_12" title="View footnote.">12</a>]</sup>).
 In the example below:</p>
 </div>
-<div class="paragraph">
-<p>[9] Unless the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.</p>
-</div>
 <div class="listingblock">
 <div class="content">
 <pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">float</span> f = <span class="float">1</span><span class="float">.0f</span>;
@@ -3404,7 +3410,7 @@
 <p>The behavior of applying the <code>sizeof</code> operator to the <code>bool</code>, <code>image2d_t</code>,
 <code>image3d_t</code>, <code>image2d_array_t</code>, <code>image1d_t</code>, <code>image1d_buffer_t</code>,
 <code>image1d_array_t</code>, <code>image2d_depth_t</code>, <code>image2d_array_depth_t</code>,
-<code>sampler_t</code>, <code>queue_t, `ndrange_t</code>, <code>clk_event_t</code>, <code>reserve_id_t</code>, and
+<code>sampler_t</code>, <code>queue_t</code>, <code>ndrange_t</code>, <code>clk_event_t</code>, <code>reserve_id_t</code>, and
 <code>event_t</code> types is implementation-defined.</p>
 </div>
 </div>
@@ -3613,36 +3619,22 @@
 <div class="openblock">
 <div class="content">
 <div class="paragraph">
-<p>OpenCL implements the following disjoint named address spaces: <code>__global</code>,
-<code>__local</code>, <code>__constant</code> and <code>__private</code>.
+<p>OpenCL C has a hierarchical memory architecture represented by address spaces, as
+defined in section 5 of <a href="#embedded-c-spec">the Embedded C Specification</a>. It
+extends the C syntax to allow an address space name as a valid type qualifier
+(section 5.1.2 of <a href="#embedded-c-spec">the Embedded C Specification</a>).
+OpenCL implements disjoint named address spaces with the spelling
+<code>__global</code>, <code>__local</code>, <code>__constant</code> and <code>__private</code>.
 The address space qualifier may be used in variable declarations to specify
-the region of memory that is used to allocate the object.
-The C syntax for type qualifiers is extended in OpenCL to include an address
-space name as a valid type qualifier.
-If the type of an object is qualified by an address space name, the object
-is allocated in the specified address space name.</p>
+the region where objects are to be allocated. If the type of an
+object is qualified by an address space name, the object is allocated in the
+specified address space. Similarly, for pointers, the type pointed to can be qualified
+by an address space signaling the address space where the object pointed to is located.</p>
 </div>
 <div class="paragraph">
-<p>The address space names without the <code>__</code> prefix, i.e. <code>global</code>, <code>local</code>,
-<code>constant</code> and <code>private</code>, may be substituted for the corresponding address
-space names with the <code>__</code> prefix.</p>
-</div>
-<div class="paragraph">
-<p>The address space name for arguments to a function in a program, or local
-variables of a function is <code>__private</code>.
-All function arguments shall be in the <code>__private</code> address space.</p>
-</div>
-<div class="paragraph">
-<p>Additionally, all function return values shall be in the <code>__private</code> address space.</p>
-</div>
-<div class="paragraph">
-<p>For OpenCL C 2.0, or OpenCL 3.0 or newer with the
-<code>__opencl_c_program_scope_global_variables</code> feature, the address space for a
-variable at program scope or a <code>static</code> or <code>extern</code> variable inside a function
-may be either <code>__constant</code> or <code>__global</code>,
-and the address space defaults to <code>__global</code> if not specified.
-Otherwise, the address space for a variable at program scope or a <code>static</code> or <code>extern</code>
-variable inside a function must explicitly be <code>__constant</code>.</p>
+<p>The address space name spelling without the <code>__</code> prefix, i.e. <code>global</code>,
+<code>local</code>, <code>constant</code> and <code>private</code>, are valid and may be substituted for the
+corresponding address space names with the <code>__</code> prefix.</p>
 </div>
 <div class="paragraph">
 <p>Examples:</p>
@@ -3651,41 +3643,26 @@
 <div class="content">
 <pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// declares a pointer p in the global address space that</span>
 <span class="comment">// points to an object in the global address space</span>
-global <span class="predefined-type">int</span> *p;
+__global <span class="predefined-type">int</span> *__global p;
 
 <span class="directive">void</span> foo (...)
 {
     <span class="comment">// declares an array of 4 floats in the private address space</span>
-    <span class="predefined-type">float</span> x[<span class="integer">4</span>];
+    __private <span class="predefined-type">float</span> x[<span class="integer">4</span>];
     ...
 }</code></pre>
 </div>
 </div>
 <div class="paragraph">
-<p>For OpenCL C 2.0, or with the <code>__opencl_c_generic_address_space</code> feature,
-there is an additional unnamed generic address space. The unnamed generic
-address space overlaps the named <code>__global</code>, <code>__local</code>, and <code>__private</code> address space; the named <code>__constant</code> address space is not in
-the generic address space.</p>
+<p>For OpenCL C 2.0, or OpenCL C 3.0 with the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
+feature macro, there is an additional unnamed generic address space.</p>
 </div>
 <div class="paragraph">
-<p>If the generic address space is supported,
-pointers that are declared without pointing to a named address space point
-to the generic address space.
-Otherwise, when the generic address space is not supported, pointers that
-are declared without pointing to a named address space point to the
-<code>__private</code> address space.</p>
-</div>
-<div class="paragraph">
-<p>Kernel function arguments declared to be a pointer or an array of a type
-must point to one of the named address spaces <code>__global</code>, <code>__local</code> or
-<code>__constant</code>.</p>
-</div>
-<div class="paragraph">
-<p>A pointer to address space A can be assigned to a pointer to the same
-address space A or be implicitly converted and assigned to a pointer
-to the generic address space.
-Casting a pointer to address space A to a pointer to address space B is
-illegal if A and B are named address spaces and A is not the same as B.</p>
+<p>Most of the restrictions from section 5.1.2 and section 5.3 of the
+<a href="#embedded-c-spec">Embedded C Specification</a> apply in OpenCL C, e.g. address
+spaces can not be used with a return type, a function parameter, or a function
+type, and multiple address space qualifiers are not allowed. However, in OpenCL
+C it is allowed to qualify local variables with an address space qualifier.</p>
 </div>
 <div class="paragraph">
 <p>Examples:</p>
@@ -3695,11 +3672,18 @@
 <pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// OK.</span>
 <span class="predefined-type">int</span> f() { ... }
 
-<span class="comment">// Error. Address space qualifier cannot be used with non-pointer return type.</span>
+<span class="comment">// Error. Address space qualifier cannot be used with a non-pointer return type.</span>
 private <span class="predefined-type">int</span> f() { ... }
 
-<span class="comment">// OK. Address space qualifier can be used with pointer return type.</span>
-local <span class="predefined-type">int</span> *f() { ... }</code></pre>
+<span class="comment">// OK. Address space qualifier can be used with a pointer return type.</span>
+local <span class="predefined-type">int</span> *f() { ... }
+
+<span class="comment">// Error. Multiple address spaces specified for a type.</span>
+private local <span class="predefined-type">int</span> i;
+
+<span class="comment">// OK. The first address space qualifies the object pointed to and the second</span>
+<span class="comment">// qualifies the pointer.</span>
+private <span class="predefined-type">int</span> *local ptr;</code></pre>
 </div>
 </div>
 <div class="paragraph">
@@ -3740,11 +3724,11 @@
 This allows the kernel to read and/or write any location in the buffer.</p>
 </div>
 <div class="paragraph">
-<p>The actual size of the array memory object is determined when the memory
+<p>The actual size of the memory object is determined when the memory
 object is allocated via appropriate API calls in the host code.</p>
 </div>
 <div class="paragraph">
-<p>Some examples are:</p>
+<p>Examples:</p>
 </div>
 <div class="listingblock">
 <div class="content">
@@ -3755,7 +3739,7 @@
     <span class="predefined-type">int</span> b[<span class="integer">2</span>];
 } foo_t;
 
-global foo_t *my_info; <span class="comment">// An array of foo_t elements.</span></code></pre>
+global foo_t *my_info; <span class="comment">// An array of foo_t elements</span></code></pre>
 </div>
 </div>
 <div class="paragraph">
@@ -3765,70 +3749,12 @@
 Built-in functions to read from and write to an image object are provided.</p>
 </div>
 <div class="paragraph">
-<p>For OpenCL C 2.0, or with the <code>__opencl_c_program_scope_global_variables</code>
-feature,
-variables defined at program scope and <code>static</code> variables inside a function
-can also be declared in the <code>global</code> address space.
-They can be defined with any valid OpenCL C data type except for those in
-<a href="#table-other-builtin-types">Other Built-in Data Types</a>.
-Such program scope variables may be of any user-defined type,
-or a pointer to a user-defined type.
-In the presence of shared virtual memory, these pointers or pointer members
-should work as expected as long as they are shared virtual memory pointers
-and the referenced storage has been mapped appropriately.
-These variables in the <code>global</code> address space have the same lifetime as the
-program, and their values persist between calls to any of the kernels in the
-program.
-These variables are not shared across devices.
-They have distinct storage.</p>
-</div>
-<div class="paragraph">
-<p>Program scope and <code>static</code> variables in the <code>global</code> address space are zero
-initialized by default. A constant expression may be given as an initializer.</p>
-</div>
-<div class="paragraph">
-<p>Examples:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0, or OpenCL C 3.0 or newer and the</span>
-<span class="comment">// __opencl_c_program_scope_global_variables feature.</span>
-
-global <span class="predefined-type">int</span> foo;         <span class="comment">// OK.</span>
-<span class="predefined-type">int</span> foo;                <span class="comment">// OK. Declared in the global address space.</span>
-global uchar buf[<span class="integer">512</span>];  <span class="comment">// OK.</span>
-global <span class="predefined-type">int</span> baz = <span class="integer">12</span>;    <span class="comment">// OK. Initialization is allowed.</span>
-<span class="directive">static</span> global <span class="predefined-type">int</span> bat;  <span class="comment">// OK. Internal linkage.</span>
-
-<span class="directive">static</span> <span class="predefined-type">int</span> foo;         <span class="comment">// OK. Declared in the global address space.</span>
-<span class="directive">static</span> global <span class="predefined-type">int</span> foo;  <span class="comment">// OK.</span>
-
-<span class="predefined-type">int</span> *foo;               <span class="comment">// OK. foo is allocated in the global address space.</span>
-                        <span class="comment">// foo points to a location in the private or</span>
-                        <span class="comment">// generic address space.</span>
-
-<span class="directive">void</span> func(...)
-{
-    <span class="predefined-type">int</span> *foo;           <span class="comment">// OK. foo is allocated in the private address space.</span>
-                        <span class="comment">// foo points to a location in the private or</span>
-                        <span class="comment">// generic address space.</span>
-    ...
-}
-
-global <span class="predefined-type">int</span> * global ptr;          <span class="comment">// OK.</span>
-<span class="predefined-type">int</span> * global ptr;                 <span class="comment">// OK.</span>
-constant <span class="predefined-type">int</span> *global ptr=&amp;baz;    <span class="comment">// Error, baz is in the global address</span>
-                                  <span class="comment">// space.</span>
-global <span class="predefined-type">int</span> * constant ptr = &amp;baz; <span class="comment">// OK</span>
-
-global image2d_t im;    <span class="comment">// Error. Invalid type for program scope variables.</span>
-
-global event_t ev;      <span class="comment">// Error. Invalid type for program scope variables.</span></code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>The <code>const</code> qualifier can also be used with the <code>__global</code> qualifier to
-specify a read-only buffer memory object.</p>
+<p>Variables at program scope or <code>static</code> or <code>extern</code> variables inside functions
+can be declared in global address space if the
+<code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature is supported. These
+variables in the <code>global</code> address space have the same lifetime as the program,
+and their values persist between calls to any of the kernels in the program.
+They are not shared across devices and have distinct storage.</p>
 </div>
 </div>
 </div>
@@ -3838,49 +3764,21 @@
 <div class="openblock">
 <div class="content">
 <div class="paragraph">
-<p>The <code>__local</code> or <code>local</code> address space name is used to describe variables
-that need to be allocated in local memory and are shared by all work-items
-of a work-group.
-Pointers to the <code>__local</code> address space are allowed as arguments to
-functions (including kernel functions).
-Variables declared in the <code>__local</code> address space inside a kernel function
-must occur at kernel function scope.</p>
+<p>The <code>__local</code> or <code>local</code> address space name is used to describe variables that
+are allocated in local memory and shared by all work-items in a work-group.</p>
 </div>
 <div class="paragraph">
-<p>Some examples of variables allocated in the <code>__local</code> address space inside
-a kernel function are:</p>
+<p>Examples:</p>
 </div>
 <div class="listingblock">
 <div class="content">
 <pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> my_func(...)
 {
     local <span class="predefined-type">float</span> a;     <span class="comment">// A single float allocated</span>
-                       <span class="comment">// in local address space</span>
+                       <span class="comment">// in the local address space</span>
 
     local <span class="predefined-type">float</span> b[<span class="integer">10</span>]; <span class="comment">// An array of 10 floats</span>
-                       <span class="comment">// allocated in local address space.</span>
-
-    <span class="keyword">if</span> (...)
-    {
-        <span class="comment">// example of variable in __local address space but not</span>
-        <span class="comment">// declared at __kernel function scope.</span>
-        local <span class="predefined-type">float</span> c; <span class="comment">// not allowed.</span>
-    }
-}</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>Variables allocated in the <code>__local</code> address space inside a kernel
-function cannot be initialized.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> my_func(...)
-{
-    local <span class="predefined-type">float</span> a = <span class="integer">1</span>; <span class="comment">// not allowed</span>
-
-    local <span class="predefined-type">float</span> b;
-    b = <span class="integer">1</span>;             <span class="comment">// allowed</span>
+                       <span class="comment">// allocated in the local address space</span>
 }</code></pre>
 </div>
 </div>
@@ -3909,16 +3807,10 @@
 <div class="content">
 <div class="paragraph">
 <p>The <code>__constant</code> or <code>constant</code> address space name is used to describe
-variables allocated in <code>global</code> memory and which are accessed inside a
-kernel(s) as read-only variables.
-These read-only variables can be accessed by all (global) work-items of the
-kernel during its execution.
-Pointers to the <code>__constant</code> address space are allowed as arguments to
-functions (including kernel functions) and for variables declared inside
-functions.</p>
-</div>
-<div class="paragraph">
-<p>All string literal storage shall be in the <code>__constant</code> address space.</p>
+read-only variables that are accessible globally. They may
+be declared in program scope or in the outermost kernel scope or inside
+ functions with a <code>static</code> or <code>extern</code> storage class specifier. Such variables
+ can be accessed by all work-items or by different kernels during the program execution.</p>
 </div>
 <div class="admonitionblock note">
 <table>
@@ -3937,18 +3829,29 @@
 </table>
 </div>
 <div class="paragraph">
-<p>Variables in the program scope can be declared in the <code>__constant</code> address
-space.
-Variables in the outermost scope of kernel functions can be declared in the
-<code>__constant</code> address space.
-These variables are required to be initialized and the values used to
-initialize these variables must be a compile time constant.
-Writing to such a variable results in a compile-time error.</p>
+<p>It is illegal to write to a variable in the constant address space and will
+result in a compilation error.</p>
+</div>
+<div class="paragraph">
+<p>Example:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">constant <span class="predefined-type">int</span> a = <span class="integer">3</span>; <span class="comment">// int allocated in the constant address space</span>
+kernel <span class="directive">void</span> k1(global <span class="predefined-type">int</span> *buf)
+{
+    buf[a] = ...;   <span class="comment">// OK. All work items access element with index 3.</span>
+}
+kernel <span class="directive">void</span> k2(global <span class="predefined-type">int</span> *buf)
+{
+    *buf = a;       <span class="comment">// OK. All work items store value 3.</span>
+    a = <span class="integer">42</span>;         <span class="comment">// Error. a is in constant memory.</span>
+}</code></pre>
+</div>
 </div>
 <div class="paragraph">
 <p>Implementations are not required to aggregate these declarations into the
-fewest number of constant arguments.
-This behavior is implementation defined.</p>
+fewest number of constant arguments. This behavior is implementation defined.</p>
 </div>
 <div class="paragraph">
 <p>Thus portable code must conservatively assume that each variable declared
@@ -3963,9 +3866,21 @@
 <div class="openblock">
 <div class="content">
 <div class="paragraph">
-<p>Variables inside a kernel function not declared with an address space
-qualifier, all variables inside non-kernel functions, and all function
-arguments are in the <code>__private</code> or <code>private</code> address space.</p>
+<p>The private address space is a memory segment that can only be accessed by one
+work item. Variables that are not shareable among work items are allocated in
+the private address space, and it is the default address space for most
+variables, in particular variables with automatic storage duration.</p>
+</div>
+<div class="paragraph">
+<p>Example:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> foo(...)
+{
+    private <span class="predefined-type">int</span> i;
+}</code></pre>
+</div>
 </div>
 </div>
 </div>
@@ -3974,6 +3889,260 @@
 <h4 id="the-generic-address-space"><a class="anchor" href="#the-generic-address-space"></a>6.7.5. The Generic Address Space</h4>
 <div class="openblock">
 <div class="content">
+<div class="paragraph">
+<p>The generic address space requires support for OpenCL C 2.0 or OpenCL C 3.0 with
+the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature. It can be used with pointer
+types and it represents a placeholder for any of the named address spaces
+- <code>global</code>, <code>local</code> or <code>private</code>. It signals that a pointer points to an object
+in one of these concrete named address spaces. The exact address space
+resolution can occur dynamically during the kernel execution.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> foo(<span class="predefined-type">int</span> a)
+{
+    private <span class="predefined-type">int</span> b;
+    local <span class="predefined-type">int</span> c;
+    <span class="predefined-type">int</span>* p =  a ? &amp;b : &amp;c; <span class="comment">// p points to the local or private address space.</span>
+}</code></pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_usage_for_declaration_scopes_and_variable_types"><a class="anchor" href="#_usage_for_declaration_scopes_and_variable_types"></a>6.7.6. Usage for declaration scopes and variable types</h4>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>This section describes use of address space qualifiers with respect to
+declaration scopes or variable types.</p>
+</div>
+<div class="paragraph">
+<p>Local variables inside functions can be qualified by the private address space
+qualifier.</p>
+</div>
+<div class="paragraph">
+<p>Variables declared in the outermost compound statement inside the body of the
+kernel function can be qualified by the local or constant address spaces.</p>
+</div>
+<div class="paragraph">
+<p>Examples:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> my_func(...)
+{
+    private <span class="predefined-type">float</span> a;    <span class="comment">// OK.</span>
+    local <span class="predefined-type">float</span> b;      <span class="comment">// OK.</span>
+
+    <span class="keyword">if</span> (...)
+    {
+        <span class="comment">// Example of variable in __local address space but not</span>
+        <span class="comment">// declared at __kernel function scope.</span>
+        local <span class="predefined-type">float</span> c;  <span class="comment">// Error.</span>
+    }
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Program scope variables or variables with a <code>extern</code> or <code>static</code> storage class
+specifier:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Must be qualified by <code>__constant</code> in OpenCL C prior to 2.0 or OpenCL C 3.0
+without <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature.</p>
+</li>
+<li>
+<p>Can be qualified by either <code><em>constant</code> or <code></em>global</code> for OpenCL C 2.0 or
+OpenCL C 3.0 with <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Examples:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_program_scope_global_variables feature macro.</span>
+
+constant <span class="predefined-type">int</span> foo;       <span class="comment">// OK.</span>
+global <span class="predefined-type">int</span> baz;         <span class="comment">// OK.</span>
+global uchar buf[<span class="integer">512</span>];  <span class="comment">// OK.</span>
+
+<span class="directive">static</span> global <span class="predefined-type">int</span> bat;  <span class="comment">// OK. Internal linkage.</span>
+
+<span class="directive">extern</span> constant <span class="predefined-type">int</span> foo;  <span class="comment">// OK.</span>
+
+<span class="directive">void</span> func(...)
+{
+    constant <span class="directive">static</span> <span class="predefined-type">int</span> foo = <span class="integer">1</span>; <span class="comment">// OK.</span>
+    global <span class="directive">extern</span> <span class="predefined-type">int</span> foo;       <span class="comment">// OK.</span>
+}
+
+global <span class="predefined-type">int</span> *global ptr;           <span class="comment">// OK.</span>
+constant <span class="predefined-type">int</span> *global ptr = &amp;baz;  <span class="comment">// Error, baz is in the global address space.</span>
+global <span class="predefined-type">int</span> *constant ptr = &amp;baz;  <span class="comment">// OK.</span></code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Kernel function arguments declared to be a pointer or an array of a type
+must point to one of the named address spaces <code>__global</code>, <code>__local</code> or
+<code>__constant</code>.</p>
+</div>
+<div class="paragraph">
+<p>Examples:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"> <span class="comment">// OK.</span>
+kernel <span class="directive">void</span> my_kernel(global <span class="predefined-type">int</span> *ptr)
+{
+    ...
+}
+ <span class="comment">// Error, ptr must point to the global, local, or constant address space.</span>
+kernel <span class="directive">void</span> my_kernel(<span class="predefined-type">int</span> *ptr)
+{
+    ...
+}</code></pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_initialization"><a class="anchor" href="#_initialization"></a>6.7.7. Initialization</h4>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>Program scope and <code>static</code> variables in the <code>__global</code> address space are zero
+initialized by default. A constant expression may be given as an initializer.</p>
+</div>
+<div class="paragraph">
+<p>Variables allocated in the <code>__local</code> address space inside a kernel function
+cannot be initialized.</p>
+</div>
+<div class="paragraph">
+<p>Variables allocated in the __constant address space are required to be initialized
+and the values used to initialize these variables must be a compile time constant.</p>
+</div>
+<div class="paragraph">
+<p>Private address space objects are not initialized by default; any initializer is
+allowed to be given.</p>
+</div>
+<div class="paragraph">
+<p>Examples:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">global <span class="predefined-type">int</span> a = <span class="integer">12</span>;      <span class="comment">// Initialization is allowed.</span>
+global <span class="predefined-type">int</span> b;           <span class="comment">// Zero initialized.</span>
+constant <span class="predefined-type">int</span> c = <span class="integer">12</span>;    <span class="comment">// Initializer is a compile time constant.</span>
+constant <span class="predefined-type">int</span> d;         <span class="comment">// Error. No initializer provided.</span>
+kernel <span class="directive">void</span> my_func(...)
+{
+    local <span class="predefined-type">float</span> e = <span class="integer">1</span>;  <span class="comment">// Error. Initializer is not allowed.</span>
+
+    local <span class="predefined-type">float</span> f;
+    f = <span class="integer">1</span>;              <span class="comment">// Allowed</span>
+    private <span class="predefined-type">int</span> g;      <span class="comment">// Uninitialized.</span>
+    constant <span class="predefined-type">int</span> h = g; <span class="comment">// Error. Initializer is not a constant expression.</span>
+}</code></pre>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="addr-spaces-inference"><a class="anchor" href="#addr-spaces-inference"></a>6.7.8. Inference</h4>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>Address space qualifiers are not required in many cases. If they are not
+specified explicitly the default address space will be inferred depending
+on the declaration scope and the object type.</p>
+</div>
+<div class="paragraph">
+<p>There is no syntax to provide address space in the source for some situations,
+therefore only the default address space is applicable.</p>
+</div>
+<div class="paragraph">
+<p>For OpenCL C 2.0 or with the <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code>
+feature, the address space for a variable at program scope or a <code>static</code>
+or <code>extern</code> variable inside a function are inferred to be <code>__global</code>.</p>
+</div>
+<div class="paragraph">
+<p>If the generic address space is supported i.e. for OpenCL C 2.0 or OpenCL C 3.0
+with <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature, pointers that are declared
+without pointing to a named address space point to the generic address space.</p>
+</div>
+<div class="paragraph">
+<p>All string literal storage shall be in the <code>__constant</code> address space.</p>
+</div>
+<div class="paragraph">
+<p>For all other cases that are not listed above the address space is inferred to
+<code>__private</code>. This includes:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>All function arguments as well as return values are in the private address
+space.</p>
+</li>
+<li>
+<p>Pointers that are declared without pointing to a named address space point
+to the <code>__private</code> address space if the generic address space is not
+supported.</p>
+</li>
+<li>
+<p>Variables inside a function not declared with an address space qualifier
+are inferred to be in the private address space.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Examples:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_program_scope_global_variables feature macro.</span>
+
+<span class="predefined-type">int</span> foo;                <span class="comment">// Inferred to be in the global address space.</span>
+
+<span class="directive">static</span> <span class="predefined-type">int</span> foo;         <span class="comment">// Inferred to be in the global address space.</span>
+
+<span class="predefined-type">int</span> *ptr;               <span class="comment">// ptr is inferred to be in the global address space.</span>
+                        <span class="comment">// ptr points to a location in (1) the generic address</span>
+                        <span class="comment">// space for OpenCL C 2.0 or OpenCL C 3.0 with</span>
+                        <span class="comment">// __opencl_c_generic_address_space feature or</span>
+                        <span class="comment">// in (2) the private address space otherwise.</span>
+
+<span class="predefined-type">int</span> *global ptr;        <span class="comment">// ptr is declared to be in the global address space.</span>
+                        <span class="comment">// ptr points to an location in (1) the generic address</span>
+                        <span class="comment">// space for OpenCL C 2.0 or OpenCL C 3.0 with</span>
+                        <span class="comment">// __opencl_c_generic_address_space feature or</span>
+                        <span class="comment">// in (2) the private address space otherwise.</span>
+
+constant <span class="predefined-type">int</span> *ptr =
+               <span class="string"><span class="delimiter">&quot;</span><span class="content">Hello</span><span class="delimiter">&quot;</span></span>; <span class="comment">// string literal is in constant address space.</span>
+
+<span class="directive">void</span> func(<span class="predefined-type">int</span> param)    <span class="comment">// param is allocated in the private address space.</span>
+{
+    <span class="predefined-type">int</span> foo;            <span class="comment">// foo is allocated in the private address space.</span>
+    <span class="directive">static</span> <span class="predefined-type">int</span> foo;     <span class="comment">// foo is allocated in the global address space.</span>
+    <span class="predefined-type">int</span> *ptr;           <span class="comment">// ptr is allocated in the private address space.</span>
+                        <span class="comment">// ptr points to a location in (1) the generic address</span>
+                        <span class="comment">// space for OpenCL C 2.0 or OpenCL C 3.0 with</span>
+                        <span class="comment">// __opencl_c_generic_address_space feature or</span>
+                        <span class="comment">// in (2) the private address space otherwise.</span>
+    ...
+}</code></pre>
+</div>
+</div>
 <div class="admonitionblock note">
 <table>
 <tr>
@@ -3981,16 +4150,157 @@
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
-The functionality described in this section <a href="#unified-spec">requires</a>
-support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the
-<code>__opencl_c_generic_address_space</code> feature.
+<div class="paragraph">
+<p>Qualifiers must be explicitly specified for:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Program scope variables or variables inside functions with
+a <code>static</code> or <code>extern</code> type specifier for OpenCL C prior to version 2.0 or
+OpenCL C 3.0 without <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature,</p>
+</li>
+<li>
+<p>Pointers used as arguments to kernel functions (the address space pointed
+to must be specified explicitly).</p>
+</li>
+</ul>
+</div>
+</td>
+</tr>
+</table>
+</div>
+<table id="table-addr-spaces-summary" class="tableblock frame-all grid-all stretch">
+<caption class="title">Table 8. Address space behavior</caption>
+<colgroup>
+<col style="width: 14.2857%;">
+<col style="width: 28.5714%;">
+<col style="width: 28.5714%;">
+<col style="width: 28.5715%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Address Space</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Supported Usage</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Initialization</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Inference</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__global</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Program scope variables, for OpenCL C 2.0 or
+    OpenCL C 3.0 with the <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature,</p>
+<p class="tableblock">    <code>static</code> or <code>extern</code> local variables, for OpenCL C 2.0 or
+    OpenCL C 3.0 with the <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature,</p>
+<p class="tableblock">    Pointers.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Optional constant initializers, 0-initialized by default.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Program scope variables, for OpenCL C 2.0 or
+    OpenCL C 3.0 with the <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature.</p>
+<p class="tableblock">    <code>static</code> or <code>extern</code> local variables, for OpenCL C 2.0 or
+    OpenCL C 3.0 with the <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__private</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Local scope variables,</p>
+<p class="tableblock">    Function arguments and return types,</p>
+<p class="tableblock">    Pointers.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Optional initializers, otherwise no default initialization.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Local scope variables,</p>
+<p class="tableblock">    Function arguments and return types,</p>
+<p class="tableblock">    Pointers in which the address space they point to is not given explicitly,
+    for OpenCL C prior to version 2.0 or OpenCL C 3.0 without the
+    <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__constant</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Program scope variables,</p>
+<p class="tableblock">    Kernel scope variables,</p>
+<p class="tableblock">    String literals,</p>
+<p class="tableblock">    Pointers.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Mandatory initialization with a compile time constant.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">String literals.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>__local</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Kernel scope variables,</p>
+<p class="tableblock">    Pointers.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Not supported.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Not supported.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Generic</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Pointers, for OpenCL C 2.0 or OpenCL C 3.0 with the
+    <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Not applicable.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Pointers in which the address space they point to is not given explicitly,
+    for OpenCL C 2.0 or OpenCL C 3.0 with the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
+    feature.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="addr-spaces-conversions"><a class="anchor" href="#addr-spaces-conversions"></a>6.7.9. Address space conversions</h4>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p>OpenCL implements the address space nesting model for pointers from
+<a href="#embedded-c-spec">Embedded C, section 5.1.3</a> as follows:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>In OpenCL the named address spaces <code>__global</code>, <code>__local</code>,
+<code>__constant</code> and <code>__private</code> are disjoint.</p>
+</li>
+<li>
+<p>The named address spaces <code>__global</code>, <code>__local</code>, and <code>__private</code>
+are subsets of the unnamed generic address spaces.</p>
+</li>
+<li>
+<p>The unnamed generic address space does not overlap the named <code>__constant</code>
+address space; the named <code>__constant</code> address space is not in the generic
+address space.</p>
+</li>
+</ul>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-note" title="Note"></i>
+</td>
+<td class="content">
+<div class="paragraph">
+<p>The OpenCL definition of the generic address space is different than the
+definition in section 5 of the <a href="#embedded-c-spec">Embedded C Specification</a>. In
+OpenCL C, no objects can be allocated in this address space. It can only be used
+with pointer types, where a pointer pointing to a location in the generic
+address space can be used for objects allocated in any of the concrete named
+address spaces <code>private</code>, <code>local</code>, or <code>global</code>.</p>
+</div>
 </td>
 </tr>
 </table>
 </div>
 <div class="paragraph">
-<p>The following rules apply when using pointers that point to the generic
-address space:</p>
+<p>Following section 5.3 of the <a href="#embedded-c-spec">Embedded C Specification</a>, it
+is only allowed to convert pointers implicitly, i.e. in assignments, function
+parameters, operations, if the original pointer points to an object qualified by
+an address space enclosed into the address space pointed by the destination
+pointer.</p>
+</div>
+<div class="paragraph">
+<p>In contrast to the <a href="#embedded-c-spec">Embedded C Specification</a>, explicitly
+converting i.e. casting between pointers to non-overlapping address spaces is
+illegal in OpenCL.</p>
+</div>
+<div class="paragraph">
+<p>Considering the above, the following applies to conversions of pointers pointing
+to different address spaces:</p>
 </div>
 <div class="ulist">
 <ul>
@@ -4011,16 +4321,19 @@
 </ul>
 </div>
 <div class="paragraph">
-<p>A few examples follow.</p>
+<p>Examples:</p>
 </div>
 <div class="paragraph">
 <p>This is the canonical example.
 In this example, function <code>foo</code> is declared with an argument that is a
-pointer with no address space qualifier.</p>
+pointer with the unnamed generic address space address space qualifier.</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c"><span class="directive">void</span> foo(<span class="predefined-type">int</span> *a)
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+<span class="directive">void</span> foo(<span class="predefined-type">int</span> *a)
 {
     *a = *a + <span class="integer">2</span>;
 }
@@ -4047,7 +4360,10 @@
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> bar(global <span class="predefined-type">int</span> *g, local <span class="predefined-type">int</span> *l)
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+kernel <span class="directive">void</span> bar(global <span class="predefined-type">int</span> *g, local <span class="predefined-type">int</span> *l)
 {
     <span class="predefined-type">int</span> *var;
 
@@ -4070,7 +4386,10 @@
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> *ptr;
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+<span class="predefined-type">int</span> *ptr;
 global <span class="predefined-type">int</span> g;
 ptr = &amp;g; <span class="comment">// legal</span>
 
@@ -4097,131 +4416,148 @@
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c">global <span class="predefined-type">int</span> *gp;
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+global <span class="predefined-type">int</span> *gp;
 local <span class="predefined-type">int</span> *lp;
 private <span class="predefined-type">int</span> *pp;
 constant <span class="predefined-type">int</span> *cp;
 
 <span class="predefined-type">int</span> *p;
-p = gp; <span class="comment">// legal</span>
-p = lp; <span class="comment">// legal</span>
-p = pp; <span class="comment">// legal</span>
-p = cp; <span class="comment">// illegal</span>
+p = gp; <span class="comment">// OK.</span>
+p = lp; <span class="comment">// OK.</span>
+p = pp; <span class="comment">// OK.</span>
+p = cp; <span class="comment">// Error.</span>
 
 <span class="comment">// it is illegal to convert from a generic pointer</span>
 <span class="comment">// to an explicit address space pointer without a cast:</span>
-gp = p; <span class="comment">// compile-time error</span>
-lp = p; <span class="comment">// compile-time error</span>
-pp = p; <span class="comment">// compile-time error</span>
-cp = p; <span class="comment">// compile-time error</span></code></pre>
+gp = p; <span class="comment">// Error.</span>
+lp = p; <span class="comment">// Error.</span>
+pp = p; <span class="comment">// Error.</span>
+cp = p; <span class="comment">// Error.</span></code></pre>
 </div>
 </div>
-</div>
-</div>
-</div>
-<div class="sect3">
-<h4 id="changes-to-C99"><a class="anchor" href="#changes-to-C99"></a>6.7.6. Changes to C99</h4>
-<div class="paragraph">
-<p>This section details the modifications to the <a href="#C99-spec">C99
-Specification</a> needed to incorporate the functionality of named address
-space and the generic address space:</p>
-</div>
 <div class="paragraph">
-<p><strong>Clause 6.2.5 - Types, replace paragraph 26 with the following paragraphs</strong>:</p>
+<p>The example below illustrates the implicit conversion between named address
+spaces.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">global <span class="predefined-type">int</span> *gp;
+local <span class="predefined-type">int</span> *lp;
+private <span class="predefined-type">int</span> *pp;
+constant <span class="predefined-type">int</span> *cp;
+
+<span class="comment">// it is illegal to convert pointers pointing to different</span>
+<span class="comment">// named address spaces.</span>
+
+gp = lp; <span class="comment">// Error.</span>
+gp = pp; <span class="comment">// Error.</span>
+gp = cp; <span class="comment">// Error.</span>
+
+lp = gp; <span class="comment">// Error.</span>
+lp = pp; <span class="comment">// Error.</span>
+lp = cp; <span class="comment">// Error.</span>
+
+pp = lp; <span class="comment">// Error.</span>
+pp = gp; <span class="comment">// Error.</span>
+pp = cp; <span class="comment">// Error.</span>
+
+cp = lp; <span class="comment">// Error.</span>
+cp = pp; <span class="comment">// Error.</span>
+cp = gp; <span class="comment">// Error.</span></code></pre>
+</div>
 </div>
 <div class="paragraph">
-<p>If type <code>T</code> is qualified by the address space qualifier for address space
-<code>A</code>, then " <code>T</code> is in <code>A</code> ".
-If type <code>T</code> is in address space <code>A</code>, a pointer to <code>T</code> is also a " pointer
-into <code>A</code> " and the referenced address space of the pointer is <code>A</code>.</p>
+<p>The example below demonstrates explicit conversions for pointers pointing to
+different address spaces.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+global <span class="predefined-type">int</span> *gp;
+local <span class="predefined-type">int</span> *lp;
+private <span class="predefined-type">int</span> *pp;
+constant <span class="predefined-type">int</span> *cp;
+
+<span class="predefined-type">int</span> *p;
+gp = (global <span class="predefined-type">int</span> *)lp;  <span class="comment">// illegal to cast between named address spaces</span>
+p = (<span class="predefined-type">int</span> *)lp;          <span class="comment">// legal to cast from global to generic</span>
+gp = (global <span class="predefined-type">int</span>*)p;    <span class="comment">// legal to cast from generic to global</span></code></pre>
+</div>
 </div>
 <div class="paragraph">
-<p>A pointer to <code>void</code> in any address space shall have the same representation
-and alignment requirements as a pointer to a character type in the same
-address space.
-Similarly, pointers to differently access-qualified versions of compatible
-types shall have the same representation and alignment requirements.
-All pointers to structure types in the same address space shall have the
-same representation and alignment requirements as each other.
-All pointers to union types in the same address space shall have the same
-representation and alignment requirements as each other.</p>
+<p>For nested pointers, implicit conversions between address spaces are disallowed.
+Explicitly casting between different address spaces in nested pointers is
+allowed but the use of such pointers can lead to incorrect behavior such as
+accessing invalid memory locations.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+kernel <span class="directive">void</span> mykernel(...)
+{
+    <span class="comment">// ll is a pointer to a pointer in the local address space,</span>
+    <span class="comment">// which points to an integer in the local address space</span>
+    local <span class="predefined-type">int</span> *local *ll;
+
+    <span class="comment">// gl is a pointer to a pointer in the local address space,</span>
+    <span class="comment">// which points to an integer in the global address space</span>
+    global <span class="predefined-type">int</span> *local *gl;
+
+    <span class="comment">// nl is a pointer to a pointer in the local address space,</span>
+    <span class="comment">// which points to an integer via the unnamed generic address space</span>
+    <span class="predefined-type">int</span> *local * nl;
+
+    ll = gl;  <span class="comment">// Error, cannot convert address spaces implicitly</span>
+              <span class="comment">// for nested pointers.</span>
+    ll = nl;  <span class="comment">// Error, cannot convert address spaces implicitly</span>
+              <span class="comment">// for nested pointers.</span>
+    ll = (local <span class="predefined-type">int</span>* local*)gl; <span class="comment">// OK to convert explicitly,</span>
+                                <span class="comment">// but uses of 'll' can result in</span>
+                                <span class="comment">// in ill-formed program.</span>
+    ll = (local <span class="predefined-type">int</span>* local*)nl; <span class="comment">// OK to convert explicitly,</span>
+                                <span class="comment">// but uses of 'll' can result in</span>
+                                <span class="comment">// in ill-formed program.</span>
+}</code></pre>
+</div>
 </div>
 <div class="paragraph">
-<p><strong>Clause 6.3.2.3 - Pointers, replace the first two paragraphs with the
-following paragraphs</strong>:</p>
+<p>Various clarifications and examples illustrating how changes to ISO/IEC
+9899:1999 detailed in <a href="#embedded-c-spec">Embedded C, section 5.3</a> apply
+to OpenCL C with the generic address space.</p>
 </div>
 <div class="paragraph">
-<p>If a pointer into one address space is converted to a pointer into another
-address space, then unless the original pointer is a null pointer or the
-location referred to by the original pointer is within the second address
-space, the behavior is undefined.
-(For the original pointer to refer to a location within the second address
-space, the two address spaces must overlap).</p>
+<p><strong>Clause 6.2.5 - Types</strong>:</p>
 </div>
 <div class="paragraph">
-<p>A pointer to <code>void</code> in any address space may be converted to or from a
-pointer to any incomplete or object type.
-A pointer to any incomplete or object type in some address space may be
-converted to a pointer to <code>void</code> in an enclosing address space and back
-again; the result shall compare equal to the original pointer.</p>
+<p>If address space qualifier on type T is omitted refer to
+<a href="#addr-spaces-inference">Inference</a>.</p>
 </div>
 <div class="paragraph">
-<p>For any qualifier <em>q</em>, a pointer to a non-<em>q</em>-qualified type may be
-converted to a pointer to the <em>q</em>-qualified version of the type (but with
-the same address-space qualifier or the generic address space); the values
-stored in the original and converted pointers shall compare equal.</p>
+<p><strong>Clause 6.3.2.3 - Pointers</strong></p>
 </div>
 <div class="paragraph">
-<p><strong>Clause 6.3.2.3 - Pointers, replace the last sentence of paragraph 4 with</strong>:</p>
+<p>Conversions between disjoint address spaces are disallowed in OpenCL
+(<a href="#addr-spaces-conversions">Address space conversions</a>).</p>
 </div>
 <div class="paragraph">
-<p>Conversion of a null pointer to another pointer type yields a null pointer
-of that type.
-Any two null pointers whose referenced address spaces overlap shall compare
-equal.</p>
-</div>
-<div class="paragraph">
-<p><strong>Clause 6.5.2.2 - Function calls, change the second bullet of paragraph 6
-to</strong>:</p>
-</div>
-<div class="paragraph">
-<p>both types are pointers to qualified or unqualified versions of a character
-type or <code>void</code> in the same address space or one type is a pointer in a named
-address space and the other is a pointer in the generic address space.</p>
-</div>
-<div class="paragraph">
-<p><strong>Clause 6.5.6 - Additive operators, add another constraint paragraph</strong>:</p>
-</div>
-<div class="paragraph">
-<p>For subtraction, if the two operands are pointers into different address
-spaces, the address spaces must overlap.</p>
-</div>
-<div class="paragraph">
-<p><strong>Clause 6.5.8 - Relational operators, add another constraint paragraph</strong>:</p>
-</div>
-<div class="paragraph">
-<p>If the two operands are pointers into different address spaces, the address
-spaces must overlap.</p>
-</div>
-<div class="paragraph">
-<p><strong>Clause 6.5.8 - Relational operators, add a new paragraph between existing
-paragraphs 3 and 4</strong>:</p>
-</div>
-<div class="paragraph">
-<p>If the two operands are pointers into different address spaces, one of the
-address spaces encloses the other.
-The pointer into the enclosed address space is first converted to a pointer
-to the same reference type except with any address-space qualifier removed
-and any address-space qualifier of the other pointer&#8217;s reference type added.
-(After this conversion, both pointers are pointers into the same address
-space).</p>
+<p><strong>Clause 6.5.8 - Relational operators</strong>:</p>
 </div>
 <div class="paragraph">
 <p>Examples:</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> test1()
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+kernel <span class="directive">void</span> test1()
 {
     global <span class="predefined-type">int</span> arr[<span class="integer">5</span>] = { <span class="integer">0</span>, <span class="integer">1</span>, <span class="integer">2</span>, <span class="integer">3</span>, <span class="integer">4</span> };
     <span class="predefined-type">int</span> *p = &amp;arr[<span class="integer">1</span>];
@@ -4242,47 +4578,17 @@
 </div>
 </div>
 <div class="paragraph">
-<p><strong>Clause 6.5.9 - Equality operators, add another constraint paragraph</strong>:</p>
-</div>
-<div class="paragraph">
-<p>If the two operands are pointers into different address spaces, the address
-spaces must overlap.</p>
-</div>
-<div class="paragraph">
-<p><strong>Clause 6.5.9 - Equality operators, replace paragraph 5 with</strong>:</p>
-</div>
-<div class="paragraph">
-<p>Otherwise, at least one operand is a pointer.
-If one operand is a pointer and the other is a null pointer constant, the
-null pointer constant is converted to the type of the pointer.
-If both operands are pointers, each of the following conversions is
-performed as applicable:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>If the two operands are pointers into different address spaces, one of
-the address spaces encloses the other.
-The pointer into the enclosed address space is first converted to a
-pointer to the same reference type except with any address-space
-qualifier removed and any address-space qualifier of the other pointer&#8217;s
-reference type added.
-(After this conversion, both pointers are pointers into the same address
-space).</p>
-</li>
-<li>
-<p>Then, if one operand is a pointer to an object or incomplete type and
-the other is a pointer to a qualified or unqualified version of <code>void</code>,
-the former is converted to the type of the latter.</p>
-</li>
-</ul>
+<p><strong>Clause 6.5.9 - Equality operators</strong>:</p>
 </div>
 <div class="paragraph">
 <p>Examples:</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> *ptr = <span class="predefined-constant">NULL</span>;
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+<span class="predefined-type">int</span> *ptr = <span class="predefined-constant">NULL</span>;
 local <span class="predefined-type">int</span> lval = SOME_VAL;
 local <span class="predefined-type">int</span> *lptr = &amp;lval;
 global <span class="predefined-type">int</span> gval = SOME_OTHER_VAL;
@@ -4311,7 +4617,10 @@
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">bool</span> callee(<span class="predefined-type">int</span> *p1, <span class="predefined-type">int</span> *p2)
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+<span class="predefined-type">bool</span> callee(<span class="predefined-type">int</span> *p1, <span class="predefined-type">int</span> *p2)
 {
     <span class="keyword">if</span> (p1 == p2)
         <span class="keyword">return</span> <span class="predefined-constant">true</span>;
@@ -4339,7 +4648,10 @@
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> *ptr = <span class="predefined-constant">NULL</span>;
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+<span class="predefined-type">int</span> *ptr = <span class="predefined-constant">NULL</span>;
 local <span class="predefined-type">int</span> *lptr = <span class="predefined-constant">NULL</span>;
 global <span class="predefined-type">int</span> *gptr = <span class="predefined-constant">NULL</span>;
 
@@ -4373,26 +4685,17 @@
 </div>
 </div>
 <div class="paragraph">
-<p><strong>Clause 6.5.9 - Equality operators, replace first sentence of paragraph 6
-with</strong>:</p>
-</div>
-<div class="paragraph">
-<p>Two pointers compare equal if and only if both are null pointers with
-overlapping address spaces.</p>
-</div>
-<div class="paragraph">
-<p><strong>Clause 6.5.15 - Conditional operator, add another constraint paragraph</strong>:</p>
-</div>
-<div class="paragraph">
-<p>If the second and third operands are pointers into different address spaces,
-the address spaces must overlap.</p>
+<p><strong>Clause 6.5.15 - Conditional operator</strong>:</p>
 </div>
 <div class="paragraph">
 <p>Examples:</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> test1()
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+kernel <span class="directive">void</span> test1()
 {
     global <span class="predefined-type">int</span> arr[<span class="integer">5</span>] = { <span class="integer">0</span>, <span class="integer">1</span>, <span class="integer">2</span>, <span class="integer">3</span>, <span class="integer">4</span> };
     <span class="predefined-type">int</span> *p = &amp;arr[<span class="integer">1</span>];
@@ -4411,32 +4714,17 @@
 </div>
 </div>
 <div class="paragraph">
-<p><strong>Clause 6.5.16.1 - Simple assignment, change the third and fourth bullets of
-paragraph 1 to</strong>:</p>
-</div>
-<div class="ulist">
-<ul>
-<li>
-<p>both operands are pointers to qualified or unqualified versions of
-compatible types, the referenced address space of the left encloses the
-referenced address space of the right, and the type pointed to by the
-left has all the qualifiers of the type pointed to by the right.</p>
-</li>
-<li>
-<p>one operand is a pointer to an object or incomplete type and the other
-is a pointer to a qualified or unqualified version of <code>void</code>, the
-referenced address space of the left encloses the referenced address
-space of the right, and the type pointed to by the left has all the
-qualifiers of the type pointed to by the right.</p>
-</li>
-</ul>
+<p><strong>Clause 6.5.16.1 - Simple assignment</strong>:</p>
 </div>
 <div class="paragraph">
 <p>Examples:</p>
 </div>
 <div class="listingblock">
 <div class="content">
-<pre class="CodeRay highlight"><code data-lang="c">kernel <span class="directive">void</span> f()
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// Note: these examples assume OpenCL C 2.0 or the</span>
+<span class="comment">// __opencl_c_generic_address_space feature support.</span>
+
+kernel <span class="directive">void</span> f()
 {
 <span class="predefined-type">int</span> *ptr;
 local <span class="predefined-type">int</span> *lptr;
@@ -4454,19 +4742,13 @@
 </div>
 </div>
 <div class="paragraph">
-<p><strong>Clause 6.7.2.1 - Structure and union specifiers, add a new constraint
-paragraph</strong>:</p>
+<p><strong>Clause 6.7.3 - Type qualifiers</strong></p>
 </div>
 <div class="paragraph">
-<p>Within a structure or union specifier, the type of a member shall not be
-qualified by an address space qualifier.</p>
+<p>The type of an object with automatic storage duration are in private address
+space and therefore can be qualified with <code>private</code>/<code>__private</code>.</p>
 </div>
-<div class="paragraph">
-<p><strong>Clause 6.7.3 - Type qualifiers, add three new constraint paragraphs</strong>:</p>
 </div>
-<div class="paragraph">
-<p>No type shall be qualified by qualifiers for two or more different address
-spaces.</p>
 </div>
 </div>
 </div>
@@ -4479,7 +4761,7 @@
 read-only or write-only.</p>
 </div>
 <div class="paragraph">
-<p>For OpenCL C 2.0, or with the <code>__opencl_c_read_write_images</code> feature,
+<p>For OpenCL C 2.0, or with the <code>__opencl_c_<wbr>read_<wbr>write_<wbr>images</code> feature,
 image objects specified as arguments to a kernel can additionally be
 declared to be read-write.</p>
 </div>
@@ -4851,11 +5133,7 @@
 <p>Arguments to kernel functions in a program cannot be declared with the
 built-in scalar types <code>bool</code>, <code>size_t</code>, <code>ptrdiff_t</code>, <code>intptr_t</code>, and
 <code>uintptr_t</code> or a struct and/or union that contain fields declared to be
-one of these built-in scalar types.
-The size in bytes of these types are implementation-defined and in
-addition can also be different for the OpenCL device and the host
-processor making it difficult to allocate buffer objects to be passed as
-arguments to a kernel declared as pointer to these types.</p>
+one of these built-in scalar types.</p>
 </li>
 <li>
 <p><code>half</code> is not supported as <code>half</code> can be used as a storage format
@@ -4937,6 +5215,20 @@
 <li>
 <p>Implicit function declaration is not supported.</p>
 </li>
+<li>
+<p>Program scope variables can be defined with any valid OpenCL C data type
+except for those in <a href="#table-other-builtin-types">Other Built-in Data Types</a>. Such program scope
+variables may be of any user-defined type, or a pointer to a user-defined
+type.</p>
+<div class="paragraph">
+<p>In the presence of shared virtual memory, these pointers or pointer
+members should work as expected as long as they are shared virtual memory
+pointers and the referenced storage has been mapped appropriately.
+Program scope varibales can be declared with <code>__constant</code> address space
+qualifiers or if <code>__opencl_c_<wbr>program_<wbr>scope_<wbr>global_<wbr>variables</code> feature is
+supported with <code>__global</code> address space qualifier.</p>
+</div>
+</li>
 </ol>
 </div>
 </div>
@@ -4961,12 +5253,12 @@
 </div>
 <div class="paragraph">
 <p>A <strong>#pragma</strong> directive where the preprocessing token <code>OPENCL</code> (used instead
-of <strong><code>STDC</code></strong>) does not immediately follow <strong>pragma</strong> in the directive (prior to
+of <strong><code>STDC</code></strong>) does not immediately follow <strong>#pragma</strong> in the directive (prior to
 any macro replacement) causes the implementation to behave in an
 implementation-defined manner.
 The behavior might cause translation to fail or cause the translator or the
 resulting program to behave in a non-conforming manner.
-Any such <strong>pragma</strong> that is not recognized by the implementation is ignored.
+Any such <strong>#pragma</strong> that is not recognized by the implementation is ignored.
 If the preprocessing token <code>OPENCL</code> does immediately follow <strong>#pragma</strong> in the
 directive (prior to any macro replacement), then no macro replacement is
 performed on the directive, and the directive shall have one of the
@@ -4999,10 +5291,13 @@
 </dd>
 <dt class="hdlist1"><code>__OPENCL_VERSION__</code> </dt>
 <dd>
-<p>Substitutes an integer reflecting the version number of the OpenCL
-supported by the OpenCL device.
-The version of OpenCL described in this document will have
-<code>__OPENCL_VERSION__</code> substitute the integer 300.</p>
+<p>For OpenCL devices with OpenCL version less than or equal to OpenCL 2.0,
+substitutes an integer value reflecting the OpenCL version supported by the
+device.
+This predefined macro is <a href="#unified-spec">deprecated by</a> OpenCL 2.1.
+For OpenCL devices with OpenCL version greater than OpenCL 2.0, it must be
+defined but may substitute any implementation-defined integer value greater
+than 200, reflecting OpenCL 2.0. <sup class="footnote">[<a id="_footnoteref_29" class="footnote" href="#_footnotedef_29" title="View footnote.">29</a>]</sup></p>
 </dd>
 <dt class="hdlist1"><code>CL_VERSION_1_0</code> </dt>
 <dd>
@@ -5072,7 +5367,7 @@
 <p>Used to determine if the OpenCL device supports images.
 This is an integer constant of 1 if images are supported and is
 undefined otherwise.
-Also refer to the value of the <a href="#opencl-device-queries"><code>CL_DEVICE_IMAGE_SUPPORT</code> device query</a> and the <code>__opencl_c_images</code>
+Also refer to the value of the <a href="#opencl-device-queries"><code>CL_DEVICE_IMAGE_SUPPORT</code> device query</a> and the <code>__opencl_c_<wbr>images</code>
 feature.</p>
 </dd>
 <dt class="hdlist1"><code>__FAST_RELAXED_MATH__</code> </dt>
@@ -5698,14 +5993,14 @@
 <td class="content">
 The functionality described in this section <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the
-<code>__opencl_c_device_enqueue</code> feature.
+<code>__opencl_c_<wbr>device_<wbr>enqueue</code> feature.
 </td>
 </tr>
 </table>
 </div>
 <div class="paragraph">
 <p>This section describes the clang block syntax
-<sup class="footnote">[<a id="_footnoteref_29" class="footnote" href="#_footnotedef_29" title="View footnote.">29</a>]</sup>.</p>
+<sup class="footnote">[<a id="_footnoteref_30" class="footnote" href="#_footnotedef_30" title="View footnote.">30</a>]</sup>.</p>
 </div>
 <div class="paragraph">
 <p>Like function types, the Block type is a pair consisting of a result value
@@ -5936,7 +6231,7 @@
 <p>Block literals are assumed to allocate memory at the point of definition and
 to be destroyed at the end of the same scope.
 To support these behaviors, additional restrictions
-<sup class="footnote">[<a id="_footnoteref_30" class="footnote" href="#_footnotedef_30" title="View footnote.">30</a>]</sup> in addition to the above feature
+<sup class="footnote">[<a id="_footnoteref_31" class="footnote" href="#_footnotedef_31" title="View footnote.">31</a>]</sup> in addition to the above feature
 restrictions are:</p>
 </div>
 <div class="ulist">
@@ -6129,7 +6424,7 @@
 identifier of each work-item when this kernel is being executed on a device.</p>
 </div>
 <table id="table-work-item-functions" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 8. Built-in Work-Item Functions</caption>
+<caption class="title">Table 9. Built-in Work-Item Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -6172,7 +6467,7 @@
       <code>NULL</code>; otherwise the OpenCL implementation chooses an appropriate
       <em>local_work_size</em> value which is returned by this function.
       If the kernel is executed with a non-uniform work-group size
-      <sup class="footnote">[<a id="_footnoteref_31" class="footnote" href="#_footnotedef_31" title="View footnote.">31</a>]</sup>, calls to this built-in from some
+      <sup class="footnote">[<a id="_footnoteref_32" class="footnote" href="#_footnotedef_32" title="View footnote.">32</a>]</sup>, calls to this built-in from some
       work-groups may return different values than calls to this built-in from
       other work-groups.</p>
 <p class="tableblock">      Valid values of <em>dimindx</em> are 0 to <strong>get_work_dim</strong>() - 1.
@@ -6261,7 +6556,7 @@
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
-The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL C 3.0 or newer and the <code>__opencl_c_subgroups</code>
+The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>subgroups</code>
 feature.
 </td>
 </tr>
@@ -6269,12 +6564,12 @@
 </div>
 <div class="paragraph">
 <p>The following table describes the list of built-in work-item functions that
-can be used to query the size of a subgroup, number of subgroups per work group,
+can be used to query the size of a subgroup, number of subgroups per work-group,
 and identifier of the subgroup within a work-group and work-item within a
 subgroup when this kernel is being executed on a device.</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 9. Built-in Work-Item Functions for Subgroups</caption>
+<caption class="title">Table 10. Built-in Work-Item Functions for Subgroups</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -6290,7 +6585,7 @@
 <td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
 <p>uint <strong>get_sub_group_size</strong>()</p>
 </div></div></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of work items in the subgroup.
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of work-items in the subgroup.
   This value is no more than the maximum subgroup size and is
   implementation-defined based on a combination of the compiled kernel and
   the dispatch dimensions.
@@ -6308,23 +6603,23 @@
 <td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
 <p>uint <strong>get_num_sub_groups</strong>()</p>
 </div></div></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of subgroups that the current work group is divided
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of subgroups that the current work-group is divided
   into.</p>
-<p class="tableblock">  This number will be constant for the duration of a work group&#8217;s execution.
-  If the kernel is executed with a non-uniform work group size
+<p class="tableblock">  This number will be constant for the duration of a work-group&#8217;s execution.
+  If the kernel is executed with a non-uniform work-group size
   (i.e. the global_work_size values specified to <strong>clEnqueueNDRangeKernel</strong>
   are not evenly divisible by the local_work_size values for any dimension,
-  calls to this built-in from some work groups may return different values
-  than calls to this built-in from other work groups.</p></td>
+  calls to this built-in from some work-groups may return different values
+  than calls to this built-in from other work-groups.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
 <p>uint <strong>get_enqueued_num_sub_groups</strong>()</p>
 </div></div></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Returns the same value as that returned by <strong>get_num_sub_groups</strong> if the
-  kernel is executed with a uniform work group size.</p>
-<p class="tableblock">  If the kernel is executed with a non-uniform work group size, returns the
-  number of subgroups in each of the work groups that make up the uniform
+  kernel is executed with a uniform work-group size.</p>
+<p class="tableblock">  If the kernel is executed with a non-uniform work-group size, returns the
+  number of subgroups in each of the work-groups that make up the uniform
   region of the global range.</p></td>
 </tr>
 <tr>
@@ -6339,9 +6634,9 @@
 <td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
 <p>uint <strong>get_sub_group_local_id</strong>()</p>
 </div></div></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the unique work item ID within the current subgroup.
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the unique work-item ID within the current subgroup.
   The mapping from <strong>get_local_id</strong>(<em>dimindx</em>) to <strong>get_sub_group_local_id</strong>
-  will be invariant for the lifetime of the work group.</p></td>
+  will be invariant for the lifetime of the work-group.</p></td>
 </tr>
 </tbody>
 </table>
@@ -6380,19 +6675,19 @@
 math functions that can take scalar or vector arguments.
 We use the generic type name <code>gentype</code> to indicate that the function can take
 <code>float</code>, <code>float2</code>, <code>float3</code>, <code>float4</code>, <code>float8</code>, <code>float16</code>, <code>double</code>
-<sup class="footnote" id="_footnote_double-supported">[<a id="_footnoteref_32" class="footnote" href="#_footnotedef_32" title="View footnote.">32</a>]</sup>, <code>double2</code>,
+<sup class="footnote" id="_footnote_double-supported">[<a id="_footnoteref_33" class="footnote" href="#_footnotedef_33" title="View footnote.">33</a>]</sup>, <code>double2</code>,
 <code>double3</code>, <code>double4</code>, <code>double8</code> or <code>double16</code> as the type for the arguments.
 We use the generic type name <code>gentypef</code> to indicate that the function can
 take <code>float</code>, <code>float2</code>, <code>float3</code>, <code>float4</code>, <code>float8</code>, or <code>float16</code> as the
 type for the arguments.
-We use the generic type name <code>gentyped</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_32" title="View footnote.">32</a>]</sup> to
+We use the generic type name <code>gentyped</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_33" title="View footnote.">33</a>]</sup> to
 indicate that the function can take <code>double</code>, <code>double2</code>, <code>double3</code>, <code>double4</code>,
 <code>double8</code> or <code>double16</code> as the type for the arguments.
 For any specific use of a function, the actual type has to be the same for
 all arguments and the return type, unless otherwise specified.</p>
 </div>
 <table id="table-builtin-math" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 10. Built-in Scalar and Vector Argument Math Functions</caption>
+<caption class="title">Table 11. Built-in Scalar and Vector Argument Math Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -6532,7 +6827,7 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock">Returns <em>y</em> if <em>y</em> &lt; <em>x</em>, otherwise it returns <em>x</em>.
       If one argument is a NaN, <strong>fmin</strong>() returns the other argument.
       If both arguments are NaNs, <strong>fmin</strong>() returns a NaN.
-      <sup class="footnote">[<a id="_footnoteref_33" class="footnote" href="#_footnotedef_33" title="View footnote.">33</a>]</sup></p></td>
+      <sup class="footnote">[<a id="_footnoteref_34" class="footnote" href="#_footnotedef_34" title="View footnote.">34</a>]</sup></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>fmod</strong>(gentype <em>x</em>, gentype <em>y</em>)</p></td>
@@ -6544,11 +6839,11 @@
   gentype <strong>fract</strong>(gentype <em>x</em>, __local gentype <em>*iptr</em>)<br>
   gentype <strong>fract</strong>(gentype <em>x</em>, __private gentype <em>*iptr</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  gentype <strong>fract</strong>(gentype <em>x</em>, gentype <em>*iptr</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Returns <strong>fmin</strong>(<em>x</em> - <strong>floor</strong>(<em>x</em>), <code>0x1.fffffep-1f</code>).
       <strong>floor</strong>(x) is returned in <em>iptr</em>.
-      <sup class="footnote">[<a id="_footnoteref_34" class="footnote" href="#_footnotedef_34" title="View footnote.">34</a>]</sup></p></td>
+      <sup class="footnote">[<a id="_footnoteref_35" class="footnote" href="#_footnotedef_35" title="View footnote.">35</a>]</sup></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">float<em>n</em> <strong>frexp</strong>(float<em>n</em> <em>x</em>, __global int<em>n</em> *exp)<br>
@@ -6558,7 +6853,7 @@
 <p class="tableblock">  float<em>n</em> <strong>frexp</strong>(float<em>n</em> <em>x</em>, __private int<em>n</em> *exp)<br>
   float <strong>frexp</strong>(float <em>x</em>, __private int *exp)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  float<em>n</em> <strong>frexp</strong>(float<em>n</em> <em>x</em>, int<em>n</em> *exp)<br>
   float <strong>frexp</strong>(float <em>x</em>, int *exp)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Extract mantissa and exponent from <em>x</em>.
@@ -6574,7 +6869,7 @@
 <p class="tableblock">  double<em>n</em> <strong>frexp</strong>(double<em>n</em> <em>x</em>, __private int<em>n</em> *exp)<br>
   double <strong>frexp</strong>(double <em>x</em>, __private int *exp)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  double<em>n</em> <strong>frexp</strong>(double<em>n</em> <em>x</em>, int<em>n</em> *exp)<br>
   double <strong>frexp</strong>(double <em>x</em>, int *exp)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Extract mantissa and exponent from <em>x</em>.
@@ -6618,7 +6913,7 @@
   double<em>n</em> <strong>lgamma_r</strong>(double<em>n</em> <em>x</em>, __private int<em>n</em> *<em>signp</em>)<br>
   double <strong>lgamma_r</strong>(double <em>x</em>, __private int *<em>signp</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  float<em>n</em> <strong>lgamma_r</strong>(float<em>n</em> <em>x</em>, int<em>n</em> *<em>signp</em>)<br>
   float <strong>lgamma_r</strong>(float <em>x</em>, int *<em>signp</em>)<br>
   double<em>n</em> <strong>lgamma_r</strong>(double<em>n</em> <em>x</em>, int<em>n</em> *<em>signp</em>)<br>
@@ -6657,7 +6952,7 @@
     in the embedded profile.  See the OpenCL SPIR-V Environment Specification
     for details. On some hardware the mad instruction may provide better
     performance than expanded computation of <em>a</em> * <em>b</em> + <em>c</em>.
-    <sup class="footnote">[<a id="_footnoteref_35" class="footnote" href="#_footnotedef_35" title="View footnote.">35</a>]</sup></p></td>
+    <sup class="footnote">[<a id="_footnoteref_36" class="footnote" href="#_footnotedef_36" title="View footnote.">36</a>]</sup></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>maxmag</strong>(gentype <em>x</em>, gentype <em>y</em>)</p></td>
@@ -6676,7 +6971,7 @@
   gentype <strong>modf</strong>(gentype <em>x</em>, __local gentype <em>*iptr</em>)<br>
   gentype <strong>modf</strong>(gentype <em>x</em>, __private gentype <em>*iptr</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  gentype <strong>modf</strong>(gentype <em>x</em>, gentype <em>*iptr</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Decompose a floating-point number.
       The <strong>modf</strong> function breaks the argument <em>x</em> into integral and
@@ -6729,7 +7024,7 @@
 <p class="tableblock">  float<em>n</em> <strong>remquo</strong>(float<em>n</em> <em>x</em>, float<em>n</em> <em>y</em>, __private int<em>n</em> <em>*quo</em>)<br>
   float <strong>remquo</strong>(float <em>x</em>, float <em>y</em>, __private int <em>*quo</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  float<em>n</em> <strong>remquo</strong>(float<em>n</em> <em>x</em>, float<em>n</em> <em>y</em>, int<em>n</em> <em>*quo</em>)<br>
   float <strong>remquo</strong>(float <em>x</em>, float <em>y</em>, int <em>*quo</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The <strong>remquo</strong> function computes the value r such that <em>r</em> = <em>x</em> -
@@ -6750,7 +7045,7 @@
 <p class="tableblock">  double<em>n</em> <strong>remquo</strong>(double<em>n</em> <em>x</em>, double<em>n</em> <em>y</em>, __private int<em>n</em> <em>*quo</em>)<br>
   double <strong>remquo</strong>(double <em>x</em>, double <em>y</em>, __private int <em>*quo</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  double<em>n</em> <strong>remquo</strong>(double<em>n</em> <em>x</em>, double<em>n</em> <em>y</em>, int<em>n</em> <em>*quo</em>)<br>
   double <strong>remquo</strong>(double <em>x</em>, double <em>y</em>, int <em>*quo</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">The <strong>remquo</strong> function computes the value r such that <em>r</em> = <em>x</em> -
@@ -6794,7 +7089,7 @@
   gentype <strong>sincos</strong>(gentype <em>x</em>, __local gentype <em>*cosval</em>)<br>
   gentype <strong>sincos</strong>(gentype <em>x</em>, __private gentype <em>*cosval</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  gentype <strong>sincos</strong>(gentype <em>x</em>, gentype <em>*cosval</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Compute sine and cosine of x.
       The computed sine is the return value and computed cosine is returned
@@ -6867,7 +7162,7 @@
 <code>float16</code> as the type for the arguments.</p>
 </div>
 <table id="table-builtin-half-native-math" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 11. Built-in Scalar and Vector <em>half</em> and <em>native</em> Math Functions</caption>
+<caption class="title">Table 12. Built-in Scalar and Vector <em>half</em> and <em>native</em> Math Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -7072,7 +7367,7 @@
 </table>
 <div class="paragraph">
 <p>If double precision is supported by the device, e.g. for OpenCL C 3.0 or newer
-the <code>__opencl_c_fp64</code> feature macro is present, the following symbolic
+the <code>__opencl_c_<wbr>fp64</code> feature macro is present, the following symbolic
 constants will also be available:</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
@@ -7284,7 +7579,7 @@
 </table>
 <div class="paragraph">
 <p>If double precision is supported by the device, e.g. for OpenCL C 3.0 or newer
-the <code>__opencl_c_fp64</code> feature macro is present, then the following macros
+the <code>__opencl_c_<wbr>fp64</code> feature macro is present, then the following macros
 and constants are also available:</p>
 </div>
 <div class="paragraph">
@@ -7450,10 +7745,10 @@
 </div>
 <div class="paragraph">
 <p>We use the generic type name <code>gentype</code> to indicate that the function can take
-<code>char</code>, <code>char{2|3|4|8|16}</code>, <code>uchar</code>, <code>uchar{2|3|4|8|16}</code>, <code>short</code>,
-<code>short{2|3|4|8|16}</code>, <code>ushort</code>, <code>ushort{2|3|4|8|16}</code>, <code>int</code>, <code>int{2|3|4|8|16}</code>,
-<code>uint</code>, <code>uint{2|3|4|8|16}</code>, <code>long</code> <sup class="footnote">[<a id="_footnoteref_36" class="footnote" href="#_footnotedef_36" title="View footnote.">36</a>]</sup>,
-<code>long{2|3|4|8|16}</code>, <code>ulong</code>, or <code>ulong{2|3|4|8|16}</code> as the type for the
+<code>char</code>, <code>char<em>n</em></code>, <code>uchar</code>, <code>uchar<em>n</em></code>, <code>short</code>,
+<code>short<em>n</em></code>, <code>ushort</code>, <code>ushort<em>n</em></code>, <code>int</code>, <code>int<em>n</em></code>,
+<code>uint</code>, <code>uint<em>n</em></code>, <code>long</code> <sup class="footnote">[<a id="_footnoteref_37" class="footnote" href="#_footnotedef_37" title="View footnote.">37</a>]</sup>,
+<code>long<em>n</em></code>, <code>ulong</code>, or <code>ulong<em>n</em></code> as the type for the
 arguments.
 We use the generic type name <code>ugentype</code> to refer to unsigned versions of
 <code>gentype</code>.
@@ -7465,16 +7760,17 @@
 the <code>gentype</code> argument must be a vector or scalar version of the <code>sgentype</code>
 argument.
 For example, if <code>sgentype</code> is <code>uchar</code>, <code>gentype</code> must be <code>uchar</code> or
-<code>uchar{2|3|4|8|16}</code>.
+<code>uchar<em>n</em></code>.
 For vector versions, <code>sgentype</code> is implicitly widened to <code>gentype</code> as
-described for <a href="#operators-arithmetic">arithmetic operators</a>.</p>
+described for <a href="#operators-arithmetic">arithmetic operators</a>.
+<em>n</em> is 2, 3, 4, 8, or 16.</p>
 </div>
 <div class="paragraph">
 <p>For any specific use of a function, the actual type has to be the same for
 all arguments and the return type unless otherwise specified.</p>
 </div>
 <table id="table-builtin-functions" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 12. Built-in Scalar and Vector Integer Argument Functions</caption>
+<caption class="title">Table 13. Built-in Scalar and Vector Integer Argument Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -7505,7 +7801,7 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>rhadd</strong>(gentype <em>x</em>, gentype <em>y</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Returns (<em>x</em> + <em>y</em> + 1) &gt;&gt; 1.
       The intermediate sum does not modulo overflow.
-      <sup class="footnote">[<a id="_footnoteref_37" class="footnote" href="#_footnotedef_37" title="View footnote.">37</a>]</sup></p></td>
+      <sup class="footnote">[<a id="_footnoteref_38" class="footnote" href="#_footnotedef_38" title="View footnote.">38</a>]</sup></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>clamp</strong>(gentype <em>x</em>, gentype <em>minval</em>, gentype <em>maxval</em>)<br>
@@ -7604,7 +7900,7 @@
 <code>uint3</code>, <code>uint4</code>, <code>uint8</code> or <code>uint16</code> as the type for the arguments.</p>
 </div>
 <table id="table-builtin-fast-integer" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 13. Built-in 24-bit Integer Functions</caption>
+<caption class="title">Table 14. Built-in 24-bit Integer Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -7756,7 +8052,7 @@
 The description is per-component.
 We use the generic type name <code>gentype</code> to indicate that the function can take
 <code>float</code>, <code>float2</code>, <code>float3</code>, <code>float4</code>, <code>float8</code>, <code>float16</code>, <code>double</code>
-<sup class="footnote">[<a id="_footnoteref_38" class="footnote" href="#_footnotedef_38" title="View footnote.">38</a>]</sup>, <code>double2</code>, <code>double3</code>, <code>double4</code>,
+<sup class="footnote">[<a id="_footnoteref_39" class="footnote" href="#_footnotedef_39" title="View footnote.">39</a>]</sup>, <code>double2</code>, <code>double3</code>, <code>double4</code>,
 <code>double8</code> or <code>double16</code> as the type for the arguments.
 We use the generic type name <code>gentypef</code> to indicate that the function can
 take <code>float</code>, <code>float2</code>, <code>float3</code>, <code>float4</code>, <code>float8</code>, or <code>float16</code> as the
@@ -7772,7 +8068,7 @@
 as <strong>mad</strong> or <strong>fma</strong>.</p>
 </div>
 <table id="table-builtin-common" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 14. Built-in Scalar and Vector Argument Common Functions</caption>
+<caption class="title">Table 15. Built-in Scalar and Vector Argument Common Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -7874,7 +8170,7 @@
 These all operate component-wise.
 The description is per-component.
 <code>float<em>n</em></code> is <code>float</code>, <code>float2</code>, <code>float3</code>, or <code>float4</code> and <code>double<em>n</em></code> is
-<code>double</code> <sup class="footnote">[<a id="_footnoteref_39" class="footnote" href="#_footnotedef_39" title="View footnote.">39</a>]</sup>, <code>double2</code>, <code>double3</code>, or
+<code>double</code> <sup class="footnote">[<a id="_footnoteref_40" class="footnote" href="#_footnotedef_40" title="View footnote.">40</a>]</sup>, <code>double2</code>, <code>double3</code>, or
 <code>double4</code>.</p>
 </div>
 <div class="paragraph">
@@ -7884,7 +8180,7 @@
 as <strong>mad</strong> or <strong>fma</strong>.</p>
 </div>
 <table id="table-builtin-geometric" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 15. Built-in Scalar and Vector Argument Geometric Functions</caption>
+<caption class="title">Table 16. Built-in Scalar and Vector Argument Geometric Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -7998,12 +8294,12 @@
 <div class="paragraph">
 <p>The functions described in the <a href="#table-builtin-relational">following table</a> can
 be used with built-in scalar or vector types as arguments and return a scalar or
-vector integer result <sup class="footnote">[<a id="_footnoteref_40" class="footnote" href="#_footnotedef_40" title="View footnote.">40</a>]</sup>.
+vector integer result <sup class="footnote">[<a id="_footnoteref_41" class="footnote" href="#_footnotedef_41" title="View footnote.">41</a>]</sup>.
 The argument type <code>gentype</code> refers to the following built-in types: <code>char</code>,
 <code>char<em>n</em></code>, <code>uchar</code>, <code>uchar<em>n</em></code>, <code>short</code>, <code>short<em>n</em></code>, <code>ushort</code>,
 <code>ushort<em>n</em></code>, <code>int</code>, <code>int<em>n</em></code>, <code>uint</code>, <code>uint<em>n</em></code>, <code>long</code>
-<sup class="footnote">[<a id="_footnoteref_41" class="footnote" href="#_footnotedef_41" title="View footnote.">41</a>]</sup>, <code>long<em>n</em></code>, <code>ulong</code>, <code>ulong<em>n</em></code>, <code>float</code>,
-<code>float<em>n</em></code>, <code>double</code> <sup class="footnote">[<a id="_footnoteref_42" class="footnote" href="#_footnotedef_42" title="View footnote.">42</a>]</sup>, and
+<sup class="footnote">[<a id="_footnoteref_42" class="footnote" href="#_footnotedef_42" title="View footnote.">42</a>]</sup>, <code>long<em>n</em></code>, <code>ulong</code>, <code>ulong<em>n</em></code>, <code>float</code>,
+<code>float<em>n</em></code>, <code>double</code> <sup class="footnote">[<a id="_footnoteref_43" class="footnote" href="#_footnotedef_43" title="View footnote.">43</a>]</sup>, and
 <code>double<em>n</em></code>.
 The argument type <code>igentype</code> refers to the built-in signed integer types
 i.e. <code>char</code>, <code>char<em>n</em></code>, <code>short</code>, <code>short<em>n</em></code>, <code>int</code>, <code>int<em>n</em></code>, <code>long</code>
@@ -8032,7 +8328,7 @@
 not a number (NaN) and the argument type is a vector.</p>
 </div>
 <table id="table-builtin-relational" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 16. Built-in Scalar and Vector Relational Functions</caption>
+<caption class="title">Table 17. Built-in Scalar and Vector Relational Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -8188,7 +8484,7 @@
 <p class="tableblock">      <em>result[i]</em> = if MSB of <em>c[i]</em> is set ? <em>b[i]</em> : <em>a[i]</em>.</p>
 <p class="tableblock">      For a scalar type, <em>result</em> = <em>c</em> ? <em>b</em> : <em>a</em>.</p>
 <p class="tableblock">      <code>igentype</code> and <code>ugentype</code> must have the same number of elements and
-      bits as <code>gentype</code> <sup class="footnote">[<a id="_footnoteref_43" class="footnote" href="#_footnotedef_43" title="View footnote.">43</a>]</sup>.</p></td>
+      bits as <code>gentype</code> <sup class="footnote">[<a id="_footnoteref_44" class="footnote" href="#_footnotedef_44" title="View footnote.">44</a>]</sup>.</p></td>
 </tr>
 </tbody>
 </table>
@@ -8204,18 +8500,18 @@
 functions that allow you to read and write vector types from a pointer to
 memory.
 We use the generic type <code>gentype</code> to indicate the built-in data types
-<code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code> <sup class="footnote">[<a id="_footnoteref_44" class="footnote" href="#_footnotedef_44" title="View footnote.">44</a>]</sup>, <code>ulong</code>,
-<code>float</code> or <code>double</code> <sup class="footnote">[<a id="_footnoteref_45" class="footnote" href="#_footnotedef_45" title="View footnote.">45</a>]</sup>.
+<code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code> <sup class="footnote">[<a id="_footnoteref_45" class="footnote" href="#_footnotedef_45" title="View footnote.">45</a>]</sup>, <code>ulong</code>,
+<code>float</code> or <code>double</code> <sup class="footnote">[<a id="_footnoteref_46" class="footnote" href="#_footnotedef_46" title="View footnote.">46</a>]</sup>.
 We use the generic type name <code>gentype<em>n</em></code> to represent n-element vectors
 of <code>gentype</code> elements.
 We use the type name <code>half<em>n</em></code> to represent n-element vectors of half
 elements.
 The suffix <em>n</em> is also used in the function names (i.e. <strong>vload<em>n</em></strong>,
-<strong>vstore<em>n</em></strong> etc.), where <em>n</em> = 2, 3 <sup class="footnote">[<a id="_footnoteref_46" class="footnote" href="#_footnotedef_46" title="View footnote.">46</a>]</sup>, 4, 8 or
+<strong>vstore<em>n</em></strong> etc.), where <em>n</em> = 2, 3 <sup class="footnote">[<a id="_footnoteref_47" class="footnote" href="#_footnotedef_47" title="View footnote.">47</a>]</sup>, 4, 8 or
 16.</p>
 </div>
 <table id="table-vector-loadstore" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 17. Built-in Vector Data Load and Store Functions</caption>
+<caption class="title">Table 18. Built-in Vector Data Load and Store Functions</caption>
 <colgroup>
 <col style="width: 70%;">
 <col style="width: 30%;">
@@ -8231,7 +8527,7 @@
   gentype<em>n</em> <strong>vload<em>n</em></strong>(size_t <em>offset</em>, const __constant gentype *<em>p</em>)<br>
   gentype<em>n</em> <strong>vload<em>n</em></strong>(size_t <em>offset</em>, const __private gentype *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  gentype<em>n</em> <strong>vload<em>n</em></strong>(size_t <em>offset</em>, const gentype *<em>p</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Return <code>sizeof(gentype<em>n</em>)</code> bytes of data, where the first <code>(<em>n</em> *
       sizeof(gentype))</code> bytes are read from the address
@@ -8246,7 +8542,7 @@
   void <strong>vstore<em>n</em></strong>(gentype<em>n</em> <em>data</em>, size_t <em>offset</em>, __local gentype *<em>p</em>)<br>
   void <strong>vstore<em>n</em></strong>(gentype<em>n</em> <em>data</em>, size_t <em>offset</em>, __private gentype *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  void <strong>vstore<em>n</em></strong>(gentype<em>n</em> <em>data</em>, size_t <em>offset</em>, gentype *<em>p</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Write <code><em>n</em> * sizeof(gentype)</code> bytes given by <em>data</em> to the address
       computed as <code>(<em>p</em> +  (<em>offset</em> * <em>n</em>))</code>.
@@ -8261,7 +8557,7 @@
   float <strong>vload_half</strong>(size_t <em>offset</em>, const __constant half *<em>p</em>)<br>
   float <strong>vload_half</strong>(size_t <em>offset</em>, const __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  float <strong>vload_half</strong>(size_t <em>offset</em>, const half *<em>p</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Read <code>sizeof(half)</code> bytes of data from the address computed as <code>(<em>p</em>
       +  <em>offset</em>)</code>.
@@ -8276,7 +8572,7 @@
   float<em>n</em> <strong>vload_half<em>n</em></strong>(size_t <em>offset</em>, const __constant half *<em>p</em>)<br>
   float<em>n</em> <strong>vload_half<em>n</em></strong>(size_t <em>offset</em>, const __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  float<em>n</em> <strong>vload_half<em>n</em></strong>(size_t <em>offset</em>, const half *<em>p</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Read <code>(<em>n</em> * sizeof(half))</code> bytes of data from the address computed as
       <code>(<em>p</em> +  (<em>offset * n</em>))</code>.
@@ -8302,7 +8598,7 @@
   void <strong>vstore_half_rtp</strong>(float <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br>
   void <strong>vstore_half_rtn</strong>(float <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  void <strong>vstore_half</strong>(float <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half_rte</strong>(float <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half_rtz</strong>(float <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
@@ -8333,7 +8629,7 @@
   void <strong>vstore_half<em>n</em>_rtp</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br>
   void <strong>vstore_half<em>n</em>_rtn</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  void <strong>vstore_half<em>n</em></strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half<em>n</em>_rte</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half<em>n</em>_rtz</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
@@ -8365,7 +8661,7 @@
   void <strong>vstore_half_rtp</strong>(double <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br>
   void <strong>vstore_half_rtn</strong>(double <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  void <strong>vstore_half</strong>(double <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half_rte</strong>(double <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half_rtz</strong>(double <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
@@ -8396,7 +8692,7 @@
   void <strong>vstore_half<em>n</em>_rtp</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br>
   void <strong>vstore_half<em>n</em>_rtn</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  void <strong>vstore_half<em>n</em></strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half<em>n</em>_rte</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstore_half<em>n</em>_rtz</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
@@ -8416,7 +8712,7 @@
   float<em>n</em> <strong>vloada_half<em>n</em></strong>(size_t <em>offset</em>, const __constant half *<em>p</em>)<br>
   float<em>n</em> <strong>vloada_half<em>n</em></strong>(size_t <em>offset</em>, const __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  float<em>n</em> <strong>vloada_half<em>n</em></strong>(size_t <em>offset</em>, const half *<em>p</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">For n = 2, 4, 8 and 16, read <code>sizeof(half<em>n</em>)</code> bytes of data from
       the address computed as (<em>p</em> + (<em>offset</em> * <em>n</em>)).
@@ -8445,7 +8741,7 @@
   void <strong>vstorea_half<em>n</em>_rtp</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br>
   void <strong>vstorea_half<em>n</em>_rtn</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  void <strong>vstorea_half<em>n</em></strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstorea_half<em>n</em>_rte</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstorea_half<em>n</em>_rtz</strong>(float<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
@@ -8479,7 +8775,7 @@
   void <strong>vstorea_half<em>n</em>_rtp</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br>
   void <strong>vstorea_half<em>n</em>_rtn</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, __private half *<em>p</em>)<br></p>
 <p class="tableblock">  For OpenCL C 2.0, or OpenCL C 3.0 or newer with the
-  <code>__opencl_c_generic_address_space</code> feature:<br></p>
+  <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature:<br></p>
 <p class="tableblock">  void <strong>vstorea_half<em>n</em></strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstorea_half<em>n</em>_rte</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
   void <strong>vstorea_half<em>n</em>_rtz</strong>(double<em>n</em> <em>data</em>, size_t <em>offset</em>, half *<em>p</em>)<br>
@@ -8534,7 +8830,7 @@
 in a work-group.</p>
 </div>
 <table id="table-builtin-synchronization" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 18. Built-in Work-Group Synchronization Functions</caption>
+<caption class="title">Table 19. Built-in Work-Group Synchronization Functions</caption>
 <colgroup>
 <col style="width: 30%;">
 <col style="width: 70%;">
@@ -8566,7 +8862,7 @@
   the barrier on each iteration of the loop if any work-item executes the barrier on that iteration.</p>
 <p class="tableblock">  The <strong>barrier</strong> and <strong>work_group_barrier</strong> functions can specify which
   memory operations become visible to the appropriate memory scope
-  identified by <em>scope</em> <sup class="footnote">[<a id="_footnoteref_47" class="footnote" href="#_footnotedef_47" title="View footnote.">47</a>]</sup>.
+  identified by <em>scope</em> <sup class="footnote">[<a id="_footnoteref_48" class="footnote" href="#_footnotedef_48" title="View footnote.">48</a>]</sup>.
   The <em>flags</em> argument specifies the memory address spaces.
   This is a bitfield and can be set to 0 or a combination of the
   following values OR&#8217;ed together.
@@ -8601,7 +8897,7 @@
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
-The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL 3.0 or newer and the <code>__opencl_c_subgroups</code>
+The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL 3.0 or newer and the <code>__opencl_c_<wbr>subgroups</code>
 feature.
 </td>
 </tr>
@@ -8612,7 +8908,7 @@
 in a subgroup.</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 19. Built-in Subgroup Synchronization Functions</caption>
+<caption class="title">Table 20. Built-in Subgroup Synchronization Functions</caption>
 <colgroup>
 <col style="width: 30%;">
 <col style="width: 70%;">
@@ -8657,11 +8953,11 @@
 <p class="tableblock">  <code>CLK_GLOBAL_MEM_FENCE</code> - The <strong>sub_group_barrier</strong> function will queue a
   memory fence to ensure correct ordering of memory operations to global
   memory.
-  This can be useful when work items, for example, write to buffer objects
+  This can be useful when work-items, for example, write to buffer objects
   and then want to read the updated data from these buffer objects.</p>
 <p class="tableblock">  <code>CLK_IMAGE_MEM_FENCE</code> - The <strong>sub_group_barrier</strong> function will queue a memory
   fence to ensure correct ordering of memory operations to image objects.
-  This can be useful when work items, for example, write to image objects
+  This can be useful when work-items, for example, write to image objects
   and then want to read the updated data from these image objects.</p>
 <p class="tableblock">  The value of <em>scope</em> must match requirements of the
   <a href="#atomic-restrictions">atomic restrictions section</a>.</p></td>
@@ -8690,7 +8986,7 @@
 <p>The OpenCL C programming language implements the following explicit memory fence functions to provide ordering between memory operations of a work-item.</p>
 </div>
 <table id="table-builtin-explicit-memory-fences" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 20. Built-in Explicit Memory Fence Functions</caption>
+<caption class="title">Table 21. Built-in Explicit Memory Fence Functions</caption>
 <colgroup>
 <col style="width: 30%;">
 <col style="width: 70%;">
@@ -8752,7 +9048,7 @@
 <td class="content">
 The functionality described in this section <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the
-<code>__opencl_c_generic_address_space</code> feature.
+<code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code> feature.
 </td>
 </tr>
 </table>
@@ -8765,7 +9061,7 @@
 types supported by OpenCL C or a user defined type.</p>
 </div>
 <table id="table-builtin-address-qualifier" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 21. Built-in Address Space Qualifier Functions</caption>
+<caption class="title">Table 22. Built-in Address Space Qualifier Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -8815,16 +9111,17 @@
 local memory and a prefetch from <code>global</code> memory.</p>
 </div>
 <div class="paragraph">
-<p>We use the generic type name <code>gentype</code> to indicate the built-in data types char,
-<code>char{2|3|4|8|16}</code>, <code>uchar</code>, <code>uchar{2|3|4|8|16}</code>, <code>short</code>, <code>short{2|3|4|8|16}</code>,
-<code>ushort</code>, <code>ushort{2|3|4|8|16}</code>, <code>int</code>, <code>int{2|3|4|8|16}</code>, <code>uint</code>,
-<code>uint{2|3|4|8|16}</code>, <code>long</code> <sup class="footnote">[<a id="_footnoteref_48" class="footnote" href="#_footnotedef_48" title="View footnote.">48</a>]</sup>, <code>long{2|3|4|8|16}</code>,
-<code>ulong</code>, <code>ulong{2|3|4|8|16}</code>, <code>float</code>, <code>float{2|3|4|8|16}</code>, or <code>double</code>
-<sup class="footnote">[<a id="_footnoteref_49" class="footnote" href="#_footnotedef_49" title="View footnote.">49</a>]</sup>, <code>double{2|3|4|8|16}</code> as the type for
-the arguments unless otherwise stated <sup class="footnote">[<a id="_footnoteref_50" class="footnote" href="#_footnotedef_50" title="View footnote.">50</a>]</sup>.</p>
+<p>We use the generic type name <code>gentype</code> to indicate the built-in data types <code>char</code>,
+<code>char<em>n</em></code>, <code>uchar</code>, <code>uchar<em>n</em></code>, <code>short</code>, <code>short<em>n</em></code>,
+<code>ushort</code>, <code>ushort<em>n</em></code>, <code>int</code>, <code>int<em>n</em></code>, <code>uint</code>,
+<code>uint<em>n</em></code>, <code>long</code> <sup class="footnote">[<a id="_footnoteref_49" class="footnote" href="#_footnotedef_49" title="View footnote.">49</a>]</sup>, <code>long<em>n</em></code>,
+<code>ulong</code>, <code>ulong<em>n</em></code>, <code>float</code>, <code>float<em>n</em></code>, <code>double</code>
+<sup class="footnote">[<a id="_footnoteref_50" class="footnote" href="#_footnotedef_50" title="View footnote.">50</a>]</sup>, and <code>double<em>n</em></code> as the type for
+the arguments unless otherwise stated.
+<em>n</em> is 2, 3 <sup class="footnote">[<a id="_footnoteref_51" class="footnote" href="#_footnotedef_51" title="View footnote.">51</a>]</sup>, 4, 8, or 16.</p>
 </div>
 <table id="table-builtin-async-copy" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 22. Built-in Async Copy and Prefetch Functions</caption>
+<caption class="title">Table 23. Built-in Async Copy and Prefetch Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -8971,7 +9268,7 @@
 visible to another.
 A synchronization operation on one or more memory locations is either an
 acquire operation, a release operation, or both an acquire and release
-operation <sup class="footnote">[<a id="_footnoteref_51" class="footnote" href="#_footnotedef_51" title="View footnote.">51</a>]</sup>.
+operation <sup class="footnote">[<a id="_footnoteref_52" class="footnote" href="#_footnotedef_52" title="View footnote.">52</a>]</sup>.
 A synchronization operation without an associated memory location is a fence
 and can be either an acquire fence, a release fence or both an acquire and
 release fence.
@@ -9146,7 +9443,7 @@
 <td class="content">
 The function variant that uses the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">requires</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -9183,25 +9480,25 @@
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_order_acquire</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#unified-spec">Requires</a> support for OpenCL C 2.0, but in OpenCL C 3.0
-      or newer some uses require the <code>__opencl_c_atomic_order_acq_rel</code>
+      or newer some uses require the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>acq_<wbr>rel</code>
       feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_order_release</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#unified-spec">Requires</a> support for OpenCL C 2.0, but in OpenCL C 3.0
-      or newer some uses require the <code>__opencl_c_atomic_order_acq_rel</code>
+      or newer some uses require the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>acq_<wbr>rel</code>
       feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_order_acq_rel</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#unified-spec">Requires</a> support for OpenCL C 2.0, but in OpenCL C 3.0
-      or newer some uses require the <code>__opencl_c_atomic_order_acq_rel</code>
+      or newer some uses require the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>acq_<wbr>rel</code>
       feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_order_seq_cst</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#unified-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_atomic_order_seq_cst</code> feature.</p></td>
+      newer and the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> feature.</p></td>
 </tr>
 </tbody>
 </table>
@@ -9242,7 +9539,7 @@
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_scope_sub_group</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#unified-spec">Requires</a> support for OpenCL C 3.0 or newer and the
-      <code>__opencl_c_subgroups</code> feature.</p></td>
+      <code>__opencl_c_<wbr>subgroups</code> feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_scope_work_group</code></p></td>
@@ -9251,18 +9548,18 @@
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_scope_device</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#unified-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_atomic_scope_device</code> feature.</p></td>
+      newer and the <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code> feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_scope_all_svm_devices</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#unified-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_atomic_scope_all_devices</code> feature.</p></td>
+      newer and the <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>all_<wbr>devices</code> feature.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>memory_scope_all_devices</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">An alias for <code>memory_scope_all_svm_devices</code>.
       <a href="#unified-spec">Requires</a> support for OpenCL C 3.0 or newer and the
-      <code>__opencl_c_atomic_scope_all_devices</code> feature.</p></td>
+      <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>all_<wbr>devices</code> feature.</p></td>
 </tr>
 </tbody>
 </table>
@@ -9364,28 +9661,28 @@
 <p><code>atomic_uint</code></p>
 </li>
 <li>
-<p><code>atomic_long</code> <sup class="footnote" id="_footnote_atomic-int64-supported">[<a id="_footnoteref_52" class="footnote" href="#_footnotedef_52" title="View footnote.">52</a>]</sup></p>
+<p><code>atomic_long</code> <sup class="footnote" id="_footnote_atomic-int64-supported">[<a id="_footnoteref_53" class="footnote" href="#_footnotedef_53" title="View footnote.">53</a>]</sup></p>
 </li>
 <li>
-<p><code>atomic_ulong</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_52" title="View footnote.">52</a>]</sup></p>
+<p><code>atomic_ulong</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_53" title="View footnote.">53</a>]</sup></p>
 </li>
 <li>
 <p><code>atomic_float</code></p>
 </li>
 <li>
-<p><code>atomic_double</code> <sup class="footnote">[<a id="_footnoteref_53" class="footnote" href="#_footnotedef_53" title="View footnote.">53</a>]</sup></p>
+<p><code>atomic_double</code> <sup class="footnote">[<a id="_footnoteref_54" class="footnote" href="#_footnotedef_54" title="View footnote.">54</a>]</sup></p>
 </li>
 <li>
-<p><code>atomic_intptr_t</code> <sup class="footnote" id="_footnote_atomic-size_t-supported">[<a id="_footnoteref_54" class="footnote" href="#_footnotedef_54" title="View footnote.">54</a>]</sup></p>
+<p><code>atomic_intptr_t</code> <sup class="footnote" id="_footnote_atomic-size_t-supported">[<a id="_footnoteref_55" class="footnote" href="#_footnotedef_55" title="View footnote.">55</a>]</sup></p>
 </li>
 <li>
-<p><code>atomic_uintptr_t</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_54" title="View footnote.">54</a>]</sup></p>
+<p><code>atomic_uintptr_t</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_55" title="View footnote.">55</a>]</sup></p>
 </li>
 <li>
-<p><code>atomic_size_t</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_54" title="View footnote.">54</a>]</sup></p>
+<p><code>atomic_size_t</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_55" title="View footnote.">55</a>]</sup></p>
 </li>
 <li>
-<p><code>atomic_ptrdiff_t</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_54" title="View footnote.">54</a>]</sup></p>
+<p><code>atomic_ptrdiff_t</code> <sup class="footnoteref">[<a class="footnote" href="#_footnotedef_55" title="View footnote.">55</a>]</sup></p>
 </li>
 </ul>
 </div>
@@ -9474,7 +9771,7 @@
 <td class="content">
 The non-explicit <code>atomic_store</code> function <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the
-<code>__opencl_c_atomic_order_seq_cst</code> and <code>__opencl_c_atomic_scope_device</code>
+<code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> and <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code>
 features.
 For the explicit variants, memory order and scope enumerations must respect the
 <a href="#atomic-restrictions">restrictions section below</a>.
@@ -9491,7 +9788,7 @@
 <td class="content">
 The function variants that use the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">require</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -9559,7 +9856,7 @@
 <td class="content">
 The non-explicit <code>atomic_load</code> function <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0 or OpenCL C 3.0 or newer and both the
-<code>__opencl_c_atomic_order_seq_cst</code> and <code>__opencl_c_atomic_scope_device</code>
+<code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> and <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code>
 features.
 For the explicit variants, memory order and scope enumerations must respect the
 <a href="#atomic-restrictions">restrictions section below</a>.
@@ -9576,7 +9873,7 @@
 <td class="content">
 The function variants that use the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">require</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -9652,7 +9949,7 @@
 <td class="content">
 The non-explicit <code>atomic_exchange</code> function <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0 or OpenCL C 3.0 or newer and both the
-<code>__opencl_c_atomic_order_seq_cst</code> and <code>__opencl_c_atomic_scope_device</code>
+<code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> and <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code>
 features.
 For the explicit variants, memory order and scope enumerations must respect the
 <a href="#atomic-restrictions">restrictions section below</a>.
@@ -9669,7 +9966,7 @@
 <td class="content">
 The function variants that use the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">require</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -9751,7 +10048,7 @@
     memory_order failure)
 
 <span class="comment">// Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the</span>
-<span class="comment">// __opencl_c_generic_address_space feature.</span>
+<span class="comment">// opencl_c_generic_address_space feature.</span>
 <span class="predefined-type">bool</span> atomic_compare_exchange_strong_explicit(
     <span class="directive">volatile</span> A *object,
     C *expected,
@@ -9880,7 +10177,7 @@
     memory_order failure)
 
 <span class="comment">// Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the</span>
-<span class="comment">// __opencl_c_generic_address_space feature.</span>
+<span class="comment">// opencl_c_generic_address_space feature.</span>
 <span class="predefined-type">bool</span> atomic_compare_exchange_weak_explicit(
     <span class="directive">volatile</span> A *object,
     C *expected,
@@ -9982,7 +10279,7 @@
 </div>
 <div class="paragraph">
 <p>The weak compare-and-exchange operations may fail spuriously
-<sup class="footnote">[<a id="_footnoteref_55" class="footnote" href="#_footnotedef_55" title="View footnote.">55</a>]</sup>.
+<sup class="footnote">[<a id="_footnoteref_56" class="footnote" href="#_footnotedef_56" title="View footnote.">56</a>]</sup>.
 That is, even when the contents of memory referred to by <code>expected</code> and
 <code>object</code> are equal, it may return zero and store back to <code>expected</code> the same
 memory contents that were originally there.</p>
@@ -10000,7 +10297,7 @@
 The non-explicit <code>atomic_compare_exchange_strong</code> and
 <code>atomic_compare_exchange_weak</code> functions <a href="#unified-spec">requires</a> support
 for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the
-<code>__opencl_c_atomic_order_seq_cst</code> and <code>__opencl_c_atomic_scope_device</code>
+<code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> and <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code>
 features.
 For the explicit variants, memory order and scope enumerations must respect the
 <a href="#atomic-restrictions">restrictions section below</a>.
@@ -10017,7 +10314,7 @@
 <td class="content">
 The function variants that use the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">require</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -10171,7 +10468,7 @@
 <td class="content">
 The non-explicit <code>atomic_fetch_key</code> functions <a href="#unified-spec">require</a>
 support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the
-<code>__opencl_c_atomic_order_seq_cst</code> and <code>__opencl_c_atomic_scope_device</code>
+<code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> and <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code>
 features.
 For the explicit variants, memory order and scope enumerations must respect the
 <a href="#atomic-restrictions">restrictions section below</a>.
@@ -10188,7 +10485,7 @@
 <td class="content">
 The function variants that use the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">require</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -10298,7 +10595,7 @@
 </td>
 <td class="content">
 The non-explicit <code>atomic_flag_test_and_set</code> function <a href="#unified-spec">requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the
-<code>__opencl_c_atomic_order_seq_cst</code> and <code>__opencl_c_atomic_scope_device</code>
+<code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> and <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code>
 features.
 For the explicit variants, memory order and scope enumerations must respect the
 <a href="#atomic-restrictions">restrictions section below</a>.
@@ -10315,7 +10612,7 @@
 <td class="content">
 The function variants that use the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">require</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -10389,7 +10686,7 @@
 <td class="content">
 The non-explicit <code>atomic_flag_clear</code> function <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the
-<code>__opencl_c_atomic_order_seq_cst</code> and <code>__opencl_c_atomic_scope_device</code>
+<code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> and <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code>
 features.
 For the explicit variants, memory order and scope enumerations must respect the
 <a href="#atomic-restrictions">restrictions section below</a>.
@@ -10406,7 +10703,7 @@
 <td class="content">
 The function variants that use the generic address space, i.e. no
 explicit address space is listed, <a href="#unified-spec">require</a> support for OpenCL
-C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_generic_address_space</code>
+C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>generic_<wbr>address_<wbr>space</code>
 feature.
 </td>
 </tr>
@@ -10458,7 +10755,7 @@
 </table>
 </div>
 <table id="table-legacy-atomic-functions" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 23. Legacy Atomic Functions</caption>
+<caption class="title">Table 24. Legacy Atomic Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -10651,40 +10948,44 @@
 <li>
 <p>Using <code>memory_order_acquire</code> with any built-in atomic function except
 <code>atomic_work_item_fence</code> <a href="#unified-spec">requires</a> support for OpenCL C
-2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_atomic_order_acq_rel</code>
+2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>acq_<wbr>rel</code>
 feature.</p>
 </li>
 <li>
 <p>Using <code>memory_order_release</code> with any built-in atomic function except
 <code>atomic_work_item_fence</code> <a href="#unified-spec">requires</a> support for OpenCL C
-2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_atomic_order_acq_rel</code>
+2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>acq_<wbr>rel</code>
 feature.</p>
 </li>
 <li>
 <p>Using <code>memory_order_acq_rel</code> with any built-in atomic function except
 <code>atomic_work_item_fence</code> <a href="#unified-spec">requires</a> support for OpenCL C
-2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_atomic_order_acq_rel</code>
+2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>acq_<wbr>rel</code>
 feature.</p>
 </li>
 <li>
 <p>Using <code>memory_order_seq_cst</code> with any built-in atomic function
 <a href="#unified-spec">requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-newer and the <code>__opencl_c_atomic_order_seq_cst</code> feature.</p>
+newer and the <code>__opencl_c_<wbr>atomic_<wbr>order_<wbr>seq_<wbr>cst</code> feature.</p>
 </li>
 <li>
 <p>Using <code>memory_scope_sub_group</code> with any built-in atomic function
 <a href="#unified-spec">requires</a> support for OpenCL C 3.0 or newer and the
-<code>__opencl_c_subgroups</code> feature.</p>
+<code>__opencl_c_<wbr>subgroups</code> feature.</p>
 </li>
 <li>
 <p>Using <code>memory_scope_device</code> <a href="#unified-spec">requires</a> support for OpenCL
 C 2.0, or OpenCL C 3.0 or newer and the
-<code>__opencl_c_atomic_scope_device</code> feature.</p>
+<code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>device</code> feature.</p>
 </li>
 <li>
-<p>Using <code>memory_scope_all_svm_devices</code> or <code>memory_scope_all_devices</code>
-<a href="#unified-spec">requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-newer and the <code>__opencl_c_atomic_scope_all_devices</code> feature.</p>
+<p>Using <code>memory_scope_all_svm_devices</code> <a href="#unified-spec">requires</a>
+support for OpenCL C 2.0, or OpenCL C 3.0 or
+newer and the <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>all_<wbr>devices</code> feature.</p>
+</li>
+<li>
+<p>Using <code>memory_scope_all_devices</code> <a href="#unified-spec">requires</a> support for OpenCL
+C 3.0 or newer and the <code>__opencl_c_<wbr>atomic_<wbr>scope_<wbr>all_<wbr>devices</code> feature.</p>
 </li>
 </ul>
 </div>
@@ -10700,17 +11001,18 @@
 <p>The OpenCL C programming language implements the following additional
 built-in vector functions.
 We use the generic type name <code>gentype<em>n</em></code> (or <code>gentype<em>m</em></code>) to indicate the
-built-in data types <code>char{2|4|8|16}</code>, <code>uchar{2|4|8|16}</code>, <code>short{2|4|8|16}</code>,
-<code>ushort{2|4|8|16}</code>, <code>half{2|4|8|16}</code> <sup class="footnote">[<a id="_footnoteref_56" class="footnote" href="#_footnotedef_56" title="View footnote.">56</a>]</sup>,
-<code>int{2|4|8|16}</code>, <code>uint{2|4|8|16}</code>, <code>long{2|4|8|16}</code>
-<sup class="footnote">[<a id="_footnoteref_57" class="footnote" href="#_footnotedef_57" title="View footnote.">57</a>]</sup>, <code>ulong{2|4|8|16}</code>, <code>float{2|4|8|16}</code>, or
-<code>double{2|4|8|16}</code> <sup class="footnote">[<a id="_footnoteref_58" class="footnote" href="#_footnotedef_58" title="View footnote.">58</a>]</sup> as the type for
+built-in data types <code>char<em>n</em></code>, <code>uchar<em>n</em></code>, <code>short<em>n</em></code>,
+<code>ushort<em>n</em></code>,
+<code>int<em>n</em></code>, <code>uint<em>n</em></code>, <code>long<em>n</em></code>
+<sup class="footnote">[<a id="_footnoteref_57" class="footnote" href="#_footnotedef_57" title="View footnote.">57</a>]</sup>, <code>ulong<em>n</em></code>, <code>half<em>n</em></code> <sup class="footnote">[<a id="_footnoteref_58" class="footnote" href="#_footnotedef_58" title="View footnote.">58</a>]</sup>, <code>float<em>n</em></code>, or
+<code>double<em>n</em></code> <sup class="footnote">[<a id="_footnoteref_59" class="footnote" href="#_footnotedef_59" title="View footnote.">59</a>]</sup> as the type for
 the arguments unless otherwise stated.
 We use the generic name <code>ugentype<em>n</em></code> to indicate the built-in unsigned
-integer data types.</p>
+integer data types.
+<em>n</em> is 2, 4, 8, or 16.</p>
 </div>
 <table id="table-misc-vector" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 24. Built-in Miscellaneous Vector Functions</caption>
+<caption class="title">Table 25. Built-in Miscellaneous Vector Functions</caption>
 <colgroup>
 <col style="width: 33.3333%;">
 <col style="width: 66.6667%;">
@@ -10832,7 +11134,7 @@
 <p>The OpenCL C programming language implements the <strong>printf</strong> function.</p>
 </div>
 <table id="table-printf" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 25. Built-in printf Function</caption>
+<caption class="title">Table 26. Built-in printf Function</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -10906,7 +11208,7 @@
 padded with spaces (by default) on the left (or right, if the left
 adjustment flag, described later, has been given) to the field width.
 The field width takes the form of a nonnegative decimal integer
-<sup class="footnote">[<a id="_footnoteref_59" class="footnote" href="#_footnotedef_59" title="View footnote.">59</a>]</sup>.</p>
+<sup class="footnote">[<a id="_footnoteref_60" class="footnote" href="#_footnotedef_60" title="View footnote.">60</a>]</sup>.</p>
 </li>
 <li>
 <p>An optional <em>precision</em> that gives the minimum number of digits to
@@ -10950,7 +11252,7 @@
 <p><strong>+</strong> The result of a signed conversion always begins with a plus or minus
 sign.
 (It begins with a sign only when a negative value is converted if this flag
-is not specified.) <sup class="footnote">[<a id="_footnoteref_60" class="footnote" href="#_footnotedef_60" title="View footnote.">60</a>]</sup></p>
+is not specified.) <sup class="footnote">[<a id="_footnoteref_61" class="footnote" href="#_footnotedef_61" title="View footnote.">61</a>]</sup></p>
 </div>
 <div class="paragraph">
 <p><em>space</em> If the first character of a signed conversion is not a sign, or if a
@@ -11042,7 +11344,7 @@
 specifier applies to a <code>short<em>n</em></code> or <code>ushort<em>n</em></code> argument (the argument
 will not be promoted); that a following <strong>a</strong>, <strong>A</strong>, <strong>e</strong>, <strong>E</strong>, <strong>f</strong>, <strong>F</strong>, <strong>g</strong>,
 or <strong>G</strong> conversion specifier applies to a <code>half<em>n</em></code>
-<sup class="footnote">[<a id="_footnoteref_61" class="footnote" href="#_footnotedef_61" title="View footnote.">61</a>]</sup> argument.</p>
+<sup class="footnote">[<a id="_footnoteref_62" class="footnote" href="#_footnotedef_62" title="View footnote.">62</a>]</sup> argument.</p>
 </div>
 <div class="paragraph">
 <p><strong>hl</strong> This modifier can only be used with the vector specifier.
@@ -11117,7 +11419,7 @@
 <em>[</em><strong>-</strong><em>]</em><strong>nan(</strong><em>n-char-sequence</em><strong>) </strong>&#8201;&#8212;&#8201;which style, and the meaning of any <em>n-char-sequence</em>, is
 implementation-defined.
 The <strong>F</strong> conversion specifier produces <code>INF</code>, <code>INFINITY</code>, or <code>NAN</code> instead of
-<strong>inf</strong>, <strong>infinity</strong>, or <strong>nan</strong>, respectively <sup class="footnote">[<a id="_footnoteref_62" class="footnote" href="#_footnotedef_62" title="View footnote.">62</a>]</sup>.</p>
+<strong>inf</strong>, <strong>infinity</strong>, or <strong>nan</strong>, respectively <sup class="footnote">[<a id="_footnoteref_63" class="footnote" href="#_footnotedef_63" title="View footnote.">63</a>]</sup>.</p>
 </div>
 <div class="paragraph">
 <p><strong>e,E</strong> A <code>double</code>, <code>half<em>n</em></code>, <code>float<em>n</em></code> or <code>double<em>n</em></code> argument
@@ -11161,7 +11463,7 @@
 <em>[</em><strong>-</strong><em>]</em><strong>0x</strong><em>h</em><strong>.</strong><em>hhhh </em><strong>p±</strong><em>d</em>, where there is one
 hexadecimal digit (which is nonzero if the argument is a normalized
 floating-point number and is otherwise unspecified) before the decimal-point
-character <sup class="footnote">[<a id="_footnoteref_63" class="footnote" href="#_footnotedef_63" title="View footnote.">63</a>]</sup> and the number of hexadecimal digits
+character <sup class="footnote">[<a id="_footnoteref_64" class="footnote" href="#_footnotedef_64" title="View footnote.">64</a>]</sup> and the number of hexadecimal digits
 after it is equal to the precision; if the precision is missing, then the
 precision is sufficient for an exact representation of the value; if the
 precision is zero and the <strong>#</strong> flag is not specified, no decimal point character
@@ -11187,7 +11489,7 @@
 <div class="paragraph">
 <p>The conversion specifiers <strong>e,E,g,G,a,A</strong> convert a <code>float</code> or <code>half</code> argument
 that is a scalar type to a <code>double</code> only if the <code>double</code> data type is
-supported, e.g. for OpenCL C 3.0 or newer the <code>__opencl_c_fp64</code> feature
+supported, e.g. for OpenCL C 3.0 or newer the <code>__opencl_c_<wbr>fp64</code> feature
 macro is present.
 If the <code>double</code> data type is not supported, the argument will be a <code>float</code>
 instead of a <code>double</code> and the <code>half</code> type will be converted to a <code>float</code>.</p>
@@ -11202,7 +11504,7 @@
 </div>
 <div class="paragraph">
 <p><strong>s</strong> The argument shall be a literal string
-<sup class="footnote">[<a id="_footnoteref_64" class="footnote" href="#_footnotedef_64" title="View footnote.">64</a>]</sup>.
+<sup class="footnote">[<a id="_footnoteref_65" class="footnote" href="#_footnotedef_65" title="View footnote.">65</a>]</sup>.
 Characters from the literal string array are written up to (but not
 including) the terminating null character.
 If the precision is specified, no more than that many bytes are written.
@@ -11360,7 +11662,7 @@
 If a device supports images then the value of the <a href="#opencl-device-queries"><code>CL_DEVICE_IMAGE_SUPPORT</code> device query</a>) is <code>CL_TRUE</code> and the OpenCL C
 compiler for that device must define the <code>__IMAGE_SUPPORT__</code> macro.
 A compiler for OpenCL C 3.0 or newer for that device must also support the
-<code>__opencl_c_images</code> feature.</p>
+<code>__opencl_c_<wbr>images</code> feature.</p>
 </div>
 <div class="paragraph">
 <p>Image memory objects that are being read by a kernel should be declared with
@@ -11453,7 +11755,7 @@
 <p>The sampler fields are described in the following table.</p>
 </div>
 <table id="table-sampler-descriptor" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 26. Sampler Descriptor</caption>
+<caption class="title">Table 27. Sampler Descriptor</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -11493,7 +11795,7 @@
 <p class="tableblock">      <code>CLK_ADDRESS_CLAMP_TO_EDGE</code> - out-of-range image coordinates are
       clamped to the extent.</p>
 <p class="tableblock">      <code>CLK_ADDRESS_CLAMP</code> - out-of-range image coordinates will return a
-      border color <sup class="footnote">[<a id="_footnoteref_65" class="footnote" href="#_footnotedef_65" title="View footnote.">65</a>]</sup>.</p>
+      border color <sup class="footnote">[<a id="_footnoteref_66" class="footnote" href="#_footnotedef_66" title="View footnote.">66</a>]</sup>.</p>
 <p class="tableblock">      <code>CLK_ADDRESS_NONE</code> - for this addressing mode the programmer
       guarantees that the image coordinates used to sample elements of the
       image refer to a location inside the image; otherwise the results are
@@ -11580,10 +11882,10 @@
 <div class="content">
 <div class="paragraph">
 <p>The following built-in function calls to read images with a sampler are
-supported <sup class="footnote">[<a id="_footnoteref_66" class="footnote" href="#_footnotedef_66" title="View footnote.">66</a>]</sup>.</p>
+supported <sup class="footnote">[<a id="_footnoteref_67" class="footnote" href="#_footnotedef_67" title="View footnote.">67</a>]</sup>.</p>
 </div>
 <table id="table-image-read" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 27. Built-in Image Read Functions</caption>
+<caption class="title">Table 28. Built-in Image Read Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -12037,7 +12339,7 @@
 For samplerless read functions this may be <code>read_only</code> or <code>read_write</code>.</p>
 </div>
 <table id="table-image-samplerless-read" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 28. Built-in Image Sampler-less Read Functions</caption>
+<caption class="title">Table 29. Built-in Image Sampler-less Read Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -12347,7 +12649,7 @@
 For write functions this may be <code>write_only</code> or <code>read_write</code>.</p>
 </div>
 <table id="table-image-write" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 29. Built-in Image Write Functions</caption>
+<caption class="title">Table 30. Built-in Image Write Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -12620,7 +12922,7 @@
       are not in the range [0, image width-1], [0, image height-1], and [0,
       image depth-1], respectively, is undefined.</p>
 <p class="tableblock">      <a href="#unified-spec">Requires</a> support for OpenCL C 2.0, or OpenCL C 3.0 or
-      newer and the <code>__opencl_c_3d_image_writes</code> feature, or the
+      newer and the <code>__opencl_c_<wbr>3d_<wbr>image_<wbr>writes</code> feature, or the
       <code>cl_khr_3d_image_writes</code> extension.</p></td>
 </tr>
 </tbody>
@@ -12641,7 +12943,7 @@
 For query functions this may be <code>read_only</code>, <code>write_only</code> or <code>read_write</code>.</p>
 </div>
 <table id="table-image-query" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 30. Built-in Image Query Functions</caption>
+<caption class="title">Table 31. Built-in Image Query Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -12712,7 +13014,7 @@
       <code>CLK_HALF_FLOAT</code><br>
       <code>CLK_FLOAT</code><br></p>
 <p class="tableblock">      Additionally, for OpenCL C 3.0 or newer:<br></p>
-<p class="tableblock">      <code>CLK_UNORM_INT_101010_2</code> <sup class="footnote">[<a id="_footnoteref_67" class="footnote" href="#_footnotedef_67" title="View footnote.">67</a>]</sup></p></td>
+<p class="tableblock">      <code>CLK_UNORM_INT_101010_2</code> <sup class="footnote">[<a id="_footnoteref_68" class="footnote" href="#_footnotedef_68" title="View footnote.">68</a>]</sup></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">int <strong>get_image_channel_order</strong>(<em>aQual</em> image2d_t <em>image</em>)<br>
@@ -12942,7 +13244,7 @@
 <td class="content">
 The functionality described in this section <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the
-<code>__opencl_c_work_group_collective_functions</code> feature.
+<code>__opencl_c_<wbr>work_<wbr>group_<wbr>collective_<wbr>functions</code> feature.
 </td>
 </tr>
 </table>
@@ -12953,12 +13255,12 @@
 These built-in functions must be encountered by all work-items in a
 work-group executing the kernel.
 We use the generic type name <code>gentype</code> to indicate the built-in data types
-<code>half</code> <sup class="footnote">[<a id="_footnoteref_68" class="footnote" href="#_footnotedef_68" title="View footnote.">68</a>]</sup>, <code>int</code>, <code>uint</code>, <code>long</code>
-<sup class="footnote">[<a id="_footnoteref_69" class="footnote" href="#_footnotedef_69" title="View footnote.">69</a>]</sup>, <code>ulong</code>, <code>float</code> or <code>double</code>
-<sup class="footnote">[<a id="_footnoteref_70" class="footnote" href="#_footnotedef_70" title="View footnote.">70</a>]</sup> as the type for the arguments.</p>
+<code>half</code> <sup class="footnote">[<a id="_footnoteref_69" class="footnote" href="#_footnotedef_69" title="View footnote.">69</a>]</sup>, <code>int</code>, <code>uint</code>, <code>long</code>
+<sup class="footnote">[<a id="_footnoteref_70" class="footnote" href="#_footnotedef_70" title="View footnote.">70</a>]</sup>, <code>ulong</code>, <code>float</code> or <code>double</code>
+<sup class="footnote">[<a id="_footnoteref_71" class="footnote" href="#_footnotedef_71" title="View footnote.">71</a>]</sup> as the type for the arguments.</p>
 </div>
 <table id="table-builtin-work-group" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 31. Built-in Work-group Collective Functions</caption>
+<caption class="title">Table 32. Built-in Work-group Collective Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -13092,7 +13394,7 @@
 </td>
 <td class="content">
 The functionality described in this section <a href="#unified-spec">requires</a>
-support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_pipes</code> feature.
+support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>pipes</code> feature.
 </td>
 </tr>
 </table>
@@ -13175,12 +13477,12 @@
 functions that read from or write to a pipe.
 We use the generic type name <code>gentype</code> to indicate the built-in OpenCL C scalar
 or vector integer or floating-point data types
-<sup class="footnote">[<a id="_footnoteref_71" class="footnote" href="#_footnotedef_71" title="View footnote.">71</a>]</sup> or any user defined type built from these
+<sup class="footnote">[<a id="_footnoteref_72" class="footnote" href="#_footnotedef_72" title="View footnote.">72</a>]</sup> or any user defined type built from these
 scalar and vector data types can be used as the type for the arguments to the
 pipe functions listed in the following table.</p>
 </div>
 <table id="table-builtin-pipe" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 32. Built-in Pipe Functions</caption>
+<caption class="title">Table 33. Built-in Pipe Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -13267,12 +13569,12 @@
 behavior is undefined.
 We use the generic type name <code>gentype</code> to indicate the built-in OpenCL C scalar
 or vector integer or floating-point data types
-<sup class="footnote">[<a id="_footnoteref_72" class="footnote" href="#_footnotedef_72" title="View footnote.">72</a>]</sup> or any user defined type built from these
+<sup class="footnote">[<a id="_footnoteref_73" class="footnote" href="#_footnotedef_73" title="View footnote.">73</a>]</sup> or any user defined type built from these
 scalar and vector data types can be used as the type for the arguments to the
 pipe functions listed in the following table.</p>
 </div>
 <table id="table-builtin-pipe-work-group" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 33. Built-in Pipe Work-group Functions</caption>
+<caption class="title">Table 34. Built-in Pipe Work-group Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -13358,7 +13660,7 @@
 functions for a pipe.
 We use the generic type name <code>gentype</code> to indicate the built-in OpenCL C scalar
 or vector integer or floating-point data types
-<sup class="footnote">[<a id="_footnoteref_73" class="footnote" href="#_footnotedef_73" title="View footnote.">73</a>]</sup> or any user defined type built from these
+<sup class="footnote">[<a id="_footnoteref_74" class="footnote" href="#_footnotedef_74" title="View footnote.">74</a>]</sup> or any user defined type built from these
 scalar and vector data types can be used as the type for the arguments to the
 pipe functions listed in the following table.</p>
 </div>
@@ -13367,7 +13669,7 @@
 For pipe query functions this may be <code>read_only</code> or <code>write_only</code>.</p>
 </div>
 <table id="table-builtin-pipe-query" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 34. Built-in Pipe Query Functions</caption>
+<caption class="title">Table 35. Built-in Pipe Query Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -13456,7 +13758,7 @@
 <td class="content">
 The functionality described in this section <a href="#unified-spec">requires</a>
 support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the
-<code>__opencl_c_device_enqueue</code> feature.
+<code>__opencl_c_<wbr>device_<wbr>enqueue</code> feature.
 </td>
 </tr>
 </table>
@@ -13483,7 +13785,7 @@
 <div class="sect4">
 <h5 id="built-in-functions-enqueuing-a-kernel"><a class="anchor" href="#built-in-functions-enqueuing-a-kernel"></a>6.15.18.1. Built-in Functions - Enqueuing a kernel</h5>
 <table id="table-builtin-kernel-enqueue" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 35. Built-in Kernel Enqueue Functions</caption>
+<caption class="title">Table 36. Built-in Kernel Enqueue Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -13715,7 +14017,7 @@
 {
     ...
 
-    ndrange_t ndrange = ndrange_1d(...);
+    ndrange_t ndrange = ndrange_1D(...);
 
     uint local_mem_size = compute_local_mem_size();
 
@@ -13731,7 +14033,7 @@
 my_func_C(global <span class="predefined-type">int</span> *a, ...)
 {
     ...
-    ndrange_t ndrange = ndrange_1d(...);
+    ndrange_t ndrange = ndrange_1D(...);
 
     <span class="directive">void</span> (^my_blk_A)(local <span class="directive">void</span> *, local <span class="directive">void</span> *) =
         ^(local <span class="directive">void</span> *lptr1, local <span class="directive">void</span> *lptr2){
@@ -13779,7 +14081,7 @@
     {
         enqueue_kernel(q,
                        CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
-                       ndrange_1d(<span class="integer">1</span>),
+                       ndrange_1D(<span class="integer">1</span>),
                        ^{evaluate_dp_work_A(q, ...);});
     }
 }
@@ -13818,13 +14120,13 @@
 <div class="sect4">
 <h5 id="determining-when-a-child-kernel-begins-execution"><a class="anchor" href="#determining-when-a-child-kernel-begins-execution"></a>6.15.18.4. Determining when a child kernel begins execution</h5>
 <div class="paragraph">
-<p>The <code>kernel_enqueue_flags_t</code> <sup class="footnote">[<a id="_footnoteref_74" class="footnote" href="#_footnotedef_74" title="View footnote.">74</a>]</sup> argument
+<p>The <code>kernel_enqueue_flags_t</code> <sup class="footnote">[<a id="_footnoteref_75" class="footnote" href="#_footnotedef_75" title="View footnote.">75</a>]</sup> argument
 to the <code>enqueue_kernel</code> built-in functions can be used to specify when the child
 kernel begins execution.
 Supported values are described in the table below:</p>
 </div>
 <table id="table-kernel-enqueue-flags" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 36. Kernel Enqueue Flags</caption>
+<caption class="title">Table 37. Kernel Enqueue Flags</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -13842,14 +14144,14 @@
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>CLK_ENQUEUE_FLAGS_WAIT_KERNEL</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Indicates that all work-items of the parent kernel must finish
-      executing and all immediate <sup class="footnote">[<a id="_footnoteref_75" class="footnote" href="#_footnotedef_75" title="View footnote.">75</a>]</sup> side
+      executing and all immediate <sup class="footnote">[<a id="_footnoteref_76" class="footnote" href="#_footnotedef_76" title="View footnote.">76</a>]</sup> side
       effects committed before the enqueued child kernel may begin execution.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Indicates that the enqueued kernels wait only for the workgroup that
       enqueued the kernels to finish before they begin execution.
-      <sup class="footnote">[<a id="_footnoteref_76" class="footnote" href="#_footnotedef_76" title="View footnote.">76</a>]</sup></p></td>
+      <sup class="footnote">[<a id="_footnoteref_77" class="footnote" href="#_footnotedef_77" title="View footnote.">77</a>]</sup></p></td>
 </tr>
 </tbody>
 </table>
@@ -13903,7 +14205,7 @@
 <div class="openblock">
 <div class="content">
 <table id="table-builtin-kernel-query" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 37. Built-in Kernel Query Functions</caption>
+<caption class="title">Table 38. Built-in Kernel Query Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -13946,7 +14248,7 @@
 used to enqueue commands such as a marker.</p>
 </div>
 <table id="table-builtin-other-enqueue" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 38. Built-in Other Enqueue Functions</caption>
+<caption class="title">Table 39. Built-in Other Enqueue Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -14012,7 +14314,7 @@
 events.</p>
 </div>
 <table id="table-builtin-event" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 39. Built-in Event Functions</caption>
+<caption class="title">Table 40. Built-in Event Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -14234,7 +14536,7 @@
 <div class="openblock">
 <div class="content">
 <table id="table-builtin-helper" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 40. Built-in Helper Functions</caption>
+<caption class="title">Table 41. Built-in Helper Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -14292,21 +14594,21 @@
 </td>
 <td class="content">
 The functionality described in this section <a href="#unified-spec">requires</a>
-support for OpenCL C 3.0 or newer and the <code>__opencl_c_subgroups</code> feature.
+support for OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>subgroups</code> feature.
 </td>
 </tr>
 </table>
 </div>
 <div class="paragraph">
 <p>The table below describes OpenCL C programming language built-in functions that operate on a subgroup level.
-These built-in functions must be encountered by all work items in the subgroup executing the kernel.
+These built-in functions must be encountered by all work-items in the subgroup executing the kernel.
 For the functions below, the generic type name <code>gentype</code> may be the one of the
 supported built-in scalar data types <code>int</code>, <code>uint</code>, <code>long</code>
-<sup class="footnote">[<a id="_footnoteref_77" class="footnote" href="#_footnotedef_77" title="View footnote.">77</a>]</sup>, <code>ulong</code>, <code>half</code> <sup class="footnote">[<a id="_footnoteref_78" class="footnote" href="#_footnotedef_78" title="View footnote.">78</a>]</sup>,
-<code>float</code>, and <code>double</code> <sup class="footnote">[<a id="_footnoteref_79" class="footnote" href="#_footnotedef_79" title="View footnote.">79</a>]</sup>.</p>
+<sup class="footnote">[<a id="_footnoteref_78" class="footnote" href="#_footnotedef_78" title="View footnote.">78</a>]</sup>, <code>ulong</code>, <code>half</code> <sup class="footnote">[<a id="_footnoteref_79" class="footnote" href="#_footnotedef_79" title="View footnote.">79</a>]</sup>,
+<code>float</code>, and <code>double</code> <sup class="footnote">[<a id="_footnoteref_80" class="footnote" href="#_footnotedef_80" title="View footnote.">80</a>]</sup>.</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 41. Built-in Subgroup Collective Functions</caption>
+<caption class="title">Table 42. Built-in Subgroup Collective Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -14320,37 +14622,37 @@
 <tbody>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">int <strong>sub_group_all</strong> (int <em>predicate</em>)</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Evaluates <em>predicate</em> for all work items in the subgroup and returns a
-  non-zero value if <em>predicate</em> evaluates to non-zero for all work items in
+<td class="tableblock halign-left valign-top"><p class="tableblock">Evaluates <em>predicate</em> for all work-items in the subgroup and returns a
+  non-zero value if <em>predicate</em> evaluates to non-zero for all work-items in
   the subgroup.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">int <strong>sub_group_any</strong> (int <em>predicate</em>)</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Evaluates <em>predicate</em> for all work items in the subgroup and returns a
-  non-zero value if <em>predicate</em> evaluates to non-zero for any work items in
+<td class="tableblock halign-left valign-top"><p class="tableblock">Evaluates <em>predicate</em> for all work-items in the subgroup and returns a
+  non-zero value if <em>predicate</em> evaluates to non-zero for any work-items in
   the subgroup.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>sub_group_broadcast</strong> (<br>
   gentype <em>x</em>, uint <em>sub_group_local_id</em>)</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Broadcast the value of <em>x</em> for work item identified by
+<td class="tableblock halign-left valign-top"><p class="tableblock">Broadcast the value of <em>x</em> for work-item identified by
   <em>sub_group_local_id</em> (value returned by <strong>get_sub_group_local_id</strong>) to all
-  work items in the subgroup.</p>
-<p class="tableblock">  <em>sub_group_local_id</em> must be the same value for all work items in the
+  work-items in the subgroup.</p>
+<p class="tableblock">  <em>sub_group_local_id</em> must be the same value for all work-items in the
   subgroup.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>sub_group_reduce_&lt;op&gt;</strong> (<br>
   gentype <em>x</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Return result of reduction operation specified by <strong>&lt;op&gt;</strong> for all values of
-  <em>x</em> specified by work items in a subgroup.</p></td>
+  <em>x</em> specified by work-items in a subgroup.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>sub_group_scan_exclusive_&lt;op&gt;</strong> (<br>
   gentype <em>x</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Do an exclusive scan operation specified by <strong>&lt;op&gt;</strong> of all values specified
-  by work items in a subgroup.
-  The scan results are returned for each work item.</p>
+  by work-items in a subgroup.
+  The scan results are returned for each work-item.</p>
 <p class="tableblock">  The scan order is defined by increasing subgroup local ID within the
   subgroup.</p></td>
 </tr>
@@ -14358,8 +14660,8 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock">gentype <strong>sub_group_scan_inclusive_&lt;op&gt;</strong> (<br>
   gentype <em>x</em>)</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Do an inclusive scan operation specified by <strong>&lt;op&gt;</strong> of all values specified
-  by work items in a subgroup.
-  The scan results are returned for each work item.</p>
+  by work-items in a subgroup.
+  The scan results are returned for each work-item.</p>
 <p class="tableblock">  The scan order is defined by increasing subgroup local ID within the
   subgroup.</p></td>
 </tr>
@@ -14402,8 +14704,8 @@
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
-The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL C 3.0 or newer and the <code>__opencl_c_subgroups</code>
-and <code>__opencl_c_pipes</code> features.
+The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>subgroups</code>
+and <code>__opencl_c_<wbr>pipes</code> features.
 </td>
 </tr>
 </table>
@@ -14411,7 +14713,7 @@
 <div class="paragraph">
 <p>The following table describes built-in pipe functions that operate at a
 subgroup level.
-These built-in functions must be encountered by all work items in a subgroup
+These built-in functions must be encountered by all work-items in a subgroup
 executing the kernel with the same argument values, otherwise the behavior
 is undefined.
 We use the generic type name <code>gentype</code> to indicate the built-in OpenCL C
@@ -14420,7 +14722,7 @@
 for the arguments to the pipe functions listed in <em>table 6.29</em>.</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 42. Built-in Subgroup Pipe Functions</caption>
+<caption class="title">Table 43. Built-in Subgroup Pipe Functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -14460,7 +14762,7 @@
 <div class="paragraph">
 <p>Note: Reservations made by a subgroup are ordered in the pipe as they are
 ordered in the program.
-Reservations made by different subgroups that belong to the same work group
+Reservations made by different subgroups that belong to the same work-group
 can be ordered using subgroup synchronization.
 The order of subgroup based reservations that belong to different work
 groups is implementation defined.</p>
@@ -14472,8 +14774,8 @@
 <i class="fa icon-note" title="Note"></i>
 </td>
 <td class="content">
-The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL C 3.0 or newer and the <code>__opencl_c_subgroups</code>
-and <code>__opencl_c_device_enqueue</code> features.
+The functionality described in the following table <a href="#unified-spec">requires</a> support for OpenCL C 3.0 or newer and the <code>__opencl_c_<wbr>subgroups</code>
+and <code>__opencl_c_<wbr>device_<wbr>enqueue</code> features.
 </td>
 </tr>
 </table>
@@ -14483,7 +14785,7 @@
 information for a block to be enqueued.</p>
 </div>
 <table class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 43. Built-in Subgroup Kernel Query Functions</caption>
+<caption class="title">Table 44. Built-in Subgroup Kernel Query Functions</caption>
 <colgroup>
 <col style="width: 55.5555%;">
 <col style="width: 44.4445%;">
@@ -14502,9 +14804,9 @@
 <p class="tableblock">  uint <strong>get_kernel_sub_group_count_for_ndrange</strong> (<br>
   const ndrange_t <em>ndrange</em>,<br>
   void (^block)(local void *, &#8230;&#8203;));</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of subgroups in each work group of the dispatch (except
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of subgroups in each work-group of the dispatch (except
   for the last in cases where the global size does not divide cleanly into
-  work groups) given the combination of the passed ndrange and block.</p>
+  work-groups) given the combination of the passed ndrange and block.</p>
 <p class="tableblock">  <em>block</em> specifies the block to be enqueued.</p></td>
 </tr>
 <tr>
@@ -14564,7 +14866,7 @@
 <p><em>Round to nearest</em> <em>even</em> is currently the only rounding mode required by the
 OpenCL specification for single precision and double precision operations and is
 therefore the default rounding mode
-<sup class="footnote">[<a id="_footnoteref_80" class="footnote" href="#_footnotedef_80" title="View footnote.">80</a>]</sup>.
+<sup class="footnote">[<a id="_footnoteref_81" class="footnote" href="#_footnotedef_81" title="View footnote.">81</a>]</sup>.
 In addition, only static selection of rounding mode is supported.
 Dynamically reconfiguring the rounding modes as specified by the IEEE 754
 spec is unsupported.</p>
@@ -14669,8 +14971,14 @@
 is the infinitely precise result.
 0 ulp is used for math functions that do not require rounding.</p>
 </div>
+<div class="paragraph">
+<p>Result overflow within the specified ULP error is permitted. Math functions are
+allowed to return infinity for a finite reference value when the next
+floating-point number that would be representable after the finite maximum, if
+there was sufficient range, meets ULP error tolerance.</p>
+</div>
 <table id="table-ulp-float-math" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 44. ULP values for single precision built-in math functions</caption>
+<caption class="title">Table 45. ULP values for single precision built-in math functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -15177,7 +15485,7 @@
 0 ulp is used for math functions that do not require rounding.</p>
 </div>
 <table id="table-ulp-embedded" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 45. ULP values for the embedded profile</caption>
+<caption class="title">Table 46. ULP values for the embedded profile</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -15685,7 +15993,7 @@
 operations and builtins with <code>-cl-unsafe-math-optimizations</code> <a href="#unified-spec">requires</a> support for OpenCL C 2.0 or newer.</p>
 </div>
 <table id="table-float-ulp-relaxed" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 46. ULP values for single precision built-in math functions with unsafe math optimizations in the full and embedded profiles</caption>
+<caption class="title">Table 47. ULP values for single precision built-in math functions with unsafe math optimizations in the full and embedded profiles</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -15827,7 +16135,7 @@
       -<strong>exp2</strong>(<em>y</em> * <strong>log2</strong>(<strong>fabs</strong>(<em>x</em>)).
       For <em>x</em> == 0 and nonzero <em>y</em>, derived implementations return zero.
       For non-derived implementations, the error is ≤ 8192 ulp.
-      <sup class="footnote">[<a id="_footnoteref_81" class="footnote" href="#_footnotedef_81" title="View footnote.">81</a>]</sup></p></td>
+      <sup class="footnote">[<a id="_footnoteref_82" class="footnote" href="#_footnotedef_82" title="View footnote.">82</a>]</sup></p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><strong>pown</strong>(<em>x</em>, <em>y</em>)</p></td>
@@ -15905,7 +16213,7 @@
 0 ulp is used for math functions that do not require rounding.</p>
 </div>
 <table id="table-ulp-double" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 47. ULP values for double precision built-in math functions</caption>
+<caption class="title">Table 48. ULP values for double precision built-in math functions</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -16633,7 +16941,7 @@
 They do not include ±0.
 A non-zero number is said to be sub-normal before rounding if after
 normalization, its radix-2 exponent is less than (<code>TYPE_MIN_EXP</code> - 1)
-<sup class="footnote">[<a id="_footnoteref_82" class="footnote" href="#_footnotedef_82" title="View footnote.">82</a>]</sup>.</p>
+<sup class="footnote">[<a id="_footnoteref_83" class="footnote" href="#_footnotedef_83" title="View footnote.">83</a>]</sup>.</p>
 </div>
 </div>
 </div>
@@ -16711,7 +17019,7 @@
 <p>The following table describes the address_mode function.</p>
 </div>
 <table id="table-address-modes-texel-location" class="tableblock frame-all grid-all stretch">
-<caption class="title">Table 48. Addressing modes to generate texel location</caption>
+<caption class="title">Table 49. Addressing modes to generate texel location</caption>
 <colgroup>
 <col style="width: 50%;">
 <col style="width: 50%;">
@@ -17766,6 +18074,13 @@
 Default RGB colour space - sRGB&#8221;,
 <a href="https://webstore.iec.ch/publication/6169" class="bare">https://webstore.iec.ch/publication/6169</a> .</p>
 </li>
+<li>
+<p><a id="embedded-c-spec"></a> &#8220;ISO/IEC TR 18037:2008 Programming languages -
+C - Extensions to support embedded processors&#8221;,
+<a href="https://www.iso.org/standard/51126.html" class="bare">https://www.iso.org/standard/51126.html</a> .
+References are to sections of this specific version, referred to as the
+&#8220;Embedded C Specification&#8221;, although other versions exist.</p>
+</li>
 </ol>
 </div>
 </div>
@@ -17858,133 +18173,133 @@
 <a href="#_footnoteref_28">28</a>. Unless the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
 </div>
 <div class="footnote" id="_footnotedef_29">
-<a href="#_footnoteref_29">29</a>. This syntax is already part of the clang source tree on which most vendors have based their OpenCL implementations. Additionally, blocks based closures are supported by the clang open source C compiler as well as Mac OS X&#8217;s C and Objective C compilers. Specifically, Mac OS X&#8217;s Grand Central Dispatch allows applications to queue tasks as a block.
+<a href="#_footnoteref_29">29</a>. When OpenCL C is compiled offline, <code>__OPENCL_VERSION__</code> may be defined and may substitute any implementation-defined integer value.
 </div>
 <div class="footnote" id="_footnotedef_30">
-<a href="#_footnoteref_30">30</a>. OpenCL C <a href="#restrictions">does not allow function pointers</a> primarily because it is difficult or expensive to implement generic indirections to executable code in many hardware architectures that OpenCL targets. OpenCL C&#8217;s design of Blocks is intended to respect that same condition, yielding the restrictions listed here. As such, Blocks allow a form of dynamically enqueued function scheduling without providing a form of runtime synchronous dynamic dispatch analogous to function pointers.
+<a href="#_footnoteref_30">30</a>. This syntax is already part of the clang source tree on which most vendors have based their OpenCL implementations. Additionally, blocks based closures are supported by the clang open source C compiler as well as Mac OS X&#8217;s C and Objective C compilers. Specifically, Mac OS X&#8217;s Grand Central Dispatch allows applications to queue tasks as a block.
 </div>
 <div class="footnote" id="_footnotedef_31">
-<a href="#_footnoteref_31">31</a>. I.e. the <em>global_work_size</em> values specified to <strong>clEnqueueNDRangeKernel</strong> are not evenly divisible by the <em>local_work_size</em> values for each dimension.
+<a href="#_footnoteref_31">31</a>. OpenCL C <a href="#restrictions">does not allow function pointers</a> primarily because it is difficult or expensive to implement generic indirections to executable code in many hardware architectures that OpenCL targets. OpenCL C&#8217;s design of Blocks is intended to respect that same condition, yielding the restrictions listed here. As such, Blocks allow a form of dynamically enqueued function scheduling without providing a form of runtime synchronous dynamic dispatch analogous to function pointers.
 </div>
 <div class="footnote" id="_footnotedef_32">
-<a href="#_footnoteref_32">32</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_32">32</a>. I.e. the <em>global_work_size</em> values specified to <strong>clEnqueueNDRangeKernel</strong> are not evenly divisible by the <em>local_work_size</em> values for each dimension.
 </div>
 <div class="footnote" id="_footnotedef_33">
-<a href="#_footnoteref_33">33</a>. <strong>fmin</strong> and <strong>fmax</strong> behave as defined by C99 and may not match the IEEE 754-2008 definition for <strong>minNum</strong> and <strong>maxNum</strong> with regard to signaling NaNs. Specifically, signaling NaNs may behave as quiet NaNs.
+<a href="#_footnoteref_33">33</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_34">
-<a href="#_footnoteref_34">34</a>. The <strong>min</strong>() operator is there to prevent <strong>fract</strong>(-small) from returning 1.0. It returns the largest positive floating-point number less than 1.0.
+<a href="#_footnoteref_34">34</a>. <strong>fmin</strong> and <strong>fmax</strong> behave as defined by C99 and may not match the IEEE 754-2008 definition for <strong>minNum</strong> and <strong>maxNum</strong> with regard to signaling NaNs. Specifically, signaling NaNs may behave as quiet NaNs.
 </div>
 <div class="footnote" id="_footnotedef_35">
-<a href="#_footnoteref_35">35</a>. The user is cautioned that for some usages, e.g. <strong>mad</strong>(a, b, -a*b), the definition of <strong>mad</strong>() is loose enough in the embedded profile that almost any result is allowed from <strong>mad</strong>() for some values of a and b.
+<a href="#_footnoteref_35">35</a>. The <strong>min</strong>() operator is there to prevent <strong>fract</strong>(-small) from returning 1.0. It returns the largest positive floating-point number less than 1.0.
 </div>
 <div class="footnote" id="_footnotedef_36">
-<a href="#_footnoteref_36">36</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
+<a href="#_footnoteref_36">36</a>. The user is cautioned that for some usages, e.g. <strong>mad</strong>(a, b, -a*b), the definition of <strong>mad</strong>() is loose enough in the embedded profile that almost any result is allowed from <strong>mad</strong>() for some values of a and b.
 </div>
 <div class="footnote" id="_footnotedef_37">
-<a href="#_footnoteref_37">37</a>. Frequently vector operations need n + 1 bits temporarily to calculate a result. The <strong>rhadd</strong> instruction gives you an extra bit without needing to upsample and downsample. This can be a profound performance win.
+<a href="#_footnoteref_37">37</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_38">
-<a href="#_footnoteref_38">38</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_38">38</a>. Frequently vector operations need n + 1 bits temporarily to calculate a result. The <strong>rhadd</strong> instruction gives you an extra bit without needing to upsample and downsample. This can be a profound performance win.
 </div>
 <div class="footnote" id="_footnotedef_39">
 <a href="#_footnoteref_39">39</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_40">
-<a href="#_footnoteref_40">40</a>. If an implementation extends this specification to support IEEE-754 flags or exceptions, then all built-in functions defined in the following table shall proceed without raising the <em>invalid</em> floating-point exception when one or more of the operands are NaNs.
+<a href="#_footnoteref_40">40</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_41">
-<a href="#_footnoteref_41">41</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
+<a href="#_footnoteref_41">41</a>. If an implementation extends this specification to support IEEE-754 flags or exceptions, then all built-in functions defined in the following table shall proceed without raising the <em>invalid</em> floating-point exception when one or more of the operands are NaNs.
 </div>
 <div class="footnote" id="_footnotedef_42">
-<a href="#_footnoteref_42">42</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_42">42</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_43">
-<a href="#_footnoteref_43">43</a>. This definition means that the behavior of select and the ternary operator for vector and scalar types is dependent on different interpretations of the bit pattern of <em>c</em>.
+<a href="#_footnoteref_43">43</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_44">
-<a href="#_footnoteref_44">44</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
+<a href="#_footnoteref_44">44</a>. This definition means that the behavior of select and the ternary operator for vector and scalar types is dependent on different interpretations of the bit pattern of <em>c</em>.
 </div>
 <div class="footnote" id="_footnotedef_45">
-<a href="#_footnoteref_45">45</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_45">45</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_46">
-<a href="#_footnoteref_46">46</a>. <strong>vload3</strong> and <strong>vload_half3</strong> read (<em>x</em>,<em>y</em>,<em>z</em>) components from address <code>(<em>p</em> + (<em>offset</em> * 3))</code> into a 3-component vector. <strong>vstore3</strong> and <strong>vstore_half3</strong> write (<em>x</em>,<em>y</em>,<em>z</em>) components from a 3-component vector to address <code>(<em>p</em> + (<em>offset</em> * 3))</code>. In addition, <strong>vloada_half3</strong> reads (<em>x</em>,<em>y</em>,<em>z</em>) components from address <code>(<em>p</em> + (<em>offset</em> * 4))</code> into a 3-component vector and <strong>vstorea_half3</strong> writes (<em>x</em>,<em>y</em>,<em>z</em>) components from a 3-component vector to address <code>(<em>p</em> +  (<em>offset</em> * 4))</code>. Whether <strong>vloada_half3</strong> and <strong>vstorea_half3</strong> read/write padding data between the third vector element and the next alignment boundary is implementation defined. The <strong>vloada_</strong> and <strong>vstorea_</strong> variants are provided to access data that is aligned to the size of the vector, and are intended to enable performance on hardware that can take advantage of the increased alignment.
+<a href="#_footnoteref_46">46</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_47">
-<a href="#_footnoteref_47">47</a>. Refer to the description and restrictions for <a href="#memory-scope"><code>memory_scope</code></a>.
+<a href="#_footnoteref_47">47</a>. <strong>vload3</strong> and <strong>vload_half3</strong> read (<em>x</em>,<em>y</em>,<em>z</em>) components from address <code>(<em>p</em> + (<em>offset</em> * 3))</code> into a 3-component vector. <strong>vstore3</strong> and <strong>vstore_half3</strong> write (<em>x</em>,<em>y</em>,<em>z</em>) components from a 3-component vector to address <code>(<em>p</em> + (<em>offset</em> * 3))</code>. In addition, <strong>vloada_half3</strong> reads (<em>x</em>,<em>y</em>,<em>z</em>) components from address <code>(<em>p</em> + (<em>offset</em> * 4))</code> into a 3-component vector and <strong>vstorea_half3</strong> writes (<em>x</em>,<em>y</em>,<em>z</em>) components from a 3-component vector to address <code>(<em>p</em> +  (<em>offset</em> * 4))</code>. Whether <strong>vloada_half3</strong> and <strong>vstorea_half3</strong> read/write padding data between the third vector element and the next alignment boundary is implementation defined. The <strong>vloada_</strong> and <strong>vstorea_</strong> variants are provided to access data that is aligned to the size of the vector, and are intended to enable performance on hardware that can take advantage of the increased alignment.
 </div>
 <div class="footnote" id="_footnotedef_48">
-<a href="#_footnoteref_48">48</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
+<a href="#_footnoteref_48">48</a>. Refer to the description and restrictions for <a href="#memory-scope"><code>memory_scope</code></a>.
 </div>
 <div class="footnote" id="_footnotedef_49">
-<a href="#_footnoteref_49">49</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_49">49</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_50">
-<a href="#_footnoteref_50">50</a>. <strong>async_work_group_copy</strong> and <strong>async_work_group_strided_copy</strong> for 3-component vector types behave as <strong>async_work_group_copy</strong> and <strong>async_work_group_strided_copy</strong> respectively for 4-component vector types.
+<a href="#_footnoteref_50">50</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_51">
-<a href="#_footnoteref_51">51</a>. The <a href="#C11-spec">C11</a> consume operation is not supported.
+<a href="#_footnoteref_51">51</a>. <strong>async_work_group_copy</strong> and <strong>async_work_group_strided_copy</strong> for 3-component vector types behave as <strong>async_work_group_copy</strong> and <strong>async_work_group_strided_copy</strong> respectively for 4-component vector types.
 </div>
 <div class="footnote" id="_footnotedef_52">
-<a href="#_footnoteref_52">52</a>. The atomic_long and atomic_ulong types are supported if the <strong>cl_khr_int64_base_atomics</strong> and <strong>cl_khr_int64_extended_atomics</strong> extensions are supported and have been enabled. If this is the case then an OpenCL C 3.0 compiler must also define the <code>__opencl_c_int64</code> feature.
+<a href="#_footnoteref_52">52</a>. The <a href="#C11-spec">C11</a> consume operation is not supported.
 </div>
 <div class="footnote" id="_footnotedef_53">
-<a href="#_footnoteref_53">53</a>. The <code>atomic_double</code> type is only supported if double precision is supported and the <strong>cl_khr_int64_base_atomics</strong> and <strong>cl_khr_int64_extended_atomics</strong> extensions are supported and have been enabled. If this is the case then an OpenCL C 3.0 compiler must also define the <code>__opencl_c_fp64</code> feature.
+<a href="#_footnoteref_53">53</a>. The atomic_long and atomic_ulong types are supported if the <strong>cl_khr_int64_base_atomics</strong> and <strong>cl_khr_int64_extended_atomics</strong> extensions are supported and have been enabled. If this is the case then an OpenCL C 3.0 compiler must also define the <code>__opencl_c_int64</code> feature.
 </div>
 <div class="footnote" id="_footnotedef_54">
-<a href="#_footnoteref_54">54</a>. If the device address space is 64-bits, the data types <code>atomic_intptr_t</code>, <code>atomic_uintptr_t</code>, <code>atomic_size_t</code> and <code>atomic_ptrdiff_t</code> are supported if the <strong>cl_khr_int64_base_atomics</strong> and <strong>cl_khr_int64_extended_atomics</strong> extensions are supported and have been enabled.
+<a href="#_footnoteref_54">54</a>. The <code>atomic_double</code> type is only supported if double precision is supported and the <strong>cl_khr_int64_base_atomics</strong> and <strong>cl_khr_int64_extended_atomics</strong> extensions are supported and have been enabled. If this is the case then an OpenCL C 3.0 compiler must also define the <code>__opencl_c_fp64</code> feature.
 </div>
 <div class="footnote" id="_footnotedef_55">
-<a href="#_footnoteref_55">55</a>. This spurious failure enables implementation of compare-and-exchange on a broader class of machines, e.g. load-locked store-conditional machines.
+<a href="#_footnoteref_55">55</a>. If the device address space is 64-bits, the data types <code>atomic_intptr_t</code>, <code>atomic_uintptr_t</code>, <code>atomic_size_t</code> and <code>atomic_ptrdiff_t</code> are supported if the <strong>cl_khr_int64_base_atomics</strong> and <strong>cl_khr_int64_extended_atomics</strong> extensions are supported and have been enabled.
 </div>
 <div class="footnote" id="_footnotedef_56">
-<a href="#_footnoteref_56">56</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
+<a href="#_footnoteref_56">56</a>. This spurious failure enables implementation of compare-and-exchange on a broader class of machines, e.g. load-locked store-conditional machines.
 </div>
 <div class="footnote" id="_footnotedef_57">
 <a href="#_footnoteref_57">57</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_58">
-<a href="#_footnoteref_58">58</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_58">58</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
 </div>
 <div class="footnote" id="_footnotedef_59">
-<a href="#_footnoteref_59">59</a>. Note that <strong>0</strong> is taken as a flag, not as the beginning of a field width.
+<a href="#_footnoteref_59">59</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_60">
-<a href="#_footnoteref_60">60</a>. The results of all floating conversions of a negative zero, and of negative values that round to zero, include a minus sign.
+<a href="#_footnoteref_60">60</a>. Note that <strong>0</strong> is taken as a flag, not as the beginning of a field width.
 </div>
 <div class="footnote" id="_footnotedef_61">
-<a href="#_footnoteref_61">61</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
+<a href="#_footnoteref_61">61</a>. The results of all floating conversions of a negative zero, and of negative values that round to zero, include a minus sign.
 </div>
 <div class="footnote" id="_footnotedef_62">
-<a href="#_footnoteref_62">62</a>. When applied to infinite and NaN values, the <strong>-</strong>, <strong>+</strong>, and <em>space</em> flag characters have their usual meaning; the <strong>#</strong> and <strong>0</strong> flag characters have no effect.
+<a href="#_footnoteref_62">62</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
 </div>
 <div class="footnote" id="_footnotedef_63">
-<a href="#_footnoteref_63">63</a>. Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries.
+<a href="#_footnoteref_63">63</a>. When applied to infinite and NaN values, the <strong>-</strong>, <strong>+</strong>, and <em>space</em> flag characters have their usual meaning; the <strong>#</strong> and <strong>0</strong> flag characters have no effect.
 </div>
 <div class="footnote" id="_footnotedef_64">
-<a href="#_footnoteref_64">64</a>. No special provisions are made for multibyte characters. The behavior of <strong>printf</strong> with the <strong>s</strong> conversion specifier is undefined if the argument value is not a pointer to a literal string.
+<a href="#_footnoteref_64">64</a>. Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries.
 </div>
 <div class="footnote" id="_footnotedef_65">
-<a href="#_footnoteref_65">65</a>. This is similar to the <code>GL_ADDRESS_CLAMP_TO_BORDER</code> addressing mode.
+<a href="#_footnoteref_65">65</a>. No special provisions are made for multibyte characters. The behavior of <strong>printf</strong> with the <strong>s</strong> conversion specifier is undefined if the argument value is not a pointer to a literal string.
 </div>
 <div class="footnote" id="_footnotedef_66">
-<a href="#_footnoteref_66">66</a>. Note that the built-in function calls to read images with a sampler are not supported for <code>image1d_buffer_t</code> image types.
+<a href="#_footnoteref_66">66</a>. This is similar to the <code>GL_ADDRESS_CLAMP_TO_BORDER</code> addressing mode.
 </div>
 <div class="footnote" id="_footnotedef_67">
-<a href="#_footnoteref_67">67</a>. Although <code>CL_UNORM_INT_101010_2</code> was added in OpenCL 2.1, because there was no OpenCL C 2.1 this image channel order <a href="#unified-spec">requires</a> OpenCL 3.0.
+<a href="#_footnoteref_67">67</a>. Note that the built-in function calls to read images with a sampler are not supported for <code>image1d_buffer_t</code> image types.
 </div>
 <div class="footnote" id="_footnotedef_68">
-<a href="#_footnoteref_68">68</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
+<a href="#_footnoteref_68">68</a>. Although <code>CL_UNORM_INT_101010_2</code> was added in OpenCL 2.1, because there was no OpenCL C 2.1 this image channel order <a href="#unified-spec">requires</a> OpenCL 3.0.
 </div>
 <div class="footnote" id="_footnotedef_69">
-<a href="#_footnoteref_69">69</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
+<a href="#_footnoteref_69">69</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
 </div>
 <div class="footnote" id="_footnotedef_70">
-<a href="#_footnoteref_70">70</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_70">70</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_71">
-<a href="#_footnoteref_71">71</a>. The <code>half</code> scalar and vector types can only be used if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled. The <code>double</code> scalar and vector types can only be used if <code>double</code> precision is supported, e.g. for OpenCL C 3.0 the <code>__opencl_c_fp64</code> feature macro is present.
+<a href="#_footnoteref_71">71</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_72">
 <a href="#_footnoteref_72">72</a>. The <code>half</code> scalar and vector types can only be used if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled. The <code>double</code> scalar and vector types can only be used if <code>double</code> precision is supported, e.g. for OpenCL C 3.0 the <code>__opencl_c_fp64</code> feature macro is present.
@@ -17993,37 +18308,40 @@
 <a href="#_footnoteref_73">73</a>. The <code>half</code> scalar and vector types can only be used if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled. The <code>double</code> scalar and vector types can only be used if <code>double</code> precision is supported, e.g. for OpenCL C 3.0 the <code>__opencl_c_fp64</code> feature macro is present.
 </div>
 <div class="footnote" id="_footnotedef_74">
-<a href="#_footnoteref_74">74</a>. Implementations are not required to honor this flag. Implementations may not schedule kernel launch earlier than the point specified by this flag, however.
+<a href="#_footnoteref_74">74</a>. The <code>half</code> scalar and vector types can only be used if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled. The <code>double</code> scalar and vector types can only be used if <code>double</code> precision is supported, e.g. for OpenCL C 3.0 the <code>__opencl_c_fp64</code> feature macro is present.
 </div>
 <div class="footnote" id="_footnotedef_75">
-<a href="#_footnoteref_75">75</a>. Immediate meaning not side effects resulting from child kernels. The side effects would include stores to <code>global</code> memory and pipe reads and writes.
+<a href="#_footnoteref_75">75</a>. Implementations are not required to honor this flag. Implementations may not schedule kernel launch earlier than the point specified by this flag, however.
 </div>
 <div class="footnote" id="_footnotedef_76">
-<a href="#_footnoteref_76">76</a>. This acts as a memory synchronization point between work-items in a work-group and child kernels enqueued by work-items in the work-group.
+<a href="#_footnoteref_76">76</a>. Immediate meaning not side effects resulting from child kernels. The side effects would include stores to <code>global</code> memory and pipe reads and writes.
 </div>
 <div class="footnote" id="_footnotedef_77">
-<a href="#_footnoteref_77">77</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
+<a href="#_footnoteref_77">77</a>. This acts as a memory synchronization point between work-items in a work-group and child kernels enqueued by work-items in the work-group.
 </div>
 <div class="footnote" id="_footnotedef_78">
-<a href="#_footnoteref_78">78</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
+<a href="#_footnoteref_78">78</a>. Only if 64-bit integers are supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_int64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_79">
-<a href="#_footnoteref_79">79</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
+<a href="#_footnoteref_79">79</a>. Only if the <strong>cl_khr_fp16</strong> extension is supported and has been enabled.
 </div>
 <div class="footnote" id="_footnotedef_80">
-<a href="#_footnoteref_80">80</a>. Except for the embedded profile where either round to zero or round to nearest rounding mode may be supported for single precision floating-point.
+<a href="#_footnoteref_80">80</a>. Only if double precision is supported. In OpenCL C 3.0 this will be indicated by the presence of the <code>__opencl_c_fp64</code> feature macro.
 </div>
 <div class="footnote" id="_footnotedef_81">
-<a href="#_footnoteref_81">81</a>. On some implementations, <strong>powr</strong>() or <strong>pown</strong>() may perform faster than <strong>pow</strong>(). If <em>x</em> is known to be &gt;= 0, consider using <strong>powr</strong>() in place of <strong>pow</strong>(), or if <em>y</em> is known to be an integer, consider using <strong>pown</strong>() in place of <strong>pow</strong>().
+<a href="#_footnoteref_81">81</a>. Except for the embedded profile where either round to zero or round to nearest rounding mode may be supported for single precision floating-point.
 </div>
 <div class="footnote" id="_footnotedef_82">
-<a href="#_footnoteref_82">82</a>. Here <code>TYPE_MIN</code> and <code>TYPE_MIN_EXP</code> should be substituted by constants appropriate to the floating-point type under consideration, such as <code>FLT_MIN</code> and <code>FLT_MIN_EXP</code> for <code>float</code>.
+<a href="#_footnoteref_82">82</a>. On some implementations, <strong>powr</strong>() or <strong>pown</strong>() may perform faster than <strong>pow</strong>(). If <em>x</em> is known to be &gt;= 0, consider using <strong>powr</strong>() in place of <strong>pow</strong>(), or if <em>y</em> is known to be an integer, consider using <strong>pown</strong>() in place of <strong>pow</strong>().
+</div>
+<div class="footnote" id="_footnotedef_83">
+<a href="#_footnoteref_83">83</a>. Here <code>TYPE_MIN</code> and <code>TYPE_MIN_EXP</code> should be substituted by constants appropriate to the floating-point type under consideration, such as <code>FLT_MIN</code> and <code>FLT_MIN_EXP</code> for <code>float</code>.
 </div>
 </div>
 <div id="footer">
 <div id="footer-text">
-Version v3.0.5<br>
-Last updated 2020-09-29 16:22:41 -0700
+Version v3.0.6<br>
+Last updated 2020-12-18 09:20:25 -0800
 </div>
 </div>
 
diff --git a/specs/3.0-unified/html/OpenCL_Env.html b/specs/3.0-unified/html/OpenCL_Env.html
index 1f40fd5..3ad4e0d 100644
--- a/specs/3.0-unified/html/OpenCL_Env.html
+++ b/specs/3.0-unified/html/OpenCL_Env.html
@@ -841,9 +841,9 @@
 <h1>The OpenCL<sup>&#8482;</sup> SPIR-V Environment Specification</h1>
 <div class="details">
 <span id="author" class="author">Khronos<sup>&#174;</sup> OpenCL Working Group</span><br>
-<span id="revnumber">version v3.0.5,</span>
-<span id="revdate">Wed, 30 Sep 2020 00:00:00 +0000</span>
-<br><span id="revremark">from git branch: master commit: 4d8a36725aa8af9658ab5cb62fdbf52adb44bcca</span>
+<span id="revnumber">version v3.0.6,</span>
+<span id="revdate">Fri, 18 Dec 2020 12:00:00 +0000</span>
+<br><span id="revremark">from git branch: master commit: e9a4d468b1a0a38c1e10b8af484bb2bbb495e2b7</span>
 </div>
 <div id="toc" class="toc2">
 <div id="toctitle">Table of Contents</div>
@@ -860,6 +860,7 @@
 <li><a href="#_image_channel_data_type_mapping">2.7. Image Channel Data Type Mapping</a></li>
 <li><a href="#_kernels">2.8. Kernels</a></li>
 <li><a href="#_built_in_variables">2.9. Built-in Variables</a></li>
+<li><a href="#_alignment_of_types">2.10. Alignment of Types</a></li>
 </ul>
 </li>
 <li><a href="#required-capabilities">3. Required Capabilities</a>
@@ -1526,6 +1527,28 @@
 All built-in variables must be in the <strong>Input</strong> storage class.</p>
 </div>
 </div>
+<div class="sect2">
+<h3 id="_alignment_of_types"><a class="anchor" href="#_alignment_of_types"></a>2.10. Alignment of Types</h3>
+<div class="paragraph">
+<p>Objects of type <strong>OpTypeInt</strong>, <strong>OpTypeFloat</strong>, and <strong>OpTypePointer</strong> must be aligned
+in memory to the size of the type in bytes. Objects of type <strong>OpTypeVector</strong> with
+these component types must be aligned in memory to the size of the vector type
+in bytes. For 3-component vector types, the size of the vector type is four
+times the size the component type.</p>
+</div>
+<div class="paragraph">
+<p>The compiler is responsible for aligning objects allocated by <strong>OpVariable</strong> to
+the appropriate alignment as required by the <em>Result Type</em>.</p>
+</div>
+<div class="paragraph">
+<p>For <strong>OpTypePointer</strong> arguments to a function, the compiler may assume that the
+pointer is appropriately aligned as required by the <em>Type</em> that the pointer
+points to.</p>
+</div>
+<div class="paragraph">
+<p>Behavior of an unaligned load or store is undefined.</p>
+</div>
+</div>
 </div>
 </div>
 <div class="sect1">
@@ -1988,6 +2011,12 @@
 </ul>
 </div>
 <div class="paragraph">
+<p>In all OpenCL environments, for the <strong>Barrier Instruction</strong> <strong>OpControlBarrier</strong>,
+when the <em>Scope</em> for <em>Execution</em> is <strong>Subgroup</strong>, behavior is undefined unless
+all invocations in the subgroup execute the same dynamic instance of the
+instruction.</p>
+</div>
+<div class="paragraph">
 <p>In an OpenCL 1.2 environment,
 for the <strong>Atomic Instructions</strong>, the <em>Scope</em> for <em>Memory</em> must be <strong>Device</strong>,
 and the memory-order constraint in <em>Memory Semantics</em> must be <strong>Relaxed</strong>.
@@ -2382,6 +2411,360 @@
 <p>If the OpenCL environment supports the extension <code>cl_khr_spirv_no_integer_wrap_decoration</code> and use of the SPIR-V extension <code>SPV_KHR_no_integer_wrap_decoration</code> is declared in the module via <strong>OpExtension</strong>, then the environment must accept modules that include the <strong>NoSignedWrap</strong> or <strong>NoUnsignedWrap</strong> decorations.</p>
 </div>
 </div>
+<div class="sect3">
+<h4 id="_cl_khr_subgroup_extended_types"><a class="anchor" href="#_cl_khr_subgroup_extended_types"></a>5.2.14. <code>cl_khr_subgroup_extended_types</code></h4>
+<div class="paragraph">
+<p>If the OpenCL environment supports the extension <code>cl_khr_subgroup_extended_types</code>, then additional types are valid for the following for <strong>Groups</strong> instructions with <em>Scope</em> for <em>Execution</em> equal to <strong>Subgroup</strong>:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpGroupBroadcast</strong></p>
+</li>
+<li>
+<p><strong>OpGroupIAdd</strong>, <strong>OpGroupFAdd</strong></p>
+</li>
+<li>
+<p><strong>OpGroupSMin</strong>, <strong>OpGroupUMin</strong>, <strong>OpGroupFMin</strong></p>
+</li>
+<li>
+<p><strong>OpGroupSMax</strong>, <strong>OpGroupUMax</strong>, <strong>OpGroupFMax</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For these instructions, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Scalars of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half</code>, <code>float</code>, and <code>double</code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Additionally, for <strong>OpGroupBroadcast</strong>, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeVectors</strong> with 2, 3, 4, 8, or 16 <em>Component Count</em> components of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char<em>n</em></code>, <code>uchar<em>n</em></code>, <code>short<em>n</em></code>, <code>ushort<em>n</em></code>, <code>int<em>n</em></code>, <code>uint<em>n</em></code>, <code>long<em>n</em></code>, and <code>ulong<em>n</em></code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half<em>n</em></code>, <code>float<em>n</em></code>, and <code>double<em>n</em></code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_cl_khr_subgroup_non_uniform_vote"><a class="anchor" href="#_cl_khr_subgroup_non_uniform_vote"></a>5.2.15. <code>cl_khr_subgroup_non_uniform_vote</code></h4>
+<div class="paragraph">
+<p>If the OpenCL environment supports the extension <code>cl_khr_subgroup_non_uniform_vote</code>, then the environment must accept SPIR-V modules that declare the following SPIR-V capabilities:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>GroupNonUniform</strong></p>
+</li>
+<li>
+<p><strong>GroupNonUniformVote</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For instructions requiring these capabilities, <em>Scope</em> for <em>Execution</em> may be:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Subgroup</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the instruction <strong>OpGroupNonUniformAllEqual</strong>, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Scalars of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half</code>, <code>float</code>, and <code>double</code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_cl_khr_subgroup_ballot"><a class="anchor" href="#_cl_khr_subgroup_ballot"></a>5.2.16. <code>cl_khr_subgroup_ballot</code></h4>
+<div class="paragraph">
+<p>If the OpenCL environment supports the extension <code>cl_khr_subgroup_ballot</code>, then the environment must accept SPIR-V modules that declare the following SPIR-V capabilities:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>GroupNonUniformBallot</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For instructions requiring these capabilities, <em>Scope</em> for <em>Execution</em> may be:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Subgroup</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the non-uniform broadcast instruction <strong>OpGroupNonUniformBroadcast</strong>, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Scalars of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half</code>, <code>float</code>, and <code>double</code>)</p>
+</li>
+</ul>
+</div>
+</li>
+<li>
+<p><strong>OpTypeVectors</strong> with 2, 3, 4, 8, or 16 <em>Component Count</em> components of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char<em>n</em></code>, <code>uchar<em>n</em></code>, <code>short<em>n</em></code>, <code>ushort<em>n</em></code>, <code>int<em>n</em></code>, <code>uint<em>n</em></code>, <code>long<em>n</em></code>, and <code>ulong<em>n</em></code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half<em>n</em></code>, <code>float<em>n</em></code>, and <code>double<em>n</em></code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the instruction <strong>OpGroupNonUniformBroadcastFirst</strong>, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Scalars of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half</code>, <code>float</code>, and <code>double</code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the instruction <strong>OpGroupNonUniformBallot</strong>, the valid <em>Result Type</em> is an <strong>OpTypeVector</strong> with four <em>Component Count</em> components of <strong>OpTypeInt</strong>, with <em>Width</em> equal to 32 and <em>Signedness</em> equal to 0 (equivalent to <code>uint4</code>).</p>
+</div>
+<div class="paragraph">
+<p>For the instructions <strong>OpGroupNonUniformInverseBallot</strong>, <strong>OpGroupNonUniformBallotBitExtract</strong>, <strong>OpGroupNonUniformBallotBitCount</strong>, <strong>OpGroupNonUniformBallotFindLSB</strong>, and <strong>OpGroupNonUniformBallotFindMSB</strong>, the valid type for <em>Value</em> is an <strong>OpTypeVector</strong> with four <em>Component Count</em> components of <strong>OpTypeInt</strong>, with <em>Width</em> equal to 32 and <em>Signedness</em> equal to 0 (equivalent to <code>uint4</code>).</p>
+</div>
+<div class="paragraph">
+<p>For built-in variables decorated with <strong>SubgroupEqMask</strong>, <strong>SubgroupGeMask</strong>, <strong>SubgroupGtMask</strong>, <strong>SubgroupLeMask</strong>, or <strong>SubgroupLtMask</strong>, the supported variable type is an <strong>OpTypeVector</strong> with four <em>Component Count</em> components of <strong>OpTypeInt</strong>, with <em>Width</em> equal to 32 and <em>Signedness</em> equal to 0 (equivalent to <code>uint4</code>).</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_cl_khr_subgroup_non_uniform_arithmetic"><a class="anchor" href="#_cl_khr_subgroup_non_uniform_arithmetic"></a>5.2.17. <code>cl_khr_subgroup_non_uniform_arithmetic</code></h4>
+<div class="paragraph">
+<p>If the OpenCL environment supports the extension <code>cl_khr_subgroup_non_uniform_arithmetic</code>, then the environment must accept SPIR-V modules that declare the following SPIR-V capabilities:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>GroupNonUniformArithmetic</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For instructions requiring these capabilities, <em>Scope</em> for <em>Execution</em> may be:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Subgroup</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the instructions <strong>OpGroupNonUniformLogicalAnd</strong>, <strong>OpGroupNonUniformLogicalOr</strong>, and <strong>OpGroupNonUniformLogicalXor</strong>, the valid type for <em>Value</em> is <strong>OpTypeBool</strong>.</p>
+</div>
+<div class="paragraph">
+<p>Otherwise, for the <strong>GroupNonUniformArithmetic</strong> scan and reduction instructions, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Scalars of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half</code>, <code>float</code>, and <code>double</code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the <strong>GroupNonUniformArithmetic</strong> scan and reduction instructions, the optional <em>ClusterSize</em> operand must not be present.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_cl_khr_subgroup_shuffle"><a class="anchor" href="#_cl_khr_subgroup_shuffle"></a>5.2.18. <code>cl_khr_subgroup_shuffle</code></h4>
+<div class="paragraph">
+<p>If the OpenCL environment supports the extension <code>cl_khr_subgroup_shuffle</code>, then the environment must accept SPIR-V modules that declare the following SPIR-V capabilities:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>GroupNonUniformShuffle</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For instructions requiring these capabilities, <em>Scope</em> for <em>Execution</em> may be:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Subgroup</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the instructions <strong>OpGroupNonUniformShuffle</strong> and <strong>OpGroupNonUniformShuffleXor</strong> requiring these capabilities, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Scalars of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half</code>, <code>float</code>, and <code>double</code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_cl_khr_subgroup_shuffle_relative"><a class="anchor" href="#_cl_khr_subgroup_shuffle_relative"></a>5.2.19. <code>cl_khr_subgroup_shuffle_relative</code></h4>
+<div class="paragraph">
+<p>If the OpenCL environment supports the extension <code>cl_khr_subgroup_shuffle_relative</code>, then the environment must accept SPIR-V modules that declare the following SPIR-V capabilities:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>GroupNonUniformShuffleRelative</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For instructions requiring these capabilities, <em>Scope</em> for <em>Execution</em> may be:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Subgroup</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For the <strong>GroupNonUniformShuffleRelative</strong> instructions, valid types for <em>Value</em> are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Scalars of supported types:</p>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>OpTypeInt</strong> (equivalent to <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>)</p>
+</li>
+<li>
+<p><strong>OpTypeFloat</strong> (equivalent to <code>half</code>, <code>float</code>, and <code>double</code>)</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_cl_khr_subgroup_clustered_reduce"><a class="anchor" href="#_cl_khr_subgroup_clustered_reduce"></a>5.2.20. <code>cl_khr_subgroup_clustered_reduce</code></h4>
+<div class="paragraph">
+<p>If the OpenCL environment supports the extension <code>cl_khr_subgroup_clustered_reduce</code>, then the environment must accept SPIR-V modules that declare the following SPIR-V capabilities:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>GroupNonUniformClustered</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>For instructions requiring these capabilities, <em>Scope</em> for <em>Execution</em> may be:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>Subgroup</strong></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>When the <strong>GroupNonUniformClustered</strong> capability is declared, the <strong>GroupNonUniformArithmetic</strong> scan and reduction instructions may include the optional <em>ClusterSize</em> operand.</p>
+</div>
+</div>
 </div>
 <div class="sect2">
 <h3 id="_embedded_profile_extensions"><a class="anchor" href="#_embedded_profile_extensions"></a>5.3. Embedded Profile Extensions</h3>
@@ -2620,6 +3003,12 @@
 The reference value used to compute the ULP value is the infinitely precise
 result.</p>
 </div>
+<div class="paragraph">
+<p>Result overflow within the specified ULP error is permitted. Math instructions
+are allowed to return infinity for a finite reference value when the next
+floating-point number that would be representable after the finite maximum, if
+there was sufficient range, meets ULP error tolerance.</p>
+</div>
 <div class="sect3">
 <h4 id="_ulp_values_for_math_instructions_full_profile"><a class="anchor" href="#_ulp_values_for_math_instructions_full_profile"></a>6.6.1. ULP Values for Math Instructions - Full Profile</h4>
 <div class="paragraph">
@@ -6269,8 +6658,8 @@
 </div>
 <div id="footer">
 <div id="footer-text">
-Version v3.0.5<br>
-Last updated 2020-09-28 14:46:56 -0700
+Version v3.0.6<br>
+Last updated 2020-12-08 14:02:39 -0800
 </div>
 </div>
 
diff --git a/specs/3.0-unified/html/OpenCL_Ext.html b/specs/3.0-unified/html/OpenCL_Ext.html
index 956f4f3..8a4c4e0 100644
--- a/specs/3.0-unified/html/OpenCL_Ext.html
+++ b/specs/3.0-unified/html/OpenCL_Ext.html
@@ -841,9 +841,9 @@
 <h1>The OpenCL<sup>&#8482;</sup> Extension Specification</h1>
 <div class="details">
 <span id="author" class="author">Khronos<sup>&#174;</sup> OpenCL Working Group</span><br>
-<span id="revnumber">version v3.0.5,</span>
-<span id="revdate">Wed, 30 Sep 2020 00:00:00 +0000</span>
-<br><span id="revremark">from git branch: master commit: 4d8a36725aa8af9658ab5cb62fdbf52adb44bcca</span>
+<span id="revnumber">version v3.0.6,</span>
+<span id="revdate">Fri, 18 Dec 2020 12:00:00 +0000</span>
+<br><span id="revremark">from git branch: master commit: e9a4d468b1a0a38c1e10b8af484bb2bbb495e2b7</span>
 </div>
 <div id="toc" class="toc2">
 <div id="toctitle">Table of Contents</div>
@@ -884,7 +884,9 @@
 <li><a href="#cl_khr_extended_async_copies">34. Extended Async Copies (Provisional)</a></li>
 <li><a href="#cl_khr_async_work_group_copy_fence">35. Async Work Group Copy Fence (Provisional)</a></li>
 <li><a href="#cl_khr_device_uuid">36. Unique Device Identifiers</a></li>
-<li><a href="#spirv_extensions">37. Extensions to the OpenCL SPIR-V Environment</a></li>
+<li><a href="#cl_khr_extended_versioning">37. Extended versioning</a></li>
+<li><a href="#_extended_subgroup_functions">38. Extended Subgroup Functions</a></li>
+<li><a href="#spirv_extensions">39. Extensions to the OpenCL SPIR-V Environment</a></li>
 <li><a href="#_extensions_promoted_to_core_features">Appendix A: Extensions Promoted to Core Features</a></li>
 <li><a href="#_deprecated_extensions">Appendix B: Deprecated Extensions</a></li>
 <li><a href="#_quick_reference">Appendix C: Quick Reference</a></li>
@@ -17499,7 +17501,1904 @@
 </div>
 </div>
 <div class="sect1">
-<h2 id="spirv_extensions"><a class="anchor" href="#spirv_extensions"></a>37. Extensions to the OpenCL SPIR-V Environment</h2>
+<h2 id="cl_khr_extended_versioning"><a class="anchor" href="#cl_khr_extended_versioning"></a>37. Extended versioning</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This extension introduces new platform and device queries that return detailed
+version information to applications. It makes it possible to return the exact
+revision of the specification or intermediate languages supported by an
+implementation. It also enables implementations to communicate a version
+number for each of the extensions they support and remove the requirement
+for applications to process strings to test for the presence of an extension or
+intermediate language or built-in kernel.</p>
+</div>
+<div class="sect2">
+<h3 id="_general_information_36"><a class="anchor" href="#_general_information_36"></a>37.1. General information</h3>
+<div class="sect3">
+<h4 id="_name_strings"><a class="anchor" href="#_name_strings"></a>37.1.1. Name Strings</h4>
+<div class="paragraph">
+<p><code>cl_khr_extended_versioning</code></p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_contributors"><a class="anchor" href="#_contributors"></a>37.1.2. Contributors</h4>
+<div class="paragraph">
+<p>Kévin Petit, Arm Ltd.<br>
+Ben Ashbaugh, Intel<br>
+Alastair Murray, Codeplay Software Ltd.<br>
+Einar Hov, Arm Ltd.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_version_history_36"><a class="anchor" href="#_version_history_36"></a>37.1.3. Version history</h4>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 60%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Date</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Version</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2020-02-12</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1.0.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Initial version.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect3">
+<h4 id="_dependencies"><a class="anchor" href="#_dependencies"></a>37.1.4. Dependencies</h4>
+<div class="paragraph">
+<p>This extension is written against the OpenCL Specification
+Version 2.2, Revision 11.</p>
+</div>
+<div class="paragraph">
+<p>This extension requires OpenCL 1.0.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_new_api_types_2"><a class="anchor" href="#_new_api_types_2"></a>37.2. New API Types</h3>
+<div class="sect3">
+<h4 id="_version"><a class="anchor" href="#_version"></a>37.2.1. Version</h4>
+<div class="paragraph">
+<p>This extension introduces a new scheme to encode detailed
+(major, minor, patch/revision) version information into a single 32-bit unsigned
+integer:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>The major version is using bits 31-22</p>
+</li>
+<li>
+<p>The minor version is using bits 21-12</p>
+</li>
+<li>
+<p>The patch version is using bits 11-0</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>This scheme enables two versions to be ordered using the standard C/C++ operators.
+Macros are provided to extract individual fields or compose a full version
+from the individual fields.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="keyword">typedef</span> cl_uint cl_version_khr;
+
+<span class="preprocessor">#define</span> CL_VERSION_MAJOR_BITS_KHR (<span class="integer">10</span>)
+<span class="preprocessor">#define</span> CL_VERSION_MINOR_BITS_KHR (<span class="integer">10</span>)
+<span class="preprocessor">#define</span> CL_VERSION_PATCH_BITS_KHR (<span class="integer">12</span>)
+
+<span class="preprocessor">#define</span> CL_VERSION_MAJOR_MASK_KHR ((<span class="integer">1</span> &lt;&lt; CL_VERSION_MAJOR_BITS_KHR) - <span class="integer">1</span>)
+<span class="preprocessor">#define</span> CL_VERSION_MINOR_MASK_KHR ((<span class="integer">1</span> &lt;&lt; CL_VERSION_MINOR_BITS_KHR) - <span class="integer">1</span>)
+<span class="preprocessor">#define</span> CL_VERSION_PATCH_MASK_KHR ((<span class="integer">1</span> &lt;&lt; CL_VERSION_PATCH_BITS_KHR) - <span class="integer">1</span>)
+
+<span class="preprocessor">#define</span> CL_VERSION_MAJOR_KHR(version) \
+        ((version) &gt;&gt; (CL_VERSION_MINOR_BITS_KHR + CL_VERSION_PATCH_BITS_KHR))
+<span class="preprocessor">#define</span> CL_VERSION_MINOR_KHR(version) \
+        (((version) &gt;&gt; CL_VERSION_PATCH_BITS_KHR) &amp; CL_VERSION_MINOR_MASK_KHR)
+<span class="preprocessor">#define</span> CL_VERSION_PATCH_KHR(version) ((version) &amp; CL_VERSION_PATCH_MASK_KHR)
+
+<span class="preprocessor">#define</span> CL_MAKE_VERSION_KHR(major, minor, patch) \
+    ((((major) &amp; CL_VERSION_MAJOR_MASK_KHR) &lt;&lt; (CL_VERSION_MINOR_BITS_KHR + CL_VERSION_PATCH_BITS_KHR)) | \
+     (((minor) &amp; CL_VERSION_MINOR_MASK_KHR) &lt;&lt; CL_VERSION_PATCH_BITS_KHR) | \
+     ((patch) &amp; CL_VERSION_PATCH_MASK_KHR))</code></pre>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_name_and_version"><a class="anchor" href="#_name_and_version"></a>37.2.2. Name and version</h4>
+<div class="paragraph">
+<p>This extension adds a structure that can be used to describe a combination of a
+name alongside a version number:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="preprocessor">#define</span> CL_NAME_VERSION_MAX_NAME_SIZE_KHR <span class="integer">64</span>
+
+<span class="keyword">typedef</span> <span class="keyword">struct</span> _cl_name_version_khr {
+    cl_version_khr version;
+    <span class="predefined-type">char</span> name[CL_NAME_VERSION_MAX_NAME_SIZE_KHR];
+} cl_name_version_khr;</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The <code>name</code> field is an array of <code>CL_NAME_VERSION_MAX_NAME_SIZE_KHR</code> bytes used as
+storage for a NUL-terminated string whose maximum length is therefore
+<code>CL_NAME_VERSION_MAX_NAME_SIZE_KHR - 1</code>.</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_new_api_enums"><a class="anchor" href="#_new_api_enums"></a>37.3. New API Enums</h3>
+<div class="paragraph">
+<p>Accepted value for the <em>param_name</em> parameter to <strong>clGetPlatformInfo</strong>:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">CL_PLATFORM_NUMERIC_VERSION_KHR
+CL_PLATFORM_EXTENSIONS_WITH_VERSION_KHR</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Accepted value for the <em>param_name</em> parameter to <strong>clGetDeviceInfo</strong>:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">CL_DEVICE_NUMERIC_VERSION_KHR
+CL_DEVICE_OPENCL_C_NUMERIC_VERSION_KHR
+CL_DEVICE_EXTENSIONS_WITH_VERSION_KHR
+CL_DEVICE_ILS_WITH_VERSION_KHR
+CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION_KHR</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_modifications_to_the_opencl_api_specification"><a class="anchor" href="#_modifications_to_the_opencl_api_specification"></a>37.4. Modifications to the OpenCL API Specification</h3>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">(Modify Section 4.1, <strong>Querying Platform Info</strong>) </dt>
+<dd>
+<div class="openblock">
+<div class="content">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">(Add the following to Table 3, <em>List of supported param_names by clGetPlatformInfo</em>) </dt>
+</dl>
+</div>
+</div>
+</div>
+</dd>
+</dl>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 37.5%;">
+<col style="width: 25%;">
+<col style="width: 37.5%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">cl_platform_info</th>
+<th class="tableblock halign-left valign-top">Return Type</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>CL_PLATFORM_NUMERIC_VERSION_KHR</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_version_khr</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns detailed (major, minor, patch) numeric version information. The major
+  and minor version numbers returned must match those returned via
+  <code>CL_PLATFORM_VERSION</code>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>CL_PLATFORM_EXTENSIONS_WITH_VERSION_KHR</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_name_version_khr[]</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns an array of description (name and version) structures. The same
+  extension name must not be reported more than once. The list of extensions
+  reported must match the list reported via <code>CL_PLATFORM_EXTENSIONS</code>.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="dlist">
+<dl>
+<dt class="hdlist1">(Modify Section 4.2, <strong>Querying Devices</strong>) </dt>
+<dd>
+<div class="openblock">
+<div class="content">
+<div class="dlist">
+<dl>
+<dt class="hdlist1">(Add the following to Table 5, <em>List of supported param_names by clGetDeviceInfo</em>) </dt>
+</dl>
+</div>
+</div>
+</div>
+</dd>
+</dl>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 37.5%;">
+<col style="width: 25%;">
+<col style="width: 37.5%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">cl_device_info</th>
+<th class="tableblock halign-left valign-top">Return Type</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>CL_DEVICE_NUMERIC_VERSION_KHR</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_version_khr</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns detailed (major, minor, patch) numeric version information. The major
+  and minor version numbers returned must match those returned via
+  <code>CL_DEVICE_VERSION</code>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>CL_DEVICE_OPENCL_C_NUMERIC_VERSION_KHR</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_version_khr</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns detailed (major, minor, patch) numeric version information. The major
+  and minor version numbers returned must match those returned via
+  <code>CL_DEVICE_OPENCL_C_VERSION</code>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>CL_DEVICE_EXTENSIONS_WITH_VERSION_KHR</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_name_version_khr[]</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns an array of description (name and version) structures. The same
+  extension name must not be reported more than once. The list of extensions
+  reported must match the list reported via <code>CL_DEVICE_EXTENSIONS</code>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>CL_DEVICE_ILS_WITH_VERSION_KHR</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_name_version_khr[]</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns an array of descriptions (name and version) for all supported
+  Intermediate Languages. Intermediate Languages with the same name may be
+  reported more than once but each name and major/minor version combination
+  may only be reported once. The list of intermediate languages reported must
+  match the list reported via <code>CL_DEVICE_IL_VERSION</code>.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION_KHR</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>cl_name_version_khr[]</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns an array of descriptions for the built-in kernels supported by the device.
+  Each built-in kernel may only be reported once. The list of reported kernels must
+  match the list returned via <code>CL_DEVICE_BUILT_IN_KERNELS</code>.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="openblock">
+<div class="content">
+
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_conformance_tests"><a class="anchor" href="#_conformance_tests"></a>37.5. Conformance tests</h3>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Each of the new queries described in this extension must be attempted and
+succeed.</p>
+</li>
+<li>
+<p>It must be verified that the information returned by all queries that
+extend existing queries is consistent with the information returned
+by existing queries.</p>
+</li>
+<li>
+<p>Some of the queries introduced by this extension impose uniqueness constraints
+on the list of returned values. It must be verified that these constraints are
+satisfied.</p>
+</li>
+</ol>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_issues"><a class="anchor" href="#_issues"></a>37.6. Issues</h3>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>What compatibility policy should we define? e.g. a <em>revision</em> has to be
+backwards-compatible with previous ones</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: No general rules as that wouldn&#8217;t be testable. Here&#8217;s a recommended policy:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Patch version bump: only clarifications and small/obvious bugfixes.</p>
+</li>
+<li>
+<p>Minor version bump: backwards-compatible changes only.</p>
+</li>
+<li>
+<p>Major version bump: backwards compatibility may break.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Do we want versioning for built-in kernels as returned by <code>CL_DEVICE_BUILT_IN_KERNELS</code>?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: No immediate use-case for versioning but being able to get a list of
+  individual kernels without parsing a string is desirable. Adding
+  <code>CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION_KHR</code>.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>What is the behaviour of the queries that return an array of structures when
+there are no elements to return?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: The query succeeds and the size returned is zero.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>What value should be returned when version information is not available?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: If a patch version is not available, it should be reported as 0.
+            If no version information is available, 0.0.0 should be reported.
+            These values have been chosen as they are guaranteed to be lower
+            than or equal to any other version.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Should we add a query to report SPIR-V extended instruction sets?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: It is unlikely that we will introduce many SPIR-V extended
+            instruction sets without an accompanying API extension. Decided
+            not to do this.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Should the queries for which the old-style query doesn&#8217;t exist in a given
+OpenCL version be present (e.g. <code>CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION_KHR</code>
+prior to OpenCL 2.1 or without support for <code>cl_khr_il_program</code> or
+<code>CL_DEVICE_OPENCL_C_NUMERIC_VERSION_KHR</code> on OpenCL 1.0)?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: All the queries are always present.
+            <code>CL_DEVICE_BUILT_IN_KERNELS_WITH_VERSION_KHR</code> returns an empty set
+            when Intermediate Languages are not supported.
+            <code>CL_DEVICE_OPENCL_C_NUMERIC_VERSION_KHR</code> always returns 1.0 on an
+            OpenCL 1.0 platform.</p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Is reporting multiple Intermediate Languages with the same name and major/minor
+versions but differing patch versions allowed?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: No. This isn&#8217;t aligned with the intended use for patch versions and
+            makes it harder for implementations to guarantee consistency with
+            the existing IL queries.</p>
+</div>
+</div>
+</div>
+</li>
+</ol>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_extended_subgroup_functions"><a class="anchor" href="#_extended_subgroup_functions"></a>38. Extended Subgroup Functions</h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="extended-subgroups"><a class="anchor" href="#extended-subgroups"></a>38.1. Overview</h3>
+<div class="paragraph">
+<p>This section describes a family of extensions that provide extended subgroup functionality.
+The extensions in this family are:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><code>cl_khr_subgroup_extended_types</code></p>
+</li>
+<li>
+<p><code>cl_khr_subgroup_non_uniform_vote</code></p>
+</li>
+<li>
+<p><code>cl_khr_subgroup_ballot</code></p>
+</li>
+<li>
+<p><code>cl_khr_subgroup_non_uniform_arithmetic</code></p>
+</li>
+<li>
+<p><code>cl_khr_subgroup_shuffle</code></p>
+</li>
+<li>
+<p><code>cl_khr_subgroup_shuffle_relative</code></p>
+</li>
+<li>
+<p><code>cl_khr_subgroup_clustered_reduce</code></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The functionality added by these extensions includes:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Additional data type support for subgroup broadcast, scan, and reduction functions;</p>
+</li>
+<li>
+<p>The ability to elect a single work item from a subgroup to perform a task;</p>
+</li>
+<li>
+<p>The ability to hold votes among work items in a subgroup;</p>
+</li>
+<li>
+<p>The ability to collect and operate on ballots from work items in the subgroup;</p>
+</li>
+<li>
+<p>The ability to use some subgroup functions, such as any, all, broadcasts, scans, and reductions within non-uniform flow control;</p>
+</li>
+<li>
+<p>Additional scan and reduction operators;</p>
+</li>
+<li>
+<p>Additional ways to exchange data among work items in a subgroup;</p>
+</li>
+<li>
+<p>Clustered reductions, that operate on a subset of work items in the subgroup.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>This section describes changes to the OpenCL C Language for these extensions.
+There are no new API functions or enums added by these extensions.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_general_information_37"><a class="anchor" href="#_general_information_37"></a>38.2. General information</h3>
+<div class="sect3">
+<h4 id="_version_history_37"><a class="anchor" href="#_version_history_37"></a>38.2.1. Version history</h4>
+<div class="paragraph">
+<p>For all of the extensions described in this section:</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 60%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Date</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Version</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2020-12-15</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1.0.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">First assigned version.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="extended-subgroups-summary"><a class="anchor" href="#extended-subgroups-summary"></a>38.3. Summary of New OpenCL C Functions</h3>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="comment">// These functions are available to devices supporting</span>
+<span class="comment">// cl_khr_subgroup_extended_types:</span>
+
+<span class="comment">// Note: Existing functions supporting additional data types.</span>
+
+gentype sub_group_broadcast( gentype value, uint index )
+
+gentype sub_group_reduce_add( gentype value )
+gentype sub_group_reduce_min( gentype value )
+gentype sub_group_reduce_max( gentype value )
+
+gentype sub_group_scan_inclusive_add( gentype value )
+gentype sub_group_scan_inclusive_min( gentype value )
+gentype sub_group_scan_inclusive_max( gentype value )
+
+gentype sub_group_scan_exclusive_add( gentype value )
+gentype sub_group_scan_exclusive_min( gentype value )
+gentype sub_group_scan_exclusive_max( gentype value )
+
+<span class="comment">// These functions are available to devices supporting</span>
+<span class="comment">// cl_khr_subgroup_non_uniform_vote:</span>
+
+<span class="predefined-type">int</span> sub_group_elect()
+
+<span class="predefined-type">int</span> sub_group_non_uniform_all( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_any( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_all_equal( gentype value )
+
+<span class="comment">// These functions are available to devices supporting</span>
+<span class="comment">// cl_khr_subgroup_ballot:</span>
+
+gentype sub_group_non_uniform_broadcast( gentype value, uint index )
+gentype sub_group_broadcast_first( gentype value )
+
+uint4 sub_group_ballot( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span>   sub_group_inverse_ballot( uint4 value )
+<span class="predefined-type">int</span>   sub_group_ballot_bit_extract( uint4 value, uint index )
+uint  sub_group_ballot_bit_count( uint4 value )
+uint  sub_group_ballot_inclusive_scan( uint4 value )
+uint  sub_group_ballot_exclusive_scan( uint4 value )
+uint  sub_group_ballot_find_lsb( uint4 value )
+uint  sub_group_ballot_find_msb( uint4 value )
+
+uint4 get_sub_group_eq_mask()
+uint4 get_sub_group_ge_mask()
+uint4 get_sub_group_gt_mask()
+uint4 get_sub_group_le_mask()
+uint4 get_sub_group_lt_mask()
+
+<span class="comment">// These functions are available to devices supporting</span>
+<span class="comment">// cl_khr_subgroup_non_uniform_arithmetic:</span>
+
+gentype sub_group_non_uniform_reduce_add( gentype value )
+gentype sub_group_non_uniform_reduce_mul( gentype value )
+gentype sub_group_non_uniform_reduce_min( gentype value )
+gentype sub_group_non_uniform_reduce_max( gentype value )
+gentype sub_group_non_uniform_reduce_and( gentype value )
+gentype sub_group_non_uniform_reduce_or( gentype value )
+gentype sub_group_non_uniform_reduce_xor( gentype value )
+<span class="predefined-type">int</span>     sub_group_non_uniform_reduce_logical_and( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span>     sub_group_non_uniform_reduce_logical_or( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span>     sub_group_non_uniform_reduce_logical_xor( <span class="predefined-type">int</span> predicate )
+
+gentype sub_group_non_uniform_scan_inclusive_add( gentype value )
+gentype sub_group_non_uniform_scan_inclusive_mul( gentype value )
+gentype sub_group_non_uniform_scan_inclusive_min( gentype value )
+gentype sub_group_non_uniform_scan_inclusive_max( gentype value )
+gentype sub_group_non_uniform_scan_inclusive_and( gentype value )
+gentype sub_group_non_uniform_scan_inclusive_or( gentype value )
+gentype sub_group_non_uniform_scan_inclusive_xor( gentype value )
+<span class="predefined-type">int</span>     sub_group_non_uniform_scan_inclusive_logical_and( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span>     sub_group_non_uniform_scan_inclusive_logical_or( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span>     sub_group_non_uniform_scan_inclusive_logical_xor( <span class="predefined-type">int</span> predicate )
+
+gentype sub_group_non_uniform_scan_exclusive_add( gentype value )
+gentype sub_group_non_uniform_scan_exclusive_mul( gentype value )
+gentype sub_group_non_uniform_scan_exclusive_min( gentype value )
+gentype sub_group_non_uniform_scan_exclusive_max( gentype value )
+gentype sub_group_non_uniform_scan_exclusive_and( gentype value )
+gentype sub_group_non_uniform_scan_exclusive_or( gentype value )
+gentype sub_group_non_uniform_scan_exclusive_xor( gentype value )
+<span class="predefined-type">int</span>     sub_group_non_uniform_scan_exclusive_logical_and( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span>     sub_group_non_uniform_scan_exclusive_logical_or( <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span>     sub_group_non_uniform_scan_exclusive_logical_xor( <span class="predefined-type">int</span> predicate )
+
+<span class="comment">// These functions are available to devices supporting</span>
+<span class="comment">// cl_khr_subgroup_shuffle:</span>
+
+gentype sub_group_shuffle( gentype value, uint index )
+gentype sub_group_shuffle_xor( gentype value, uint mask )
+
+<span class="comment">// These functions are available to devices supporting</span>
+<span class="comment">// cl_khr_subgroup_shuffle_relative:</span>
+
+gentype sub_group_shuffle_up( gentype value, uint delta )
+gentype sub_group_shuffle_down( gentype value, uint delta )
+
+<span class="comment">// These functions are available to devices supporting</span>
+<span class="comment">// cl_khr_subgroup_clustered_reduce:</span>
+
+gentype sub_group_clustered_reduce_add( gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_mul( gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_min( gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_max( gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_and( gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_or( gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_xor( gentype value, uint clustersize )
+<span class="predefined-type">int</span>     sub_group_clustered_reduce_logical_and( <span class="predefined-type">int</span> predicate, uint clustersize )
+<span class="predefined-type">int</span>     sub_group_clustered_reduce_logical_or( <span class="predefined-type">int</span> predicate, uint clustersize )
+<span class="predefined-type">int</span>     sub_group_clustered_reduce_logical_xor( <span class="predefined-type">int</span> predicate, uint clustersize )</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="cl_khr_subgroup_extended_types"><a class="anchor" href="#cl_khr_subgroup_extended_types"></a>38.4. Extended Types</h3>
+<div class="paragraph">
+<p>This section describes functionality added by <code>cl_khr_subgroup_extended_types</code>.
+This extension adds additional supported data types to the existing subgroup broadcast, scan, and reduction functions.</p>
+</div>
+<div class="sect3">
+<h4 id="_modify_the_existing_section_describing_subgroup_functions"><a class="anchor" href="#_modify_the_existing_section_describing_subgroup_functions"></a>38.4.1. Modify the Existing Section Describing Subgroup Functions</h4>
+<div class="paragraph">
+<p>Modify the first paragraph in this section that describes <code>gentype</code> type support for the subgroup <code>broadcast</code>, <code>scan</code>, and <code>reduction</code> functions to add scalar <code>char</code>, <code>uchar</code>, <code>short</code>, and <code>ushort</code> support, and to additionally add built-in vector type support for <code>broadcast</code> specifically.
+The functions in the table and their descriptions remain unchanged by this extension:</p>
+</div>
+<div class="paragraph">
+<p>The table below describes OpenCL C programming language built-in functions that operate on a subgroup level.
+These built-in functions must be encountered by all work items in the subgroup executing the kernel.
+We use the generic type name <code>gentype</code> to indicate the built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, <code>double</code> (if double precision is supported), or <code>half</code> (if half precision is supported).</p>
+</div>
+<div class="paragraph">
+<p>For the <code>sub_group_broadcast</code> function, the generic type name <code>gentype</code> may additionally be one of the supported built-in vector data types <code>char<em>n</em></code>, <code>uchar<em>n</em></code>, <code>short<em>n</em></code>, <code>ushort<em>n</em></code>, <code>int<em>n</em></code>, <code>uint<em>n</em></code>, <code>long<em>n</em></code>, <code>ulong<em>n</em></code>, <code>float<em>n</em></code>, <code>double<em>n</em></code> (if double precision is supported), or <code>half<em>n</em></code> (if half precision is supported).</p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="cl_khr_subgroup_non_uniform_vote"><a class="anchor" href="#cl_khr_subgroup_non_uniform_vote"></a>38.5. Votes and Elections</h3>
+<div class="paragraph">
+<p>This section describes functionality added by <code>cl_khr_subgroup_non_uniform_vote</code>.
+This extension adds the ability to elect a single work item from a subgroup to perform a task and to hold votes among work items in a subgroup.</p>
+</div>
+<div class="sect3">
+<h4 id="_add_a_new_section_6_15_x_subgroup_vote_and_elect_built_in_functions"><a class="anchor" href="#_add_a_new_section_6_15_x_subgroup_vote_and_elect_built_in_functions"></a>38.5.1. Add a new Section 6.15.X - Subgroup Vote and Elect Built-in Functions</h4>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions to elect a single work item in a subgroup to perform a task and to collectively vote to determine a boolean condition for the subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For the functions below, the generic type name <code>gentype</code> may be the one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, <code>double</code> (if double precision is supported), or <code>half</code> (if half precision is supported).</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_elect()</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Elects a single work item in the subgroup to perform a task.
+This function will return true (nonzero) for the active work item in the subgroup with the smallest subgroup local ID, and false (zero) for all other active work items in the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_non_uniform_all(
+    <span class="predefined-type">int</span> predicate )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Examines <em>predicate</em> for all active work items in the subgroup and returns a non-zero value if <em>predicate</em> is non-zero for all active work items in the subgroup and zero otherwise.</p>
+<p class="tableblock">Note: This behavior is the same as <code>sub_group_all</code> from <code>cl_khr_subgroups</code> and OpenCL 2.1, except this function need not be encountered by all work items in the subgroup executing the kernel.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_non_uniform_any(
+    <span class="predefined-type">int</span> predicate )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Examines <em>predicate</em> for all active work items in the subgroup and returns a non-zero value if <em>predicate</em> is non-zero for any active work item in the subgroup and zero otherwise.</p>
+<p class="tableblock">Note: This behavior is the same as <code>sub_group_any</code> from <code>cl_khr_subgroups</code> and OpenCL 2.1, except this function need not be encountered by all work items in the subgroup executing the kernel.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_non_uniform_all_equal(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Examines <em>value</em> for all active work items in the subgroup and returns a non-zero value if <em>value</em> is equivalent for all active invocations in the subgroup and zero otherwise.</p>
+<p class="tableblock">Integer types use a bitwise test for equality.  Floating-point types use an ordered floating-point test for equality.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="cl_khr_subgroup_ballot"><a class="anchor" href="#cl_khr_subgroup_ballot"></a>38.6. Ballots</h3>
+<div class="paragraph">
+<p>This section describes functionality added by <code>cl_khr_subgroup_ballot</code>.
+This extension adds the ability to collect and operate on ballots from work items in the subgroup.</p>
+</div>
+<div class="sect3">
+<h4 id="_add_a_new_section_6_15_x_subgroup_ballot_built_in_functions"><a class="anchor" href="#_add_a_new_section_6_15_x_subgroup_ballot_built_in_functions"></a>38.6.1. Add a new Section 6.15.X - Subgroup Ballot Built-in Functions</h4>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions to allow work items in a subgroup to collect and operate on ballots from work items in the subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.</p>
+</div>
+<div class="paragraph">
+<p>For the <code>sub_group_non_uniform_broadcast</code> and <code>sub_group_broadcast_first</code> functions, the generic type name <code>gentype</code> may be one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, <code>double</code> (if double precision is supported), or <code>half</code> (if half precision is supported).</p>
+</div>
+<div class="paragraph">
+<p>For the <code>sub_group_non_uniform_broadcast</code> function, the generic type name <code>gentype</code> may additionally be one of the supported built-in vector data types <code>char<em>n</em></code>, <code>uchar<em>n</em></code>, <code>short<em>n</em></code>, <code>ushort<em>n</em></code>, <code>int<em>n</em></code>, <code>uint<em>n</em></code>, <code>long<em>n</em></code>, <code>ulong<em>n</em></code>, <code>float<em>n</em></code>, <code>double<em>n</em></code> (if double precision is supported), or <code>half<em>n</em></code> (if half precision is supported).</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_non_uniform_broadcast(
+    gentype value,
+    uint index )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns <em>value</em> for the work item with subgroup local ID equal to <em>index</em>.</p>
+<p class="tableblock">Behavior is undefined when the value of <em>index</em> is not equivalent for all active work items in the subgroup.</p>
+<p class="tableblock">The return value is undefined if the work item with subgroup local ID equal to <em>index</em> is inactive or if <em>index</em> is greater than or equal to the size of the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_broadcast_first(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns <em>value</em> for the work item with the smallest subgroup local ID among active work items in the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint4 sub_group_ballot(
+    <span class="predefined-type">int</span> predicate )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns a bitfield combining the <em>predicate</em> values from all work items in the subgroup.
+Bit zero of the first vector component represents the subgroup local ID zero, with higher-order bits and subsequent vector components representing, in order, increasing subgroup local IDs.
+The representative bit in the bitfield is set if the work item is active and the <em>predicate</em> is non-zero, and is unset otherwise.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_inverse_ballot(
+    uint4 value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the predicate value for this work item in the subgroup from the bitfield <em>value</em> representing predicate values from all work items in the subgroup.
+The predicate return value will be non-zero if the bit in the bitfield <em>value</em> for this work item is set, and zero otherwise.</p>
+<p class="tableblock">Behavior is undefined when <em>value</em> is not equivalent for all active work items in the subgroup.</p>
+<p class="tableblock">This is a specialized function that may perform better than the equivalent <code>sub_group_ballot_bit_extract</code> on some implementations.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_ballot_bit_extract(
+    uint4 value,
+    uint index )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the predicate value for the work item with subgroup local ID equal to <em>index</em> from the bitfield <em>value</em> representing predicate values from all work items in the subgroup.
+The predicate return value will be non-zero if the bit in the bitfield <em>value</em> for the work item with subgroup local ID equal to <em>index</em> is set, and zero otherwise.</p>
+<p class="tableblock">The predicate return value is undefined if the work item with subgroup local ID equal to <em>index</em> is greater than or equal to the size of the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint sub_group_ballot_bit_count(
+    uint4 value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of bits that are set in the bitfield <em>value</em>, only considering the bits in <em>value</em> that represent predicate values from all work items in the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint sub_group_ballot_inclusive_scan(
+    uint4 value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of bits that are set in the bitfield <em>value</em>, only considering the bits in <em>value</em> representing work items with a subgroup local ID less than or equal to this work item&#8217;s subgroup local ID.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint sub_group_ballot_exclusive_scan(
+    uint4 value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the number of bits that are set in the bitfield <em>value</em>, only considering the bits in <em>value</em> representing work items with a subgroup local ID less than this work item&#8217;s subgroup local ID.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint sub_group_ballot_find_lsb(
+    uint4 value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the smallest subgroup local ID with a bit set in the bitfield <em>value</em>, only considering the bits in <em>value</em> that represent predicate values from all work items in the subgroup.
+If no bits representing predicate values from all work items in the subgroup are set in the bitfield <em>value</em> then the return value is undefined.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint sub_group_ballot_find_msb(
+    uint4 value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the largest subgroup local ID with a bit set in the bitfield <em>value</em>, only considering the bits in <em>value</em> that represent predicate values from all work items in the subgroup.
+If no bits representing predicate values from all work items in the subgroup are set in the bitfield <em>value</em> then the return value is undefined.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint4 get_sub_group_eq_mask()</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Generates a bitmask of all work items in the subgroup, where the bit is set in the bitmask if the bit index equals the subgroup local ID and unset otherwise.
+Bit zero of the first vector component represents the subgroup local ID zero, with higher-order bits and subsequent vector components representing, in order, increasing subgroup local IDs.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint4 get_sub_group_ge_mask()</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Generates a bitmask of all work items in the subgroup, where the bit is set in the bitmask if the bit index is greater than or equal to the subgroup local ID and less than the maximum subgroup size, and unset otherwise.
+Bit zero of the first vector component represents the subgroup local ID zero, with higher-order bits and subsequent vector components representing, in order, increasing subgroup local IDs.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint4 get_sub_group_gt_mask()</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Generates a bitmask of all work items in the subgroup, where the bit is set in the bitmask if the bit index is greater than the subgroup local ID and less than the maximum subgroup size, and unset otherwise.
+Bit zero of the first vector component represents the subgroup local ID zero, with higher-order bits and subsequent vector components representing, in order, increasing subgroup local IDs.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint4 get_sub_group_le_mask()</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Generates a bitmask of all work items in the subgroup, where the bit is set in the bitmask if the bit index is less than or equal to the subgroup local ID and unset otherwise.
+Bit zero of the first vector component represents the subgroup local ID zero, with higher-order bits and subsequent vector components representing, in order, increasing subgroup local IDs.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">uint4 get_sub_group_lt_mask()</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Generates a bitmask of all work items in the subgroup, where the bit is set in the bitmask if the bit index is less than the subgroup local ID and unset otherwise.
+Bit zero of the first vector component represents the subgroup local ID zero, with higher-order bits and subsequent vector components representing, in order, increasing subgroup local IDs.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="cl_khr_subgroup_non_uniform_arithmetic"><a class="anchor" href="#cl_khr_subgroup_non_uniform_arithmetic"></a>38.7. Non-Uniform Arithmetic</h3>
+<div class="paragraph">
+<p>This section describes functionality added by <code>cl_khr_subgroup_non_uniform_arithmetic</code>.
+This extension adds the ability to use some subgroup functions within non-uniform flow control, including additional scan and reduction operators.</p>
+</div>
+<div class="sect3">
+<h4 id="_add_a_new_section_6_15_x_non_uniform_subgroup_scan_and_reduction_built_in_functions"><a class="anchor" href="#_add_a_new_section_6_15_x_non_uniform_subgroup_scan_and_reduction_built_in_functions"></a>38.7.1. Add a new Section 6.15.X - Non Uniform Subgroup Scan and Reduction Built-in Functions</h4>
+<div class="sect4">
+<h5 id="_arithmetic_operations"><a class="anchor" href="#_arithmetic_operations"></a>38.7.1.1. Arithmetic Operations</h5>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions that perform simple arithmetic operations across work items in a subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For the functions below, the generic type name <code>gentype</code> may be one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, <code>double</code> (if double precision is supported), or <code>half</code> (if half precision is supported).</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 60%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_non_uniform_reduce_add(
+    gentype value )
+gentype sub_group_non_uniform_reduce_min(
+    gentype value )
+gentype sub_group_non_uniform_reduce_max(
+    gentype value )
+gentype sub_group_non_uniform_reduce_mul(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the summation, multiplication, minimum, or maximum of <em>value</em> for all active work items in the subgroup.</p>
+<p class="tableblock">Note: This behavior is the same as the <strong>add</strong>, <strong>min</strong>, and <strong>max</strong> reduction built-in functions from <code>cl_khr_subgroups</code> and OpenCL 2.1, except these functions support additional types and need not be encountered by all work items in the subgroup executing the kernel.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_non_uniform_scan_inclusive_add(
+    gentype value )
+gentype sub_group_non_uniform_scan_inclusive_min(
+    gentype value )
+gentype sub_group_non_uniform_scan_inclusive_max(
+    gentype value )
+gentype sub_group_non_uniform_scan_inclusive_mul(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the result of an inclusive scan operation, which is the summation, multiplication, minimum, or maximum of <em>value</em> for all active work items in the subgroup with a subgroup local ID less than or equal to this work item&#8217;s subgroup local ID.</p>
+<p class="tableblock">Note: This behavior is the same as the <strong>add</strong>, <strong>min</strong>, and <strong>max</strong> inclusive scan built-in functions from <code>cl_khr_subgroups</code> and OpenCL 2.1, except these functions support additional types and need not be encountered by all work items in the subgroup executing the kernel.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_non_uniform_scan_exclusive_add(
+    gentype value )
+gentype sub_group_non_uniform_scan_exclusive_min(
+    gentype value )
+gentype sub_group_non_uniform_scan_exclusive_max(
+    gentype value )
+gentype sub_group_non_uniform_scan_exclusive_mul(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the result of an exclusive scan operation, which is the summation, multiplication, minimum, or maximum of <em>value</em> for all active work items in the subgroup with a subgroup local ID less than this work item&#8217;s subgroup local ID.</p>
+<p class="tableblock">If there is no active work item in the subgroup with a subgroup local ID less than this work item&#8217;s subgroup local ID then an identity value <code>I</code> is returned.
+For <strong>add</strong>, the identity value is <code>0</code>.
+For <strong>min</strong>, the identity value is the largest representable value for integer types, or <code>+INF</code> for floating point types.
+For <strong>max</strong>, the identity value is the minimum representable value for integer types, or <code>-INF</code> for floating point types.
+For <strong>mul</strong>, the identity value is <code>1</code>.</p>
+<p class="tableblock">Note: This behavior is the same as the <strong>add</strong>, <strong>min</strong>, and <strong>max</strong> exclusive scan built-in functions from <code>cl_khr_subgroups</code> and OpenCL 2.1, except these functions support additional types and need not be encountered by all work items in the subgroup executing the kernel.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Note: The order of floating-point operations is not guaranteed for the subgroup scan and reduction built-in functions that operate on floating point types, and the order of operations may additionally be non-deterministic for a given subgroup.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_bitwise_operations"><a class="anchor" href="#_bitwise_operations"></a>38.7.1.2. Bitwise Operations</h5>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions that perform simple bitwise integer operations across work items in a subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For the functions below, the generic type name <code>gentype</code> may be one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, and <code>ulong</code>.</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 60%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_non_uniform_reduce_and(
+    gentype value )
+gentype sub_group_non_uniform_reduce_or(
+    gentype value )
+gentype sub_group_non_uniform_reduce_xor(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the bitwise <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>value</em> for all active work items in the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_non_uniform_scan_inclusive_and(
+    gentype value )
+gentype sub_group_non_uniform_scan_inclusive_or(
+    gentype value )
+gentype sub_group_non_uniform_scan_inclusive_xor(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the result of an inclusive scan operation, which is the bitwise <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>value</em> for all active work items in the subgroup with a subgroup local ID less than or equal to this work item&#8217;s subgroup local ID.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_non_uniform_scan_exclusive_and(
+    gentype value )
+gentype sub_group_non_uniform_scan_exclusive_or(
+    gentype value )
+gentype sub_group_non_uniform_scan_exclusive_xor(
+    gentype value )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the result of an exclusive scan operation, which is the bitwise <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>value</em> for all active work items in the subgroup with a subgroup local ID less than this work item&#8217;s subgroup local ID.</p>
+<p class="tableblock">If there is no active work item in the subgroup with a subgroup local ID less than this work item&#8217;s subgroup local ID then an identity value <code>I</code> is returned.
+For <strong>and</strong>, the identity value is <code>~0</code> (all bits set).
+For <strong>or</strong> and <strong>xor</strong>, the identity value is <code>0</code>.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect4">
+<h5 id="_logical_operations"><a class="anchor" href="#_logical_operations"></a>38.7.1.3. Logical Operations</h5>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions that perform simple logical operations across work items in a subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For these functions, a non-zero <em>predicate</em> argument or return value is logically <code>true</code> and a zero <em>predicate</em> argument or return value is logically <code>false</code>.</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 66.6666%;">
+<col style="width: 33.3334%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_non_uniform_reduce_logical_and(
+    <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_reduce_logical_or(
+    <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_reduce_logical_xor(
+    <span class="predefined-type">int</span> predicate )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the logical <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>predicate</em> for all active work items in the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_non_uniform_scan_inclusive_logical_and(
+    <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_scan_inclusive_logical_or(
+    <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_scan_inclusive_logical_xor(
+    <span class="predefined-type">int</span> predicate )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the result of an inclusive scan operation, which is the logical <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>predicate</em> for all active work items in the subgroup with a subgroup local ID less than or equal to this work item&#8217;s subgroup local ID.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_non_uniform_scan_exclusive_logical_and(
+    <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_scan_exclusive_logical_or(
+    <span class="predefined-type">int</span> predicate )
+<span class="predefined-type">int</span> sub_group_non_uniform_scan_exclusive_logical_xor(
+    <span class="predefined-type">int</span> predicate )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the result of an exclusive scan operation, which is the logical <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>predicate</em> for all active work items in the subgroup with a subgroup local ID less than this work item&#8217;s subgroup local ID.</p>
+<p class="tableblock">If there is no active work item in the subgroup with a subgroup local ID less than this work item&#8217;s subgroup local ID then an identity value <code>I</code> is returned.
+For <strong>and</strong>, the identity value is <code>true</code> (non-zero).
+For <strong>or</strong> and <strong>xor</strong>, the identity value is <code>false</code> (zero).</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="cl_khr_subgroup_shuffle"><a class="anchor" href="#cl_khr_subgroup_shuffle"></a>38.8. General Purpose Shuffles</h3>
+<div class="paragraph">
+<p>This section describes functionality added by <code>cl_khr_subgroup_shuffle</code>.
+This extension adds additional ways to exchange data among work items in a subgroup.</p>
+</div>
+<div class="sect3">
+<h4 id="_add_a_new_section_6_15_x_subgroup_shuffle_built_in_functions"><a class="anchor" href="#_add_a_new_section_6_15_x_subgroup_shuffle_built_in_functions"></a>38.8.1. Add a new Section 6.15.X - Subgroup Shuffle Built-in Functions</h4>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions that allow work items in a subgroup to exchange data.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For the functions below, the generic type name <code>gentype</code> may be one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, <code>double</code> (if double precision is supported), or <code>half</code> (if half precision is supported).</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_shuffle(
+    gentype value, uint index )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns <em>value</em> for the work item with subgroup local ID equal to <em>index</em>.
+The shuffle <em>index</em> need not be the same for all work items in the subgroup.</p>
+<p class="tableblock">The return value is undefined if the work item with subgroup local ID equal to <em>index</em> is inactive or if <em>index</em> is greater than or equal to the size of the subgroup.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_shuffle_xor(
+    gentype value, uint mask )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns <em>value</em> for the work item with subgroup local ID equal to this work item&#8217;s subgroup local ID xor&#8217;d with <em>mask</em>.
+The shuffle <em>mask</em> need not be the same for all work items in the subgroup.</p>
+<p class="tableblock">The return value is undefined if the work item with subgroup local ID equal to the calculated index is inactive or if the calculated index is greater than or equal to the size of the subgroup.</p>
+<p class="tableblock">This is a specialized function that may perform better than the equivalent <code>sub_group_shuffle</code> on some implementations.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="cl_khr_subgroup_shuffle_relative"><a class="anchor" href="#cl_khr_subgroup_shuffle_relative"></a>38.9. Relative Shuffles</h3>
+<div class="paragraph">
+<p>This section describes functionality added by <code>cl_khr_subgroup_shuffle_relative</code>.
+This extension adds specialized ways to exchange data among work items in a subgroup that may perform better on some implementations.</p>
+</div>
+<div class="sect3">
+<h4 id="_add_a_new_section_6_15_x_subgroup_relative_shuffle_built_in_functions"><a class="anchor" href="#_add_a_new_section_6_15_x_subgroup_relative_shuffle_built_in_functions"></a>38.9.1. Add a new Section 6.15.X - Subgroup Relative Shuffle Built-in Functions</h4>
+<div class="paragraph">
+<p>The table below describes specialized OpenCL C programming language built-in functions that allow work items in a subgroup to exchange data.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For the functions below, the generic type name <code>gentype</code> may be one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, <code>double</code> (if double precision is supported), or <code>half</code> (if half precision is supported).</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_shuffle_up(
+    gentype value, uint delta )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns <em>value</em> for the work item with subgroup local ID equal to this work item&#8217;s subgroup local ID minus <em>delta</em>.
+The shuffle <em>delta</em> need not be the same for all work items in the subgroup.</p>
+<p class="tableblock">The return value is undefined if the work item with subgroup local ID equal to the calculated index is inactive, or <em>delta</em> is greater than this work item&#8217;s subgroup local ID.</p>
+<p class="tableblock">This is a specialized function that may perform better than the equivalent <code>sub_group_shuffle</code> on some implementations.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_shuffle_down(
+    gentype value, uint delta )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns <em>value</em> for the work item with subgroup local ID equal to this work item&#8217;s subgroup local ID plus <em>delta</em>.
+The shuffle <em>delta</em> need not be the same for all work items in the subgroup.</p>
+<p class="tableblock">The return value is undefined if the work item with subgroup local ID equal to the calculated index is inactive, or this work item&#8217;s subgroup local ID plus <em>delta</em> is greater than or equal to the size of the subgroup.</p>
+<p class="tableblock">This is a specialized function that may perform better than the equivalent <code>sub_group_shuffle</code> on some implementations.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="cl_khr_subgroup_clustered_reduce"><a class="anchor" href="#cl_khr_subgroup_clustered_reduce"></a>38.10. Clustered Reductions</h3>
+<div class="paragraph">
+<p>This section describes functionality added by <code>cl_khr_subgroup_clustered_reduce</code>.
+This extension adds support for clustered reductions that operate on a subset of work items in the subgroup.</p>
+</div>
+<div class="sect3">
+<h4 id="_add_a_new_section_6_15_x_subgroup_clustered_reduction_built_in_functions"><a class="anchor" href="#_add_a_new_section_6_15_x_subgroup_clustered_reduction_built_in_functions"></a>38.10.1. Add a new Section 6.15.X - Subgroup Clustered Reduction Built-in Functions</h4>
+<div class="paragraph">
+<p>This section describes arithmetic operations that are performed subset of work items in a subgroup, referred to as a cluster.
+A cluster is described by a specified cluster size.
+Work items in a subgroup are assigned to clusters such that for cluster size <em>n</em>, the <em>n</em> work items in the subgroup with the smallest subgroup local IDs are assigned to the first cluster, then the <em>n</em> remaining work items with the smallest subgroup local IDs are assigned to the next cluster, and so on.
+The specified cluster size must be an integer constant expression that is a power-of-two.
+Behavior is undefined if the specified cluster size is greater than the maximum size of a subgroup within the dispatch.</p>
+</div>
+<div class="sect4">
+<h5 id="_arithmetic_operations_2"><a class="anchor" href="#_arithmetic_operations_2"></a>38.10.1.1. Arithmetic Operations</h5>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions that perform simple arithmetic operations on a cluster of work items in a subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For the functions below, the generic type name <code>gentype</code> may be one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>, <code>double</code> (if double precision is supported), or <code>half</code> (if half precision is supported).</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_clustered_reduce_add(
+    gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_mul(
+    gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_min(
+    gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_max(
+    gentype value, uint clustersize )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the summation, multiplication, minimum, or maximum of <em>value</em> for all active work items in the subgroup within a cluster of the specified <em>clustersize</em>.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Note: The order of floating-point operations is not guaranteed for the subgroup clustered reduction built-in functions that operate on floating point types, and the order of operations may additionally be non-deterministic for a given subgroup.</p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="_bitwise_operations_2"><a class="anchor" href="#_bitwise_operations_2"></a>38.10.1.2. Bitwise Operations</h5>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions to perform simple bitwise integer operations across a cluster of work items in a subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For the functions below, the generic type name <code>gentype</code> may be the one of the supported built-in scalar data types <code>char</code>, <code>uchar</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, or <code>ulong</code>.</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c">gentype sub_group_clustered_reduce_and(
+    gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_or(
+    gentype value, uint clustersize )
+gentype sub_group_clustered_reduce_xor(
+    gentype value, uint clustersize )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the bitwise <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>value</em> for all active work items in the subgroup within a cluster of the specified <em>clustersize</em>.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect4">
+<h5 id="_logical_operations_2"><a class="anchor" href="#_logical_operations_2"></a>38.10.1.3. Logical Operations</h5>
+<div class="paragraph">
+<p>The table below describes the OpenCL C programming language built-in functions to perform simple logical operations across a cluster of work items in a subgroup.
+These functions need not be encountered by all work items in a subgroup executing the kernel.
+For these functions, a non-zero <em>predicate</em> argument or return value is logically <code>true</code> and a zero <em>predicate</em> argument or return value is logically <code>false</code>.</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 60%;">
+<col style="width: 40%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="c"><span class="predefined-type">int</span> sub_group_clustered_reduce_logical_and(
+    <span class="predefined-type">int</span> predicate, uint clustersize )
+<span class="predefined-type">int</span> sub_group_clustered_reduce_logical_or(
+    <span class="predefined-type">int</span> predicate, uint clustersize )
+<span class="predefined-type">int</span> sub_group_clustered_reduce_logical_xor(
+    <span class="predefined-type">int</span> predicate, uint clustersize )</code></pre>
+</div>
+</div></div></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returns the logical <strong>and</strong>, <strong>or</strong>, or <strong>xor</strong> of <em>predicate</em> for all active work items in the subgroup within a cluster of the specified <em>clustersize</em>.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="extended-subgroups-mapping"><a class="anchor" href="#extended-subgroups-mapping"></a>38.11. Function Mapping and Capabilities</h3>
+<div class="paragraph">
+<p>This section describes a possible mapping between OpenCL built-in functions and SPIR-V instructions and required SPIR-V capabilities.</p>
+</div>
+<div class="paragraph">
+<p>This section is informational and non-normative.</p>
+</div>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 33.3333%;">
+<col style="width: 33.3333%;">
+<col style="width: 33.3334%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>OpenCL C Function</strong></th>
+<th class="tableblock halign-left valign-top"><strong>SPIR-V BuiltIn or Instruction</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Enabling SPIR-V Capability</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For OpenCL 2.1 or <code>cl_khr_subgroups</code>:</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;size</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupSize</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;max_&#8203;sub_&#8203;group_&#8203;size</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupMaxSize</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;num_&#8203;sub_&#8203;groups</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>NumSubgroups</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;enqueued_&#8203;num_&#8203;sub_&#8203;groups</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>NumEnqueuedSubgroups</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;id</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupId</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;local_&#8203;id</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupLocalInvocationId</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Kernel</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;barrier</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpControlBarrier</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">None Needed</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;all</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupAll</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;any</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupAny</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;broadcast</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupBroadcast</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupIAdd</strong>, <strong>OpGroupFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMin</strong>, <strong>OpGroupUMin</strong>, <strong>OpGroupFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMax</strong>, <strong>OpGroupUMax</strong>, <strong>OpGroupFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;exclusive_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupIAdd</strong>, <strong>OpGroupFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;exclusive_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMin</strong>, <strong>OpGroupUMin</strong>, <strong>OpGroupFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;exclusive_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMax</strong>, <strong>OpGroupUMax</strong>, <strong>OpGroupFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;inclusive_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupIAdd</strong>, <strong>OpGroupFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;inclusive_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMin</strong>, <strong>OpGroupUMin</strong>, <strong>OpGroupFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;inclusive_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMax</strong>, <strong>OpGroupUMax</strong>, <strong>OpGroupFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reserve_&#8203;read_&#8203;pipe</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupReserveReadPipePackets</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Pipes</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reserve_&#8203;write_&#8203;pipe</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupReserveReadWritePackets</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Pipes</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;commit_&#8203;read_&#8203;pipe</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupCommitReadPipe</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Pipes</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;commit_&#8203;write_&#8203;pipe</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupCommitWritePipe</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Pipes</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;kernel_&#8203;sub_&#8203;group_&#8203;count_&#8203;for_&#8203;ndrange</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGetKernelNDrangeSubGroupCount</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>DeviceEnqueue</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;kernel_&#8203;max_&#8203;sub_&#8203;group_&#8203;size_&#8203;for_&#8203;ndrange</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGetKernelNDrangeMaxSubGroupSize</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>DeviceEnqueue</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For <code>cl_khr_subgroup_extended_types</code>:<br>
+Note: This extension adds new types to uniform subgroup operations.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;broadcast</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupBroadcast</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupIAdd</strong>, <strong>OpGroupFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMin</strong>, <strong>OpGroupUMin</strong>, <strong>OpGroupFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMax</strong>, <strong>OpGroupUMax</strong>, <strong>OpGroupFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;exclusive_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupIAdd</strong>, <strong>OpGroupFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;exclusive_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMin</strong>, <strong>OpGroupUMin</strong>, <strong>OpGroupFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;exclusive_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMax</strong>, <strong>OpGroupUMax</strong>, <strong>OpGroupFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;inclusive_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupIAdd</strong>, <strong>OpGroupFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;inclusive_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMin</strong>, <strong>OpGroupUMin</strong>, <strong>OpGroupFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;scan_&#8203;inclusive_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupSMax</strong>, <strong>OpGroupUMax</strong>, <strong>OpGroupFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Groups</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For <code>cl_khr_subgroup_non_uniform_vote</code>:</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;elect</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformElect</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniform</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;all</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformAll</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformVote</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;any</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformAny</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformVote</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;all_&#8203;equal</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformAllEqual</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformVote</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For <code>cl_khr_subgroup_ballot</code>:</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;broadcast</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBroadcast</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;broadcast_&#8203;first</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBroadcastFirst</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;ballot</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBallot</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;inverse_&#8203;ballot</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformInverseBallot</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;ballot_&#8203;bit_&#8203;extract</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBallotBitExtract</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;ballot_&#8203;bit_&#8203;count</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBallotBitCount</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;ballot_&#8203;inclusive_&#8203;scan</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBallotBitCount</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;ballot_&#8203;exclusive_&#8203;scan</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBallotBitCount</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;ballot_&#8203;find_&#8203;lsb</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBallotFindLSB</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;ballot_&#8203;find_&#8203;msb</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBallotFindMSB</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;eq_&#8203;mask</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupEqMask</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;ge_&#8203;mask</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupGeMask</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;gt_&#8203;mask</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupGtMask</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;le_&#8203;mask</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupLeMask</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>get_&#8203;sub_&#8203;group_&#8203;lt_&#8203;mask</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>SubgroupLtMask</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformBallot</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For <code>cl_khr_subgroup_non_uniform_arithmetic</code>:</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIAdd</strong>, <strong>OpGroupNonUniformFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;mul</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIMul</strong>, <strong>OpGroupNonUniformFMul</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMin</strong>, <strong>OpGroupNonUniformUMin</strong>, <strong>OpGroupNonUniformFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMax</strong>, <strong>OpGroupNonUniformUMax</strong>, <strong>OpGroupNonUniformFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;logical_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;logical_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;reduce_&#8203;logical_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIAdd</strong>, <strong>OpGroupNonUniformFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;mul</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIMul</strong>, <strong>OpGroupNonUniformFMul</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMin</strong>, <strong>OpGroupNonUniformUMin</strong>, <strong>OpGroupNonUniformFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMax</strong>, <strong>OpGroupNonUniformUMax</strong>, <strong>OpGroupNonUniformFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;logical_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;logical_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;inclusive_&#8203;logical_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;exclusive_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIAdd</strong>, <strong>OpGroupNonUniformFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;exclusive_&#8203;mul</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIMul</strong>, <strong>OpGroupNonUniformFMul</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;scan_&#8203;exclusive_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMin</strong>, <strong>OpGroupNonUniformUMin</strong>, <strong>OpGroupNonUniformFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;&#8203;scan_&#8203;exclusive_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMax</strong>, <strong>OpGroupNonUniformUMax</strong>, <strong>OpGroupNonUniformFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;&#8203;scan_&#8203;exclusive_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;&#8203;scan_&#8203;exclusive_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;&#8203;scan_&#8203;exclusive_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;&#8203;scan_&#8203;exclusive_&#8203;logical_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;&#8203;scan_&#8203;exclusive_&#8203;logical_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;non_&#8203;uniform_&#8203;&#8203;scan_&#8203;exclusive_&#8203;logical_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformArithmetic</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For <code>cl_khr_subgroup_shuffle</code>:</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;shuffle</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformShuffle</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformShuffle</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;shuffle_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformShuffleXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformShuffle</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For <code>cl_khr_subgroup_shuffle_relative</code>:</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;shuffle_&#8203;up</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformShuffleUp</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformShuffleRelative</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;shuffle_&#8203;down</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformShuffleDown</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformShuffleRelative</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top" colspan="3"><p class="tableblock">For <code>cl_khr_subgroup_clustered_reduce</code>:</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;add</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIAdd</strong>, <strong>OpGroupNonUniformFAdd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;mul</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformIMul</strong>, <strong>OpGroupNonUniformFMul</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;min</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMin</strong>, <strong>OpGroupNonUniformUMin</strong>, <strong>OpGroupNonUniformFMin</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;max</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformSMax</strong>, <strong>OpGroupNonUniformUMax</strong>, <strong>OpGroupNonUniformFMax</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformBitwiseXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;logical_&#8203;and</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalAnd</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;logical_&#8203;or</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalOr</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>sub_&#8203;group_&#8203;reduce_&#8203;clustered_&#8203;logical_&#8203;xor</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>OpGroupNonUniformLogicalXor</strong></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>GroupNonUniformClustered</strong></p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="spirv_extensions"><a class="anchor" href="#spirv_extensions"></a>39. Extensions to the OpenCL SPIR-V Environment</h2>
 <div class="sectionbody">
 <div class="paragraph">
 <p>An OpenCL SPIR-V environment may be modified by OpenCL extensions.
@@ -17684,6 +19583,11 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock">Provisional Extension</p></td>
 </tr>
 <tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_extended_versioning">cl_khr_extended_versioning</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extend versioning of platform, devices, extensions, etc.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
+</tr>
+<tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_fp16">cl_khr_fp16</a></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Operations on 16-bit Floating-Point Values</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
@@ -17794,14 +19698,49 @@
 <td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
 </tr>
 <tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroups">cl_khr_subgroups</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Sub-Groupings of Work Items</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Core Feature in OpenCL 2.1 (with minor changes)</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_ballot">cl_khr_subgroup_ballot</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Exchange Ballots Among Sub-Groupings of Work Items</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_clustered_reduce">cl_khr_subgroup_clustered_reduce</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Clustered Reductions for Sub-Groupings of Work Items</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_extended_types">cl_khr_subgroup_extended_types</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Additional Type Support for Sub-Group Functions</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
+</tr>
+<tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_named_barrier">cl_khr_subgroup_named_barrier</a></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Barriers for Subsets of a Work Group</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
 </tr>
 <tr>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroups">cl_khr_subgroups</a></p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Sub-Groupings of Work Items</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Core Feature in OpenCL 2.1 (with minor changes)</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_non_uniform_arithmetic">cl_khr_subgroup_non_uniform_arithmetic</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Sub-Group Arithmetic Functions in Non-Uniform Control Flow</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_non_uniform_vote">cl_khr_subgroup_non_uniform_vote</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hold Votes Among Sub-Groupings of Work Items</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_shuffle">cl_khr_subgroup_shuffle</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">General-Purpose Shuffles Among Sub-Groupings of Work Items</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_subgroup_shuffle_relative">cl_khr_subgroup_shuffle_relative</a></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Relative Shuffles Among Sub-Groupings of Work Items</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Extension</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="#cl_khr_terminate_context">cl_khr_terminate_context</a></p></td>
@@ -17820,8 +19759,8 @@
 </div>
 <div id="footer">
 <div id="footer-text">
-Version v3.0.5<br>
-Last updated 2020-09-29 09:40:19 -0700
+Version v3.0.6<br>
+Last updated 2020-12-18 09:20:25 -0800
 </div>
 </div>
 
diff --git a/specs/3.0-unified/pdf/OpenCL_API.pdf b/specs/3.0-unified/pdf/OpenCL_API.pdf
index ca42dd1..a3cbfb5 100644
--- a/specs/3.0-unified/pdf/OpenCL_API.pdf
+++ b/specs/3.0-unified/pdf/OpenCL_API.pdf
Binary files differ
diff --git a/specs/3.0-unified/pdf/OpenCL_C.pdf b/specs/3.0-unified/pdf/OpenCL_C.pdf
index 9da1b86..70b71fd 100644
--- a/specs/3.0-unified/pdf/OpenCL_C.pdf
+++ b/specs/3.0-unified/pdf/OpenCL_C.pdf
Binary files differ
diff --git a/specs/3.0-unified/pdf/OpenCL_Env.pdf b/specs/3.0-unified/pdf/OpenCL_Env.pdf
index cee7944..a07337e 100644
--- a/specs/3.0-unified/pdf/OpenCL_Env.pdf
+++ b/specs/3.0-unified/pdf/OpenCL_Env.pdf
Binary files differ
diff --git a/specs/3.0-unified/pdf/OpenCL_Ext.pdf b/specs/3.0-unified/pdf/OpenCL_Ext.pdf
index 38010a8..ab5b09e 100644
--- a/specs/3.0-unified/pdf/OpenCL_Ext.pdf
+++ b/specs/3.0-unified/pdf/OpenCL_Ext.pdf
Binary files differ