| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook MathML Module V1.1b1//EN" |
| "http://www.oasis-open.org/docbook/xml/mathml/1.1CR1/dbmathml.dtd"> |
| <refentry> |
| <refentryinfo> |
| <keywordset> |
| <keyword> |
| Function Qualifiers |
| </keyword> |
| </keywordset> |
| </refentryinfo> |
| |
| <refmeta> |
| <refentrytitle>Function Qualifiers</refentrytitle> |
| |
| <refmiscinfo> |
| <copyright> |
| <year>2007-2009</year> |
| <holder>The Khronos Group Inc. |
| Permission is hereby granted, free of charge, to any person obtaining a |
| copy of this software and/or associated documentation files (the |
| "Materials"), to deal in the Materials without restriction, including |
| without limitation the rights to use, copy, modify, merge, publish, |
| distribute, sublicense, and/or sell copies of the Materials, and to |
| permit persons to whom the Materials are furnished to do so, subject to |
| the condition that this copyright notice and permission notice shall be included |
| in all copies or substantial portions of the Materials.</holder> |
| </copyright> |
| </refmiscinfo> |
| <manvolnum>2</manvolnum> |
| </refmeta> |
| |
| <!-- ================================ SYNOPSIS --> |
| |
| <refnamediv id="Function Qualifiers"> |
| <refname>Function Qualifiers</refname> |
| |
| <refpurpose> |
| Qualifiers for kernel functions. |
| </refpurpose> |
| </refnamediv> |
| |
| |
| <!-- ALTERNATIVE SYNTAX SYNOPSIS (NON-FUNCTION) --> |
| <refsect2 id="synopsis"> |
| <title> |
| </title> |
| |
| <informaltable frame="none"> |
| <tgroup cols="1" align="left" colsep="0" rowsep="0"> |
| <colspec colname="col1" colnum="1" /> |
| <tbody> |
| <row> |
| <entry> |
| __kernel |
| kernel |
| |
| __attribute__((vec_type_hint(<type<emphasis>n</emphasis>>))) |
| __attribute__((work_group_size_hint(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>))) |
| __attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>))) |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </informaltable> |
| </refsect2> |
| |
| |
| <!-- ================================ DESCRIPTION --> |
| |
| <refsect1 id="description"><title>Description</title> |
| <para> |
| The <function>__kernel</function> (or <function>kernel</function>) qualifier |
| declares a function to be a kernel that can be |
| executed by an application on an OpenCL device(s). |
| The following rules apply to functions that |
| are declared with this qualifier: |
| </para> |
| |
| <itemizedlist mark='bullet'> |
| <listitem> |
| <para> |
| It can be executed on the device only |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| It can be called by the host |
| </para> |
| </listitem> |
| <listitem> |
| <para> |
| It is just a regular function call if a <function>__kernel</function> |
| function is called by another kernel function. |
| </para> |
| </listitem> |
| </itemizedlist> |
| |
| <para> |
| The <function>__kernel</function> qualifier can be used with the keyword |
| <citerefentry href="attribute"><refentrytitle>__attribute__</refentrytitle></citerefentry> to declare additional |
| information about the kernel function as described below. |
| </para> |
| |
| <para> |
| The optional |
| <constant>__attribute__((vec_type_hint(<type<emphasis>n</emphasis>>)))</constant> |
| is a hint to the |
| compiler and is intended to be a representation of the computational |
| <emphasis>width</emphasis> of the |
| <function>__kernel</function>, |
| and should serve as the basis for calculating processor |
| bandwidth utilization when the compiler |
| is looking to autovectorize the code. |
| <constant>vec_type_hint (<type<emphasis>n</emphasis>>)</constant> |
| shall be one of the built-in scalar or vector data type described in |
| tables 6.1 and 6.2. |
| If |
| <constant>vec_type_hint (<type<emphasis>n</emphasis>>)</constant> |
| is not specified, the default value is <type>int</type>. |
| </para> |
| |
| <para> |
| The |
| <constant>__attribute__((vec_type_hint(int)))</constant> |
| is the default type. |
| </para> |
| |
| <para> |
| For example, where the developer specified a width of <type>float4</type>, |
| the compiler should assume |
| that the computation usually uses up 4 lanes of a float vector, |
| and would decide to merge work-items or possibly even separate |
| one work-item into many threads to better match the hardware |
| capabilities. A conforming implementation is not required |
| to autovectorize code, but shall |
| support the hint. A compiler may autovectorize, even if no |
| hint is provided. If an |
| implementation merges <constant>N</constant> work-items into one thread, |
| it is responsible for correctly handling |
| cases where the number of global or local work-items |
| in any dimension modulo <constant>N</constant> is not zero. |
| </para> |
| |
| <para> |
| If for example, a <function>__kernel</function> is declared with |
| <constant>__attribute__(( vec_type_hint (float4)))</constant> |
| (meaning that most operations in the <function>__kernel</function> |
| are explicitly vectorized using |
| <type>float4</type>) and the kernel is running using |
| Intel® Advanced Vector Instructions |
| (Intel® AVX) |
| which implements a 8-float-wide vector unit, |
| the autovectorizer might choose to merge two |
| work-items to one thread, running a second |
| work-item in the high half of the 256-bit AVX |
| register. |
| </para> |
| |
| <para> |
| As another example, a Power4 machine has two scalar |
| double precision floating-point units with |
| an 6-cycle deep pipe. An autovectorizer for the |
| Power4 machine might choose to interleave six |
| <constant>__attribute__(( vec_type_hint (double2))) __kernel</constant>s |
| into one hardware |
| thread, to ensure that there is always 12-way |
| parallelism available to saturate the FPUs. It might |
| also choose to merge 4 or 8 work-items (or some |
| other number) if it concludes that these are |
| better choices, due to resource utilization |
| concerns or some preference for divisibility by 2. |
| </para> |
| |
| <para> |
| The optional |
| <constant>__attribute__((work_group_size_hint(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant> |
| is a hint to the |
| compiler and is intended to specify the work-group size |
| that may be used i.e. value most likely to |
| be specified by the <varname>local_work_size</varname> argument to |
| <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>. |
| For example the |
| <constant>__attribute__((work_group_size_hint(1, 1, 1)))</constant> |
| is a hint to the compiler |
| that the kernel will most likely be executed |
| with a work-group size of 1. |
| </para> |
| |
| <para> |
| The optional |
| <constant>__attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant> |
| is the work-group size that must be used as the |
| <varname>local_work_size</varname> argument to |
| <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>. |
| This allows the compiler to optimize the generated |
| code appropriately for this kernel. The optional |
| <constant>__attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant>, |
| if specified, must be (1, 1, 1) if the kernel is executed via |
| <citerefentry><refentrytitle>clEnqueueTask</refentrytitle></citerefentry>. |
| </para> |
| |
| <para> |
| If <varname>Z</varname> is one, the <varname>work_dim</varname> argument to |
| <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry> |
| can be 2 or 3. If <varname>Y</varname> and <varname>Z</varname> are |
| one, the <varname>work_dim</varname> argument to |
| <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry> |
| can be 1, 2 or 3. |
| </para> |
| |
| </refsect1> |
| |
| |
| <!-- ================================ NOTES --> |
| |
| |
| <refsect1 id="notes"><title>Notes</title> |
| <para> |
| Implicit in autovectorization is the assumption that |
| any libraries called from the |
| __kernel must be recompilable at |
| run time to handle cases where the compiler decides to |
| merge or separate workitems. This probably means that such |
| libraries can never be hard coded binaries or that hard |
| coded binaries must be accompanied either by source or some |
| retargetable intermediate representation. This may be |
| a code security question for some. |
| |
| </para> |
| </refsect1> |
| |
| |
| <!-- ================================ EXAMPLE --> |
| |
| <refsect2 id="example1"> |
| <title> |
| Example |
| </title> |
| |
| <informaltable frame="none"> |
| <tgroup cols="1" align="left" colsep="0" rowsep="0"> |
| <colspec colname="col1" colnum="1" /> |
| <tbody> |
| <row> |
| <entry> |
| // autovectorize assuming float4 as the |
| // basic computation width |
| __kernel __attribute__((vec_type_hint(float4))) |
| void foo( __global float4 *p ) { .... |
| |
| // autovectorize assuming double as the |
| // basic computation width |
| __kernel __attribute__((vec_type_hint(double))) |
| void foo( __global float4 *p ){ .... |
| |
| // autovectorize assuming int (default) |
| // as the basic computation width |
| __kernel |
| void foo( __global float4 *p ){ .... |
| </entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </informaltable> |
| </refsect2> |
| |
| |
| |
| <!-- ================================ SPECIFICATION --> |
| <!-- Set the "uri" attribute in the <olink /> element to the "named destination" for the PDF page |
| --> |
| <refsect1 id="specification"><title>Specification</title> |
| <para> |
| <imageobject> |
| <imagedata fileref="pdficon_small1.gif" format="gif" /> |
| </imageobject> |
| |
| <olink uri="functionQualifiers">OpenCL Specification</olink> |
| </para> |
| </refsect1> |
| |
| |
| <!-- ================================ ALSO SEE --> |
| |
| <refsect1 id="seealso"><title>Also see</title> |
| <para> |
| <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry> |
| <citerefentry><refentrytitle>clEnqueueTask</refentrytitle></citerefentry> |
| </para> |
| </refsect1> |
| |
| |
| |
| <!-- ============================== COPYRIGHT --> |
| <!-- Content included from copyright.inc.xsl --> |
| |
| <refsect3 id="Copyright"><title></title> |
| <imageobject> |
| <imagedata fileref="KhronosLogo.jpg" format="jpg" /> |
| </imageobject> |
| <para /> |
| </refsect3> |
| |
| </refentry> |
| |