sdk/1.0/docs/man/functionQualifiers.xml - external/github.com/KhronosGroup/OpenCL-Registry - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook MathML Module V1.1b1//EN"
               "http://www.oasis-open.org/docbook/xml/mathml/1.1CR1/dbmathml.dtd">
 <refentry>
     <refentryinfo>
         <keywordset>
             <keyword>
                 Function Qualifiers
             </keyword>
         </keywordset>
     </refentryinfo>

     <refmeta>
         <refentrytitle>Function Qualifiers</refentrytitle>

         <refmiscinfo>
             <copyright>
                 <year>2007-2009</year>
                 <holder>The Khronos Group Inc.
  Permission is hereby granted, free of charge, to any person obtaining a
 copy of this software and/or associated documentation files (the
 "Materials"), to deal in the Materials without restriction, including
 without limitation the rights to use, copy, modify, merge, publish,
 distribute, sublicense, and/or sell copies of the Materials, and to
 permit persons to whom the Materials are furnished to do so, subject to
 the condition that this copyright notice and permission notice shall be included
 in all copies or substantial portions of the Materials.</holder>
             </copyright>
         </refmiscinfo>
         <manvolnum>2</manvolnum>
     </refmeta>

 <!-- ================================ SYNOPSIS -->

     <refnamediv id="Function Qualifiers">
         <refname>Function Qualifiers</refname>

         <refpurpose>
             Qualifiers for kernel functions.
         </refpurpose>
     </refnamediv>


     <!-- ALTERNATIVE SYNTAX SYNOPSIS (NON-FUNCTION) -->
     <refsect2 id="synopsis">
         <title>
         </title>

         <informaltable frame="none">
             <tgroup cols="1" align="left" colsep="0" rowsep="0">
                 <colspec colname="col1" colnum="1" />
                 <tbody>
                     <row>
                         <entry>
 __kernel
 kernel

 __attribute__((vec_type_hint(&lt;type<emphasis>n</emphasis>&gt;)))
 __attribute__((work_group_size_hint(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))
 __attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))
                         </entry>
                     </row>
                 </tbody>
             </tgroup>
         </informaltable>
     </refsect2>


 <!-- ================================ DESCRIPTION  -->

     <refsect1 id="description"><title>Description</title>
         <para>
 The <function>__kernel</function> (or <function>kernel</function>) qualifier
 declares a function to be a kernel that can be
 executed by an application on an OpenCL device(s).
 The following rules apply to functions that
 are declared with this qualifier:
         </para>

         <itemizedlist mark='bullet'>
             <listitem>
                <para>
 It can be executed on the device only
                </para>
             </listitem>
             <listitem>
                <para>
 It can be called by the host
                </para>
             </listitem>
             <listitem>
                <para>
 It is just a regular function call if a <function>__kernel</function>
 function is called by another kernel function.
                </para>
             </listitem>
         </itemizedlist>

         <para>
 The <function>__kernel</function> qualifier can be used with the keyword
 <citerefentry href="attribute"><refentrytitle>__attribute__</refentrytitle></citerefentry> to declare additional
 information about the kernel function as described below.
         </para>

         <para>
 The optional
 <constant>__attribute__((vec_type_hint(&lt;type<emphasis>n</emphasis>&gt;)))</constant>
 is a hint to the
 compiler and is intended to be a representation of the computational
 <emphasis>width</emphasis> of the
 <function>__kernel</function>,
 and should serve as the basis for calculating processor
 bandwidth utilization when the compiler
 is looking to autovectorize the code.
 <constant>vec_type_hint (&lt;type<emphasis>n</emphasis>&gt;)</constant>
 shall be one of the built-in scalar or vector data type described in
 tables 6.1 and 6.2.
 If
 <constant>vec_type_hint (&lt;type<emphasis>n</emphasis>&gt;)</constant>
 is not specified, the default value is <type>int</type>.
         </para>

         <para>
 The
 <constant>__attribute__((vec_type_hint(int)))</constant>
 is the default type.
         </para>

         <para>
 For example, where the developer specified a width of <type>float4</type>,
 the compiler should assume
 that the computation usually uses up 4 lanes of a float vector,
 and would decide to merge work-items or possibly even separate
 one work-item into many threads to better match the hardware
 capabilities. A conforming implementation is not required
 to autovectorize code, but shall
 support the hint. A compiler may autovectorize, even if no
 hint is provided. If an
 implementation merges <constant>N</constant> work-items into one thread,
 it is responsible for correctly handling
 cases where the number of global or local work-items
 in any dimension modulo <constant>N</constant> is not zero.
         </para>

         <para>
 If for example, a <function>__kernel</function> is declared with
 <constant>__attribute__(( vec_type_hint (float4)))</constant>
 (meaning that most operations in the <function>__kernel</function>
 are explicitly vectorized using
 <type>float4</type>) and the kernel is running using
 Intel&reg; Advanced Vector Instructions
 (Intel&reg; AVX)
 which implements a 8-float-wide vector unit,
 the autovectorizer might choose to merge two
 work-items to one thread, running a second
 work-item in the high half of the 256-bit AVX
 register.
         </para>

         <para>
 As another example, a Power4 machine has two scalar
 double precision floating-point units with
 an 6-cycle deep pipe. An autovectorizer for the
 Power4 machine might choose to interleave six
 <constant>__attribute__(( vec_type_hint (double2))) __kernel</constant>s
 into one hardware
 thread, to ensure that there is always 12-way
 parallelism available to saturate the FPUs. It might
 also choose to merge 4 or 8 work-items (or some
 other number) if it concludes that these are
 better choices, due to resource utilization
 concerns or some preference for divisibility by 2.
         </para>

         <para>
 The optional
 <constant>__attribute__((work_group_size_hint(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant>
 is a hint to the
 compiler and is intended to specify the work-group size
 that may be used i.e. value most likely to
 be specified by the <varname>local_work_size</varname> argument to
 <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>.
 For example the
 <constant>__attribute__((work_group_size_hint(1, 1, 1)))</constant>
 is a hint to the compiler
 that the kernel will most likely be executed
 with a work-group size of 1.
         </para>

         <para>
 The optional
 <constant>__attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant>
 is the work-group size that must be used as the
 <varname>local_work_size</varname> argument to
 <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>.
 This allows the compiler to optimize the generated
 code appropriately for this kernel. The optional
 <constant>__attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant>,
 if specified, must be (1, 1, 1) if the kernel is executed via
 <citerefentry><refentrytitle>clEnqueueTask</refentrytitle></citerefentry>.
         </para>

         <para>
 If <varname>Z</varname> is one, the <varname>work_dim</varname> argument to
 <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>
 can be 2 or 3. If <varname>Y</varname> and <varname>Z</varname> are
 one, the <varname>work_dim</varname> argument to
 <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>
 can be 1, 2 or 3.
         </para>

     </refsect1>


 <!-- ================================ NOTES  -->


     <refsect1 id="notes"><title>Notes</title>
         <para>
 Implicit in autovectorization is the assumption that
 any libraries called from the
 __kernel must be recompilable at
 run time to handle cases where the compiler decides to
 merge or separate workitems. This probably means that such
 libraries can never be hard coded binaries or that hard
 coded binaries must be accompanied either by source or some
 retargetable intermediate representation. This may be
 a code security question for some.

         </para>
     </refsect1>


 <!-- ================================ EXAMPLE  -->

     <refsect2 id="example1">
         <title>
             Example
         </title>

         <informaltable frame="none">
             <tgroup cols="1" align="left" colsep="0" rowsep="0">
                 <colspec colname="col1" colnum="1" />
                 <tbody>
                     <row>
                         <entry>
 // autovectorize assuming float4 as the
 // basic computation width
 __kernel __attribute__((vec_type_hint(float4)))
 void foo( __global float4 *p ) { ....

 // autovectorize assuming double as the
 // basic computation width
 __kernel __attribute__((vec_type_hint(double)))
 void foo( __global float4 *p ){ ....

 // autovectorize assuming int (default)
 // as the basic computation width
 __kernel
 void foo( __global float4 *p ){ ....
                         </entry>
                     </row>
                 </tbody>
             </tgroup>
         </informaltable>
     </refsect2>


 <!-- ================================ SPECIFICATION  -->
 <!-- Set the "uri" attribute in the <olink /> element to the "named destination" for the PDF page
 -->
     <refsect1 id="specification"><title>Specification</title>
         <para>
             <imageobject>
                 <imagedata fileref="pdficon_small1.gif" format="gif" />
             </imageobject>

             <olink uri="functionQualifiers">OpenCL Specification</olink>
         </para>
     </refsect1>


 <!-- ================================ ALSO SEE -->

     <refsect1 id="seealso"><title>Also see</title>
         <para>
             <citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>
 			<citerefentry><refentrytitle>clEnqueueTask</refentrytitle></citerefentry>
         </para>
     </refsect1>


 <!-- ============================== COPYRIGHT -->
 <!-- Content included from copyright.inc.xsl -->

     <refsect3 id="Copyright"><title></title>
 		<imageobject>
                 <imagedata fileref="KhronosLogo.jpg" format="jpg" />
         </imageobject>
 		<para />
     </refsect3>

 </refentry>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook MathML Module V1.1b1//EN"
	"http://www.oasis-open.org/docbook/xml/mathml/1.1CR1/dbmathml.dtd">
	<refentry>
	<refentryinfo>
	<keywordset>
	<keyword>
	Function Qualifiers
	</keyword>
	</keywordset>
	</refentryinfo>

	<refmeta>
	<refentrytitle>Function Qualifiers</refentrytitle>

	<refmiscinfo>
	<copyright>
	<year>2007-2009</year>
	<holder>The Khronos Group Inc.
	Permission is hereby granted, free of charge, to any person obtaining a
	copy of this software and/or associated documentation files (the
	"Materials"), to deal in the Materials without restriction, including
	without limitation the rights to use, copy, modify, merge, publish,
	distribute, sublicense, and/or sell copies of the Materials, and to
	permit persons to whom the Materials are furnished to do so, subject to
	the condition that this copyright notice and permission notice shall be included
	in all copies or substantial portions of the Materials.</holder>
	</copyright>
	</refmiscinfo>
	<manvolnum>2</manvolnum>
	</refmeta>

	<!-- ================================ SYNOPSIS -->

	<refnamediv id="Function Qualifiers">
	<refname>Function Qualifiers</refname>

	<refpurpose>
	Qualifiers for kernel functions.
	</refpurpose>
	</refnamediv>


	<!-- ALTERNATIVE SYNTAX SYNOPSIS (NON-FUNCTION) -->
	<refsect2 id="synopsis">
	<title>
	</title>

	<informaltable frame="none">
	<tgroup cols="1" align="left" colsep="0" rowsep="0">
	<colspec colname="col1" colnum="1" />
	<tbody>
	<row>
	<entry>
	__kernel
	kernel

	__attribute__((vec_type_hint(<type<emphasis>n</emphasis>>)))
	__attribute__((work_group_size_hint(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))
	__attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))
	</entry>
	</row>
	</tbody>
	</tgroup>
	</informaltable>
	</refsect2>


	<!-- ================================ DESCRIPTION -->

	<refsect1 id="description"><title>Description</title>
	<para>
	The <function>__kernel</function> (or <function>kernel</function>) qualifier
	declares a function to be a kernel that can be
	executed by an application on an OpenCL device(s).
	The following rules apply to functions that
	are declared with this qualifier:
	</para>

	<itemizedlist mark='bullet'>
	<listitem>
	<para>
	It can be executed on the device only
	</para>
	</listitem>
	<listitem>
	<para>
	It can be called by the host
	</para>
	</listitem>
	<listitem>
	<para>
	It is just a regular function call if a <function>__kernel</function>
	function is called by another kernel function.
	</para>
	</listitem>
	</itemizedlist>

	<para>
	The <function>__kernel</function> qualifier can be used with the keyword
	<citerefentry href="attribute"><refentrytitle>__attribute__</refentrytitle></citerefentry> to declare additional
	information about the kernel function as described below.
	</para>

	<para>
	The optional
	<constant>__attribute__((vec_type_hint(<type<emphasis>n</emphasis>>)))</constant>
	is a hint to the
	compiler and is intended to be a representation of the computational
	<emphasis>width</emphasis> of the
	<function>__kernel</function>,
	and should serve as the basis for calculating processor
	bandwidth utilization when the compiler
	is looking to autovectorize the code.
	<constant>vec_type_hint (<type<emphasis>n</emphasis>>)</constant>
	shall be one of the built-in scalar or vector data type described in
	tables 6.1 and 6.2.
	If
	<constant>vec_type_hint (<type<emphasis>n</emphasis>>)</constant>
	is not specified, the default value is <type>int</type>.
	</para>

	<para>
	The
	<constant>__attribute__((vec_type_hint(int)))</constant>
	is the default type.
	</para>

	<para>
	For example, where the developer specified a width of <type>float4</type>,
	the compiler should assume
	that the computation usually uses up 4 lanes of a float vector,
	and would decide to merge work-items or possibly even separate
	one work-item into many threads to better match the hardware
	capabilities. A conforming implementation is not required
	to autovectorize code, but shall
	support the hint. A compiler may autovectorize, even if no
	hint is provided. If an
	implementation merges <constant>N</constant> work-items into one thread,
	it is responsible for correctly handling
	cases where the number of global or local work-items
	in any dimension modulo <constant>N</constant> is not zero.
	</para>

	<para>
	If for example, a <function>__kernel</function> is declared with
	<constant>__attribute__(( vec_type_hint (float4)))</constant>
	(meaning that most operations in the <function>__kernel</function>
	are explicitly vectorized using
	<type>float4</type>) and the kernel is running using
	Intel® Advanced Vector Instructions
	(Intel® AVX)
	which implements a 8-float-wide vector unit,
	the autovectorizer might choose to merge two
	work-items to one thread, running a second
	work-item in the high half of the 256-bit AVX
	register.
	</para>

	<para>
	As another example, a Power4 machine has two scalar
	double precision floating-point units with
	an 6-cycle deep pipe. An autovectorizer for the
	Power4 machine might choose to interleave six
	<constant>__attribute__(( vec_type_hint (double2))) __kernel</constant>s
	into one hardware
	thread, to ensure that there is always 12-way
	parallelism available to saturate the FPUs. It might
	also choose to merge 4 or 8 work-items (or some
	other number) if it concludes that these are
	better choices, due to resource utilization
	concerns or some preference for divisibility by 2.
	</para>

	<para>
	The optional
	<constant>__attribute__((work_group_size_hint(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant>
	is a hint to the
	compiler and is intended to specify the work-group size
	that may be used i.e. value most likely to
	be specified by the <varname>local_work_size</varname> argument to
	<citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>.
	For example the
	<constant>__attribute__((work_group_size_hint(1, 1, 1)))</constant>
	is a hint to the compiler
	that the kernel will most likely be executed
	with a work-group size of 1.
	</para>

	<para>
	The optional
	<constant>__attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant>
	is the work-group size that must be used as the
	<varname>local_work_size</varname> argument to
	<citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>.
	This allows the compiler to optimize the generated
	code appropriately for this kernel. The optional
	<constant>__attribute__((reqd_work_group_size(<emphasis>X</emphasis>, <emphasis>Y</emphasis>, <emphasis>Z</emphasis>)))</constant>,
	if specified, must be (1, 1, 1) if the kernel is executed via
	<citerefentry><refentrytitle>clEnqueueTask</refentrytitle></citerefentry>.
	</para>

	<para>
	If <varname>Z</varname> is one, the <varname>work_dim</varname> argument to
	<citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>
	can be 2 or 3. If <varname>Y</varname> and <varname>Z</varname> are
	one, the <varname>work_dim</varname> argument to
	<citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>
	can be 1, 2 or 3.
	</para>

	</refsect1>


	<!-- ================================ NOTES -->


	<refsect1 id="notes"><title>Notes</title>
	<para>
	Implicit in autovectorization is the assumption that
	any libraries called from the
	__kernel must be recompilable at
	run time to handle cases where the compiler decides to
	merge or separate workitems. This probably means that such
	libraries can never be hard coded binaries or that hard
	coded binaries must be accompanied either by source or some
	retargetable intermediate representation. This may be
	a code security question for some.

	</para>
	</refsect1>


	<!-- ================================ EXAMPLE -->

	<refsect2 id="example1">
	<title>
	Example
	</title>

	<informaltable frame="none">
	<tgroup cols="1" align="left" colsep="0" rowsep="0">
	<colspec colname="col1" colnum="1" />
	<tbody>
	<row>
	<entry>
	// autovectorize assuming float4 as the
	// basic computation width
	__kernel __attribute__((vec_type_hint(float4)))
	void foo( __global float4 *p ) { ....

	// autovectorize assuming double as the
	// basic computation width
	__kernel __attribute__((vec_type_hint(double)))
	void foo( __global float4 *p ){ ....

	// autovectorize assuming int (default)
	// as the basic computation width
	__kernel
	void foo( __global float4 *p ){ ....
	</entry>
	</row>
	</tbody>
	</tgroup>
	</informaltable>
	</refsect2>



	<!-- ================================ SPECIFICATION -->
	<!-- Set the "uri" attribute in the <olink /> element to the "named destination" for the PDF page
	-->
	<refsect1 id="specification"><title>Specification</title>
	<para>
	<imageobject>
	<imagedata fileref="pdficon_small1.gif" format="gif" />
	</imageobject>

	<olink uri="functionQualifiers">OpenCL Specification</olink>
	</para>
	</refsect1>


	<!-- ================================ ALSO SEE -->

	<refsect1 id="seealso"><title>Also see</title>
	<para>
	<citerefentry><refentrytitle>clEnqueueNDRangeKernel</refentrytitle></citerefentry>
	<citerefentry><refentrytitle>clEnqueueTask</refentrytitle></citerefentry>
	</para>
	</refsect1>



	<!-- ============================== COPYRIGHT -->
	<!-- Content included from copyright.inc.xsl -->

	<refsect3 id="Copyright"><title></title>
	<imageobject>
	<imagedata fileref="KhronosLogo.jpg" format="jpg" />
	</imageobject>
	<para />
	</refsect3>

	</refentry>