| <body class="book toc2 toc-left" style="max-width: 100;"> |
| <div id="header"> |
| <h1>The OpenCL<sup>™</sup> Specification</h1> |
| <div class="details"> |
| <span id="author" class="author">Khronos<sup>®</sup> OpenCL Working Group</span><br> |
| <span id="revnumber">version v3.0.8,</span> |
| <span id="revdate">Wed, 30 Jun 2021 20:00:00 +0000</span> |
| <br><span id="revremark">from git branch: master commit: 09130de814688ec7b463cb089986b807c628ead3</span> |
| </div> |
| <div id="toc" class="toc2"> |
| <div id="toctitle">Table of Contents</div> |
| <ul class="sectlevel1"> |
| <li><a href="#_introduction">1. Introduction</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_normative_references">1.1. Normative References</a></li> |
| <li><a href="#_version_numbers">1.2. Version Numbers</a></li> |
| <li><a href="#unified-spec">1.3. Unified Specification</a></li> |
| </ul> |
| </li> |
| <li><a href="#_glossary">2. Glossary</a></li> |
| <li><a href="#_the_opencl_architecture">3. The OpenCL Architecture</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_platform_model">3.1. Platform Model</a></li> |
| <li><a href="#_execution_model">3.2. Execution Model</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_mapping_work_items_onto_an_ndrange">3.2.1. Mapping work-items onto an NDRange</a></li> |
| <li><a href="#_execution_of_kernel_instances">3.2.2. Execution of kernel-instances</a></li> |
| <li><a href="#device-side-enqueue">3.2.3. Device-side enqueue</a></li> |
| <li><a href="#execution-model-sync">3.2.4. Synchronization</a></li> |
| <li><a href="#_categories_of_kernels">3.2.5. Categories of Kernels</a></li> |
| </ul> |
| </li> |
| <li><a href="#_memory_model">3.3. Memory Model</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_fundamental_memory_regions">3.3.1. Fundamental Memory Regions</a></li> |
| <li><a href="#_memory_objects">3.3.2. Memory Objects</a></li> |
| <li><a href="#shared-virtual-memory">3.3.3. Shared Virtual Memory</a></li> |
| <li><a href="#_memory_consistency_model_for_opencl_1_x">3.3.4. Memory Consistency Model for OpenCL 1.x</a></li> |
| <li><a href="#memory-consistency-model">3.3.5. Memory Consistency Model for OpenCL 2.x</a></li> |
| <li><a href="#_overview_of_atomic_and_fence_operations">3.3.6. Overview of atomic and fence operations</a></li> |
| <li><a href="#memory-ordering-rules">3.3.7. Memory Ordering Rules</a></li> |
| </ul> |
| </li> |
| <li><a href="#opencl-framework">3.4. The OpenCL Framework</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_mixed_version_support">3.4.1. Mixed Version Support</a></li> |
| <li><a href="#_backwards_compatibility">3.4.2. Backwards Compatibility</a></li> |
| <li><a href="#_versioning">3.4.3. Versioning</a></li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li><a href="#opencl-platform-layer">4. The OpenCL Platform Layer</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_querying_platform_info">4.1. Querying Platform Info</a></li> |
| <li><a href="#platform-querying-devices">4.2. Querying Devices</a></li> |
| <li><a href="#_partitioning_a_device">4.3. Partitioning a Device</a></li> |
| <li><a href="#_contexts">4.4. Contexts</a></li> |
| </ul> |
| </li> |
| <li><a href="#opencl-runtime">5. The OpenCL Runtime</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_command_queues">5.1. Command Queues</a></li> |
| <li><a href="#_buffer_objects">5.2. Buffer Objects</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_creating_buffer_objects">5.2.1. Creating Buffer Objects</a></li> |
| <li><a href="#_reading_writing_and_copying_buffer_objects">5.2.2. Reading, Writing and Copying Buffer Objects</a></li> |
| <li><a href="#_filling_buffer_objects">5.2.3. Filling Buffer Objects</a></li> |
| <li><a href="#_mapping_buffer_objects">5.2.4. Mapping Buffer Objects</a></li> |
| </ul> |
| </li> |
| <li><a href="#_image_objects">5.3. Image Objects</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_creating_image_objects">5.3.1. Creating Image Objects</a></li> |
| <li><a href="#_querying_list_of_supported_image_formats">5.3.2. Querying List of Supported Image Formats</a></li> |
| <li><a href="#_reading_writing_and_copying_image_objects">5.3.3. Reading, Writing and Copying Image Objects</a></li> |
| <li><a href="#_filling_image_objects">5.3.4. Filling Image Objects</a></li> |
| <li><a href="#_copying_between_image_and_buffer_objects">5.3.5. Copying between Image and Buffer Objects</a></li> |
| <li><a href="#_mapping_image_objects">5.3.6. Mapping Image Objects</a></li> |
| <li><a href="#image-object-queries">5.3.7. Image Object Queries</a></li> |
| </ul> |
| </li> |
| <li><a href="#_pipes">5.4. Pipes</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_creating_pipe_objects">5.4.1. Creating Pipe Objects</a></li> |
| <li><a href="#_pipe_object_queries">5.4.2. Pipe Object Queries</a></li> |
| </ul> |
| </li> |
| <li><a href="#_querying_unmapping_migrating_retaining_and_releasing_memory_objects">5.5. Querying, Unmapping, Migrating, Retaining and Releasing Memory Objects</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_retaining_and_releasing_memory_objects">5.5.1. Retaining and Releasing Memory Objects</a></li> |
| <li><a href="#unmapping-mapped-memory">5.5.2. Unmapping Mapped Memory Objects</a></li> |
| <li><a href="#accessing-mapped-regions">5.5.3. Accessing mapped regions of a memory object</a></li> |
| <li><a href="#_migrating_memory_objects">5.5.4. Migrating Memory Objects</a></li> |
| <li><a href="#memory-object-queries">5.5.5. Memory Object Queries</a></li> |
| </ul> |
| </li> |
| <li><a href="#_shared_virtual_memory">5.6. Shared Virtual Memory</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_svm_sharing_granularity_coarse_and_fine_grained_sharing">5.6.1. SVM sharing granularity: coarse- and fine- grained sharing</a></li> |
| <li><a href="#_memory_consistency_for_svm_allocations">5.6.2. Memory consistency for SVM allocations</a></li> |
| </ul> |
| </li> |
| <li><a href="#_sampler_objects">5.7. Sampler Objects</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_creating_sampler_objects">5.7.1. Creating Sampler Objects</a></li> |
| <li><a href="#_sampler_object_queries">5.7.2. Sampler Object Queries</a></li> |
| </ul> |
| </li> |
| <li><a href="#_program_objects">5.8. Program Objects</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_creating_program_objects">5.8.1. Creating Program Objects</a></li> |
| <li><a href="#_retaining_and_releasing_program_objects">5.8.2. Retaining and Releasing Program Objects</a></li> |
| <li><a href="#_setting_spir_v_specialization_constants">5.8.3. Setting SPIR-V specialization constants</a></li> |
| <li><a href="#_building_program_executables">5.8.4. Building Program Executables</a></li> |
| <li><a href="#_separate_compilation_and_linking_of_programs">5.8.5. Separate Compilation and Linking of Programs</a></li> |
| <li><a href="#compiler-options">5.8.6. Compiler Options</a></li> |
| <li><a href="#linker-options">5.8.7. Linker Options</a></li> |
| <li><a href="#_unloading_the_opencl_compiler">5.8.8. Unloading the OpenCL Compiler</a></li> |
| <li><a href="#_program_object_queries">5.8.9. Program Object Queries</a></li> |
| </ul> |
| </li> |
| <li><a href="#_kernel_objects">5.9. Kernel Objects</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_creating_kernel_objects">5.9.1. Creating Kernel Objects</a></li> |
| <li><a href="#_setting_kernel_arguments">5.9.2. Setting Kernel Arguments</a></li> |
| <li><a href="#_copying_kernel_objects">5.9.3. Copying Kernel Objects</a></li> |
| <li><a href="#_kernel_object_queries">5.9.4. Kernel Object Queries</a></li> |
| </ul> |
| </li> |
| <li><a href="#_executing_kernels">5.10. Executing Kernels</a></li> |
| <li><a href="#event-objects">5.11. Event Objects</a></li> |
| <li><a href="#markers-barriers-waiting-for-events">5.12. Markers, Barriers and Waiting for Events</a></li> |
| <li><a href="#_out_of_order_execution_of_kernels_and_memory_object_commands">5.13. Out-of-order Execution of Kernels and Memory Object Commands</a></li> |
| <li><a href="#profiling-operations">5.14. Profiling Operations on Memory Objects and Kernels</a></li> |
| <li><a href="#_flush_and_finish">5.15. Flush and Finish</a></li> |
| </ul> |
| </li> |
| <li><a href="#_associated_opencl_specification">6. Associated OpenCL specification</a> |
| <ul class="sectlevel2"> |
| <li><a href="#spirv-il">6.1. SPIR-V Intermediate Language</a></li> |
| <li><a href="#opencl-extensions">6.2. Extensions to OpenCL</a></li> |
| <li><a href="#opencl-c-kernel-language">6.3. The OpenCL C Kernel Language</a></li> |
| </ul> |
| </li> |
| <li><a href="#opencl-embedded-profile">7. OpenCL Embedded Profile</a></li> |
| <li><a href="#_host_environment_and_thread_safety">Appendix A: Host environment and thread safety</a> |
| <ul class="sectlevel2"> |
| <li><a href="#shared-opencl-objects">Shared OpenCL Objects</a></li> |
| <li><a href="#_multiple_host_threads">Multiple Host Threads</a></li> |
| <li><a href="#_global_constructors_and_destructors">Global constructors and destructors</a></li> |
| </ul> |
| </li> |
| <li><a href="#_portability">Appendix B: Portability</a></li> |
| <li><a href="#data-types">Appendix C: Application Data Types</a> |
| <ul class="sectlevel2"> |
| <li><a href="#scalar-data-types">Supported Application Scalar Data Types</a></li> |
| <li><a href="#vector-data-types">Supported Application Vector Data Types</a></li> |
| <li><a href="#alignment-app-data-types">Alignment of Application Data Types</a></li> |
| <li><a href="#_vector_literals">Vector Literals</a></li> |
| <li><a href="#vector-components">Vector Components</a> |
| <ul class="sectlevel3"> |
| <li><a href="#_named_vector_components_notation">Named vector components notation</a></li> |
| <li><a href="#_highlow_vector_component_notation">High/Low vector component notation</a></li> |
| <li><a href="#_native_vector_type_notation">Native vector type notation</a></li> |
| </ul> |
| </li> |
| <li><a href="#_implicit_conversions">Implicit Conversions</a></li> |
| <li><a href="#_explicit_casts">Explicit Casts</a></li> |
| <li><a href="#_other_operators_and_functions">Other operators and functions</a></li> |
| <li><a href="#_application_constant_definitions">Application constant definitions</a></li> |
| </ul> |
| </li> |
| <li><a href="#check-copy-overlap">Appendix D: Checking for Memory Copy Overlap</a></li> |
| <li><a href="#changes_to_opencl">Appendix E: Changes to OpenCL</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_summary_of_changes_from_opencl_1_0_to_opencl_1_1">Summary of changes from OpenCL 1.0 to OpenCL 1.1</a></li> |
| <li><a href="#_summary_of_changes_from_opencl_1_1_to_opencl_1_2">Summary of changes from OpenCL 1.1 to OpenCL 1.2</a></li> |
| <li><a href="#_summary_of_changes_from_opencl_1_2_to_opencl_2_0">Summary of changes from OpenCL 1.2 to OpenCL 2.0</a></li> |
| <li><a href="#_summary_of_changes_from_opencl_2_0_to_opencl_2_1">Summary of changes from OpenCL 2.0 to OpenCL 2.1</a></li> |
| <li><a href="#_summary_of_changes_from_opencl_2_1_to_opencl_2_2">Summary of changes from OpenCL 2.1 to OpenCL 2.2</a></li> |
| <li><a href="#_summary_of_changes_from_opencl_2_2_to_opencl_3_0">Summary of changes from OpenCL 2.2 to OpenCL 3.0</a></li> |
| </ul> |
| </li> |
| <li><a href="#error_codes">Appendix F: Error Codes</a></li> |
| <li><a href="#error_other_misc_enums">Appendix G: Other Miscellaneous Enums</a></li> |
| <li><a href="#opencl-3.0-backwards-compatibility">Appendix H: OpenCL 3.0 Backwards Compatibility</a> |
| <ul class="sectlevel2"> |
| <li><a href="#_shared_virtual_memory_2">Shared Virtual Memory</a></li> |
| <li><a href="#_memory_consistency_model">Memory Consistency Model</a></li> |
| <li><a href="#_device_side_enqueue">Device-Side Enqueue</a></li> |
| <li><a href="#_pipes_2">Pipes</a></li> |
| <li><a href="#_program_scope_global_variables">Program Scope Global Variables</a></li> |
| <li><a href="#_non_uniform_work_groups">Non-Uniform Work Groups</a></li> |
| <li><a href="#_read_write_images">Read-Write Images</a></li> |
| <li><a href="#_creating_2d_images_from_buffers">Creating 2D Images from Buffers</a></li> |
| <li><a href="#_srgb_images">sRGB Images</a></li> |
| <li><a href="#_depth_images">Depth Images</a></li> |
| <li><a href="#_device_and_host_timer_synchronization">Device and Host Timer Synchronization</a></li> |
| <li><a href="#_intermediate_language_programs">Intermediate Language Programs</a></li> |
| <li><a href="#_subgroups">Subgroups</a></li> |
| <li><a href="#_program_initialization_and_clean_up_kernels">Program Initialization and Clean-Up Kernels</a></li> |
| <li><a href="#_3d_image_writes">3D Image Writes</a></li> |
| <li><a href="#_work_group_collective_functions">Work Group Collective Functions</a></li> |
| <li><a href="#_generic_address_space">Generic Address Space</a></li> |
| <li><a href="#_language_features_that_were_already_optional">Language Features that Were Already Optional</a></li> |
| </ul> |
| </li> |
| <li><a href="#_acknowledgements">Acknowledgements</a></li> |
| </ul> |
| </div> |
| </div> |
| <div id="content"> |
| <div id="preamble"> |
| <div class="sectionbody"> |
| <div style="page-break-after: always;"></div> |
| <div class="paragraph"> |
| <p>Copyright 2008-2021 The Khronos Group.</p> |
| </div> |
| <div class="paragraph"> |
| <p>This specification is protected by copyright laws and contains material proprietary |
| to the Khronos Group, Inc. Except as described by these terms, it or any components |
| may not be reproduced, republished, distributed, transmitted, displayed, broadcast |
| or otherwise exploited in any manner without the express prior written permission |
| of Khronos Group.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Khronos Group grants a conditional copyright license to use and reproduce the |
| unmodified specification for any purpose, without fee or royalty, EXCEPT no licenses |
| to any patent, trademark or other intellectual property rights are granted under |
| these terms. Parties desiring to implement the specification and make use of |
| Khronos trademarks in relation to that implementation, and receive reciprocal patent |
| license protection under the Khronos IP Policy must become Adopters and confirm the |
| implementation as conformant under the process defined by Khronos for this |
| specification; see <a href="https://www.khronos.org/adopters" class="bare">https://www.khronos.org/adopters</a>.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Khronos Group makes no, and expressly disclaims any, representations or warranties, |
| express or implied, regarding this specification, including, without limitation: |
| merchantability, fitness for a particular purpose, non-infringement of any |
| intellectual property, correctness, accuracy, completeness, timeliness, and |
| reliability. Under no circumstances will the Khronos Group, or any of its Promoters, |
| Contributors or Members, or their respective partners, officers, directors, |
| employees, agents or representatives be liable for any damages, whether direct, |
| indirect, special or consequential damages for lost revenues, lost profits, or |
| otherwise, arising from or in connection with these materials.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Vulkan and Khronos are registered trademarks, and OpenXR, SPIR, SPIR-V, SYCL, WebGL, |
| WebCL, OpenVX, OpenVG, EGL, COLLADA, glTF, NNEF, OpenKODE, OpenKCAM, StreamInput, |
| OpenWF, OpenSL ES, OpenMAX, OpenMAX AL, OpenMAX IL, OpenMAX DL, OpenML and DevU are |
| trademarks of the Khronos Group Inc. ASTC is a trademark of ARM Holdings PLC, |
| OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks |
| and the OpenGL ES and OpenGL SC logos are trademarks of Hewlett Packard Enterprise |
| used under license by Khronos. All other product names, trademarks, |
| and/or company names are used solely for identification and belong to their |
| respective owners.</p> |
| </div> |
| <div style="page-break-after: always;"></div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_introduction"><a class="anchor" href="#_introduction"></a>1. Introduction</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p>Modern processor architectures have embraced parallelism as an important |
| pathway to increased performance. |
| Facing technical challenges with higher clock speeds in a fixed power |
| envelope, Central Processing Units (CPUs) now improve performance by adding |
| multiple cores. |
| Graphics Processing Units (GPUs) have also evolved from fixed function |
| rendering devices into programmable parallel processors. |
| As todays computer systems often include highly parallel CPUs, GPUs and |
| other types of processors, it is important to enable software developers to |
| take full advantage of these heterogeneous processing platforms.</p> |
| </div> |
| <div class="paragraph"> |
| <p>Creating applications for heterogeneous parallel processing platforms is |
| challenging as traditional programming approaches for multi-core CPUs and |
| GPUs are very different. |
| CPU-based parallel programming models are typically based on standards but |
| usually assume a shared address space and do not encompass vector |
| operations. |
| General purpose GPU programming models address complex memory hierarchies |
| and vector operations but are traditionally platform-, vendor- or |
| hardware-specific. |
| These limitations make it difficult for a developer to access the compute |
| power of heterogeneous CPUs, GPUs and other types of processors from a |
| single, multi-platform source code base. |
| More than ever, there is a need to enable software developers to effectively |
| take full advantage of heterogeneous processing platforms from high |
| performance compute servers, through desktop computer systems to handheld |
| devices - that include a diverse mix of parallel CPUs, GPUs and other |
| processors such as DSPs and the Cell/B.E. |
| processor.</p> |
| </div> |
| <div class="paragraph"> |
| <p><strong>OpenCL</strong> (Open Computing Language) is an open royalty-free standard for |
| general purpose parallel programming across CPUs, GPUs and other processors, |
| giving software developers portable and efficient access to the power of |
| these heterogeneous processing platforms.</p> |
| </div> |
| <div class="paragraph"> |
| <p>OpenCL supports a wide range of applications, ranging from embedded and |
| consumer software to HPC solutions, through a low-level, high-performance, |
| portable abstraction. |
| By creating an efficient, close-to-the-metal programming interface, OpenCL |
| will form the foundation layer of a parallel computing ecosystem of |
| platform-independent tools, middleware and applications. |
| OpenCL is particularly suited to play an increasingly significant role in |
| emerging interactive graphics applications that combine general parallel |
| compute algorithms with graphics rendering pipelines.</p> |
| </div> |
| <div class="paragraph"> |
| <p>OpenCL consists of an API for coordinating parallel computation across |
| heterogeneous processors, a cross-platform programming language, and a |
| cross-platform intermediate language with a well-specified computation |
| environment. |
| The OpenCL standard:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Supports both data- and task-based parallel programming models</p> |
| </li> |
| <li> |
| <p>Supports kernels written using a subset of ISO C99 with extensions |
| for parallel execution</p> |
| </li> |
| <li> |
| <p>Supports kernels represented by a portable and self-contained |
| intermediate language (e.g. SPIR-V) with support for parallel execution</p> |
| </li> |
| <li> |
| <p>Defines consistent numerical requirements based on IEEE 754</p> |
| </li> |
| <li> |
| <p>Defines a configuration profile for handheld and embedded devices</p> |
| </li> |
| <li> |
| <p>Supports efficient interop with OpenGL, OpenGL ES and other APIs</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>This document begins with an overview of basic concepts and the architecture |
| of OpenCL, followed by a detailed description of its execution model, memory |
| model and synchronization support. |
| It then discusses the OpenCL platform and runtime API. |
| Some examples are given that describe sample compute use-cases and how they |
| would be written in OpenCL. |
| The specification is divided into a core specification that any OpenCL |
| compliant implementation must support; a handheld/embedded profile which |
| relaxes the OpenCL compliance requirements for handheld and embedded |
| devices; and a set of optional extensions that are likely to move into the |
| core specification in later revisions of the OpenCL specification.</p> |
| </div> |
| <div class="sect2"> |
| <h3 id="_normative_references"><a class="anchor" href="#_normative_references"></a>1.1. Normative References</h3> |
| <div class="paragraph"> |
| <p>Normative references are references to external documents or resources to |
| which implementers of OpenCL must comply with all, or specified portions of, |
| as described in this specification.</p> |
| </div> |
| <div id="iso-c11" class="paragraph"> |
| <p><em>ISO/IEC 9899:2011 - Information technology - Programming languages - C</em>, |
| <a href="https://www.iso.org/standard/57853.html" class="bare">https://www.iso.org/standard/57853.html</a> (final specification), |
| <a href="http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf" class="bare">http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf</a> (last public |
| draft).</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="_version_numbers"><a class="anchor" href="#_version_numbers"></a>1.2. Version Numbers</h3> |
| <div class="paragraph"> |
| <p>The OpenCL version number follows a <em>major.minor-revision</em> scheme. When this |
| version number is used within the API it generally only includes the |
| <em>major.minor</em> components of the version number.</p> |
| </div> |
| <div class="paragraph"> |
| <p>A difference in the <em>major</em> or <em>minor</em> version number indicates that some |
| amount of new functionality has been added to the specification, and may also |
| include behavior changes and bug fixes. |
| Functionality may also be deprecated or removed when the <em>major</em> or <em>minor</em> |
| version changes.</p> |
| </div> |
| <div class="paragraph"> |
| <p>A difference in the <em>revision</em> number indicates small changes to the |
| specification, typically to fix a bug or to clarify language. |
| When the <em>revision</em> number changes there may be an impact on the behavior of |
| existing functionality, but this should not affect backwards compatibility. |
| Functionality should not be added or removed when the <em>revision</em> number |
| changes.</p> |
| </div> |
| </div> |
| <div class="sect2"> |
| <h3 id="unified-spec"><a class="anchor" href="#unified-spec"></a>1.3. Unified Specification</h3> |
| <div class="paragraph"> |
| <p>This document specifies all versions of the OpenCL API.</p> |
| </div> |
| <div class="paragraph"> |
| <p>There are three ways that an OpenCL feature may be described in terms of what |
| versions of OpenCL support that feature.</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Missing before <em>major.minor</em>: Features that were introduced in |
| version <em>major.minor</em>. Implementations of an earlier version of OpenCL |
| will not provide these features.</p> |
| </li> |
| <li> |
| <p>Deprecated by <em>major.minor</em>: Features that were deprecated |
| in version <em>major.minor</em>, see the definition of deprecation in the |
| glossary.</p> |
| </li> |
| <li> |
| <p>Universal: Features that have no mention of what version they are missing |
| before or deprecated by are available in all versions of OpenCL.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_glossary"><a class="anchor" href="#_glossary"></a>2. Glossary</h2> |
| <div class="sectionbody"> |
| <div class="dlist"> |
| <dl> |
| <dt class="hdlist1">Application </dt> |
| <dd> |
| <p>The combination of the program running on the host and OpenCL devices.</p> |
| </dd> |
| <dt class="hdlist1">Acquire semantics </dt> |
| <dd> |
| <p>One of the memory order semantics defined for synchronization |
| operations. |
| Acquire semantics apply to atomic operations that load from memory. |
| Given two units of execution, <strong>A</strong> and <strong>B</strong>, acting on a shared atomic |
| object <strong>M</strong>, if <strong>A</strong> uses an atomic load of <strong>M</strong> with acquire semantics to |
| synchronize-with an atomic store to <strong>M</strong> by <strong>B</strong> that used release |
| semantics, then <strong>A</strong>'s atomic load will occur before any subsequent |
| operations by <strong>A</strong>. |
| Note that the memory orders <em>release</em>, <em>sequentially consistent</em>, and |
| <em>acquire_release</em> all include <em>release semantics</em> and effectively pair |
| with a load using acquire semantics.</p> |
| </dd> |
| <dt class="hdlist1">Acquire release semantics </dt> |
| <dd> |
| <p>A memory order semantics for synchronization operations (such as atomic |
| operations) that has the properties of both acquire and release memory |
| orders. |
| It is used with read-modify-write operations.</p> |
| </dd> |
| <dt class="hdlist1">Atomic operations </dt> |
| <dd> |
| <p>Operations that at any point, and from any perspective, have either |
| occurred completely, or not at all. |
| Memory orders associated with atomic operations may constrain the |
| visibility of loads and stores with respect to the atomic operations |
| (see <em>relaxed semantics</em>, <em>acquire semantics</em>, <em>release semantics</em> or |
| <em>acquire release semantics</em>).</p> |
| </dd> |
| <dt class="hdlist1">Blocking and Non-Blocking Enqueue API calls </dt> |
| <dd> |
| <p>A <em>non-blocking enqueue API call</em> places a <em>command</em> on a |
| <em>command-queue</em> and returns immediately to the host. |
| The <em>blocking-mode enqueue API calls</em> do not return to the host until |
| the command has completed.</p> |
| </dd> |
| <dt class="hdlist1">Barrier </dt> |
| <dd> |
| <p>There are three types of <em>barriers</em> a command-queue barrier, a |
| work-group barrier and a sub-group barrier.</p> |
| <div class="openblock"> |
| <div class="content"> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>The OpenCL API provides a function to enqueue a <em>command-queue</em> |
| <em>barrier</em> command. |
| This <em>barrier</em> command ensures that all previously enqueued commands to |
| a command-queue have finished execution before any following <em>commands</em> |
| enqueued in the <em>command-queue</em> can begin execution.</p> |
| </li> |
| <li> |
| <p>The OpenCL kernel execution model provides built-in <em>work-group barrier</em> |
| functionality. |
| This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on |
| a <em>device</em> to perform synchronization between <em>work-items</em> in a |
| <em>work-group</em> executing the <em>kernel</em>. |
| All the <em>work-items</em> of a <em>work-group</em> must execute the <em>barrier</em> |
| construct before any are allowed to continue execution beyond the |
| <em>barrier</em>.</p> |
| </li> |
| <li> |
| <p>The OpenCL kernel execution model provides built-in <em>sub-group barrier</em> |
| functionality. |
| This <em>barrier</em> built-in function can be used by a <em>kernel</em> executing on |
| a <em>device</em> to perform synchronization between <em>work-items</em> in a |
| <em>sub-group</em> executing the <em>kernel</em>. |
| All the <em>work-items</em> of a <em>sub-group</em> must execute the <em>barrier</em> |
| construct before any are allowed to continue execution beyond the |
| <em>barrier</em>.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </dd> |
| <dt class="hdlist1">Buffer Object </dt> |
| <dd> |
| <p>A memory object that stores a linear collection of bytes. |
| Buffer objects are accessible using a pointer in a <em>kernel</em> executing on |
| a <em>device</em>. |
| Buffer objects can be manipulated by the host using OpenCL API calls. |
| A <em>buffer object</em> encapsulates the following information:</p> |
| <div class="openblock"> |
| <div class="content"> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Size in bytes.</p> |
| </li> |
| <li> |
| <p>Properties that describe usage information and which region to allocate |
| from.</p> |
| </li> |
| <li> |
| <p>Buffer data.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </dd> |
| <dt class="hdlist1">Built-in Kernel </dt> |
| <dd> |
| <p>A <em>built-in kernel</em> is a <em>kernel</em> that is executed on an OpenCL <em>device</em> |
| or <em>custom device</em> by fixed-function hardware or in firmware. |
| <em>Applications</em> can query the <em>built-in kernels</em> supported by a <em>device</em> |
| or <em>custom device</em>. |
| A <em>program object</em> can only contain <em>kernels</em> written in OpenCL C or |
| <em>built-in kernels</em> but not both. |
| See also <em>Kernel</em> and <em>Program</em>.</p> |
| </dd> |
| <dt class="hdlist1">Child kernel </dt> |
| <dd> |
| <p>See <em>Device-side enqueue</em>.</p> |
| </dd> |
| <dt class="hdlist1">Command </dt> |
| <dd> |
| <p>The OpenCL operations that are submitted to a <em>command-queue</em> for |
| execution. |
| For example, OpenCL commands issue kernels for execution on a compute |
| device, manipulate memory objects, etc.</p> |
| </dd> |
| <dt class="hdlist1">Command-queue </dt> |
| <dd> |
| <p>An object that holds <em>commands</em> that will be executed on a specific |
| <em>device</em>. |
| The <em>command-queue</em> is created on a specific <em>device</em> in a <em>context</em>. |
| <em>Commands</em> to a <em>command-queue</em> are queued in-order but may be executed |
| in-order or out-of-order. |
| <em>Refer to In-order Execution_and_Out-of-order Execution</em>.</p> |
| </dd> |
| <dt class="hdlist1">Command-queue Barrier </dt> |
| <dd> |
| <p>See <em>Barrier</em>.</p> |
| </dd> |
| <dt class="hdlist1">Command synchronization </dt> |
| <dd> |
| <p>Constraints on the order that commands are launched for execution on a |
| device defined in terms of the synchronization points that occur between |
| commands in host command-queues and between commands in device-side |
| command-queues. |
| See <em>synchronization points</em>.</p> |
| </dd> |
| <dt class="hdlist1">Complete </dt> |
| <dd> |
| <p>The final state in the six state model for the execution of a command. |
| The transition into this state occurs is signaled through event objects |
| or callback functions associated with a command.</p> |
| </dd> |
| <dt class="hdlist1">Compute Device Memory </dt> |
| <dd> |
| <p>This refers to one or more memories attached to the compute device.</p> |
| </dd> |
| <dt class="hdlist1">Compute Unit </dt> |
| <dd> |
| <p>An OpenCL <em>device</em> has one or more <em>compute units</em>. |
| A <em>work-group</em> executes on a single <em>compute unit</em>. |
| A <em>compute unit</em> is composed of one or more <em>processing elements</em> and |
| <em>local memory</em>. |
| A <em>compute unit</em> may also include dedicated texture filter units that |
| can be accessed by its processing elements.</p> |
| </dd> |
| <dt class="hdlist1">Concurrency </dt> |
| <dd> |
| <p>A property of a system in which a set of tasks in a system can remain |
| active and make progress at the same time. |
| To utilize concurrent execution when running a program, a programmer |
| must identify the concurrency in their problem, expose it within the |
| source code, and then exploit it using a notation that supports |
| concurrency.</p> |
| </dd> |
| <dt class="hdlist1">Constant Memory </dt> |
| <dd> |
| <p>A region of <em>global memory</em> that remains constant during the execution |
| of a <em>kernel</em>. |
| The <em>host</em> allocates and initializes memory objects placed into |
| <em>constant memory</em>.</p> |
| </dd> |
| <dt class="hdlist1">Context </dt> |
| <dd> |
| <p>The environment within which the kernels execute and the domain in which |
| synchronization and memory management is defined. |
| The <em>context</em> includes a set of <em>devices</em>, the memory accessible to |
| those <em>devices</em>, the corresponding memory properties and one or more |
| <em>command-queues</em> used to schedule execution of a <em>kernel(s)</em> or |
| operations on <em>memory objects</em>.</p> |
| </dd> |
| <dt class="hdlist1">Control flow </dt> |
| <dd> |
| <p>The flow of instructions executed by a work-item. |
| Multiple logically related work-items may or may not execute the same |
| control flow. |
| The control flow is said to be <em>converged</em> if all the work-items in the |
| set execution the same stream of instructions. |
| In a <em>diverged</em> control flow, the work-items in the set execute |
| different instructions. |
| At a later point, if a diverged control flow becomes converged, it is |
| said to be a re-converged control flow.</p> |
| </dd> |
| <dt class="hdlist1">Converged control flow </dt> |
| <dd> |
| <p>See <em>Control flow</em>.</p> |
| </dd> |
| <dt class="hdlist1">Custom Device </dt> |
| <dd> |
| <p>An OpenCL <em>device</em> that fully implements the OpenCL Runtime but does not |
| support <em>programs</em> written in OpenCL C. |
| A custom device may be specialized non-programmable hardware that is |
| very power efficient and performant for directed tasks or hardware with |
| limited programmable capabilities such as specialized DSPs. |
| Custom devices are not OpenCL conformant. |
| Custom devices may support an online compiler. |
| Programs for custom devices can be created using the OpenCL runtime APIs |
| that allow OpenCL programs to be created from source (if an online |
| compiler is supported) and/or binary, or from <em>built-in kernels</em> |
| supported by the <em>device</em>. |
| See also <em>Device</em>.</p> |
| </dd> |
| <dt class="hdlist1">Data Parallel Programming Model </dt> |
| <dd> |
| <p>Traditionally, this term refers to a programming model where concurrency |
| is expressed as instructions from a single program applied to multiple |
| elements within a set of data structures. |
| The term has been generalized in OpenCL to refer to a model wherein a |
| set of instructions from a single program are applied concurrently to |
| each point within an abstract domain of indices.</p> |
| </dd> |
| <dt class="hdlist1">Data race </dt> |
| <dd> |
| <p>The execution of a program contains a data race if it contains two |
| actions in different work-items or host threads where (1) one action |
| modifies a memory location and the other action reads or modifies the |
| same memory location, and (2) at least one of these actions is not |
| atomic, or the corresponding memory scopes are not inclusive, and (3) |
| the actions are global actions unordered by the global-happens-before |
| relation or are local actions unordered by the local-happens before |
| relation.</p> |
| </dd> |
| <dt class="hdlist1">Deprecation </dt> |
| <dd> |
| <p>Existing features are marked as deprecated if their usage is not |
| recommended as that feature is being de-emphasized, superseded and may |
| be removed from a future version of the specification.</p> |
| </dd> |
| <dt class="hdlist1">Device </dt> |
| <dd> |
| <p>A <em>device</em> is a collection of <em>compute units</em>. |
| A <em>command-queue</em> is used to queue <em>commands</em> to a <em>device</em>. |
| Examples of <em>commands</em> include executing <em>kernels</em>, or reading and |
| writing <em>memory objects</em>. |
| OpenCL devices typically correspond to a GPU, a multi-core CPU, and |
| other processors such as DSPs and the Cell/B.E. |
| processor.</p> |
| </dd> |
| <dt class="hdlist1">Device-side enqueue </dt> |
| <dd> |
| <p>A mechanism whereby a kernel-instance is enqueued by a kernel-instance |
| running on a device without direct involvement by the host program. |
| This produces <em>nested parallelism</em>; i.e. additional levels of |
| concurrency are nested inside a running kernel-instance. |
| The kernel-instance executing on a device (the <em>parent kernel</em>) enqueues |
| a kernel-instance (the <em>child kernel</em>) to a device-side command queue. |
| Child and parent kernels execute asynchronously though a parent kernel |
| does not complete until all of its child-kernels have completed.</p> |
| </dd> |
| <dt class="hdlist1">Diverged control flow </dt> |
| <dd> |
| <p>See <em>Control flow</em>.</p> |
| </dd> |
| <dt class="hdlist1">Ended </dt> |
| <dd> |
| <p>The fifth state in the six state model for the execution of a command. |
| The transition into this state occurs when execution of a command has |
| ended. |
| When a Kernel-enqueue command ends, all of the work-groups associated |
| with that command have finished their execution.</p> |
| </dd> |
| <dt class="hdlist1">Event Object </dt> |
| <dd> |
| <p>An <em>event object</em> encapsulates the status of an operation such as a |
| <em>command</em>. |
| It can be used to synchronize operations in a context.</p> |
| </dd> |
| <dt class="hdlist1">Event Wait List </dt> |
| <dd> |
| <p>An <em>event wait list</em> is a list of <em>event objects</em> that can be used to |
| control when a particular <em>command</em> begins execution.</p> |
| </dd> |
| <dt class="hdlist1">Fence </dt> |
| <dd> |
| <p>A memory ordering operation without an associated atomic object. |
| A fence can use the <em>acquire semantics, release semantics</em>, or <em>acquire |
| release semantics</em>.</p> |
| </dd> |
| <dt class="hdlist1">Framework </dt> |
| <dd> |
| <p>A software system that contains the set of components to support |
| software development and execution. |
| A <em>framework</em> typically includes libraries, APIs, runtime systems, |
| compilers, etc.</p> |
| </dd> |
| <dt class="hdlist1">Generic address space </dt> |
| <dd> |
| <p>An address space that include the <em>private</em>, <em>local</em>, and <em>global</em> |
| address spaces available to a device. |
| The generic address space supports conversion of pointers to and from |
| private, local and global address spaces, and hence lets a programmer |
| write a single function that at compile time can take arguments from any |
| of the three named address spaces.</p> |
| </dd> |
| <dt class="hdlist1">Global Happens before </dt> |
| <dd> |
| <p>See <em>Happens before</em>.</p> |
| </dd> |
| <dt class="hdlist1">Global ID </dt> |
| <dd> |
| <p>A <em>global ID</em> is used to uniquely identify a <em>work-item</em> and is derived |
| from the number of <em>global work-items</em> specified when executing a |
| <em>kernel</em>. |
| The <em>global ID</em> is a N-dimensional value that starts at (0, 0, …​ 0). |
| See also <em>Local ID</em>.</p> |
| </dd> |
| <dt class="hdlist1">Global Memory </dt> |
| <dd> |
| <p>A memory region accessible to all <em>work-items</em> executing in a <em>context</em>. |
| It is accessible to the <em>host</em> using <em>commands</em> such as read, write and |
| map. |
| <em>Global memory</em> is included within the <em>generic address space</em> that |
| includes the private and local address spaces.</p> |
| </dd> |
| <dt class="hdlist1">GL share group </dt> |
| <dd> |
| <p>A <em>GL share group</em> object manages shared OpenGL or OpenGL ES resources |
| such as textures, buffers, framebuffers, and renderbuffers and is |
| associated with one or more GL context objects. |
| The <em>GL share group</em> is typically an opaque object and not directly |
| accessible.</p> |
| </dd> |
| <dt class="hdlist1">Handle </dt> |
| <dd> |
| <p>An opaque type that references an <em>object</em> allocated by OpenCL. |
| Any operation on an <em>object</em> occurs by reference to that objects handle.</p> |
| </dd> |
| <dt class="hdlist1">Happens before </dt> |
| <dd> |
| <p>An ordering relationship between operations that execute on multiple |
| units of execution. |
| If an operation A happens-before operation B then A must occur before B; |
| in particular, any value written by A will be visible to B. |
| We define two separate happens before relations: <em>global-happens-before</em> |
| and <em>local-happens-before</em>. |
| These are defined in <a href="#memory-ordering-rules">Memory Ordering Rules</a>.</p> |
| </dd> |
| <dt class="hdlist1">Host </dt> |
| <dd> |
| <p>The <em>host</em> interacts with the <em>context</em> using the OpenCL API.</p> |
| </dd> |
| <dt class="hdlist1">Host-thread </dt> |
| <dd> |
| <p>The unit of execution that executes the statements in the host program.</p> |
| </dd> |
| <dt class="hdlist1">Host pointer </dt> |
| <dd> |
| <p>A pointer to memory that is in the virtual address space on the <em>host</em>.</p> |
| </dd> |
| <dt class="hdlist1">Illegal </dt> |
| <dd> |
| <p>Behavior of a system that is explicitly not allowed and will be reported |
| as an error when encountered by OpenCL.</p> |
| </dd> |
| <dt class="hdlist1">Image Object </dt> |
| <dd> |
| <p>A <em>memory object</em> that stores a two- or three-dimensional structured |
| array. |
| Image data can only be accessed with read and write functions. |
| The read functions use a <em>sampler</em>.</p> |
| <div class="openblock"> |
| <div class="content"> |
| <div class="paragraph"> |
| <p>The <em>image object</em> encapsulates the following information:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Dimensions of the image.</p> |
| </li> |
| <li> |
| <p>Description of each element in the image.</p> |
| </li> |
| <li> |
| <p>Properties that describe usage information and which region to allocate |
| from.</p> |
| </li> |
| <li> |
| <p>Image data.</p> |
| </li> |
| </ul> |
| </div> |
| <div class="paragraph"> |
| <p>The elements of an image are selected from a list of predefined image |
| formats.</p> |
| </div> |
| </div> |
| </div> |
| </dd> |
| <dt class="hdlist1">Implementation Defined </dt> |
| <dd> |
| <p>Behavior that is explicitly allowed to vary between conforming |
| implementations of OpenCL. |
| An OpenCL implementor is required to document the implementation-defined |
| behavior.</p> |
| </dd> |
| <dt class="hdlist1">Independent Forward Progress </dt> |
| <dd> |
| <p>If an entity supports independent forward progress, then if it is |
| otherwise not dependent on any actions due to be performed by any other |
| entity (for example it does not wait on a lock held by, and thus that |
| must be released by, any other entity), then its execution cannot be |
| blocked by the execution of any other entity in the system (it will not |
| be starved). |
| Work-items in a subgroup, for example, typically do not support |
| independent forward progress, so one work-item in a subgroup may be |
| completely blocked (starved) if a different work-item in the same |
| subgroup enters a spin loop.</p> |
| </dd> |
| <dt class="hdlist1">In-order Execution </dt> |
| <dd> |
| <p>A model of execution in OpenCL where the <em>commands</em> in a <em>command-queue</em> |
| are executed in order of submission with each <em>command</em> running to |
| completion before the next one begins. |
| See Out-of-order Execution.</p> |
| </dd> |
| <dt class="hdlist1">Intermediate Language </dt> |
| <dd> |
| <p>A lower-level language that may be used to create programs. |
| SPIR-V is a required intermediate language (IL) for OpenCL 2.1 and 2.2 devices. |
| Other OpenCL devices may optionally support SPIR-V or other ILs.</p> |
| </dd> |
| <dt class="hdlist1">Kernel </dt> |
| <dd> |
| <p>A <em>kernel</em> is a function declared in a <em>program</em> and executed on an |
| OpenCL <em>device</em>. |
| A <em>kernel</em> is identified by the <code>__kernel</code> or <code>kernel</code> qualifier applied to |
| any function defined in a <em>program</em>.</p> |
| </dd> |
| <dt class="hdlist1">Kernel-instance </dt> |
| <dd> |
| <p>The work carried out by an OpenCL program occurs through the execution |
| of kernel-instances on devices. |
| The kernel instance is the <em>kernel object</em>, the values associated with |
| the arguments to the kernel, and the parameters that define the |
| <em>NDRange</em> index space.</p> |
| </dd> |
| <dt class="hdlist1">Kernel Object </dt> |
| <dd> |
| <p>A <em>kernel object</em> encapsulates a specific <em>kernel</em> function declared |
| in a <em>program</em> and the argument values to be used when executing this |
| <em>kernel</em> function.</p> |
| </dd> |
| <dt class="hdlist1">Kernel Language </dt> |
| <dd> |
| <p>A language that is used to represent source code for kernel. |
| Kernels may be directly created from OpenCL C kernel language |
| source strings. |
| Other kernel languages may be supported by compiling to SPIR-V, |
| another supported Intermediate Language, or to a device-specific |
| program binary format.</p> |
| </dd> |
| <dt class="hdlist1">Launch </dt> |
| <dd> |
| <p>The transition of a command from the <em>submitted</em> state to the <em>ready</em> |
| state. |
| See <em>Ready</em>.</p> |
| </dd> |
| <dt class="hdlist1">Local ID </dt> |
| <dd> |
| <p>A <em>local ID</em> specifies a unique <em>work-item ID</em> within a given |
| <em>work-group</em> that is executing a <em>kernel</em>. |
| The <em>local ID</em> is a N-dimensional value that starts at (0, 0, …​ 0). |
| See also <em>Global ID</em>.</p> |
| </dd> |
| <dt class="hdlist1">Local Memory </dt> |
| <dd> |
| <p>A memory region associated with a <em>work-group</em> and accessible only by |
| <em>work-items</em> in that <em>work-group</em>. |
| <em>Local memory</em> is included within the <em>generic address space</em> that |
| includes the private and global address spaces.</p> |
| </dd> |
| <dt class="hdlist1">Marker </dt> |
| <dd> |
| <p>A <em>command</em> queued in a <em>command-queue</em> that can be used to tag all |
| <em>commands</em> queued before the <em>marker</em> in the <em>command-queue</em>. |
| The <em>marker</em> command returns an <em>event</em> which can be used by the |
| <em>application</em> to queue a wait on the marker event i.e. wait for all |
| commands queued before the <em>marker</em> command to complete.</p> |
| </dd> |
| <dt class="hdlist1">Memory Consistency Model </dt> |
| <dd> |
| <p>Rules that define which values are observed when multiple units of |
| execution load data from any shared memory plus the synchronization |
| operations that constrain the order of memory operations and define |
| synchronization relationships. |
| The memory consistency model in OpenCL is based on the memory model from |
| the ISO C11 programming language.</p> |
| </dd> |
| <dt class="hdlist1">Memory Objects </dt> |
| <dd> |
| <p>A <em>memory object</em> is a handle to a reference counted region of <em>Global |
| Memory</em>. |
| Also see <em>Buffer Object</em> and <em>Image Object</em>.</p> |
| </dd> |
| <dt class="hdlist1">Memory Regions (or Pools) </dt> |
| <dd> |
| <p>A distinct address space in OpenCL. |
| <em>Memory regions</em> may overlap in physical memory though OpenCL will treat |
| them as logically distinct. |
| The <em>memory regions</em> are denoted as <em>private</em>, <em>local</em>, <em>constant,</em> and |
| <em>global</em>.</p> |
| </dd> |
| <dt class="hdlist1">Memory Scopes </dt> |
| <dd> |
| <p>These memory scopes define a hierarchy of visibilities when analyzing |
| the ordering constraints of memory operations. |
| They are defined by the values of the <strong>memory_scope</strong> enumeration |
| constant. |
| Current values are <strong>memory_scope_work_item</strong> (memory constraints only |
| apply to a single work-item and in practice apply only to image |
| operations), <strong>memory_scope_sub_group</strong> (memory-ordering constraints only |
| apply to work-items executing in a sub-group), <strong>memory_scope_work_group</strong> |
| (memory-ordering constraints only apply to work-items executing in a |
| work-group), <strong>memory_scope_device</strong> (memory-ordering constraints only |
| apply to work-items executing on a single device) and |
| <strong>memory_scope_all_svm_devices</strong> (memory-ordering constraints only apply |
| to work-items executing across multiple devices and when using shared |
| virtual memory).</p> |
| </dd> |
| <dt class="hdlist1">Modification Order </dt> |
| <dd> |
| <p>All modifications to a particular atomic object M occur in some |
| particular <em>total order</em>, called the <em>modification order</em> of M. |
| If A and B are modifications of an atomic object M, and A happens-before |
| B, then A shall precede B in the modification order of M. |
| Note that the modification order of an atomic object M is independent of |
| whether M is in local or global memory.</p> |
| </dd> |
| <dt class="hdlist1">Nested Parallelism </dt> |
| <dd> |
| <p>See <em>device-side enqueue</em>.</p> |
| </dd> |
| <dt class="hdlist1">Object </dt> |
| <dd> |
| <p>Objects are abstract representation of the resources that can be |
| manipulated by the OpenCL API. |
| Examples include <em>program objects</em>, <em>kernel objects</em>, and <em>memory |
| objects</em>.</p> |
| </dd> |
| <dt class="hdlist1">Out-of-Order Execution </dt> |
| <dd> |
| <p>A model of execution in which <em>commands</em> placed in the <em>work queue</em> may |
| begin and complete execution in any order consistent with constraints |
| imposed by <em>event wait lists_and_command-queue barrier</em>. |
| See <em>In-order Execution</em>.</p> |
| </dd> |
| <dt class="hdlist1">Parent device </dt> |
| <dd> |
| <p>The OpenCL <em>device</em> which is partitioned to create <em>sub-devices</em>. |
| Not all <em>parent devices</em> are <em>root devices</em>. |
| A <em>root device</em> might be partitioned and the <em>sub-devices</em> partitioned |
| again. |
| In this case, the first set of <em>sub-devices</em> would be <em>parent devices</em> |
| of the second set, but not the <em>root devices</em>. |
| Also see <em>Device</em>, <em>parent device</em> and <em>root device</em>.</p> |
| </dd> |
| <dt class="hdlist1">Parent kernel </dt> |
| <dd> |
| <p>see <em>Device-side enqueue</em>.</p> |
| </dd> |
| <dt class="hdlist1">Pipe </dt> |
| <dd> |
| <p>The <em>pipe</em> memory object conceptually is an ordered sequence of data |
| items. |
| A pipe has two endpoints: a write endpoint into which data items are |
| inserted, and a read endpoint from which data items are removed. |
| At any one time, only one kernel instance may write into a pipe, and |
| only one kernel instance may read from a pipe. |
| To support the producer consumer design pattern, one kernel instance |
| connects to the write endpoint (the producer) while another kernel |
| instance connects to the reading endpoint (the consumer).</p> |
| </dd> |
| <dt class="hdlist1">Platform </dt> |
| <dd> |
| <p>The <em>host</em> plus a collection of <em>devices</em> managed by the OpenCL |
| <em>framework</em> that allow an application to share <em>resources</em> and execute |
| <em>kernels</em> on <em>devices</em> in the <em>platform</em>.</p> |
| </dd> |
| <dt class="hdlist1">Private Memory </dt> |
| <dd> |
| <p>A region of memory private to a <em>work-item</em>. |
| Variables defined in one <em>work-items</em> <em>private memory</em> are not visible |
| to another <em>work-item</em>.</p> |
| </dd> |
| <dt class="hdlist1">Processing Element </dt> |
| <dd> |
| <p>A virtual scalar processor. |
| A work-item may execute on one or more processing elements.</p> |
| </dd> |
| <dt class="hdlist1">Program </dt> |
| <dd> |
| <p>An OpenCL <em>program</em> consists of a set of <em>kernels</em>. |
| <em>Programs</em> may also contain auxiliary functions called by the |
| <em>kernel</em> functions and constant data.</p> |
| </dd> |
| <dt class="hdlist1">Program Object </dt> |
| <dd> |
| <p>A <em>program object</em> encapsulates the following information:</p> |
| <div class="openblock"> |
| <div class="content"> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>A reference to an associated <em>context</em>.</p> |
| </li> |
| <li> |
| <p>A <em>program</em> source or binary.</p> |
| </li> |
| <li> |
| <p>The latest successfully built program executable, the list of <em>devices</em> |
| for which the program executable is built, the build options used and a |
| build log.</p> |
| </li> |
| <li> |
| <p>The number of <em>kernel objects</em> currently attached.</p> |
| </li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| </dd> |
| <dt class="hdlist1">Queued </dt> |
| <dd> |
| <p>The first state in the six state model for the execution of a command. |
| The transition into this state occurs when the command is enqueued into |
| a command-queue.</p> |
| </dd> |
| <dt class="hdlist1">Ready </dt> |
| <dd> |
| <p>The third state in the six state model for the execution of a command. |
| The transition into this state occurs when pre-requisites constraining |
| execution of a command have been met; i.e. the command has been |
| launched. |
| When a kernel-enqueue command is launched, work-groups associated with |
| the command are placed in a devices work-pool from which they are |
| scheduled for execution.</p> |
| </dd> |
| <dt class="hdlist1">Re-converged Control Flow </dt> |
| <dd> |
| <p>see <em>Control flow</em>.</p> |
| </dd> |
| <dt class="hdlist1">Reference Count </dt> |
| <dd> |
| <p>The life span of an OpenCL object is determined by its <em>reference |
| count</em>, an internal count of the number of references to the object. |
| When you create an object in OpenCL, its <em>reference count</em> is set to |
| one. |
| Subsequent calls to the appropriate <em>retain</em> API (such as |
| <a href="#clRetainContext"><strong>clRetainContext</strong></a>, <a href="#clRetainCommandQueue"><strong>clRetainCommandQueue</strong></a>) increment the <em>reference |
| count</em>. |
| Calls to the appropriate <em>release</em> API (such as <a href="#clReleaseContext"><strong>clReleaseContext</strong></a>, |
| <a href="#clReleaseCommandQueue"><strong>clReleaseCommandQueue</strong></a>) decrement the <em>reference count</em>. |
| Implementations may also modify the <em>reference count</em>, e.g. to track |
| attached objects or to ensure correct operation of in-progress or |
| scheduled activities. |
| The object becomes inaccessible to host code when the number of |
| <em>release</em> operations performed matches the number of <em>retain</em> operations |
| plus the allocation of the object. |
| At this point the reference count may be zero but this is not |
| guaranteed.</p> |
| </dd> |
| <dt class="hdlist1">Relaxed Consistency </dt> |
| <dd> |
| <p>A memory consistency model in which the contents of memory visible to |
| different <em>work-items</em> or <em>commands</em> may be different except at a |
| <em>barrier</em> or other explicit synchronization points.</p> |
| </dd> |
| <dt class="hdlist1">Relaxed Semantics </dt> |
| <dd> |
| <p>A memory order semantics for atomic operations that implies no order |
| constraints. |
| The operation is <em>atomic</em> but it has no impact on the order of memory |
| operations.</p> |
| </dd> |
| <dt class="hdlist1">Release Semantics </dt> |
| <dd> |
| <p>One of the memory order semantics defined for synchronization |
| operations. |
| Release semantics apply to atomic operations that store to memory. |
| Given two units of execution, <strong>A</strong> and <strong>B</strong>, acting on a shared atomic |
| object <strong>M</strong>, if <strong>A</strong> uses an atomic store of <strong>M</strong> with release semantics to |
| synchronize-with an atomic load to <strong>M</strong> by <strong>B</strong> that used acquire |
| semantics, then <strong>A</strong>'s atomic store will occur <em>after</em> any prior |
| operations by <strong>A</strong>. |
| Note that the memory orders <em>acquire</em>, <em>sequentially consistent</em>, and |
| <em>acquire_release</em> all include <em>acquire semantics</em> and effectively pair |
| with a store using release semantics.</p> |
| </dd> |
| <dt class="hdlist1">Remainder work-groups </dt> |
| <dd> |
| <p>When the work-groups associated with a kernel-instance are defined, the |
| sizes of a work-group in each dimension may not evenly divide the size |
| of the NDRange in the corresponding dimensions. |
| The result is a collection of work-groups on the boundaries of the |
| NDRange that are smaller than the base work-group size. |
| These are known as <em>remainder work-groups</em>.</p> |
| </dd> |
| <dt class="hdlist1">Running </dt> |
| <dd> |
| <p>The fourth state in the six state model for the execution of a command. |
| The transition into this state occurs when the execution of the command |
| starts. |
| When a Kernel-enqueue command starts, one or more work-groups associated |
| with the command start to execute.</p> |
| </dd> |
| <dt class="hdlist1">Root device </dt> |
| <dd> |
| <p>A <em>root device</em> is an OpenCL <em>device</em> that has not been partitioned. |
| Also see <em>Device</em>, <em>Parent device</em> and <em>Root device</em>.</p> |
| </dd> |
| <dt class="hdlist1">Resource </dt> |
| <dd> |
| <p>A class of <em>objects</em> defined by OpenCL. |
| An instance of a <em>resource</em> is an <em>object</em>. |
| The most common <em>resources</em> are the <em>context</em>, <em>command-queue</em>, <em>program |
| objects</em>, <em>kernel objects</em>, and <em>memory objects</em>. |
| Computational resources are hardware elements that participate in the |
| action of advancing a program counter. |
| Examples include the <em>host</em>, <em>devices</em>, <em>compute units</em> and <em>processing |
| elements</em>.</p> |
| </dd> |
| <dt class="hdlist1">Retain, Release </dt> |
| <dd> |
| <p>The action of incrementing (retain) and decrementing (release) the |
| reference count using an OpenCL <em>object</em>. |
| This is a book keeping functionality to make sure the system doesn’t |
| remove an <em>object</em> before all instances that use this <em>object</em> have |
| finished. |
| Refer to <em>Reference Count</em>.</p> |
| </dd> |
| <dt class="hdlist1">Sampler </dt> |
| <dd> |
| <p>An <em>object</em> that describes how to sample an image when the image is read |
| in the <em>kernel</em>. |
| The image read functions take a <em>sampler</em> as an argument. |
| The <em>sampler</em> specifies the image addressing-mode i.e. how out-of-range |
| image coordinates are handled, the filter mode, and whether the input |
| image coordinate is a normalized or unnormalized value.</p> |
| </dd> |
| <dt class="hdlist1">Scope inclusion </dt> |
| <dd> |
| <p>Two actions <strong>A</strong> and <strong>B</strong> are defined to have an inclusive scope if they |
| have the same scope <strong>P</strong> such that: (1) if <strong>P</strong> is |
| <strong>memory_scope_sub_group</strong>, and <strong>A</strong> and <strong>B</strong> are executed by work-items |
| within the same sub-group, or (2) if <strong>P</strong> is <strong>memory_scope_work_group</strong>, |
| and <strong>A</strong> and <strong>B</strong> are executed by work-items within the same work-group, |
| or (3) if <strong>P</strong> is <strong>memory_scope_device</strong>, and <strong>A</strong> and <strong>B</strong> are executed by |
| work-items on the same device, or (4) if <strong>P</strong> is |
| <strong>memory_scope_all_svm_devices</strong>, if <strong>A</strong> and <strong>B</strong> are executed by host |
| threads or by work-items on one or more devices that can share SVM |
| memory with each other and the host process.</p> |
| </dd> |
| <dt class="hdlist1">Sequenced before </dt> |
| <dd> |
| <p>A relation between evaluations executed by a single unit of execution. |
| Sequenced-before is an asymmetric, transitive, pair-wise relation that |
| induces a partial order between evaluations. |
| Given any two evaluations A and B, if A is sequenced-before B, then the |
| execution of A shall precede the execution of B.</p> |
| </dd> |
| <dt class="hdlist1">Sequential consistency </dt> |
| <dd> |
| <p>Sequential consistency interleaves the steps executed by each unit of |
| execution. |
| Each access to a memory location sees the last assignment to that |
| location in that interleaving.</p> |
| </dd> |
| <dt class="hdlist1">Sequentially consistent semantics </dt> |
| <dd> |
| <p>One of the memory order semantics defined for synchronization |
| operations. |
| When using sequentially-consistent synchronization operations, the loads |
| and stores within one unit of execution appear to execute in program |
| order (i.e., the sequenced-before order), and loads and stores from |
| different units of execution appear to be simply interleaved.</p> |
| </dd> |
| <dt class="hdlist1">Shared Virtual Memory (SVM) </dt> |
| <dd> |
| <p>An address space exposed to both the host and the devices within a |
| context. |
| SVM causes addresses to be meaningful between the host and all of the |
| devices within a context and therefore supports the use of pointer based |
| data structures in OpenCL kernels. |
| It logically extends a portion of the global memory into the host |
| address space therefore giving work-items access to the host address |
| space. |
| There are three types of SVM in OpenCL:</p> |
| <div class="openblock"> |
| <div class="content"> |
| <div class="dlist"> |
| <dl> |
| <dt class="hdlist1"><em>Coarse-Grained buffer SVM</em> </dt> |
| <dd> |
| <p>Sharing occurs at the granularity of regions of OpenCL buffer memory |
| objects.</p> |
| </dd> |
| <dt class="hdlist1"><em>Fine-Grained buffer SVM</em> </dt> |
| <dd> |
| <p>Sharing occurs at the granularity of individual loads/stores into bytes |
| within OpenCL buffer memory objects.</p> |
| </dd> |
| <dt class="hdlist1"><em>Fine-Grained system SVM</em> </dt> |
| <dd> |
| <p>Sharing occurs at the granularity of individual loads/stores into bytes |
| occurring anywhere within the host memory.</p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| </dd> |
| <dt class="hdlist1">SIMD </dt> |
| <dd> |
| <p>Single Instruction Multiple Data. |
| A programming model where a <em>kernel</em> is executed concurrently on |
| multiple <em>processing elements</em> each with its own data and a shared |
| program counter. |
| All <em>processing elements</em> execute a strictly identical set of |
| instructions.</p> |
| </dd> |
| <dt class="hdlist1">Specialization constants </dt> |
| <dd> |
| <p>Specialization constants are special constant objects that do not |
| have known constant values in an intermediate language (e.g. SPIR-V). |
| Applications may provide updated values for the specialization constants |
| before a program is built. |
| Specialization constants that do not receive a value from an application |
| shall use the default specialization constant value.</p> |
| </dd> |
| <dt class="hdlist1">SPMD </dt> |
| <dd> |
| <p>Single Program Multiple Data. |
| A programming model where a <em>kernel</em> is executed concurrently on |
| multiple <em>processing elements</em> each with its own data and its own |
| program counter. |
| Hence, while all computational resources run the same <em>kernel</em> they |
| maintain their own instruction counter and due to branches in a |
| <em>kernel</em>, the actual sequence of instructions can be quite different |
| across the set of <em>processing elements</em>.</p> |
| </dd> |
| <dt class="hdlist1">Sub-device </dt> |
| <dd> |
| <p>An OpenCL <em>device</em> can be partitioned into multiple <em>sub-devices</em>. |
| The new <em>sub-devices</em> alias specific collections of compute units within |
| the parent <em>device</em>, according to a partition scheme. |
| The <em>sub-devices</em> may be used in any situation that their parent |
| <em>device</em> may be used. |
| Partitioning a <em>device</em> does not destroy the parent <em>device</em>, which may |
| continue to be used along side and intermingled with its child |
| <em>sub-devices</em>. |
| Also see <em>Device</em>, <em>Parent device</em> and <em>Root device</em>.</p> |
| </dd> |
| <dt class="hdlist1">Sub-group </dt> |
| <dd> |
| <p>Sub-groups are an implementation-dependent grouping of work-items within |
| a work-group. |
| The size and number of sub-groups is implementation-defined.</p> |
| </dd> |
| <dt class="hdlist1">Sub-group Barrier </dt> |
| <dd> |
| <p>See <em>Barrier</em>.</p> |
| </dd> |
| <dt class="hdlist1">Submitted </dt> |
| <dd> |
| <p>The second state in the six state model for the execution of a command. |
| The transition into this state occurs when the command is flushed from |
| the command-queue and submitted for execution on the device. |
| Once submitted, a programmer can assume a command will execute once its |
| prerequisites have been met.</p> |
| </dd> |
| <dt class="hdlist1">SVM Buffer </dt> |
| <dd> |
| <p>A memory allocation enabled to work with <em>Shared Virtual Memory (SVM)</em>. |
| Depending on how the SVM buffer is created, it can be a coarse-grained |
| or fine-grained SVM buffer. |
| Optionally it may be wrapped by a <em>Buffer Object</em>. |
| See <em>Shared Virtual Memory (SVM)</em>.</p> |
| </dd> |
| <dt class="hdlist1">Synchronization </dt> |
| <dd> |
| <p>Synchronization refers to mechanisms that constrain the order of |
| execution and the visibility of memory operations between two or more |
| units of execution.</p> |
| </dd> |
| <dt class="hdlist1">Synchronization operations </dt> |
| <dd> |
| <p>Operations that define memory order constraints in a program. |
| They play a special role in controlling how memory operations in one |
| unit of execution (such as work-items or, when using SVM a host thread) |
| are made visible to another. |
| Synchronization operations in OpenCL include <em>atomic operations</em> and |
| <em>fences</em>.</p> |
| </dd> |
| <dt class="hdlist1">Synchronization point </dt> |
| <dd> |
| <p>A synchronization point between a pair of commands (A and B) assures |
| that results of command A happens-before command B is launched (i.e. |
| enters the ready state) .</p> |
| </dd> |
| <dt class="hdlist1">Synchronizes with </dt> |
| <dd> |
| <p>A relation between operations in two different units of execution that |
| defines a memory order constraint in global memory |
| (<em>global-synchronizes-with</em>) or local memory |
| (<em>local-synchronizes-with</em>).</p> |
| </dd> |
| <dt class="hdlist1">Task Parallel Programming Model </dt> |
| <dd> |
| <p>A programming model in which computations are expressed in terms of |
| multiple concurrent tasks executing in one or more <em>command-queues</em>. |
| The concurrent tasks can be running different <em>kernels</em>.</p> |
| </dd> |
| <dt class="hdlist1">Thread-safe </dt> |
| <dd> |
| <p>An OpenCL API call is considered to be <em>thread-safe</em> if the internal |
| state as managed by OpenCL remains consistent when called simultaneously |
| by multiple <em>host</em> threads. |
| OpenCL API calls that are <em>thread-safe</em> allow an application to call |
| these functions in multiple <em>host</em> threads without having to implement |
| mutual exclusion across these <em>host</em> threads i.e. they are also |
| re-entrant-safe.</p> |
| </dd> |
| <dt class="hdlist1">Undefined </dt> |
| <dd> |
| <p>The behavior of an OpenCL API call, built-in function used inside a |
| <em>kernel</em> or execution of a <em>kernel</em> that is explicitly not defined by |
| OpenCL. |
| A conforming implementation is not required to specify what occurs when |
| an undefined construct is encountered in OpenCL.</p> |
| </dd> |
| <dt class="hdlist1">Unit of execution </dt> |
| <dd> |
| <p>A generic term for a process, OS managed thread running on the host (a |
| host-thread), kernel-instance, host program, work-item or any other |
| executable agent that advances the work associated with a program.</p> |
| </dd> |
| <dt class="hdlist1">Work-group </dt> |
| <dd> |
| <p>A collection of related <em>work-items</em> that execute on a single <em>compute |
| unit</em>. |
| The <em>work-items</em> in the group execute the same <em>kernel-instance</em> and |
| share <em>local</em> <em>memory</em> and <em>work-group functions</em>.</p> |
| </dd> |
| <dt class="hdlist1">Work-group Barrier </dt> |
| <dd> |
| <p>See <em>Barrier</em>.</p> |
| </dd> |
| <dt class="hdlist1">Work-group Function </dt> |
| <dd> |
| <p>A function that carries out collective operations across all the |
| work-items in a work-group. |
| Available collective operations are a barrier, reduction, broadcast, |
| prefix sum, and evaluation of a predicate. |
| A work-group function must occur within a <em>converged control flow</em>; i.e. |
| all work-items in the work-group must encounter precisely the same |
| work-group function.</p> |
| </dd> |
| <dt class="hdlist1">Work-group Synchronization </dt> |
| <dd> |
| <p>Constraints on the order of execution for work-items in a single |
| work-group.</p> |
| </dd> |
| <dt class="hdlist1">Work-pool </dt> |
| <dd> |
| <p>A logical pool associated with a device that holds commands and |
| work-groups from kernel-instances that are ready to execute. |
| OpenCL does not constrain the order that commands and work-groups are |
| scheduled for execution from the work-pool; i.e. a programmer must |
| assume that they could be interleaved. |
| There is one work-pool per device used by all command-queues associated |
| with that device. |
| The work-pool may be implemented in any manner as long as it assures |
| that work-groups placed in the pool will eventually execute.</p> |
| </dd> |
| <dt class="hdlist1">Work-item </dt> |
| <dd> |
| <p>One of a collection of parallel executions of a <em>kernel</em> invoked on a |
| <em>device</em> by a <em>command</em>. |
| A <em>work-item</em> is executed by one or more <em>processing elements</em> as part |
| of a <em>work-group</em> executing on a <em>compute unit</em>. |
| A <em>work-item</em> is distinguished from other work-items by its <em>global ID</em> |
| or the combination of its <em>work-group</em> ID and its <em>local ID</em> within a |
| <em>work-group</em>.</p> |
| </dd> |
| </dl> |
| </div> |
| </div> |
| </div> |
| <div class="sect1"> |
| <h2 id="_the_opencl_architecture"><a class="anchor" href="#_the_opencl_architecture"></a>3. The OpenCL Architecture</h2> |
| <div class="sectionbody"> |
| <div class="paragraph"> |
| <p><strong>OpenCL</strong> is an open industry standard for programming a heterogeneous |
| collection of CPUs, GPUs and other discrete computing devices organized into |
| a single platform. |
| It is more than a language. |
| OpenCL is a framework for parallel programming and includes a language, API, |
| libraries and a runtime system to support software development. |
| Using OpenCL, for example, a programmer can write general purpose programs |
| that execute on GPUs without the need to map their algorithms onto a 3D |
| graphics API such as OpenGL or DirectX.</p> |
| </div> |
| <div class="paragraph"> |
| <p>The target of OpenCL is expert programmers wanting to write portable yet |
| efficient code. |
| This includes library writers, middleware vendors, and performance oriented |
| application programmers. |
| Therefore OpenCL provides a low-level hardware abstraction plus a framework |
| to support programming and many details of the underlying hardware are |
| exposed.</p> |
| </div> |
| <div class="paragraph"> |
| <p>To describe the core ideas behind OpenCL, we will use a hierarchy of models:</p> |
| </div> |
| <div class="ulist"> |
| <ul> |
| <li> |
| <p>Platform Model</p> |
| </li> |
| <li> |
| <p>Memory Model</p> |
| </li> |
| <li> |
| <p>Execution Model</p> |
| </li> |
| <li> |
| <p>Programming Model</p> |
| </li> |
| </ul> |
| </div> |
| <div class="sect2"> |
| <h3 id="_platform_model"><a class="anchor" href="#_platform_model"></a>3.1. Platform Model</h3> |
| <div class="paragraph"> |
| <p>The <a href="#platform-model-image">Platform model</a> for OpenCL is defined below. |
| The model consists of a <strong>host</strong> connected to one or more <strong>OpenCL devices</strong>. |
| An OpenCL device is divided into one or more <strong>compute units</strong> (CUs) which are |
| further divided into one or more <strong>processing elements</strong> (PEs). |
| Computations on a device occur within the processing elements.</p> |
| </div> |
| <div class="paragraph"> |
| <p>An OpenCL application is implemented as both host code and device kernel |
| code. |
| The host code portion of an OpenCL application runs on a host processor |
| according to the models native to the host platform. |
| The OpenCL application host code submits the kernel code as commands from |
| the host to OpenCL devices. |
| An OpenCL device executes the commands computation on the processing |
| elements within the device.</p> |
| </div> |
| <div class="paragraph"> |
| <p>An OpenCL device has considerable latitude on how computations are mapped |
| onto the devices processing elements. |
| When processing elements within a compute unit execute the same sequence of |
| statements across the processing elements, the control flow is said to be |
| <em>converged</em>. |
| Hardware optimized for executing a single stream of instructions over |
| multiple processing elements is well suited to converged control flows. |
| When the control flow varies from one processing element to another, it is |
| said to be <em>diverged</em>. |
| While a kernel always begins execution with a converged control flow, due to |
| branching statements within a kernel, converged and diverged control flows |
| may occur within a single kernel. |
| This provides a great deal of flexibility in the algorithms that can be |
| implemented with OpenCL.</p> |
| </div> |
| <div id="platform-model-image" class="imageblock text-center"> |
| <div class="content"> |
| <img src="" alt="platform model"> |
| </div> |
| <div class="title">Figure 1. Platform Model …​ one host plus one or more compute devices each with one or more compute units composed of one or more processing elements.</div> |
| </div> |
| <div class="paragraph"> |
| <p>Programmers may provide programs in the form of OpenCL C source strings, |
| the SPIR-V intermediate language, or as implementation-defined binary objects. |
| An OpenCL platform provides a compiler to translate programs of these |
| forms into executable program objects. |
| The device code compiler may be <em>online</em> or <em>offline</em>. |
| An <em>online</em> <em>compiler</em> is available during host program execution using |
| standard APIs. |
| An <em>offline compiler</em> is invoked outside of host program control, using |
| platform-specific methods. |
| The OpenCL runtime allows developers to get a previously compiled device |
| program executable and be able to load and execute a previously compiled |
| device program executable.</p> |
| </div> |
| <div class="paragraph"> |
| <p>OpenCL defines two kinds of platform profiles: a <em>full profile</em> and a |
| reduced-functionality <em>embedded profile</em>. |
| A full profile platform must provide an online compiler for all its devices. |
| An embedded platform may provide an online compiler, but is not required to |
| do so.</p> |
| </div> |
| <div class="paragraph"> |
| <p>A device may expose special purpose functionality as a <em>built-in kernel</em>. |
| The platform provides APIs for enumerating and invoking the built-in |
| kernels offered by a device, but otherwise does not define their |
| construction or semantics. |