| Name Strings |
| |
| cl_altera_compiler_mode |
| |
| Contributors |
| |
| David Neto, Altera Corporation |
| |
| Contact |
| |
| Michael Kinsner, mkinsner 'at' altera 'dot' com |
| |
| IP Status |
| |
| No known IP claims. |
| |
| Version |
| |
| Version 1, 2014-02-06 |
| |
| Number |
| |
| OpenCL Extension #30 |
| |
| Status |
| |
| Complete. |
| Shipping with the Altera SDK for OpenCL, version 14.0 |
| |
| An earlier version shipped with the Altera SDK for OpenCL, version 13.0 |
| |
| Extension Type |
| |
| OpenCL platform extension |
| |
| Dependencies |
| |
| OpenCL 1.0 is required. |
| This document is written against revision 48 of the OpenCL 1.0 specification. |
| |
| Overview |
| |
| This extension specifies alternative device code compilation flows |
| for OpenCL. |
| |
| In standard OpenCL, there are two methods for preparing code for |
| execution on an OpenCL device. The first is to compile OpenCL C code |
| from source text using an online compiler. The second is to load a |
| precompiled device binary. |
| |
| This extension specifies mechanisms to enable alternative workflows for |
| compiling and loading device code. They enable simplified or optimized |
| support for the following development and deployment scenarios: |
| |
| 1. The target OpenCL device is not available in the development |
| environment. |
| |
| 2. The platform does not provide an online compiler for the target |
| OpenCL device, or full online compilation is not desirable during |
| host application development. |
| |
| 3. The application only requires one device program, and does not |
| need read access to the device binary. |
| |
| The development and deployment of a particular application may involve |
| more than one of these scenarios. |
| |
| The mechanisms specified by this extension are: |
| |
| 1. An "offline device" feature: This is the ability to specify that |
| the OpenCL runtime should partially emulate the presence a device. |
| From the perspective of the host program, emulation should be |
| complete, except that kernel code may have no effect. For example, |
| kernels are enqueued, buffers are copied or mapped as required, |
| and event profiling information is updated to reflect command |
| progression. However side effects produced by running the code |
| inside kernels may not be visible. |
| |
| 2. A "create program executable library" compiler mode: |
| The clBuildProgram API method does not fully compile device code. |
| Instead, it performs an online stub compilation, and saves enough |
| information in a data store to perform a deferred full (offline) |
| compilation. |
| |
| 2a. The stub compilation produces a binary which includes enough |
| kernel interface information to satisfy further host activities |
| such as creating cl_kernel objects, setting kernel arguments, |
| and enqueueing kernels for execution. But the stub binary |
| may not have executable code, so the execution of kernels from |
| such a binary may not produce valid data. |
| |
| 2b. The data store is used to perform offline compilation of |
| kernel programs. Each logically distinct call of clBuildProgram |
| generates an entry in the data store. Each entry is a tuple |
| consisting of: |
| |
| - The target device |
| |
| - The kernel program source |
| |
| - The options supplied to clBuildProgram |
| |
| - A set of instructions (such as a script) for offline |
| compilation of the kernel program. The result of executing |
| these instructions is to associate a device binary (including |
| executable device code) with this tuple in the data store. |
| |
| 3. A "use program executable library" mode: We assume that an |
| offline compilation has been performed for all entries in the |
| data store generated by use of the "create program executable |
| library" mode. In the "use program exectuable library" mode, |
| the clBuildProgram API method performs a lookup in the data store |
| based on the following attributes: |
| |
| - The target device |
| |
| - The kernel program source |
| |
| - The options supplied to clBuildProgram |
| |
| If such an entry exists in the data store, then the associated |
| binary is loaded into the cl_program object. If no such entry |
| exists, or if the entry is not associated with a device binary, |
| then an error is returned. |
| |
| 4. A "preloaded binary only" compiler mode: In this mode, the |
| runtime does not compile any programs, does not load the device |
| with new code at runtime, and does not necessarily produce a valid |
| device binary when requested by the CL_PROGRAM_BINARIES query to |
| the clGetProgramInfo API method. Instead: |
| |
| - The device is assumed to already have a program loaded |
| before the cl_context is created. |
| |
| - All cl_program objects behave as if they refer to the preloaded |
| binary |
| |
| - The clBuildProgram API method is a no-op except for |
| setting a successful build status. |
| |
| The following paragraphs describe the benefits provided by the |
| mechanisms specified in this extension. |
| |
| Scenario 1: Developing the host part of an OpenCL application without |
| a device being present: |
| |
| The offline device feature enables the execution of a host program |
| even while a device is not present in the system. This is |
| useful for developing an application before a device has been |
| manufactured, is otherwise scarce, or unavailable. Although the |
| data coming back from the emulated device is invalid, host code |
| interaction with the runtime may be developed and tested. |
| |
| Scenario 2: Porting an existing application to a platform without an |
| online compiler: |
| |
| Many OpenCL platforms provide an online OpenCL C compiler |
| for each device. Therefore host applications often use |
| clCreateProgramWithSource and clBuildProgram to create |
| executable device code. The standard method of porting such |
| a program to an environment without an online compiler is to |
| replace the compile-from-source sequence of API calls with a |
| find-and-load-from-binary code sequence. Transforming the host |
| application in this way introduces platform dependencies (such |
| as APIs to access a filesystem) and adds bookkeeping complexity |
| to map the original OpenCL C source and compile options with |
| an associated device binary. The porting and verification effort |
| can be quite onerous if many portions of the host code must be |
| modified, or if the device source code is parameterized at runtime. |
| |
| We can reduce porting effort in this scenario by using a three-phase |
| approach. |
| |
| In the first phase, we run the host program with the runtime |
| configured to use offline device mode, and also the "create program |
| executable libary" compiler mode. This generates a data store |
| (the program executable library) containing enough information |
| to compile the required device programs in an offline manner, |
| i.e. outside the control of the host program. |
| |
| In the second phase, we perform an offline compilation of all |
| entries in the data store. This is done by enumerating the entries |
| in the data store and following the compilation instructions for |
| each entry. |
| |
| In the third phase, the application is fully functional. The host |
| is configured to use "use program executable library" compiler mode. |
| Any API request to compile a device program from source is translated |
| into a lookup of the (fully functional) device binary in the data store. |
| |
| This approach works when: |
| |
| - The device has a stable name. That is, the device name does not |
| change from one run to another. |
| |
| - The host program calls clBuildProgram with only a finite |
| set of combinations of device, kernel source, and build |
| options. |
| |
| - Those combinations are consistent from one run to the next. |
| That is, the kernel source and build options do not depend |
| on the data results from executing a previous kernel. |
| |
| - Other context upon which the compilation depends is stable. |
| For example, the location and contents of files included |
| from the OpenCL C source remains the same. |
| |
| Scenario 3: An application needs only one cl_program, runs in a |
| constrained environment, and may need "instant on" behaviour: |
| |
| The "preloaded binary only" compiler mode is desiged to be used |
| by embedded applications. Such applications typically have |
| the following constraints: |
| |
| - The application is simple enough that only one cl_program |
| is required. |
| |
| - The target system has tight memory and time constraints. |
| It may be infeasible or undesirable to process a full device |
| binary through the standard OpenCL runtime APIs. For example, |
| the device binary might be too large to map into host memory, |
| or the application cannot tolerate the delay to load the |
| device binary from external storage. |
| |
| - The application has no requirement to recover the device |
| binary via the CL_PROGRAM_BINARIES query to clGetProgramInfo. |
| Even moreso, it may be undesirable for intellectual property |
| reasons or otherwise to allow the host program to recover |
| the device binary. |
| |
| - It is desirable to develop the application in the normal way, |
| but to minimize host program code changes before deployment. |
| That is, application development uses the standard |
| clCreateProgramWithSource, or clCreateProgramWithBinary APIs. |
| But the transition to a test or production environment |
| is limited to a configuration change, as opposed to a code |
| change. |
| |
| Header File |
| |
| Interface constants are defined in cl_ext.h |
| |
| New Tokens |
| |
| New context properties to be used in the <properties> array |
| supplied to to clCreateContext or clCreateContextFromType: |
| |
| CL_CONTEXT_COMPILER_MODE_ALTERA 0x40F0 |
| CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA 0x40F1 |
| CL_CONTEXT_OFFLINE_DEVICE_ALTERA 0x40F2 |
| |
| Values to be supplied for context property |
| CL_CONTEXT_COMPILER_MODE_ALTERA: |
| |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA 0 |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA 1 |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA 2 |
| CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA 3 |
| |
| |
| |
| Additions to Chapter 4 of the OpenCL 1.0 (v48) Specification |
| |
| Add the following to Table 4.4, List of supported properties by |
| clCreateContext: |
| |
| |
| cl_context_properties enum: |
| CL_CONTEXT_OFFLINE_DEVICE_ALTERA |
| Property value: |
| const char* |
| Description: |
| Specifies that the runtime should partly emulate the presence |
| of the named device. The device should behave normally except |
| that kernels executed on such an "offline" device might |
| not produce any side effects. |
| |
| The specified name value should be the first word in the full |
| name returned by the CL_DEVICE_NAME, in C-style form. |
| |
| An implementation may restrict the use of this property as |
| follows: |
| |
| - A platform may require external initialization when using |
| this context property. For example, Altera's |
| platform implementations require that environment variable |
| CL_CONTEXT_OFFLINE_DEVICE_ALTERA be set to the same device |
| name string as supplied to this property. |
| |
| - When this property is specified, the specified device |
| may be the only available device in the platform. |
| |
| This property may be useful for developing or porting |
| applications when no online compiler is available, and |
| when the CL_CONTEXT_COMPILER_MODE_ALTERA property is set to |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA. |
| |
| |
| cl_context_properties enum: |
| CL_CONTEXT_COMPILER_MODE_ALTERA |
| Property value: |
| cl_ulong |
| Description: |
| For devices without an online compiler, this property |
| specifies alternative behaviour for building, loading, |
| and querying device programs. |
| (Platforms implementing the embedded profile are the |
| only ones that may omit providing an online compiler.) |
| |
| The value should be one of the following: |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA, |
| CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA, |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA, |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA. |
| |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA - The clBuildProgram |
| and clCreateProgramWithBinary, and clGetProgramInfo |
| API methods behave in the standard way. |
| In particular, clBuildProgram may fail with error |
| CL_COMPILER_NOT_AVAILABLE. |
| This value is the default, and is the only value |
| which specifies conformant behaviour. |
| |
| CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA - The device |
| is assumed to already have a program loaded before |
| the first OpenCL APIs are invoked. |
| All cl_program objects behave as if they refer to the |
| preloaded device binary. |
| The clBuildProgram API method always succeeds when compiling |
| for this device, but does not produce an executable |
| device binary. |
| The clCreateProgramWithBinary does not validate |
| the provided binary. It always succeeds provided its |
| arguments are well-formed. |
| The data returned by the CL_PROGRAM_BINARIES query |
| for clGetProgramInfo is unspecified. It may not be |
| a valid device binary. |
| |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA - |
| When compiling a program created with |
| clCreateProgramWithSource, the clBuildProgram method |
| performs a "stub" compilation, and populates a data store |
| with enough information to perform a program compilation |
| offline (i.e. outside the control of the host program). |
| |
| The online stub compilation produces a binary which includes |
| enough information to satsify further runtime API activities |
| such as creating cl_kernel objects, setting kernel arguments, |
| and enqueing kernels for execution. The stub binary may not |
| have executable code, so the execution of kernels from such |
| a binary may not produce valid data. |
| |
| This value for CL_CONTEXT_COMPILER_MODE_ALTERA is designed |
| to be used with the CL_CONTEXT_OFFLINE_DEVICE_ALTERA context |
| property. |
| |
| Additionally, each logically distinct call of clBuildProgram |
| generates an entry in an external data store which persists |
| beyond the lifetime of the host program. Each entry is a |
| tuple consisting of: |
| |
| - The target device |
| |
| - The kernel program source |
| |
| - The options supplied to clBuildProgram |
| |
| - A set of instructions (such as a script) for offline |
| compilation of the kernel program. The result of executing |
| these instructions is to associate a device binary (including |
| executable device code) with this tuple in the data store. |
| |
| In Altera's platform implementation, the data store is a tree of |
| directories and files. Each data store entry is a leaf |
| directory in this tree containing: |
| - a file named "kernels.cl" containing the kernel program |
| source |
| - a file named "build.cmd" containing the list of operating |
| system commands to be used to compile the program to a |
| device binary, and to store that device binary in a file |
| named "kernels.aocx" in the same directory. |
| |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA - |
| When compiling a program created with clCreateProgramWithSoruce, |
| the clBuildProgram API method performs a lookup in the data store |
| based on the following attributes: |
| |
| - The target device |
| |
| - The kernel program source |
| |
| - The options supplied to clBuildProgram |
| |
| If such an entry exists in the data store, then the associated |
| binary is loaded into the cl_program. If no such entry exists, |
| or if the entry is not associated with a device binary, then an |
| error is returned. |
| |
| |
| cl_context_properties enum: |
| CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA |
| Property value: |
| const char* |
| Description: |
| Specifies the filesystem root directory |
| for the data store used when either value |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA or value |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA is specified |
| for context property CL_CONTEXT_COMPILER_MODE_ALTERA. |
| |
| The specified value may be a relative directory name, but |
| will be resolved to an absolute path at context creation time. |
| |
| If this property is left unspecified, then data store root |
| is implementation-defined. |
| |
| In Altera's platform implementation, the default is the |
| "aocl_program_library" subdirectory in the current directory |
| in effect at context creation time. |
| |
| |
| |
| Additions to Chapter 5 of the OpenCL 1.0 (v48) Specification |
| |
| Additions to Section 5.4.1 Creating Program Objects |
| |
| When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is |
| specified with value CL_CONTEXT_COMPILER_PRELOADED_BINARY_ONLY_ALTERA, |
| the clCreateProgramWithBinary API method behaves in a non-standard |
| way. See the description of this property value in Table 4.4 for |
| more information. |
| |
| Additions to Section 5.4.2 Building Program Exectuables |
| |
| When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is |
| specified, the clBuildProgram API method behaves in a non-standard |
| way when compiling for devices without an online compiler. |
| See the description of this property value in Table 4.4 for more |
| information. |
| |
| Additions to Section 5.4.5 Program Object Queries |
| |
| The following is added to the description of the |
| CL_PROGRAM_BINARIES query in Table 5.11: |
| |
| When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA |
| is specified with either value |
| CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA |
| or value CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA, |
| then the binary returned by this query may not be a valid device |
| binary. |
| See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA |
| property value in Table 4.4 for more information. |
| |
| Additions to Section 5.6 Executing Kernels |
| |
| When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA |
| is specified with value |
| CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA, |
| or if context property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is specified, |
| then a kernel execution may behave as if the kernel code |
| produces no side effects, i.e. as if the body of the kernel |
| contains no statements. |
| (All other runtime activites must occur. For example, |
| the kernels are still enqueued, buffers are copied or mapped |
| as required to satisfy kernel arguments, and event profiling |
| information is updated to reflect command progression.) |
| See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA |
| property value in Table 4.4 for more information. |
| |
| |
| Implementation Notes |
| |
| For each context property defined in this extension specification, |
| Altera's platform implementation allows the property to be specified |
| by setting an environment variable of the same name. |
| |
| For CL_CONTEXT_OFFLINE_DEVICE_ALTERA, the environment variable |
| of *must* be set before any platform APIs are called. |
| |
| For CL_CONTEXT_COMPILER_MODE_ALTERA, value of the environment variable |
| is the numerical value for the corresponding value enum as defined |
| in the CL/cl_ext.h header file. |
| |
| Precedence: For context properties CL_CONTEXT_COMPILER_MODE_ALTERA and |
| CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA, any value explicitly |
| provided to the clCreateContext or clCreateContextFromType |
| API methods will override an environment variable setting. |
| |
| If property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is explicitly |
| provided to the clCreateCOTNEXT or clCreateContextFromType |
| methods, then it must match the value set for environment variable |
| CL_CONTEXT_OFFLINE_DEVICE_ALTERA. |
| |
| |
| Issues |
| |
| 1. The use of context property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is |
| rather restricted: |
| |
| - It must be set at the "beginning of time" for a host application, |
| e.g. via an environment variable setting. |
| |
| - If an "offline device" is used in a context, then online |
| devices cannot be used in any context for the platform. |
| |
| In this light, it might be more natural to call it a "platform" |
| property. But in OpenCL platforms are stateless, so this would |
| not be appropriate. Instead the most basic runtime controls are |
| applied to contexts, not platforms. |
| |
| These restrictions reflect limitations of Altera's original |
| implementation. However, the feature in its current form is still |
| quite useful for application development and porting. |
| |
| |
| Sample Code |
| |
| Example for using an "offline device", with creation and initialization of |
| a data store for offline compilation of program binaries. |
| |
| #include <CL/opencl.h> |
| #include <stdlib.h> |
| #include <assert.h> |
| #define CHECK(X) assert(CL_SUCCESS == (X)) |
| |
| int main(...) { |
| cl_platform platform = 0; |
| cl_device_id device = 0; |
| cl_context context = 0; |
| cl_int status = 0; |
| |
| // Specify an offline device via environment variable here, or externally |
| // before program startup. |
| // This must occur befor the first OpenCL API method call. |
| setenv("CL_CONTEXT_OFFLINE_DEVICE_ALTERA","mydevice"); |
| |
| CHECK( clGetPlatformIDs(1,&platform,0) ); |
| CHECK( clGetDeviceIDs(platform,CL_DEVICE_TYPE_ACCELERATOR,1,&device,0) ); |
| |
| cl_context_properties props[] = { |
| |
| // Specify creation and initialization of a program library data store. |
| CL_CONTEXT_COMPILER_MODE_ALTERA, |
| (cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA |
| |
| // Where is the data store rooted? |
| CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA |
| (cl_context_properties)"/data/myproject/proglib" |
| |
| // Terminate the properties list. |
| 0 |
| }; |
| context = clCreateContext( props, 1, &device, 0, 0, &status ); |
| CHECK( status ); |
| |
| cl_command_queue cq = clCreateCommandQueue( context, device, 0, &status ); |
| CHECK( status ); |
| |
| const char* source = "kernel void foo( global int* A ) { *A = 42; }"; |
| cl_program program = clCreateProgramWithSource( context, 1, &source, NULL, &status ); |
| CHECK( status ); |
| |
| // Perform a stub compilation, and create an entry for this (device,source,options) |
| // combination in the program library data store. |
| CHECK( clBuildProgram( program, 1, &device, "-cl-opt-disable", 0, 0 ) ); |
| |
| cl_mem mem = clCreateBuffer(context,CL_MEM_READ_WRITE,sizeof(cl_int),0,&status); |
| CHECK( status ); |
| |
| // Only a stub compilation has been performed. |
| // But this still enables full setup and enqueue of a kernel for execution. |
| cl_kernel kernel = clCreateKernel( program, "foo", &status ); |
| CHECK( status ); |
| CHECK( clSetKernelArg( kernel, 0, sizeof(cl_mem), &mem ) ); |
| |
| // Expect that kernels scheduled by clEnqueueNDRangeKernel and |
| // clEnqueueTask may not produce expected side effects. |
| // It may appear that each kernel executes no statements. |
| |
| CHECK( clEnqueueTask( cq, kernel, 0, 0, 0 ) ); |
| cl_int the_answer = 0; |
| CHECK( clEnqueueReadBuffer( cq, mem, 1 /*block*/, 0, sizeof(the_answer),&the_answer,0,0,0) ); |
| |
| // Because we're using an "offline device", we can't rely on the_answer being 42. |
| |
| // ... |
| |
| |
| |
| Example for using an already-populated data store of program binaries compiled |
| in an offline manner. The code is the same as the previous example, except |
| use these context properties instead: |
| |
| cl_context_properties props[] = { |
| |
| // Specify the use of a program data store library of pre-compiled device binaries. |
| CL_CONTEXT_COMPILER_MODE_ALTERA, |
| (cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA |
| |
| // Where is the data store rooted? |
| CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA |
| (cl_context_properties)"/data/myproject/proglib" |
| |
| // Terminate the properties list. |
| 0 |
| }; |
| |
| |
| |
| Example for using a "preloaded binary": |
| |
| cl_context_properties props[] = { |
| |
| // Specify the use of a preloaded binary. |
| CL_CONTEXT_COMPILER_MODE_ALTERA, |
| (cl_context_properties)CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA, |
| |
| // Terminate the properties list. |
| 0 |
| }; |
| context = clCreateContext( props, 1, &device, 0, 0, &status ); |
| |
| ... |
| |
| // All cl_program objects refer to the preloaded binary. |
| // clCreateProgramWithBinary accepts invalid device binaries. |
| // clBuildProgram succeeds but does not actually compile the program from source. |
| |
| |
| Conformance Tests |
| |
| None. |
| |
| Revision History |
| |
| Version 0, 2013-05-06 - Initial revision. |
| Documents the behaviours in the Altera SDK for OpenCL, version 13.0. |
| |
| Version 1, 2014-02-06 - Assigned registered values to tokens. |
| |