blob: 22cf540acebfc210b764fdd22e3b6e979330ce6e [file] [log] [blame]
Name Strings
cl_altera_compiler_mode
Contributors
David Neto, Altera Corporation
Contact
Michael Kinsner, mkinsner 'at' altera 'dot' com
IP Status
No known IP claims.
Version
Version 1, 2014-02-06
Number
OpenCL Extension #30
Status
Complete.
Shipping with the Altera SDK for OpenCL, version 14.0
An earlier version shipped with the Altera SDK for OpenCL, version 13.0
Extension Type
OpenCL platform extension
Dependencies
OpenCL 1.0 is required.
This document is written against revision 48 of the OpenCL 1.0 specification.
Overview
This extension specifies alternative device code compilation flows
for OpenCL.
In standard OpenCL, there are two methods for preparing code for
execution on an OpenCL device. The first is to compile OpenCL C code
from source text using an online compiler. The second is to load a
precompiled device binary.
This extension specifies mechanisms to enable alternative workflows for
compiling and loading device code. They enable simplified or optimized
support for the following development and deployment scenarios:
1. The target OpenCL device is not available in the development
environment.
2. The platform does not provide an online compiler for the target
OpenCL device, or full online compilation is not desirable during
host application development.
3. The application only requires one device program, and does not
need read access to the device binary.
The development and deployment of a particular application may involve
more than one of these scenarios.
The mechanisms specified by this extension are:
1. An "offline device" feature: This is the ability to specify that
the OpenCL runtime should partially emulate the presence a device.
From the perspective of the host program, emulation should be
complete, except that kernel code may have no effect. For example,
kernels are enqueued, buffers are copied or mapped as required,
and event profiling information is updated to reflect command
progression. However side effects produced by running the code
inside kernels may not be visible.
2. A "create program executable library" compiler mode:
The clBuildProgram API method does not fully compile device code.
Instead, it performs an online stub compilation, and saves enough
information in a data store to perform a deferred full (offline)
compilation.
2a. The stub compilation produces a binary which includes enough
kernel interface information to satisfy further host activities
such as creating cl_kernel objects, setting kernel arguments,
and enqueueing kernels for execution. But the stub binary
may not have executable code, so the execution of kernels from
such a binary may not produce valid data.
2b. The data store is used to perform offline compilation of
kernel programs. Each logically distinct call of clBuildProgram
generates an entry in the data store. Each entry is a tuple
consisting of:
- The target device
- The kernel program source
- The options supplied to clBuildProgram
- A set of instructions (such as a script) for offline
compilation of the kernel program. The result of executing
these instructions is to associate a device binary (including
executable device code) with this tuple in the data store.
3. A "use program executable library" mode: We assume that an
offline compilation has been performed for all entries in the
data store generated by use of the "create program executable
library" mode. In the "use program exectuable library" mode,
the clBuildProgram API method performs a lookup in the data store
based on the following attributes:
- The target device
- The kernel program source
- The options supplied to clBuildProgram
If such an entry exists in the data store, then the associated
binary is loaded into the cl_program object. If no such entry
exists, or if the entry is not associated with a device binary,
then an error is returned.
4. A "preloaded binary only" compiler mode: In this mode, the
runtime does not compile any programs, does not load the device
with new code at runtime, and does not necessarily produce a valid
device binary when requested by the CL_PROGRAM_BINARIES query to
the clGetProgramInfo API method. Instead:
- The device is assumed to already have a program loaded
before the cl_context is created.
- All cl_program objects behave as if they refer to the preloaded
binary
- The clBuildProgram API method is a no-op except for
setting a successful build status.
The following paragraphs describe the benefits provided by the
mechanisms specified in this extension.
Scenario 1: Developing the host part of an OpenCL application without
a device being present:
The offline device feature enables the execution of a host program
even while a device is not present in the system. This is
useful for developing an application before a device has been
manufactured, is otherwise scarce, or unavailable. Although the
data coming back from the emulated device is invalid, host code
interaction with the runtime may be developed and tested.
Scenario 2: Porting an existing application to a platform without an
online compiler:
Many OpenCL platforms provide an online OpenCL C compiler
for each device. Therefore host applications often use
clCreateProgramWithSource and clBuildProgram to create
executable device code. The standard method of porting such
a program to an environment without an online compiler is to
replace the compile-from-source sequence of API calls with a
find-and-load-from-binary code sequence. Transforming the host
application in this way introduces platform dependencies (such
as APIs to access a filesystem) and adds bookkeeping complexity
to map the original OpenCL C source and compile options with
an associated device binary. The porting and verification effort
can be quite onerous if many portions of the host code must be
modified, or if the device source code is parameterized at runtime.
We can reduce porting effort in this scenario by using a three-phase
approach.
In the first phase, we run the host program with the runtime
configured to use offline device mode, and also the "create program
executable libary" compiler mode. This generates a data store
(the program executable library) containing enough information
to compile the required device programs in an offline manner,
i.e. outside the control of the host program.
In the second phase, we perform an offline compilation of all
entries in the data store. This is done by enumerating the entries
in the data store and following the compilation instructions for
each entry.
In the third phase, the application is fully functional. The host
is configured to use "use program executable library" compiler mode.
Any API request to compile a device program from source is translated
into a lookup of the (fully functional) device binary in the data store.
This approach works when:
- The device has a stable name. That is, the device name does not
change from one run to another.
- The host program calls clBuildProgram with only a finite
set of combinations of device, kernel source, and build
options.
- Those combinations are consistent from one run to the next.
That is, the kernel source and build options do not depend
on the data results from executing a previous kernel.
- Other context upon which the compilation depends is stable.
For example, the location and contents of files included
from the OpenCL C source remains the same.
Scenario 3: An application needs only one cl_program, runs in a
constrained environment, and may need "instant on" behaviour:
The "preloaded binary only" compiler mode is desiged to be used
by embedded applications. Such applications typically have
the following constraints:
- The application is simple enough that only one cl_program
is required.
- The target system has tight memory and time constraints.
It may be infeasible or undesirable to process a full device
binary through the standard OpenCL runtime APIs. For example,
the device binary might be too large to map into host memory,
or the application cannot tolerate the delay to load the
device binary from external storage.
- The application has no requirement to recover the device
binary via the CL_PROGRAM_BINARIES query to clGetProgramInfo.
Even moreso, it may be undesirable for intellectual property
reasons or otherwise to allow the host program to recover
the device binary.
- It is desirable to develop the application in the normal way,
but to minimize host program code changes before deployment.
That is, application development uses the standard
clCreateProgramWithSource, or clCreateProgramWithBinary APIs.
But the transition to a test or production environment
is limited to a configuration change, as opposed to a code
change.
Header File
Interface constants are defined in cl_ext.h
New Tokens
New context properties to be used in the <properties> array
supplied to to clCreateContext or clCreateContextFromType:
CL_CONTEXT_COMPILER_MODE_ALTERA 0x40F0
CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA 0x40F1
CL_CONTEXT_OFFLINE_DEVICE_ALTERA 0x40F2
Values to be supplied for context property
CL_CONTEXT_COMPILER_MODE_ALTERA:
CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA 0
CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA 1
CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA 2
CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA 3
Additions to Chapter 4 of the OpenCL 1.0 (v48) Specification
Add the following to Table 4.4, List of supported properties by
clCreateContext:
cl_context_properties enum:
CL_CONTEXT_OFFLINE_DEVICE_ALTERA
Property value:
const char*
Description:
Specifies that the runtime should partly emulate the presence
of the named device. The device should behave normally except
that kernels executed on such an "offline" device might
not produce any side effects.
The specified name value should be the first word in the full
name returned by the CL_DEVICE_NAME, in C-style form.
An implementation may restrict the use of this property as
follows:
- A platform may require external initialization when using
this context property. For example, Altera's
platform implementations require that environment variable
CL_CONTEXT_OFFLINE_DEVICE_ALTERA be set to the same device
name string as supplied to this property.
- When this property is specified, the specified device
may be the only available device in the platform.
This property may be useful for developing or porting
applications when no online compiler is available, and
when the CL_CONTEXT_COMPILER_MODE_ALTERA property is set to
CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA.
cl_context_properties enum:
CL_CONTEXT_COMPILER_MODE_ALTERA
Property value:
cl_ulong
Description:
For devices without an online compiler, this property
specifies alternative behaviour for building, loading,
and querying device programs.
(Platforms implementing the embedded profile are the
only ones that may omit providing an online compiler.)
The value should be one of the following:
CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA,
CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA,
CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA,
CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA.
CL_CONTEXT_COMPILER_MODE_OFFLINE_ALTERA - The clBuildProgram
and clCreateProgramWithBinary, and clGetProgramInfo
API methods behave in the standard way.
In particular, clBuildProgram may fail with error
CL_COMPILER_NOT_AVAILABLE.
This value is the default, and is the only value
which specifies conformant behaviour.
CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA - The device
is assumed to already have a program loaded before
the first OpenCL APIs are invoked.
All cl_program objects behave as if they refer to the
preloaded device binary.
The clBuildProgram API method always succeeds when compiling
for this device, but does not produce an executable
device binary.
The clCreateProgramWithBinary does not validate
the provided binary. It always succeeds provided its
arguments are well-formed.
The data returned by the CL_PROGRAM_BINARIES query
for clGetProgramInfo is unspecified. It may not be
a valid device binary.
CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA -
When compiling a program created with
clCreateProgramWithSource, the clBuildProgram method
performs a "stub" compilation, and populates a data store
with enough information to perform a program compilation
offline (i.e. outside the control of the host program).
The online stub compilation produces a binary which includes
enough information to satsify further runtime API activities
such as creating cl_kernel objects, setting kernel arguments,
and enqueing kernels for execution. The stub binary may not
have executable code, so the execution of kernels from such
a binary may not produce valid data.
This value for CL_CONTEXT_COMPILER_MODE_ALTERA is designed
to be used with the CL_CONTEXT_OFFLINE_DEVICE_ALTERA context
property.
Additionally, each logically distinct call of clBuildProgram
generates an entry in an external data store which persists
beyond the lifetime of the host program. Each entry is a
tuple consisting of:
- The target device
- The kernel program source
- The options supplied to clBuildProgram
- A set of instructions (such as a script) for offline
compilation of the kernel program. The result of executing
these instructions is to associate a device binary (including
executable device code) with this tuple in the data store.
In Altera's platform implementation, the data store is a tree of
directories and files. Each data store entry is a leaf
directory in this tree containing:
- a file named "kernels.cl" containing the kernel program
source
- a file named "build.cmd" containing the list of operating
system commands to be used to compile the program to a
device binary, and to store that device binary in a file
named "kernels.aocx" in the same directory.
CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA -
When compiling a program created with clCreateProgramWithSoruce,
the clBuildProgram API method performs a lookup in the data store
based on the following attributes:
- The target device
- The kernel program source
- The options supplied to clBuildProgram
If such an entry exists in the data store, then the associated
binary is loaded into the cl_program. If no such entry exists,
or if the entry is not associated with a device binary, then an
error is returned.
cl_context_properties enum:
CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA
Property value:
const char*
Description:
Specifies the filesystem root directory
for the data store used when either value
CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA or value
CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA is specified
for context property CL_CONTEXT_COMPILER_MODE_ALTERA.
The specified value may be a relative directory name, but
will be resolved to an absolute path at context creation time.
If this property is left unspecified, then data store root
is implementation-defined.
In Altera's platform implementation, the default is the
"aocl_program_library" subdirectory in the current directory
in effect at context creation time.
Additions to Chapter 5 of the OpenCL 1.0 (v48) Specification
Additions to Section 5.4.1 Creating Program Objects
When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is
specified with value CL_CONTEXT_COMPILER_PRELOADED_BINARY_ONLY_ALTERA,
the clCreateProgramWithBinary API method behaves in a non-standard
way. See the description of this property value in Table 4.4 for
more information.
Additions to Section 5.4.2 Building Program Exectuables
When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA is
specified, the clBuildProgram API method behaves in a non-standard
way when compiling for devices without an online compiler.
See the description of this property value in Table 4.4 for more
information.
Additions to Section 5.4.5 Program Object Queries
The following is added to the description of the
CL_PROGRAM_BINARIES query in Table 5.11:
When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
is specified with either value
CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA
or value CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA,
then the binary returned by this query may not be a valid device
binary.
See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
property value in Table 4.4 for more information.
Additions to Section 5.6 Executing Kernels
When context property CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
is specified with value
CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA,
or if context property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is specified,
then a kernel execution may behave as if the kernel code
produces no side effects, i.e. as if the body of the kernel
contains no statements.
(All other runtime activites must occur. For example,
the kernels are still enqueued, buffers are copied or mapped
as required to satisfy kernel arguments, and event profiling
information is updated to reflect command progression.)
See the description of the CL_CONTEXT_OFFLINE_COMPILER_MODE_ALTERA
property value in Table 4.4 for more information.
Implementation Notes
For each context property defined in this extension specification,
Altera's platform implementation allows the property to be specified
by setting an environment variable of the same name.
For CL_CONTEXT_OFFLINE_DEVICE_ALTERA, the environment variable
of *must* be set before any platform APIs are called.
For CL_CONTEXT_COMPILER_MODE_ALTERA, value of the environment variable
is the numerical value for the corresponding value enum as defined
in the CL/cl_ext.h header file.
Precedence: For context properties CL_CONTEXT_COMPILER_MODE_ALTERA and
CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA, any value explicitly
provided to the clCreateContext or clCreateContextFromType
API methods will override an environment variable setting.
If property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is explicitly
provided to the clCreateCOTNEXT or clCreateContextFromType
methods, then it must match the value set for environment variable
CL_CONTEXT_OFFLINE_DEVICE_ALTERA.
Issues
1. The use of context property CL_CONTEXT_OFFLINE_DEVICE_ALTERA is
rather restricted:
- It must be set at the "beginning of time" for a host application,
e.g. via an environment variable setting.
- If an "offline device" is used in a context, then online
devices cannot be used in any context for the platform.
In this light, it might be more natural to call it a "platform"
property. But in OpenCL platforms are stateless, so this would
not be appropriate. Instead the most basic runtime controls are
applied to contexts, not platforms.
These restrictions reflect limitations of Altera's original
implementation. However, the feature in its current form is still
quite useful for application development and porting.
Sample Code
Example for using an "offline device", with creation and initialization of
a data store for offline compilation of program binaries.
#include <CL/opencl.h>
#include <stdlib.h>
#include <assert.h>
#define CHECK(X) assert(CL_SUCCESS == (X))
int main(...) {
cl_platform platform = 0;
cl_device_id device = 0;
cl_context context = 0;
cl_int status = 0;
// Specify an offline device via environment variable here, or externally
// before program startup.
// This must occur befor the first OpenCL API method call.
setenv("CL_CONTEXT_OFFLINE_DEVICE_ALTERA","mydevice");
CHECK( clGetPlatformIDs(1,&platform,0) );
CHECK( clGetDeviceIDs(platform,CL_DEVICE_TYPE_ACCELERATOR,1,&device,0) );
cl_context_properties props[] = {
// Specify creation and initialization of a program library data store.
CL_CONTEXT_COMPILER_MODE_ALTERA,
(cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_CREATE_EXE_LIBRARY_ALTERA
// Where is the data store rooted?
CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA
(cl_context_properties)"/data/myproject/proglib"
// Terminate the properties list.
0
};
context = clCreateContext( props, 1, &device, 0, 0, &status );
CHECK( status );
cl_command_queue cq = clCreateCommandQueue( context, device, 0, &status );
CHECK( status );
const char* source = "kernel void foo( global int* A ) { *A = 42; }";
cl_program program = clCreateProgramWithSource( context, 1, &source, NULL, &status );
CHECK( status );
// Perform a stub compilation, and create an entry for this (device,source,options)
// combination in the program library data store.
CHECK( clBuildProgram( program, 1, &device, "-cl-opt-disable", 0, 0 ) );
cl_mem mem = clCreateBuffer(context,CL_MEM_READ_WRITE,sizeof(cl_int),0,&status);
CHECK( status );
// Only a stub compilation has been performed.
// But this still enables full setup and enqueue of a kernel for execution.
cl_kernel kernel = clCreateKernel( program, "foo", &status );
CHECK( status );
CHECK( clSetKernelArg( kernel, 0, sizeof(cl_mem), &mem ) );
// Expect that kernels scheduled by clEnqueueNDRangeKernel and
// clEnqueueTask may not produce expected side effects.
// It may appear that each kernel executes no statements.
CHECK( clEnqueueTask( cq, kernel, 0, 0, 0 ) );
cl_int the_answer = 0;
CHECK( clEnqueueReadBuffer( cq, mem, 1 /*block*/, 0, sizeof(the_answer),&the_answer,0,0,0) );
// Because we're using an "offline device", we can't rely on the_answer being 42.
// ...
Example for using an already-populated data store of program binaries compiled
in an offline manner. The code is the same as the previous example, except
use these context properties instead:
cl_context_properties props[] = {
// Specify the use of a program data store library of pre-compiled device binaries.
CL_CONTEXT_COMPILER_MODE_ALTERA,
(cl_context_properties)CL_CONTEXT_COMPILER_MODE_OFFLINE_USE_EXE_LIBRARY_ALTERA
// Where is the data store rooted?
CL_CONTEXT_PROGRAM_EXE_LIBRARY_ROOT_ALTERA
(cl_context_properties)"/data/myproject/proglib"
// Terminate the properties list.
0
};
Example for using a "preloaded binary":
cl_context_properties props[] = {
// Specify the use of a preloaded binary.
CL_CONTEXT_COMPILER_MODE_ALTERA,
(cl_context_properties)CL_CONTEXT_COMPILER_MODE_PRELOADED_BINARY_ONLY_ALTERA,
// Terminate the properties list.
0
};
context = clCreateContext( props, 1, &device, 0, 0, &status );
...
// All cl_program objects refer to the preloaded binary.
// clCreateProgramWithBinary accepts invalid device binaries.
// clBuildProgram succeeds but does not actually compile the program from source.
Conformance Tests
None.
Revision History
Version 0, 2013-05-06 - Initial revision.
Documents the behaviours in the Altera SDK for OpenCL, version 13.0.
Version 1, 2014-02-06 - Assigned registered values to tokens.