blob: c69451a9a8336615b56d40dd9f6bdc9b693bfcec [file] [log] [blame]
Name
ARM_integer_dot_product
Name Strings
cl_arm_integer_dot_product_int8
cl_arm_integer_dot_product_accumulate_int8
cl_arm_integer_dot_product_accumulate_int16
cl_arm_integer_dot_product_accumulate_saturate_int8
Contributors
Kevin Petit, ARM Ltd.
Abel Bernabeu, ARM Ltd.
Giridhar Tammana, ARM Ltd.
Contact
Kevin Petit, ARM Ltd. (kevin.petit 'at' ARM.com)
IP Status
No claims or disclosures are known to exist.
Version
Revision: #3, November 17th, 2017
Number
OpenCL Extension #52
Status
Complete.
Extension Type
OpenCL device extension
Dependencies
Requires OpenCL version 1.2 or later.
Overview
This extension adds built-in functions giving direct access to specialised
integer dot product instructions that are supported on some devices. An
application wishing to support a fallback path for devices where a specific
function is not available may use the following pattern:
#ifdef cl_arm_integer_dot_product_XXX
#pragma OPENCL EXTENSION cl_arm_integer_dot_product_XXX : enable
code using arm_dot_xxx()
#else
alternative implementation
#endif
Header File
cl_ext.h
New Procedures and Functions
Built-in Functions
int arm_dot(char4 a, char4 b);
uint arm_dot(uchar4 a, uchar4 b);
Description
These functions are available when cl_arm_integer_dot_product_int8 is reported.
The cl_arm_integer_dot_product_int8 preprocessor macro is then defined.
The value returned is:
(a.x * b.x) + (a.y * b.y) + (a.z * b.z) + (a.w * b.w).
Operands are zero- or sign-extended to 32-bit before the multiplications.
Built-in Functions
int arm_dot_acc(char4 a, char4 b, int acc);
uint arm_dot_acc(uchar4 a, uchar4 b, uint acc);
Description
These functions are available when cl_arm_integer_dot_product_accumulate_int8
is reported. The cl_arm_integer_dot_product_accumulate_int8 preprocessor
macro is then defined.
The value returned is:
acc + [ (a.x * b.x) + (a.y * b.y) + (a.z * b.z) + (a.w * b.w) ].
Operands are zero- or sign-extended to 32-bit before the multiplications.
Built-in Functions
int arm_dot_acc(short2 a, short2 b, int acc);
uint arm_dot_acc(ushort2 a, ushort2 b, uint acc);
Description
These functions are available when cl_arm_integer_dot_product_accumulate_int16
is reported. The cl_arm_integer_dot_product_accumulate_int16 preprocessor
macro is then defined.
The value returned is:
acc + [ (a.x * b.x) + (a.y * b.y) ].
Operands are zero- or sign-extended to 32-bit before the multiplications.
Built-in Functions
int arm_dot_acc_sat(char4 a, char4 b, int acc);
uint arm_dot_acc_sat(uchar4 a, uchar4 b, uint acc);
Description
These functions are available when cl_arm_integer_dot_product_accumulate_saturate_int8
is reported. The cl_arm_integer_dot_product_accumulate_saturate_int8
preprocessor macro is then defined.
The value returned is:
acc + [ (a.x * b.x) + (a.y * b.y) + (a.z * b.z) + (a.w * b.w) ].
Operands are zero- or sign-extended to 32-bit before the multiplications.
The final accumulation is saturating.