docs/syntax.md - external/github.com/KhronosGroup/SPIRV-Tools - Git at Google

 # SPIR-V Assembly language syntax

 ## Overview

 The assembly attempts to adhere to the binary form from Section 3 of the SPIR-V
 spec as closely as possible, with one exception aiming at improving the text's
 readability.  The `<result-id>` generated by an instruction is moved to the
 beginning of that instruction and followed by an `=` sign.  This allows us to
 distinguish between variable definitions and uses and locate value definitions
 more easily.

 Here is an example:

 ```
      OpCapability Shader
      OpMemoryModel Logical Simple
      OpEntryPoint GLCompute %3 "main"
      OpExecutionMode %3 LocalSize 64 64 1
 %1 = OpTypeVoid
 %2 = OpTypeFunction %1
 %3 = OpFunction %1 None %2
 %4 = OpLabel
      OpReturn
      OpFunctionEnd
 ```

 A module is a sequence of instructions, separated by whitespace.
 An instruction is an opcode name followed by operands, separated by
 whitespace.  Typically each instruction is presented on its own line,
 but the assembler does not enforce this rule.

 The opcode names and expected operands are described in Section 3 of
 the SPIR-V specification.  An operand is one of:
 * a literal integer: A decimal integer, or a hexadecimal integer.
   A hexadecimal integer is indicated by a leading `0x` or `0X`.  A hex
   integer supplied for a signed integer value will be sign-extended.
   For example, `0xffff` supplied as the literal for an `OpConstant`
   on a signed 16-bit integer type will be interpreted as the value `-1`.
 * a literal floating point number, in decimal or hexadecimal form.
   See [below](#floats).
 * a literal string.
    * A literal string is everything following a double-quote `"` until the
      following un-escaped double-quote. This includes special characters such
      as newlines.
    * A backslash `\` may be used to escape characters in the string. The `\`
      may be used to escape a double-quote or a `\` but is simply ignored when
      preceding any other character.
 * a named enumerated value, specific to that operand position.  For example,
   the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or
   `Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`).
   Named enumerated values are only meaningful in specific positions, and will
   otherwise generate an error.
 * a mask expression, consisting of one or more mask enum names separated
   by `|`.  For example, the expression `NotNaN|NotInf|NSZ` denotes the mask
   which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags.
 * an injected immediate integer: `!<integer>`.  See [below](#immediate).
 * an ID, e.g. `%foo`. See [below](#id).
 * the name of an extended instruction.  For example, `sqrt` in an extended
   instruction such as `%f = OpExtInst %f32 %OpenCLImport sqrt %arg`
 * the name of an opcode for OpSpecConstantOp, but where the `Op` prefix
   is removed.  For example, the following indicates the use of an integer
   addition in a specialization constant computation:
   `%sum = OpSpecConstantOp %i32 IAdd %a %b`

 ## ID Definitions & Usage
 <a name="id"></a>

 An ID _definition_ pertains to the `<result-id>` of an instruction, and ID
 _usage_ is a use of an ID as an input to an instruction.

 An ID in the assembly language begins with `%` and must be followed by a name
 consisting of one or more letters, numbers or underscore characters.

 For every ID in the assembly program, the assembler generates a unique number
 called the ID's internal number. Then each ID reference translates into its
 internal number in the SPIR-V output. Internal numbers are unique within the
 compilation unit: no two IDs in the same unit will share internal numbers.

 The disassembler generates IDs where the name is always a decimal number
 greater than 0.

 So the example can be rewritten using more user-friendly names, as follows:
 ```
           OpCapability Shader
           OpMemoryModel Logical Simple
           OpEntryPoint GLCompute %main "main"
           OpExecutionMode %main LocalSize 64 64 1
   %void = OpTypeVoid
 %fnMain = OpTypeFunction %void
   %main = OpFunction %void None %fnMain
 %lbMain = OpLabel
           OpReturn
           OpFunctionEnd
 ```

 ## Floating point literals
 <a name="floats"></a>

 The assembler and disassembler support floating point literals in both
 decimal and hexadecimal form.

 The syntax for a floating point literal is the same as floating point
 constants in the C programming language, except:
 * An optional leading minus (`-`) is part of the literal.
 * An optional type specifier suffix is not allowed.
 Infinity and NaN values are expressed in hexadecimal float literals
 by using the maximum representable exponent for the bit width.

 For example, in 32-bit floating point, 8 bits are used for the exponent, and the
 exponent bias is 127.  So the maximum representable unbiased exponent is 128.
 Therefore, we represent the infinities and some NaNs as follows:

 ```
 %float32 = OpTypeFloat 32
 %inf     = OpConstant %float32 0x1p+128
 %neginf  = OpConstant %float32 -0x1p+128
 %aNaN    = OpConstant %float32 0x1.8p+128
 %moreNaN = OpConstant %float32 -0x1.0002p+128
 ```
 The assembler preserves all the bits of a NaN value.  For example, the encoding
 of `%aNaN` in the previous example is the same as the word with bits
 `0x7fc00000`, and `%moreNaN` is encoded as `0xff800100`.

 The disassembler prints infinite, NaN, and subnormal values in hexadecimal form.
 Zero and normal values are printed in decimal form with enough digits
 to preserve all significand bits.

 ## Arbitrary Integers
 <a name="immediate"></a>

 When writing tests it can be useful to emit an invalid 32 bit word into the
 binary stream at arbitrary positions within the assembly. To specify an
 arbitrary word into the stream the prefix `!` is used, this takes the form
 `!<integer>`. Here is an example.

 ```
 OpCapability !0x0000FF00
 ```

 Any token in a valid assembly program may be replaced by `!<integer>` -- even
 tokens that dictate how the rest of the instruction is parsed.  Consider, for
 example, the following assembly program:

 ```
 %4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33
 OpExecutionMode %3 InputLines
 ```

 The tokens `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random
 `!<integer>` values, and the assembler will still assemble an output binary with
 three instructions.  It will not necessarily be valid SPIR-V, but it will
 faithfully reflect the input text.

 You may wonder how the assembler recognizes the instruction structure (including
 instruction boundaries) in the text with certain crucial tokens replaced by
 arbitrary integers.  If, say, `OpConstant` becomes a `!<integer>` whose value
 differs from the binary representation of `OpConstant` (remember that this
 feature is intended for fine-grain control in SPIR-V testing), the assembler
 generally has no idea what that value stands for.  So how does it know there is
 exactly one `<id>` and three number literals following in that instruction,
 before the next one begins?  And if `LocalSize` is replaced by an arbitrary
 `!<integer>`, how does it know to take the next three tokens (instead of zero or
 one, both of which are possible in the absence of certainty that `LocalSize`
 provided)?  The answer is a simple rule governing the parsing of instructions
 with `!<integer>` in them:

 When a token in the assembly program is a `!<integer>`, that integer value is
 emitted into the binary output, and parsing proceeds differently than before:
 each subsequent token not recognized as an OpCode or a <result-id> is emitted
 into the binary output without any checking; when a recognizable OpCode or a
 <result-id> is eventually encountered, it begins a new instruction and parsing
 returns to normal.  (If a subsequent OpCode is never found, then this alternate
 parsing mode handles all the remaining tokens in the program.)

 The assembler processes the tokens encountered in alternate parsing mode as
 follows:

 * If the token is a number literal, since context may be lost, the number
   is interpreted as a 32-bit value and output as a single word.  In order to
   specify multiple-word literals in alternate-parsing mode, further uses of
   `!<integer>` tokens may be required.
   All formats supported by `strtoul()` are accepted.
 * If the token is a string literal, it outputs a sequence of words representing
   the string as defined in the SPIR-V specification for Literal String.
 * If the token is an ID, it outputs the ID's internal number.
 * If the token is another `!<integer>`, it outputs that integer.
 * Any other token causes the assembler to quit with an error.

 Note that this has some interesting consequences, including:

 * When an OpCode is replaced by `!<integer>`, the integer value should encode
   the instruction's word count, as specified in the physical-layout section of
   the SPIR-V specification.

 * Consecutive instructions may have their OpCode replaced by `!<integer>` and
   still produce valid SPIR-V.  For example, `!262187 %1 %2 "abc" !327739 %1 %3 6
   %2` will successfully assemble into SPIR-V declaring a constant and a
   PrivateGlobal variable.

 * Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled
   by the alternate parsing mode.  They must be replaced by `!<integer>` for
   successful assembly.

 * The `<result-id>` on the left-hand side of an assignment cannot be a
   `!<integer>`. The `<result-id>` can be still be manually controlled if desired
   by expressing the entire instruction as `!<integer>` tokens for its opcode and
   operands.

 * The `=` sign cannot be processed by the alternate parsing mode if the OpCode
   following it is a `!<integer>`.

 * When replacing a named ID with `!<integer>`, it is possible to generate
   unintentionally valid SPIR-V.  If the integer provided happens to equal a
   number generated for an existing named ID, it will result in a reference to
   that named ID being output.  This may be valid SPIR-V, contrary to the
   presumed intention of the writer.

 ## Notes

 * Some enumerants cannot be used by name, because the target instruction
 in which they are meaningful take an ID reference instead of a literal value.
 For example:
    * Named enumerated value `CmdExecTime` from section 3.30 Kernel
      Profiling Info is used in constructing a mask value supplied as
      an ID for `OpCaptureEventProfilingInfo`.  But no other instruction
      has enough context to bring the enumerant names from section 3.30
      into scope.
    * Similarly, the names in section 3.29 Kernel Enqueue Flags are used to
      construct a value supplied as an ID to the Flags argument of
      OpEnqueueKernel.
    * Similarly for the names in section 3.25 Memory Semantics.
    * Similarly for the names in section 3.27 Scope.
 * Some enumerants cannot be used by name, because they only name values
 returned by an instruction:
    * Enumerants from 3.12 Image Channel Order name possible values returned
      by the `OpImageQueryOrder` instruction.
    * Enumerants from 3.13 Image Channel Data Type name possible values
      returned by the `OpImageQueryFormat` instruction.
	# SPIR-V Assembly language syntax

	## Overview

	The assembly attempts to adhere to the binary form from Section 3 of the SPIR-V
	spec as closely as possible, with one exception aiming at improving the text's
	readability. The `<result-id>` generated by an instruction is moved to the
	beginning of that instruction and followed by an `=` sign. This allows us to
	distinguish between variable definitions and uses and locate value definitions
	more easily.

	Here is an example:

	```
	OpCapability Shader
	OpMemoryModel Logical Simple
	OpEntryPoint GLCompute %3 "main"
	OpExecutionMode %3 LocalSize 64 64 1
	%1 = OpTypeVoid
	%2 = OpTypeFunction %1
	%3 = OpFunction %1 None %2
	%4 = OpLabel
	OpReturn
	OpFunctionEnd
	```

	A module is a sequence of instructions, separated by whitespace.
	An instruction is an opcode name followed by operands, separated by
	whitespace. Typically each instruction is presented on its own line,
	but the assembler does not enforce this rule.

	The opcode names and expected operands are described in Section 3 of
	the SPIR-V specification. An operand is one of:
	* a literal integer: A decimal integer, or a hexadecimal integer.
	A hexadecimal integer is indicated by a leading `0x` or `0X`. A hex
	integer supplied for a signed integer value will be sign-extended.
	For example, `0xffff` supplied as the literal for an `OpConstant`
	on a signed 16-bit integer type will be interpreted as the value `-1`.
	* a literal floating point number, in decimal or hexadecimal form.
	See [below](#floats).
	* a literal string.
	* A literal string is everything following a double-quote `"` until the
	following un-escaped double-quote. This includes special characters such
	as newlines.
	* A backslash `\` may be used to escape characters in the string. The `\`
	may be used to escape a double-quote or a `\` but is simply ignored when
	preceding any other character.
	* a named enumerated value, specific to that operand position. For example,
	the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or
	`Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`).
	Named enumerated values are only meaningful in specific positions, and will
	otherwise generate an error.
	* a mask expression, consisting of one or more mask enum names separated
	by `\|`. For example, the expression `NotNaN\|NotInf\|NSZ` denotes the mask
	which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags.
	* an injected immediate integer: `!<integer>`. See [below](#immediate).
	* an ID, e.g. `%foo`. See [below](#id).
	* the name of an extended instruction. For example, `sqrt` in an extended
	instruction such as `%f = OpExtInst %f32 %OpenCLImport sqrt %arg`
	* the name of an opcode for OpSpecConstantOp, but where the `Op` prefix
	is removed. For example, the following indicates the use of an integer
	addition in a specialization constant computation:
	`%sum = OpSpecConstantOp %i32 IAdd %a %b`

	## ID Definitions & Usage
	<a name="id"></a>

	An ID _definition_ pertains to the `<result-id>` of an instruction, and ID
	_usage_ is a use of an ID as an input to an instruction.

	An ID in the assembly language begins with `%` and must be followed by a name
	consisting of one or more letters, numbers or underscore characters.

	For every ID in the assembly program, the assembler generates a unique number
	called the ID's internal number. Then each ID reference translates into its
	internal number in the SPIR-V output. Internal numbers are unique within the
	compilation unit: no two IDs in the same unit will share internal numbers.

	The disassembler generates IDs where the name is always a decimal number
	greater than 0.

	So the example can be rewritten using more user-friendly names, as follows:
	```
	OpCapability Shader
	OpMemoryModel Logical Simple
	OpEntryPoint GLCompute %main "main"
	OpExecutionMode %main LocalSize 64 64 1
	%void = OpTypeVoid
	%fnMain = OpTypeFunction %void
	%main = OpFunction %void None %fnMain
	%lbMain = OpLabel
	OpReturn
	OpFunctionEnd
	```

	## Floating point literals
	<a name="floats"></a>

	The assembler and disassembler support floating point literals in both
	decimal and hexadecimal form.

	The syntax for a floating point literal is the same as floating point
	constants in the C programming language, except:
	* An optional leading minus (`-`) is part of the literal.
	* An optional type specifier suffix is not allowed.
	Infinity and NaN values are expressed in hexadecimal float literals
	by using the maximum representable exponent for the bit width.

	For example, in 32-bit floating point, 8 bits are used for the exponent, and the
	exponent bias is 127. So the maximum representable unbiased exponent is 128.
	Therefore, we represent the infinities and some NaNs as follows:

	```
	%float32 = OpTypeFloat 32
	%inf = OpConstant %float32 0x1p+128
	%neginf = OpConstant %float32 -0x1p+128
	%aNaN = OpConstant %float32 0x1.8p+128
	%moreNaN = OpConstant %float32 -0x1.0002p+128
	```
	The assembler preserves all the bits of a NaN value. For example, the encoding
	of `%aNaN` in the previous example is the same as the word with bits
	`0x7fc00000`, and `%moreNaN` is encoded as `0xff800100`.

	The disassembler prints infinite, NaN, and subnormal values in hexadecimal form.
	Zero and normal values are printed in decimal form with enough digits
	to preserve all significand bits.

	## Arbitrary Integers
	<a name="immediate"></a>

	When writing tests it can be useful to emit an invalid 32 bit word into the
	binary stream at arbitrary positions within the assembly. To specify an
	arbitrary word into the stream the prefix `!` is used, this takes the form
	`!<integer>`. Here is an example.

	```
	OpCapability !0x0000FF00
	```

	Any token in a valid assembly program may be replaced by `!<integer>` -- even
	tokens that dictate how the rest of the instruction is parsed. Consider, for
	example, the following assembly program:

	```
	%4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33
	OpExecutionMode %3 InputLines
	```

	The tokens `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random
	`!<integer>` values, and the assembler will still assemble an output binary with
	three instructions. It will not necessarily be valid SPIR-V, but it will
	faithfully reflect the input text.

	You may wonder how the assembler recognizes the instruction structure (including
	instruction boundaries) in the text with certain crucial tokens replaced by
	arbitrary integers. If, say, `OpConstant` becomes a `!<integer>` whose value
	differs from the binary representation of `OpConstant` (remember that this
	feature is intended for fine-grain control in SPIR-V testing), the assembler
	generally has no idea what that value stands for. So how does it know there is
	exactly one `<id>` and three number literals following in that instruction,
	before the next one begins? And if `LocalSize` is replaced by an arbitrary
	`!<integer>`, how does it know to take the next three tokens (instead of zero or
	one, both of which are possible in the absence of certainty that `LocalSize`
	provided)? The answer is a simple rule governing the parsing of instructions
	with `!<integer>` in them:

	When a token in the assembly program is a `!<integer>`, that integer value is
	emitted into the binary output, and parsing proceeds differently than before:
	each subsequent token not recognized as an OpCode or a <result-id> is emitted
	into the binary output without any checking; when a recognizable OpCode or a
	<result-id> is eventually encountered, it begins a new instruction and parsing
	returns to normal. (If a subsequent OpCode is never found, then this alternate
	parsing mode handles all the remaining tokens in the program.)

	The assembler processes the tokens encountered in alternate parsing mode as
	follows:

	* If the token is a number literal, since context may be lost, the number
	is interpreted as a 32-bit value and output as a single word. In order to
	specify multiple-word literals in alternate-parsing mode, further uses of
	`!<integer>` tokens may be required.
	All formats supported by `strtoul()` are accepted.
	* If the token is a string literal, it outputs a sequence of words representing
	the string as defined in the SPIR-V specification for Literal String.
	* If the token is an ID, it outputs the ID's internal number.
	* If the token is another `!<integer>`, it outputs that integer.
	* Any other token causes the assembler to quit with an error.

	Note that this has some interesting consequences, including:

	* When an OpCode is replaced by `!<integer>`, the integer value should encode
	the instruction's word count, as specified in the physical-layout section of
	the SPIR-V specification.

	* Consecutive instructions may have their OpCode replaced by `!<integer>` and
	still produce valid SPIR-V. For example, `!262187 %1 %2 "abc" !327739 %1 %3 6
	%2` will successfully assemble into SPIR-V declaring a constant and a
	PrivateGlobal variable.

	* Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled
	by the alternate parsing mode. They must be replaced by `!<integer>` for
	successful assembly.

	* The `<result-id>` on the left-hand side of an assignment cannot be a
	`!<integer>`. The `<result-id>` can be still be manually controlled if desired
	by expressing the entire instruction as `!<integer>` tokens for its opcode and
	operands.

	* The `=` sign cannot be processed by the alternate parsing mode if the OpCode
	following it is a `!<integer>`.

	* When replacing a named ID with `!<integer>`, it is possible to generate
	unintentionally valid SPIR-V. If the integer provided happens to equal a
	number generated for an existing named ID, it will result in a reference to
	that named ID being output. This may be valid SPIR-V, contrary to the
	presumed intention of the writer.

	## Notes

	* Some enumerants cannot be used by name, because the target instruction
	in which they are meaningful take an ID reference instead of a literal value.
	For example:
	* Named enumerated value `CmdExecTime` from section 3.30 Kernel
	Profiling Info is used in constructing a mask value supplied as
	an ID for `OpCaptureEventProfilingInfo`. But no other instruction
	has enough context to bring the enumerant names from section 3.30
	into scope.
	* Similarly, the names in section 3.29 Kernel Enqueue Flags are used to
	construct a value supplied as an ID to the Flags argument of
	OpEnqueueKernel.
	* Similarly for the names in section 3.25 Memory Semantics.
	* Similarly for the names in section 3.27 Scope.
	* Some enumerants cannot be used by name, because they only name values
	returned by an instruction:
	* Enumerants from 3.12 Image Channel Order name possible values returned
	by the `OpImageQueryOrder` instruction.
	* Enumerants from 3.13 Image Channel Data Type name possible values
	returned by the `OpImageQueryFormat` instruction.