commit | a691458e474db884601b1aaf48ee699c63e1ce9a | [log] [tgz] |
---|---|---|
author | MITSUNARI Shigeo <herumi@nifty.com> | Sun Nov 20 20:04:42 2016 +0900 |
committer | MITSUNARI Shigeo <herumi@nifty.com> | Sun Nov 20 20:04:42 2016 +0900 |
tree | e11bb1628d8961fb3557d6c3697963d88dcb1d2f | |
parent | bf62c0660b97314e2b014d3a1eae859750528674 [diff] |
add vptest for ymm
This is a header file which enables dynamically to assemble x86(IA32), x64(AMD64, x86-64) mnemonic.
header file only you can use Xbyak's functions at once if xbyak.h is included.
MMX/MMX2/SSE/SSE2/SSE3/SSSE3/SSE4/FPU(partial)/AVX/AVX2/FMA/VEX-encoded GPR/AVX-512
Note: Xbyak uses and(), or(), xor(), not() functions, so “-fno-operator-names” option is required on gcc. Or define XBYAK_NO_OP_NAMES and use and_(), or_(), xor_(), not_() instead of them. and_(), or_(), xor_(), not_() are always available.
The following files are necessary. Please add the path to your compile directories.
Linux:
make install
These files are copied into /usr/local/include/xbyak
Add support for AVX-512 instruction set.
Make Xbyak::CodeGenerator and make the class method and get the function pointer by calling cgetCode() and casting the return value.
NASM Xbyak mov eax, ebx --> mov(eax, ebx); inc ecx inc(ecx); ret --> ret();
(ptr|dword|word|byte) [base + index * (1|2|4|8) + displacement] [rip + 32bit disp] ; x64 only NASM Xbyak mov eax, [ebx+ecx] --> mov (eax, ptr[ebx+ecx]); test byte [esp], 4 --> test (byte [esp], 4);
How to use Selector(Segment Register)
Note: Segment class is not derived from Operand.
mov eax, [fs:eax] --> putSeg(fs); mov(eax, ptr [eax]); mov ax, cs --> mov(ax, cs);
you can use ptr for almost memory access unless you specify the size of memory.
dword, word and byte are member variables, then don't use dword as unsigned int, for example.
vaddps(xmm1, xmm2, xmm3); // xmm1 <- xmm2 + xmm3 vaddps(xmm2, xmm3, ptr [rax]); // use ptr to access memory vgatherdpd(xmm1, ptr [ebp+123+xmm2*4], xmm3);
Remark The omitted destination syntax as the following ss disabled.
vaddps(xmm2, xmm3); // xmm2 <- xmm2 + xmm3
define XBYAK_ENABLE_OMITTED_OPERAND
if you use it for backward compatibility. But the newer version will not support it.
vaddpd zmm2, zmm5, zmm30 --> vaddpd(zmm2, zmm5, zmm30); vaddpd xmm30, xmm20, [rax] --> vaddpd(xmm30, xmm20, ptr [rax]); vaddps xmm30, xmm20, [rax] --> vaddps(xmm30, xmm20, ptr [rax]); vaddpd zmm2{k5}, zmm4, zmm2 --> vaddpd(zmm2 | k5, zmm4, zmm2); vaddpd zmm2{k5}{z}, zmm4, zmm2 --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2); vaddpd zmm2{k5}{z}, zmm4, zmm2,{rd-sae} --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2 | T_rd_sae); vaddpd(zmm2 | k5 | T_z | T_rd_sae, zmm4, zmm2); // the position of `|` is arbitrary. vcmppd k4{k3}, zmm1, zmm2, {sae}, 5 --> vcmppd(k4 | k3, zmm1, zmm2 | T_sae, 5); vaddpd xmm1, xmm2, [rax+256] --> vaddpd(xmm1, xmm2, ptr [rax+256]); vaddpd xmm1, xmm2, [rax+256]{1to2} --> vaddpd(xmm1, xmm2, ptr_b [rax+256]); vaddpd ymm1, ymm2, [rax+256]{1to4} --> vaddpd(ymm1, ymm2, ptr_b [rax+256]); vaddpd zmm1, zmm2, [rax+256]{1to8} --> vaddpd(zmm1, zmm2, ptr_b [rax+256]); vaddps zmm1, zmm2, [rax+rcx*8+8]{1to16} --> vaddps(zmm1, zmm2, ptr_b [rax+rcx*8+8]); vmovsd [rax]{k1}, xmm4 --> vmovsd(ptr [rax] | k1, xmm4); vcvtpd2dq xmm16, oword [eax+33] --> vcvtpd2dq(xmm16, xword [eax+33]); // use xword for m128 instead of oword vcvtpd2dq(xmm16, ptr [eax+33]); // default xword vcvtpd2dq xmm21, [eax+32]{1to2} --> vcvtpd2dq(xmm21, ptr_b [eax+32]); vcvtpd2dq xmm0, yword [eax+33] --> vcvtpd2dq(xmm0, yword [eax+33]); // use yword for m256 vcvtpd2dq xmm19, [eax+32]{1to4} --> vcvtpd2dq(xmm19, yword_b [eax+32]); // use yword_b to broadcast vfpclassps k5{k3}, zword [rax+64], 5 --> vfpclassps(k5|k3, zword [rax+64], 5); // specify m512 vfpclasspd k5{k3}, [rax+64]{1to2}, 5 --> vfpclasspd(k5|k3, xword_b [rax+64], 5); // broadcast 64-bit to 128-bit vfpclassps k5{k3}, [rax+64]{1to4}, 5 --> vfpclassps(k5|k3, xword_b [rax+64], 5); // broadcast 32-bit to 128-bit
Remark
| T_z
, | T_sae
, | T_rn_sae
, | T_rd_sae
, | T_ru_sae
, | T_rz_sae
instead of ,{z}
, ,{sae}
, ,{rn-sae}
, ,{rd-sae}
, ,{ru-sae}
, ,{rz-sae}
respectively.k4 | k3
is different from k3 | k4
.ptr_b
for broadcast {1toX}
. X is automatically determined.L("L1"); jmp ("L1"); jmp ("L2"); ... a few mnemonics(8-bit displacement jmp) ... L("L2"); jmp ("L3", T_NEAR); ... a lot of mnemonics(32-bit displacement jmp) ... L("L3");
Call hasUndefinedLabel() to verify your code has no undefined label. you can use a label for immediate value of mov like as mov (eax, “L2”);
L("@@"); // <A> jmp("@b"); // jmp to <A> jmp("@f"); // jmp to <B> L("@@"); // <B> jmp("@b"); // jmp to <B> mov(eax, "@b"); jmp(eax); // jmp to <B>
labels begining of period between inLocalLabel() and outLocalLabel() are dealed with local label. inLocalLabel() and outLocalLabel() can be nested.
void func1() { inLocalLabel(); L(".lp"); // <A> ; local label ... jmp(".lp"); // jmpt to <A> L("aaa"); // global label outLocalLabel(); } void func2() { inLocalLabel(); L(".lp"); // <B> ; local label func1(); jmp(".lp"); // jmp to <B> inLocalLabel(); }
L() and jxx() functions support a new Label class.
Label label1, label2; L(label1); ... jmp(label1); ... jmp(label2); ... L(label2);
Moreover, assignL(dstLabel, srcLabel) method binds dstLabel with srcLabel.
Label label1, label2; L(label1); ... jmp(label2); ... assignL(label2, label1); // label2 <= label1
The above jmp opecode jumps label1.
Label label; mov(eax, ptr [rip + label]); // eax = 4 ... L(label); dd(4);
int x; ... mov(eax, ptr[rip + &x]); // throw exception if the difference between &x and current position is larger than 2GiB
The default max code size is 4096 bytes. Please set it in constructor of CodeGenerator() if you want to use large size.
class Quantize : public Xbyak::CodeGenerator { public: Quantize() : CodeGenerator(8192) { } ... };
You can make jit code on prepaired memory.
class Sample : public Xbyak::CodeGenerator { public: Sample(void *userPtr, size_t size) : Xbyak::CodeGenerator(size, userPtr) { ... } }; const size_t codeSize = 1024; uint8 buf[codeSize + 16]; // get 16-byte aligned address uint8 *p = Xbyak::CodeArray::getAlignedAddress(buf); // append executable attribute to the memory Xbyak::CodeArray::protect(p, codeSize, true); // construct your jit code on the memory Sample s(p, codeSize);
See sample/test0.cpp
Under AutoGrow
mode, Xbyak extends memory automatically if necessary. Call ready() before calling getCode() to calc address of jmp.
struct Code : Xbyak::CodeGenerator { Code() : Xbyak::CodeGenerator(<default memory size>, Xbyak::AutoGrow) { ... } }; Code c; c.ready(); // Don't forget to call this function
Don't use the address returned by getCurr() before calling ready(). It may be invalid address. RESTRICTION : rip addressing is not supported in AutoGrow
-fno-operator-names
vaddps(xmm2, xmm3);
(duplicated in the future)modified new BSD License http://opensource.org/licenses/BSD-3-Clause
The files under test/cybozu/ are copied from cybozulib(https://github.com/herumi/cybozulib/), which is licensed by BSD-3-Clause and are used for only tests. The header files under xbyak/ are independent of cybozulib.
MITSUNARI Shigeo(herumi@nifty.com)