commit | d95b6ee23d4c234738410dd8e2a862eaa6aea9be | [log] [tgz] |
---|---|---|
author | MITSUNARI Shigeo <herumi@nifty.com> | Sun Jul 17 11:42:09 2016 +0900 |
committer | MITSUNARI Shigeo <herumi@nifty.com> | Sun Jul 17 11:42:09 2016 +0900 |
tree | 06ac8656b3bfd06e1e34ff8c1774e1d9a6c7cc69 | |
parent | f79df1f515733249aa3c754d03268a1a2eeb31a9 [diff] |
vcvtpd2dq is ok ; add yword
This is a header file which enables dynamically to assemble x86(IA32), x64(AMD64, x86-64) mnemonic.
header file only you can use Xbyak's functions at once if xbyak.h is included.
MMX/MMX2/SSE/SSE2/SSE3/SSSE3/SSE4/FPU(partial)/AVX/AVX2/FMA/VEX-encoded GPR
Note: Xbyak uses and(), or(), xor(), not() functions, so “-fno-operator-names” option is required on gcc. Or define XBYAK_NO_OP_NAMES and use and_(), or_(), xor_(), not_() instead of them. and_(), or_(), xor_(), not_() are available if XBYAK_NO_OP_NAMES is not defined.
The following files are necessary. Please add the path to your compile directories.
Linux:
make install
These files are copied into /usr/local/include/xbyak
Use MmapAllocator if XBYAK_USE_MMAP_ALLOCATOR. Default allocator calls posix_memalign on Linux, then mprotect recudes map count. The max value is written in /proc/sys/vm/max_map_count
. The max number of instances of Xbyak::CodeGenerator
is limited to the value. See test/mprotect_test.cpp
. Use MmapAllocator if you want to avoid the restriction(This behavior may be default in the feature).
AutoGrow mode is a mode that Xbyak grows memory automatically if necessary. Call ready() before calling getCode() to calc address of jmp.
struct Code : Xbyak::CodeGenerator { Code() : Xbyak::CodeGenerator(<default memory size>, Xbyak::AutoGrow) { ... } }; Code c; c.ready(); // Don't forget to call this function
Don't use the address returned by getCurr() before calling ready(). It may be invalid address. RESTRICTION : rip addressing is not supported in AutoGrow
Make Xbyak::CodeGenerator and make the class method and get the function pointer by calling cgetCode() and casting the return value.
NASM Xbyak mov eax, ebx --> mov(eax, ebx); inc ecx inc(ecx); ret --> ret();
(ptr|dword|word|byte) [base + index * (1|2|4|8) + displacement] [rip + 32bit disp] ; x64 only NASM Xbyak mov eax, [ebx+ecx] --> mov (eax, ptr[ebx+ecx]); test byte [esp], 4 --> test (byte [esp], 4);
How to use Selector(Segment Register)
Note: Segment class is not derived from Operand.
mov eax, [fs:eax] --> putSeg(fs); mov(eax, ptr [eax]); mov ax, cs --> mov(ax, cs);
you can use ptr for almost memory access unless you specify the size of memory.
dword, word and byte are member variables, then don't use dword as unsigned int, for example.
You can omit a destination for almost 3-op mnemonics.
vaddps(xmm1, xmm2, xmm3); // xmm1 <- xmm2 + xmm3 vaddps(xmm2, xmm3); // xmm2 <- xmm2 + xmm3 vaddps(xmm2, xmm3, ptr [rax]); // use ptr to access memory vgatherdpd(xmm1, ptr [ebp+123+xmm2*4], xmm3);
vaddpd zmm2, zmm5, zmm30 --> vaddpd(zmm2, zmm5, zmm30); vaddpd zmm2{k5}, zmm4, zmm2 --> vaddpd(zmm2 | k5, zmm4, zmm2); vaddpd zmm2{k5}{z}, zmm4, zmm2 --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2); vaddpd zmm2{k5}{z}, zmm4, zmm2,{rd-sae} --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2 | T_rd_sae); vcmppd k4{k3}, zmm1, zmm2, {sae}, 5 --> vcmppd(k4 | k3, zmm1, zmm2 | T_sae, 5); vaddpd xmm1, xmm2, [rax+256]{1to2} --> vaddpd(xmm1, xmm2, ptr_b [rax+256]); vaddpd ymm1, ymm2, [rax+256]{1to4} --> vaddpd(ymm1, ymm2, ptr_b [rax+256]); vaddpd zmm1, zmm2, [rax+256]{1to8} --> vaddpd(zmm1, zmm2, ptr_b [rax+256]); vaddps zmm1, zmm2, [rax+rcx*8+8]{1to16} --> vaddps(zmm1, zmm2, ptr_b [rax+rcx*8+8]); vmovsd [rax]{k1}, xmm4 --> vmovsd (ptr [rax] | k1, xmm4); vcvtpd2dq xmm16, oword [eax+33] --> vcvtpd2dq(xmm16, ptr [eax+33]); // default oword(m128) vcvtpd2dq xmm21, [eax+32]{1to2} --> vcvtpd2dq(xmm21, ptr_b [eax+32]); vcvtpd2dq xmm0, yword [eax+33] --> vcvtpd2dq(xmm0, yword [eax+33]); // use yword for m256 vcvtpd2dq xmm19, [eax+32]{1to4} --> vcvtpd2dq(xmm19, yword_b [eax+32]); // use yword_b to broadcast
Remark
| T_z
, | T_sae
, | T_rn_sae
, | T_rd_sae
, | T_ru_sae
, | T_rz_sae
instead of ,{z}
, ,{sae}
, ,{rn-sae}
, ,{rd-sae}
, ,{ru-sae}
, ,{rz-sae}
respectively.k4 | k3
is different from k3 | k4
.ptr_b
for broadcast {1toX}
. X is automatically determined.L("L1"); jmp ("L1"); jmp ("L2"); ... a few mnemonics(8-bit displacement jmp) ... L("L2"); jmp ("L3", T_NEAR); ... a lot of mnemonics(32-bit displacement jmp) ... L("L3");
Call hasUndefinedLabel() to verify your code has no undefined label. you can use a label for immediate value of mov like as mov (eax, “L2”);
L("@@"); // <A> jmp("@b"); // jmp to <A> jmp("@f"); // jmp to <B> L("@@"); // <B> jmp("@b"); // jmp to <B> mov(eax, "@b"); jmp(eax); // jmp to <B>
labels begining of period between inLocalLabel() and outLocalLabel() are dealed with local label. inLocalLabel() and outLocalLabel() can be nested.
void func1() { inLocalLabel(); L(".lp"); // <A> ; local label ... jmp(".lp"); // jmpt to <A> L("aaa"); // global label outLocalLabel(); } void func2() { inLocalLabel(); L(".lp"); // <B> ; local label func1(); jmp(".lp"); // jmp to <B> inLocalLabel(); }
L() and jxx() functions support a new Label class.
Label label1, label2; L(label1); ... jmp(label1); ... jmp(label2); ... L(label2);
Moreover, assignL(dstLabel, srcLabel) method binds dstLabel with srcLabel.
Label label1, label2; L(label1); ... jmp(label2); ... assignL(label2, label1); // label2 <= label1
The above jmp opecode jumps label1.
The default max code size is 4096 bytes. Please set it in constructor of CodeGenerator() if you want to use large size.
class Quantize : public Xbyak::CodeGenerator { public: Quantize() : CodeGenerator(8192) { } ... };
You can make jit code on prepaired memory.
class Sample : public Xbyak::CodeGenerator { public: Sample(void *userPtr, size_t size) : Xbyak::CodeGenerator(size, userPtr) { ... } }; const size_t codeSize = 1024; uint8 buf[codeSize + 16]; // get 16-byte aligned address uint8 *p = Xbyak::CodeArray::getAlignedAddress(buf); // append executable attribute to the memory Xbyak::CodeArray::protect(p, codeSize, true); // construct your jit code on the memory Sample s(p, codeSize);
See sample/test0.cpp
The current version does not support 3D Now!, 80bit FPU load/store and some special mnemonics. Please mail to me if necessary.
modified new BSD License http://opensource.org/licenses/BSD-3-Clause
The files under test/cybozu/ are copied from cybozulib(https://github.com/herumi/cybozulib/), which is licensed by BSD-3-Clause and are used for only tests. The header files under xbyak/ are independent of cybozulib.
MITSUNARI Shigeo(herumi at nifty dot com)