x86 Encoding Rules: *********************************************************************************************************************** Types The following types are defined as follows: Boolean U8 (1 - true, 0 - false) Char U8 UInt U32 UProcInt U32 Int S32 SInt S32 ProcInt S32 SProcInt S32 PointerInt U32 U8Up U32 S8Up S32 U16Up U32 S16Up S32 U24Up U32 S24Up S32 U32Up U32 S32Up S32 U48Up U64 S48Up S64 U64Up U64 S64Up S64 U128Up U128 S128Up S128 U256Up U256 S256Up S256 *********************************************************************************************************************** Register Access Unfortunately, the x86 processors only give us seven general registers (ESP doesn't count!), three of which do not support 8bit access (ESI, EDI, EBP). This means we have four possible arithmetic registers (EAX, EBX, ECX and EDX), and the x86 has already assigned EAX (A for Accumulator) and EDX (D for Data) as general data processing registers. In the actual code, the following registers are considered "free for use anytime": EAX, EDX These registers are considered "volatile" in that no function is expected to preserve them and they are commonly used for operations such as move, add, negative, etc. The following registers MUST be preserved if any function uses them since they have special purposes: EBX, ECX, ESI, EDI, EBP, ESP Actual register assignments: EAX - General data (volatile); also used to return values EBX - Available for register variables and pointers ECX - Available for register variables and pointers EDX - General data (volatile); also used to return 48bit and 64bit values or pointers ESI - Current class pointer for use in methods; also available for pointers EDI - Available for pointers EBP - Available for pointers ESP - Stack pointer, also used to access local variables *********************************************************************************************************************** Global Variable Access On some processors, accessing global variables may be a complicated task. To help simplify this as much as possible, Any final assemblers will make certain that global variables are aligned as necessary to prevent bus read/write issues. KERBLUH - I need to figure out the best method of doing this. Most compilers seem content with accessing everything as 32bit pointers from the beginning of memory. *********************************************************************************************************************** Return Values Whenever a return value is used, the following chart is consulted: Boolean al U/S8 al U/S16 ax (top 16 bits of eax are undefined) U/S24 eax (top 8 bits are clear) U/S32 eax U/S48 dx:eax (top 16 bits of edx are undefined) U/S64 edx:eax U/S128 eax -> Value's location U/S256 eax -> Value's location Pointer eax As always, the return value is only valid immediately after a function call. All objects and structs are returned as pointers. *********************************************************************************************************************** Pointers =================================================================================================== Class Object Pointers Whenever a class method is called, ESI will hold a pointer to the class the method is in. =================================================================================================== General Pointers Pointers are not directly used in any of the encoding lists (except GetReturnValue) because they should be used in place of a memory operand. GetReturnValue is special because some processors, such as the MC68K series, have special registers for pointers as opposed to data registers. Pointers are first allotted to EDI, then EBP (if it's available) and lastly ESI. Since ESI often holds a pointer to the current class in OOP operations, it is last and will need to be preserved if this is in a method. Since no single operation ever needs to use more than two pointers, this should be sufficient for temporary pointers as needed in the code to access memory pointer operands. *********************************************************************************************************************** Register Variables Registers may hold values from Boolean to U/S64. Only EBX and ECX are available for register values. Since more operations use ECX than EBX, EBX is the first to be declared or used when registers need to be allocated. =================================================================================================== U/S8s, Boolean: There are four possibilities for 8bit values to be allocated to registers: bl, bh, cl, ch Using a boolean value returns a 1 or a 0 regardless of the actual value stored in the register. Because U/S8s may be a high byte (bh or ch), 8 bit registers MAY NOT be upgraded. Using the .U8 or .S8 of an 8bit register variable is the only option! Access Types: bl bh cl ch ------------------------- U/S8 bl bh cl ch U/S8[0] bl bh cl ch =================================================================================================== U/S16s: There are two possibilities for 16bit values to be allocated to registers: bx and cx Because the upper two bytes of a register are not directly accessible, a U16 register may be upgraded to a U/S32 for more speed. This means that the full register is used. The upper two bytes are considered undefined, so all operations will need to filter them out if necessary. Access Types: bx cx ------------------------ U/S8 bl cl U/S8[0] bl cl U/S8[1] bh ch U/S16 bx cx U/S16[0] bx cx U/S32 ebx ecx U/S32[0] ebx ecx =================================================================================================== U/S24s: There are two possibilities for 24bit values to be allocated to registers: ebx and ecx Because the x86 architecture has no means of directly manipulating 24bit values, these will be treated as 32bit values that are zero extended for unsigned, sign extended for signed values. The register may be upgraded to a U/S32 for more speed, and to be able to fully access the variable. Because the value will be zero/sign extended, as long as the registers are used as read-only, they may be treated as 32bit values. NOTE: U/S24s are very difficult to do on x86 processors since the third byte is not directly accessible. This requires a lot of bit shifting to do much manipulation on these values. It is HIGHLY recommended to avoid using U/S24s as registers in the first place, but it is particularly slow on these processors. Also, 24bit values often may reside on both odd and even addresses (especially if there is an array of them), so they may have alignment issues as well. Access Types: ebx ecx ------------------------ U/S8 bl cl U/S8[0] bl cl U/S8[1] bh ch U/S16 bx cx U/S16[0] bx cx U/S32 ebx ecx U/S32[0] ebx ecx =================================================================================================== U/S32, U/SProcInt, PointerInt: There are two possibilities for 32bit values to be allocated to registers: ebx and ecx Access Types: ebx ecx ------------------------ U/S8 bl cl U/S8[0] bl cl U/S8[1] bh ch U/S16 bx cx U/S16[0] bx cx U/S32 ebx ecx U/S32[0] ebx ecx =================================================================================================== U/S48: There is only one possibility for 48bit values to be allocated to registers: cx:ebx cx holds the upper 16 bits, ebx holds the lower 32 bits. The high register will be zero extended (for unsigned) or sign extended (for signed) to a full 32bits so the value may be used as a 64 bit value. The upper register, cx, may be upgraded to a U/S32 for more speed. Access Types: cx:ebx ------------------------ U/S8 bl U/S8[0] bl U/S8[1] bh U/S8[4] cl U/S8[5] ch U/S16 bx U/S16[0] bx U/S16[2] cx U/S32 ebx U/S32[0] ebx U/S32[1] ecx U/S64 ecx:ebx =================================================================================================== U/S64: There is only one possibility for 64bit values to be allocated to registers: ecx:ebx ecx holds the upper 32 bits, ebx holds the lower 32 bits. Access Types: ecx:ebx ------------------------ U/S8 bl U/S8[0] bl U/S8[1] bh U/S8[4] cl U/S8[5] ch U/S16 bx U/S16[0] bx U/S16[2] cx U/S32 ebx U/S32[0] ebx U/S32[1] ecx U/S64 ecx:ebx =================================================================================================== U/S128-256: These values cannot be stored in registers on the 32bit x86. They will be allotted temporary local storage as needed. *********************************************************************************************************************** Custom Encoding XML File Attributes =================================================================================================== Processor Codes 386 386 387 386s with a 387 co-processor 486 486 P1 Pentium PMMX Pentium w/ MMX PPro Pentium Pro P2 Pentium 2 P3 Pentium 3 P4 Pentium 4 PM Pentium-M Core2 Core 2 K5 K5 K6 K6 K6-2 K6-2 or K6-3 (w/ 3D Now!) Athlon Athlon AMD64 AMD Athlon 64 AMDFX AMD Athlon FX The assembler will choose the encoding for the highest compatible processor. For example, code designated as P1 will work on all Pentiums, Cores, K5 and above, and Athlons, but K6-2 code is only guaranteed to work on AMD processors that are K6-2 or higher.