AnyASM Mnemonics and Special Keywords v0.0.1a by Mark Ormston Created: February 6th, 2007 Last Modified: December 16th, 2007 TODO: o Change it so that Compare-style statements REQUIRE a Condition-style statement afterwards. o Change it so that *Out statements REQUIRE a GetReturn statement afterwards. o Signed bitwise operations will sign extend while unsigned bitwise operations will not. This requires the x86 implementations to be redone. This document uses C/C++ notation for math operations, though AnyScript adds a few new types. If you are not familiar with C/C++ notation, here's a quick chart: Symbol Sample Name Meaning -- a-- Decrement a becomes a minus 1 ++ a++ Increment a becomes a plus 1 * a * b Multiple a becomes a times b / a / b Divide a becomes a divided by b % a % b Modula a becomes the remainder of a divided by b /%= a /%= b DivMod a becomes a divided by b and b becomes the remainder of a divided by b [Not in C/C++] <=> a <=> b Exchange a becomes b and b becomes a [Not in C/C++] << a << b Shift Left a's bits get shifted to the left b times and 0s shift in from the right. a << b is similar to a * (2 ^ b), but is way faster >> a >> b Shift Right Unsigned Version: a's bits get shifted to the right b times and 0s shift in from the left. a >> b is similar to "a / (2 ** b)", but is way faster. Signed Version: a's bits get shifted to the right b times, but the high bit never changes. This is similar to a signed "a / (2 ** b)". Unlike an integer "a / (2 ** b)", which would truncate the value towards 0, an arithmetic shift right truncates the value towards negative infinity Example: If a is an S32 and it holds 0x87654321 (-2,023,406,814), then "a >> 4" has a result of 0xF8765432 (-126,462,925) NOTE: Some programming languages, such as JScript, use >>> to denote a Shift Right without sign extending (see >>>). This is not to be confused with Rotate Right used by AnyScript which very few programming languages support natively. <<< a <<< b Rotate Left a's bits get rotated to the left b times and the top bits shift in from the right. Example: If a is a U32 and it holds 0x12345678, then a <<< 4 has a result of 0x23456781 [Not in C/C++] >>> a >>> b Rotate Right a's bits get rotated to the right b times and the bottom bits shift in from the left. Example: If a is a U32 and it holds 0x12345678, then a >>> 4 has a result of 0x81234567 [Not in C/C++] NOTE: Some programming languages, such as JScript, use >>> to denote a Shift Right without sign extending (see >>). This is not to be confused with Rotate Right used by AnyScript which very few programming languages support natively. ^ a ^ b Xor Bitwise Exclusive OR aka XOR Each bit set in the second value flips those bits in the first value For a single bit, XOR does: a ^ b = result 0 0 = 0 0 1 = 1 1 0 = 1 1 1 = 0 & a & b And Bitwise AND Each bit clear in the second value clears those bits in the first value For a single bit, AND does: a & b = result 0 0 = 0 0 1 = 0 1 0 = 0 1 1 = 1 &~ a &~ b And Not Bitwise AND NOT. Each bit set in the second value clears those bits in the first value This is simply the opposite of a normal AND, and is very useful for clearing specific bits on a value For a single bit, AND NOT does: a &~ b = result 0 0 = 0 0 1 = 0 1 0 = 1 1 1 = 0 [Not in C/C++] | a | b Or Bitwise OR. Each bit set in the second value sets those bits in the first value For a single bit, OR does: a | b = result 0 0 = 0 0 1 = 1 1 0 = 1 1 1 = 1 ~ ~a Complement Bitwise Complement. This changes every bit in the specified operand. Every 0 becomes a 1 and every 1 becomes a 0. ################################################################################################### ################################################################################################### ################################################################################################### Special Keywords ClearRegisters This tells the final assembler that various temporary registers are no longer needed and their values can be reused. Multiple registers may be cleared by separating them with commas. EndFunction Ends the function, calling stack correction, setting the necessary registers for the return value, popping values off the stack, and any object deinitialization as necessary. This should only appear once per function, at the end of it. Ending a function ahead of time may be done by placing a label just before EndFunction and jumping to it at the necessary time. Also, using the Return mnemonic will set the return value (if specified) then jump to EndFunction. Function This starts a new function. It uses this format: Function {function_name}({function_params}) [Format {format_type}] [[, ][Returns {return_type}]]; function_name The name of the function. function_params The parameters of the function. Just like in C++, put the type then the name of the variable to assign to the parameter format_type The type of format to pass variables to the function: Any The calling code pushes the parameters on the stack in reverse order (last first). If there is an ellipsis on the function, meaning an unknown number of extra values, then the calling code must correct the stack. Otherwise, if the processor supports correcting the stack on return (eg, with a RET n style opcode), then the function will correct the stack. If not, the calling code must correct the stack. Fast Registers are used first if possible for the earliest parameters then extra parameters are pushed on the stack in reverse order (last first). This does NOT support ellipsis for the function. Which registers and how many are used is dependent on the platform and processor though hopefully at least the first two parameters will always be passed as registers If the processor supports correcting the stack on return (eg, with a RET n style opcode), then the function handles correcting the stack. If not, the calling code must correct the stack. Pseudo This format is used to declare Pseudo functions. This means that the function is not really a function, but rather a calling point for outside code. Pseudo functions cannot accept parameters and cannot return a value. To call Pseudo functions, you can use either Call with no parameters, or JumpSubroutine. To exit a Pseudo function, you should use Return just like normal functions since they may declare local variables. Pseudo functions may declare local variables and registers, and can use global variables. Note that if any local variables are used, Pseudo functions act just like normal functions without parameters or a return value since they require prologue and epilogue code to adjust the stack for the local variables. Note that this value may be omitted, in which case Any will be used by default. return_type Data type returned. If this value is omitted, Nothing will be used by default. Pseudo functions can only return Nothing. The following function names are reserved for special purposes: Initialize This is always the starting point of any program. This function is useful for any special initialization that may be necessary before the program begins. It never has any parameters, may use any format, and may return SProcInt or Nothing. If it returns an SProcInt, returning 0 causes the program to end and returning 1 allows the program to run. Other values may be used in future versions of Any, are considered reserved and for now will cause the program to exit since their future behavior is unknown. NOTE: All globally created objects will be initialized and deinitialized automatically. DeInitialize This function is useful for any special deinitialization that may be necessary. It never has any parameters, may use any format, and always returns Nothing. If you create a DeInitialize function, you should always assume that Initialize may not have completed successfully since DeInitialize is always called before the program terminates, and test that pointers actually have been allocated before clearing them. NOTE: All globally created objects will be initialized and deinitialized automatically. Option Sets options for how to assemble the code. Multiple values may be set simultaneously by separating them with commas. If multiple values that opposed one another (Size vs Speed) are set, the latest value in the list takes precedence. Align Tells the assembler to place as many No Operation opcodes as are necessary to pad the following statement to whatever optimal alignment size exists for the current processor. On some processors, such as the 65c816, this is ignored since alignment of code is not important. On other processors, like Pentium and higher x86s, alignment is important for functions or labels that are taken often. The main use of this option is just before a loop where optimal speed is a must. All functions will automatically be aligned optimally. All data will automatically be aligned optimally. OrderDS Tells the assembler that all Assignment, Compare2 and Dual mnemonics are in "Destination,Source" order. The first operand is the destination and the second operand is the source. This is the default option for the assembler. OrderSD Tells the assembler that all Assignment, Compare2 and Dual mnemonics are in "Source,Destination" order. The first operand is the source and the second operand is the destination. This is useful for programmers who are accustomed to programming on systems where the first operand is generally the source, such as the MC68K processors. Size Tells the assembler to assemble the code for a smaller size. This code is useful to use in functions or in code you know will be used rarely, as often times smaller code may mean slower code, but on several processors, less code "decodes" faster than longer, normally faster code and this may end up making smaller code run faster, even if the clock counts for the instructions make it look slower. A good example is the Intel Pentium series, which takes 1 clock cycle PER OPCODE to decode the data length of code the first time the code is run when it is out of the cache. In this case, 9 opcodes which could pipe and take only a total of 5 clocks, will in fact take 9 (per opcode decoding) + 5 or 14 clocks. A 3 opcode equivalent that takes 9 clocks would only take 12 clocks, not to mention that it'd use a lot less space. This also helps if you want to make the program shrink since faster code can sometimes be much longer. If this option is used outside of a function, it will stay on for all remaining functions, or until an Option Speed; is used. If it is used inside of a function, it only affects that function. This is the default option for the assembler. Speed Tells the assembler to assemble the code for more speed. This code is very useful to use in often used functions or just before loops where you know that the code will be executed often and will need more speed, even if it means larger code. If this option is used outside of a function, it will stay on for all remaining functions, or until an Option Size; is used. If it is used inside of a function, it only affects that function from that point on. Region Defines a region of code/data. The following regions are permitted: Code Holds the code to be compiled and executed. Only code may go here, and must always be inside of functions. If you want code to execute on startup, place it inside the Main function. If global variables, data, or code outside of a function is placed here, an error will occur. Constants Holds pre-initialized data that will never be modified This region is important when an AnyASM file may be assembled on a ROM or other read-only device. It should contain all constants and strings used by the program. If the program attempts to modify a value in the Constants region, an error will occur. On systems where this is not necessary, such as a PC, this region will be merged with the Data region Data Holds pre-initialized data, useful for holding predefined values. This region is designed to hold all variables that have initial values. Though values that are not predefined may go here, they will always have a default value of 0 for integers, Null for pointers, False for booleans, or 0.0 for floating point. EmptyData Holds all remaining variables. This region is designed to tell the assembler how much memory to allocate after initialization for global variables that do not need to be pre-initialized. Placing a value here that is predefined will cause an error to occur. RegisterLocal Declares a local register variable. This is designed only to have processor capable data types assigned to the registers. This may end up not being a register at all if there are no registers available or if a temporary register value cannot be completely contained in available registers. Var Declares global variables. All Vars are automatically initialized before the program begins, negating the need to initialize them by hand. They are also deinitialized automatically. Multiple variables may be declared in the same statement by separating them with a comma. Multiple types may be declared in the same statement by adding a type designation before the first variable of that type. A type is always required for the first variable in a list. Examples: Var UProcInt a, b, c; // Declares a, b and c as UProcInts Var U32 a, S16 b, c = -19; // Declares a as a U32, b and c as S16s and initializes // c to a value of -19 Var String q; // Declares q as a pointer to a string and it is initialized before // Main is called. Var String q = "1234"; // Same as above, but a value of 1234 is assigned to the // string after initialization Var StringMem q = "1234"; // StringMem and StringNull are stored inline, so unlike // the string data type, they need no initialization // When declaring constant strings like this, you should // generally use VarConstant instead of Var. Also, while // StringNull may take up less memory/space if the string // is over 255 characters long, it is usually a bit // slower because it has to be crawled to find it's // length. VarConstant Used to declare constants or variables that will never change. One of the best features of VarConstants is the ability to store their values in a ROM or other read-only portion of memory, if necessary. Writing to a VarConstant is prohibited and will cause an error. VarLocal Declares local variables. See Var for format information. All VarLocals are automatically initialized as well as automatically deinitialized by EndFunction. ################################################################################################### ################################################################################################### ################################################################################################### Mnemonics *************************************************************************************************** Assignments Mnemonic Type Description --------------------------------------------------------------------------------------------------- Exchange Dual The value of param1 is moved to param0 and the value of param0 is moved to param1 Aliases: Exch, Xchg, Swap, <=> GetReturnValue GetReturn The return value is moved to param0. This instruction is only useful after a mnemonic that sets the return value. Though this is actually a "Unary" operation for coding purposes, the assembler handles this as an "Assignment" type because the previous instruction returns data of a specific type. Restrictions: o This instruction must be used immediately after an instruction that sets the return value (such as Call). o The second parameter is always a volatile register. It's exact type and access method is determined by the processor it is used on and will always be coded directly in the encoding charts. Aliases: GetReturn, GetRet LoadAddress Assign The address of param1 is moved to param0. If param1 is a register, an error occurs. Restrictions: o The first parameter is expected to be a Pointer or a PointerInt, but another data type is possible though it will not necessarily hold a valid pointer. Don't expect a U8 to hold a pointer on any system, for example. o The second parameter must be a memory operand and it's operand size does not matter. Aliases: LA, LEA, =& Move Assign The value of param1 is moved to param0 Aliases: Mov, Load, Ld, = SetBoolean SetBool Sets a boolean value based on the current flags and the condition used. This instruction is only useful after a Comparison. Aliases: SetBool, SetIf, SetB *************************************************************************************************** Comparisons All comparison instructions MUST BE PAIRED with a valid condition instruction. For example, Compare followed by SetBoolean is valid, whereas Compare followed by Move is not. This is especially important for processors where the comparison itself needs to know the condition (such as the Super-H series). Mnemonic Type Description --------------------------------------------------------------------------------------------------- Compare Compare2 Compares param1 to param0, setting system flags for a conditional jump. Aliases: Comp, Cmp Test Compare1 Performs a simple test on the parameter. Only the following conditions are applicable after this type of compare: Zero or NotZero - based on the value tested Positive or Negative - based on the high bit of the value tested Aliases: Tst TestCompare Compare2 Compares param0 and param1 by doing a bitwise AND of the two values. The result is not stored anywhere, only the flags are affected Zero - Set only if there are no similar bits in the two values Negative - Set only if the high bit is set in both values Aliases: TstComp *************************************************************************************************** Math Operations Mnemonic Type Description --------------------------------------------------------------------------------------------------- Add Assign Adds param1 to param0 Aliases: += And Assign Bitwise ANDs param0 with param1, storing the result in param0 Aliases: &= AndNot Assign Bitwise ANDs param0 with the complement of param1, storing the result in param0 Aliases: &~= SwitchEndian Unary Switches every last byte in the value, essentially reversing endianness Aliases: EndianSwitch, ByteSwap, BSwap Complement Unary Complements the value (switches all bits) Aliases: Compl, Cmpl, Not, ~ Decrement Unary Subtracts 1 from the value Aliases: Dec, -- Divide Assign Divides param0 by param1, storing the quotient in param0 Aliases: Div, /= DivideModulus Dual Divides param0 by param1, storing the quotient in param0 and the remainder in param1 Aliases: DivMod, /%= ExclusiveOr Assign Bitwise XORs param0 with param1, storing the result in param0 Aliases: Xor, EOr, ^= InclusiveOr Assign Bitwise ORs param0 with param1, storing the result in param0 Aliases: Or, IOr, |= Increment Unary Adds 1 to the value Aliases: Inc, ++ Modulus Assign Divides param0 by param1, storing the remainder in param0 Aliases: Mod, %= Multiply Assign Multiplies param1 to param0 Aliases: Mul, *= Negative Unary Takes the negative of the value Aliases: Neg, - RotateLeft Assign Rotates bits in param0 by param1 bits to the left Aliases: RoL, <<<= RotateRight Assign Rotates bits in param0 by param1 bits to the right Aliases: RoR, >>>= ShiftLeft Assign Shifts bits in param0 by param1 bits to the left Aliases: ShL, <<= ShiftRight Assign Shifts bits in param0 by param1 bits to the right. If param0 is unsigned, it uses a Logical Shift Right. If param1 is signed, it uses an Arithmetic Shift Right. Aliases: ShR, >>= Subtract Assign Subtracts param1 from param0 Aliases: Sub, -= *************************************************************************************************** Execution Control Mnemonic Type Description --------------------------------------------------------------------------------------------------- Call Function Param0 is a function name or pointer, followed by a parenthesis surrounded list of parameters to send to the function. Optionally, a comma then a variable name for the return value to be stored to may be supplied. Jump Jump Jumps to the label/variable value specified Aliases: Jmp, JP, J Branch, BranchAlways, Bra JumpCondition JumpCond Jumps to the label with the same name as Param0 if the specified condition is set. See Conditions below. This instruction is only useful after a Comparison. Aliases: JumpCond, JumpIf, Jcc, JCond, JmpIf, JIf, BranchCondition, BranchCond, BraCond, Bcc JumpSubroutine Jump Jumps to the label with the same name as Param0, storing the current location on the stack for later retrieval Aliases: JumpSub, JmpSub, JSub, JSR BranchSubroutine, BranchSub, BraSub, BSR NoOperation Implied Does nothing. Every processor supports a No Operation of some sort, and this allows using it for whatever reason one may have for it. Aliases: NoOp, NOp Return Return Returns from a function with the return value specified If no value is specified, the return value (if any) is undefined In essence, this simply sets the return value (if any), then jumps to EndFunction Aliases: Ret ReturnSubroutine Implied Returns from a sub-label previously called with JumpSub This DOES NOT return from a function properly and should NEVER be used in place of the Return mnemonic when it is appropriate. Aliases: ReturnSub, RetSub, RTS, RSR *************************************************************************************************** Conditions The following conditions exist: Condition Description --------------------------------------------------------------------------------------------------- Equal Equal, Zero or False (For Boolean) Aliases: Zero, False, E, Z, F NotEqual Not equal, Non-Zero or True (For Boolean) Aliases: NotZero, True, NE, NZ, T Greater Greater than. The sign used is based on the previous compare statement. Aliases: NotLessOrEqual, NotLessOrZero, G, NLE, NLZ GreaterOrEqual Greater or equal The sign used is based on the previous compare statement. Aliases: GreaterOrZero, NotLess, GE, GZ, NL Less Less than The sign used is based on the previous compare statement. Aliases: NotGreaterOrEqual, NotGreaterOrZero, L, NGE, NGZ LessOrEqual Less than or equal The sign used is based on the previous compare statement. Aliases: LessOrZero, NotGreater, LE, LZ, NG Negative High bit is set (value is Negative) Aliases: Minus, Sign, N, M, S Positive High bit is not set (value is Positive) Aliases: Plus, NotSign, P, NS *************************************************************************************************** Possible later additions: MultiplyOut - This version of Multiply would return a data type of at least twice that of it's parameters. For example, if two U32s were used, the return value would be a U64. Note that this only works for up to a U128. Use GetReturnValue to get the value returned. DivideOut/ModulusOut - The result of these would be a returned value. Use GetReturnValue to get the value returned. DivideModulusOut - The modulus result of this would be a returned and the destination would have the division. Use GetReturnValue to get the modulus value, if desired. BitTest, BitTestAndSet, BitTestAndComplement, BitTestAndReset - All natively supported by x86, MC68K and many other processors. BitFindFirst, BitFindLast - Also supported by x86. ################################################################################################### ################################################################################################### ################################################################################################### Mnemonic Types In terms of processors or assemblers, we can think of the "mnemonic type" as similar in functionalty to an addressing mode. Each Mnemonic Type is very specific in how it functions. The actual functionality is controlled by the specific mnemonic. The parameters may be any of: Condition One of the possible conditions Function A function or a variable pointing to a function (may have values in parenthesis) Label A label or a variable pointing to a label Value A variable OR an immediate Variable A variable *************************************************************************************************** Assign Parameters: Variable and Value Description: The first parameter is modified while the second parameter stays the same and is used only to achieve the result. Examples: Move val1, val2; // val1 becomes the same as val2 Add val1, val2; // val1 becomes the sum of val1 + val2 Xor val1, val2; // val1 becomes the result of val1 ^ val2 *************************************************************************************************** Compare[x] Parameters: Variable (Compare1) or Variable and Value (Compare2) Description: Neither the first nor the second parameter is changed, only the processor's flags are changed. These mnemonics are designed to be used just before a condition-based type, such as JumpCondition. Examples: Compare val1, val2; // Sets the processor's flags based on a comparison of value 1 and // value 2 Test val1; // Sets the processor's flags based on value 1 *************************************************************************************************** Dual Parameters: 2 Variables Description: Both parameters are used and modified. Examples: DivMod val1, val2; // val1 becomes (val1 / val2) and val2 becomes (val1 % val2) // (val1 /%= val2) Exchange val1, val2; // val1 becomes val2 and val2 becomes val1 (val1 <=> val2) *************************************************************************************************** Function Parameters: Function Description: Used to call a function. Each parameter is configured according to the format of the called function. Example: Call SomeFunc(val1, val2); // Assuming SomeFunc has a format of type C, this pushes // val2 onto the stack, then val1, then calls the function. // After the call, it then corrects the stack back to how it // was before pushing the values onto the stack *************************************************************************************************** GetReturn Parameters: Variable Description: This special type is used exclusived by the GetReturnValue mnemonic. Though codewise it appears to be a Unary value, in actuality it is an assignment type because the return value is the second (not shown) parameter. For encoding purposes, the second parameter is always treated as a volatile register, even when it is actually a pointer. Examples: GetReturnValue val1; // val1 becomes the return value *************************************************************************************************** Implied Parameters: None Description: The mnemonic alone contains all information necessary for it's execution. Example: ReturnSub; // Returns from a JumpSub NoOperation; // Does absolutely nothing *************************************************************************************************** Jump Parameters: Label Description: Program control is changed to the location of the label specified. The label may also be stored as a pointer. Examples: Jump SomeLabel; // Jumps to SomeLabel JumpSub SomeLabel; // Pushes the location of the next instruction on the stack, then // Jumps to SomeLabel. SomeLabel is expected to return control // back via a ReturnSubroutine or ReturnSubCondition *************************************************************************************************** Return Parameters: None or Value Description: Sets the value to return to the calling code if there is a parameter, then jumps to the end of the function. Examples: Return; // Returns out of the current function Return val1; // Sets the return value to val1 then returns out of the current // function *************************************************************************************************** SetBool Parameters: Condition and Variable Description: This special type is for the SetBoolean mnemonic. It sets the second operand to true (1) if the condition is met, and to false (0) if it is not met. These instructions MUST be proceeded by a condition statement, such as Compare and Test. Example: Compare val1, val2; // Compare val1 and val2 SetBoolean Equal, boolval; // Set boolval to true if they are equal, false otherwise *************************************************************************************************** Unary Parameters: Variable Description: The parameter is modified directly. The operation is implied by the mnemonic, so no additional parameters are necessary Examples: Complement val1; // val1 becomes the bitwise one's complement of itself (~val1) Decrement val1; // val1 becomes val1 - 1 (val1--) ################################################################################################### ################################################################################################### ################################################################################################### Programming Rules and Restrictions *************************************************************************************************** Bitwise Operations Unlike mathematical operations, bitwise operations do NOT sign extend values to match the same bit depth before execution. All values passed to bitwise operations are assumed to be zero extended which subsequently simplifies them. If sign extending is necessary, the values involved may be sign extended via type casting before use in bitwise operations. *************************************************************************************************** Bit Shifting Operations Bit shifts are only valid between 0 and one less than the number of bits of the destination. This means that "ShiftLeft x, 9" when x is a U8, is not valid. ################################################################################################### ################################################################################################### ################################################################################################### Things to Do or Consider Bitwise operations currently do not sign extend signed values. Should they?