Inline Asm In Dev C++

Generally, using inline ASM is a bad idea. You're probably going to produce worse ASM than a compiler. Using any ASM in a method generally defeats any optimizations which try to touch that method (i.e. You can still use the disabled keywords such as asm if you instead use the alternate keywords in the reserved namespace such as asm. Be wary of getting inline assembly just right: The compiler doesn't understand the assembly it emits and can potentially cause rare nasty bugs if. Non-Confidential PDF versionARM DUI0375H ARM® Compiler v5.06 for µVision® armcc User GuideVersion 5Home Using the Inline and Embedded Assemblers of the ARM Compiler Inline assembler Thumb instruction set restrictions in C and C code 6.10 Inline assembler Thumb instruction set restrictions in C and C code The inline assembler supports Thumb state in ARM architectures. Assembly speed = C as C is compiled into assembly code. With handwritten assembly, you might be able to make the routine more efficient than the C compiler does. On most modern systems though, the results aren't noticeable. The difference is that C is much faster development time while assembly is much better for custom processors.

Basic, intermediate, and advanced concepts

In this article, we discuss several use scenarios for inline assembly, also called inline asm. For beginners, we introduce basic syntax, operand referencing, constraints, and common pitfalls that new users need to be aware of. For intermediate users, we discuss the clobbers list, as well as branching topics that facilitate the use of branch instructions within inline asm stanzas in their C/C++ code. Lastly, we discuss memory clobbers and the volatile attribute for advanced users who use inline asm to optimize their code. We conclude with an example of multithreaded locking with inline asm.

Basic inline asm

In the asm block shown in code Listing 1, the addc instruction is used to add two variables, op1 and op2. In any asm block, assembly instructions appear first, followed by the inputs and outputs, which are separated by a colon. The assembly instructions can consist of one or more quoted strings. The first colon separates the output operands; the second colon separates the input operands. If there are clobbered registers, they are inserted after the third colon. If there are no clobbered inputs for the asm block, the third colon can be omitted, as Listing 2 shows.

Listing 1. Opcodes, inputs, outputs, and clobbers

Listing 2. No clobbered inputs for the asm block, so third colon omitted

Oct 16, 2015  The latest inline assembly support (referred to as ASM in this article) allows embedding hardware instructions directly within the standard C and C programs with full language environment and run time library support. Feb 16, 2005  how can i get inline assembly to work in dev-cpp in can get it to work in msvcpp but not in dev-cpp. Any help would be. Arrays In Inline x86 Assembly. By saxman in forum C Programming. By Lurker in forum C Programming Replies: 3 Last Post:, 01:26 AM. Inline assembly question. By DavidP in forum C Programming. WxDev-C is an extension of Dev-C by Colin Laplace et. This program helps you to create dialogs and frames for wxWidgets visually using a form designer. With all the wonderful features of Dev-C, wxDev-C is still being actively developed.

Note:
The clobbers list is discussed later in this section.

Each instruction 'expects' inputs and outputs to be passed in a certain format. In the previous example, the addc. instruction expects its operands to be passed through registers, hence op1 and op2 are passed into the asm block with the 'b' and 'r' constraints. For a complete listing of all legal asm constraints for the IBM XL C and C++ compiler, see the compiler language reference.

Register constraints on variable declarations

In some programs, you will want to tie variables to certain hardware registers. This is done at the variable declaration. The following example ties the variable res to GPR0 throughout the life of the program:

When the variable type is not matched with the type of target hardware register, you will receive a compilation error notice.

After a variable is tied to a specific register, it is not possible to use another register to hold the same variable. For example, the following code will cause a compilation error, the variable res is associated at declaration time with GPR0, but in the asm block, the user attempts to use any register but GPR0 to pass in res.

Listing 3. Compilation error when conflicting constraints are used on a variable

Inline asm in dev-c++

In the example in Listing 4, there is no output operand for the stw instruction, hence the outputs section of the asm is empty. None of the registers is modified, so they are all input operands, and the target address is passed in with the input operands. However, something is modified: the addressed memory location. But that location is not explicitly mentioned in the instruction, so the output of the instruction is implicit rather than explicit.

Listing 4. Instructions with no output operands

Listing 5. Instructions with preserved operands

In listing 5, if you want to preserve the initial value of a result variable that is not necessarily modified by the asm block, then you need to use the + (plus sign) constraint to preserve the initial value of that variable, as is shown with res[0].

Target memory addresses in inline asm

If an instruction specifies two of its arguments in a form similar to D(RA), where D is a literal value and RA is a general register, then this is taken to mean that D+RA is an effective address. In this case, the appropriate constraints are 'm' or 'o'. Both 'm' and 'o' refer to memory arguments. Constraint 'o' is described as an offsettable memory location. But in the IBM® POWER® architecture, nearly all memory references require an offset, so 'm' and 'o' are equivalent. In this case, you can use a single constraint to refer to two operands in the instruction. Listing 6 is an example.

Listing 6. A single constraint to refer to two operands in the instruction

The form of the instruction stb (from the assembly language reference) is: stb RS,D(RA).

Although the stb instruction technically takes three operands (a source register, an address register, and an immediate displacement), the asm description of it uses only two constraints. The '=m' constraint is used to notify the compiler that the memory address of res is to be used for the result of the store instruction (The 'sync' instruction is often used for this purpose, but there are others available, as described in the POWER ISA See Resources for a link.) The '=m' indicates that the operand is a modified memory location. You do not need to know the address of the target location beforehand, because that task is left to the compiler. This allows the compiler to choose the right register (r1 for an automatic variable, for instance) and apply the right displacement automatically. This is necessary, because it would generally be impossible for an asm programmer to know what address register and what displacement to use. In other instances, you can also override this behavior by manually calculating the target address as in the following example.

Listing 7. Manually calculating the target address

In this code, the specification %1(%2) represents a base address and an offset, where %2 represents the base address, and res[0] and %1 represent the offset, sizeof(int). As a result, the store is performed at the effective address, res.

Note:
For some instructions, GPR0 cannot be used as a base address. Specifying GPR0 tells the assembler not to use a base register at all. To ensure that the compiler does not choose r0 for an operand, you can use the constraint 'b' rather than 'r'.

Addressing modes for POWER and PowerPC instructions

The IBM POWER architecture type is RISC. Instructions typically operate either with three register arguments (two registers for source arguments, one register to hold a result) or with two registers and an immediate value (one register and one immediate value for the source arguments, and one register to hold the result). There are exceptions to this pattern, but mostly it is true.

Among the instructions that take two registers and an immediate value, there are two special subclasses: load instructions and store instructions. These instructions use the immediate value as an offset to the value in the source register to form an 'effective address.' The offset value is typically an offset onto the stack (r1 is the stack pointer), or it is an offset to the TOC (Table of Contents -- r2 is the TOC pointer). The TOC is used to promote the construction of position-independent code, which enables efficient dynamic loading of shared libraries on these machines.

When using inline asm, you do not have to use specific registers nor manually construct effective addresses. The argument constraints are used to direct the compiler to choose registers or construct effective addresses appropriate to the requirements of the instructions. Thus, if a general register is required by the instruction, you could use either the 'r' or 'b' constraint. The 'b' constraint is of interest, because many instructions use the designation of register 0 specially –- a designation of register 0 does not mean that r0 is used, but instead a literal value of 0. For these instructions, it is wise to use 'b' to denote the input operands to prevent the compiler from choosing r0. If the compiler chooses r0, and the instruction takes that to mean a literal 0, the instruction would produce incorrect results.

Listing 8. r0 and its special meaning in the stbx instruction

Here, the expected result string is abcdefgy, but if the compiler chose r0 for %1, then the result would incorrectly be ybcdefgh. To prevent this from happening, use 'b' as in Listing 9 shows.

Listing 9. Using 'b' constraint to signify non-zero GPR

Another example is in the following ASM block. While it appears that the asm block below does res=res+4, that is not the actual functional behavior of the code.

Listing 10. Meaning of r0 in the second operand with addi opcode

Because res is tied to r0, the translation of the asm code in assembly looks becomes:
addi 0,0,4

The second operand does not translate to register zero. Instead, it translates to the immediate number zero. In effect, the following is the result of the addi operation:
res=0+4

This case is special to the addi opcode. If, instead, res was tied to r1, then the original intended behavior would have been obtained:
res=res+4

Clobbers list

Basic clobbers list

In cases when registers that are not directly tied to the inputs/outputs are used within the asm block, the user must specify such registers within the clobbers list.

The clobbers list is used to notify the compiler that the registers contained within the list can potentially have their values altered. Hence, they should not be used to hold other data other than for the instructions that they are used for.

In the example in Listing 11, registers 8 and 7 are added to the clobbers list because they are used in the instructions but are not explicitly tied to any of the input/output operands. Also, condition register field zero is added to the clobbers list for the same reason. Although it is not present in the input/output operands, the mfocrf instruction reads that bit from the condition register and moves the value in register 8.

Listing 11. Clobbers list example

Inline asm

If, instead, the mfocrf instruction read from condition register field 1 (cr1), then that field would need to be added to clobbers list instead. Also, the period [full stop] at the end of the addc. and andi. instructions means their results are compared to zero, and the result of the comparison is stored in condition register field 0.

When clobbered registers are omitted from the clobbers list, the results from the asm operations might not be correct. This is because such clobbered registers might be reused to hold intermediate values for other operations. Unless the compiler detects that those registers are clobbered, the intermediate data can be used to perform the programmer's instructions, with inaccurate results. Also, the user's asm instructions may clobber values used by the compiler.

Exceptions to the clobbers list

Nearly all registers can be clobbered, except for those listed in Table 1.

Table 1. Registers that cannot be clobbered
RegisterDescription
r1stack pointer
r2toc pointer
r11environment pointer
r1364 bit mode thread local data pointer
r30often used by the compiler as a stack frame pointer, pointer to constant area
r31often used by the compiler as a stack frame pointer, pointer to constant area

Memory clobbers

Memory clobber implies a fence, and it also impacts how the compiler treats potential data aliases. A memory clobber says that the asm block modifies memory that is not otherwise mentioned in the asm instructions. So, for example, a correct use of memory clobbers would be when using an instruction that clears a cache line. The compiler will assume that virtually any data may be aliased with the memory changed by that instruction. As a result, all required data used after the asm block will be reloaded from memory after the asm completes. This is much more expensive than the simple fence implied by the 'volatile' attribute (discussed later).

Remember, because the memory clobber says anything might be aliased, everything that is used needs to be reloaded after the asm, regardless of whether it had anything to do with the asm. A memory clobber can be added to the clobbers list by simply using the 'memory' word instead of a register name.

Branching

Basic branching

Branching can be tricky with inline asm, this is because you need to know the address of the instruction to which to branch before compile time. Although this is not possible, you can use labels. Using labels, the branch-to address can be designated with a unique identifier that can be used as a target branch address.

Within a single source file, labels cannot be repeated within an inline asm block, nor within neighboring asm blocks within the same source. In a given program, each label is unique. There is an exception to this rule, however, and this is if you use relative branching (more on this later). With relative branching, more than one label with the same identifier can be found within the same program and within the same asm block.

Note:
Labels cannot be used in asm to define macros because of possible namespace clashes.

In the example in Listing 12, the branch occurs when the LT bit, bit 0, of the condition register is set. If is it not set, then the branch is not taken.

Asm In C

Listing 12. Example of branch taken when LT bit of CR0 is set (0x80000000)

Likewise, a branch would occur if the GT bit (bit 1) of the condition register is set, as in the code in Listing 13.

Osdev

Listing 13. Example of branch taken when GT bit of CR0 is set (0x40000000)

With inline asm, it is perfectly legal to branch within the same asm block; however, it is not possible to branch between different asm blocks, even if they are contained within the same source.

Relative branching

As discussed earlier, relative branching allows you to reuse the name of a label more than once within the same program. It is predominantly used, however, to dictate the position of the target address relative to the branch instruction. These are examples of the relative branch codes that can be used:

  • F -forward
  • B -backward

Note:
That they must be suffixed to numeric labels to be syntactically correct.

In this example (Listing 14), notice that the target address is referenced as 'Hereb'. In this case, we use the label of the target address appended with a suffix that dictates where this label is located relative to the branch instruction itself. The label 'Here' is located before the branch instruction, hence the use of the 'b' suffix in 'Hereb.'

Listing 14. Needs caption

The condition register

The condition register is used to capture information on results of certain instructions.

For non-floating point instructions with period (.) suffixes that set the CR, the result of the operation is compared to zero.

  • If the result is greater than zero, then bit 1 of the CR field is set (0x4).
  • If it is less than zero, then bit 0 is set (0x8).
  • If the result is equal to zero, then bit 2 is set (0x2).

For all compare instructions, the two values are compared, and any CR field can be set (not just CR0). Table 2 lists the bits and their corresponding meanings (there are eight such sets of 4 bits in the condition register, called 'cr0, cr1, cr2 … cr7').

Table 2. Bits of a CR field and the meanings of different settings
BitNameDescription
0LTRA < 0
1GTRA > 0
2EQRA = 0
3UOverflow for integer operations.
Unordered, for floating point operations

Note:
For floating point instructions with a period suffix, CR1 is set to the upper 4 bits of the FPSCR.

Blocking the Volatile attribute

Making an inline asm block 'volatile' as in this example, ensures that, as it optimizes, the compiler does not move any instructions above or below the block of asm statements.

This can be particularly important in cases when the code is accessing shared memory. This will be illustrated in the next section on multithreaded locking.

Multithreaded locking

One of the most common uses of inline asm is in writing short segments of instructions to manage multithreaded locks. Because of the loose memory model on the POWER architecture, constructing such locks requires careful use of a pair of instructions:

  • One instruction that loads the lock word and creates a 'reservation'
  • Another that updates the lock word if the reservation hasn't been lost in the interim

Inline Asm

Note:
If the reservation has been lost, a loop can be used to retry repeatedly.

Listing 15 shows a basic inline function that attempts to acquire a lock (there are several problems with this code, which we discuss after these examples).

Listing 15. Example of Acquire lock function coded in asm

Listing 16 is an example of how this inline function could be used.

Listing 16. Example of how the acquireLock function can be used

Because the function is inline, the resulting code won't have an actual call in it. Instead, it will precede the use of the shared region x with the instructions to acquire the lock.

The first problem to notice with this code is the lack of a synchronization instruction. One of the key performance enhancements enabled by the loose memory model of the POWER architecture is the ability of the machine to reorder loads and stores to make more efficient use of internal pipelines. However, there are times when the programmer needs to curtail this reordering to some degree to properly access shared storage. In the case of a lock, you would not want a load of data from the shared region ('x' in the case above) to be reordered so that it occurs before the lock on the region is acquired. For this reason, a synchronization instruction should be inserted to tell the machine to limit reordering in this case. The sync instruction is often used for this purpose, but there are others available, as described in the POWER ISA (see Resources). In the code example in Listing 17, we inserted sync instruction to prevent reordering of loads of 'x' (this is called an 'import barrier'):

Listing 17. Sync example

In that asm block, the sync will prevent any subsequent loads from occurring until after it is known which way the preceding branch went. That way the variable x will not be loaded unless the branch was not taken and the acquireLock returns true.

So, are we set now? Unfortunately not. We still have to worry what the compiler might do.

Modern optimizing compilers can be very aggressive in moving code around -- and even removing it completely -- if it appears that the changes might make the program run faster without changing the semantics of the code. However, compilers typically aren't aware of the complexities involved with accessing shared memory. For example, a compiler might move the statement temp = x + 1; to a place higher in the program if it determines that the result would be scheduled more efficiently (and it assumes that the 'if' is usually taken). Of course, that would be disastrous from the viewpoint of accessing shared data. To prevent the movement of any loads (or any instructions at all) from below the inline asm to a location above it, you can use the keyword 'volatile' (also known as the volatile attribute) to modify the asm block, as Listing 18 shows.

Listing 18. Volatile keyword example

When you do this, an internal fence is placed before and after the asm block that prevents instructions from being moved past it. And remember that this asm block is inlined, so it will prevent the access to x from being moved above the asm-implemented lock.

Memory clobbers in multithreaded locking

The discussion of multithreaded locking would not be complete without a mention of memory clobbers. The keyword memory is often added to the clobber list in such situations, although it is not always clear why it would be needed. The use of memory in the clobbers list means that memory is altered unpredictably by the asm block.

However, memory modifications in the locking example given are quite predictable. Although the variable lock is a pointer (that points to a lock location), that isn't any more unpredictable that the expression '*lock' in a C program. In that case, a well-behaved compiler would likely associate the expression '*lock' with all variables of the appropriate type, and so would correctly reload any affected variables after the pointer was used for modifying data. Nonetheless, the use of memory clobbers appears to be a pervasive practice, which is probably driven by an abundance of caution when dealing with shared regions. Programmers should be aware, though, of the performance penalties involved and of alternative approaches.

When an inline asm includes 'memory' in the clobbers list, it means that any variable in the program might have been modified by the asm, so it must be reloaded before it is used. This requirement can pretty much put a sledgehammer to optimization efforts by the compiler. A potentially lighter-weight approach would be to make the shared region volatile (in addition to the asm block itself). Making a variable volatile means its value must be reloaded before it is used in any given expression. If the shared region in question is a data structure, such as a list or queue, this will ensure that the updated structure is reloaded after the lock is acquired. However, all of the non-shared data accesses can enjoy the full complement of compiler optimizations.

Tip:

If the shared data structure is accessed by a pointer (say *p), be sure to declare the pointer so that you ndicate that it's the object pointed to that is volatile, not the pointer itself. For example, this declares that the list pointed to by p is volatile:

Acknowledgments

Thank you Ian McIntosh, Christopher Lapkowski, Jim McInnes, and Jae Broadhurst. You've each played an important role in publishing this article.

Downloadable resources

-->

Microsoft Specific

Assembly language serves many purposes, such as improving program speed, reducing memory needs, and controlling hardware. You can use the inline assembler to embed assembly-language instructions directly in your C and C++ source programs without extra assembly and link steps. The inline assembler is built into the compiler, so you don't need a separate assembler such as the Microsoft Macro Assembler (MASM).

Note

Programs with inline assembler code are not fully portable to other hardware platforms. If you are designing for portability, avoid using inline assembler.

Inline assembly is not supported on the ARM and x64 processors. The following topics explain how to use the Visual C/C++ inline assembler with x86 processors:

END Microsoft Specific

See also

Compiler Intrinsics and Assembly Language
C++ Language Reference