The argument in favor of assembly language

For many C/C++ programmers, assembly language programming seems like a waste of time, what with all those wonderful optimizing compilers.  Well...maybe on other platforms, but on Intel based machines, hand crafted assembler code is quite a bit different than the slop being produced by compilers.  Let's take a look at the example below.  This output was generated by the Microsoft Visual C++ v5.0 compiler with optimization set for maximum speed coded specifically for a Pentium:

Notice the wasteful use of registers when they are not needed, as well as the costly prefix bytes to work with word sized registers in a 32-bit code segment, and a few stalls of the two pipelines due to register contention.

    mov    cl, BYTE PTR _ucByte2$[esp-4]
    push    ebx
    mov    bl, BYTE PTR _ucByte1$[esp]
    xor    ax, ax       
<--- Here we see that even the big boys still have moldy code in their products :(
    xor    dx, dx
    push    esi
    mov    esi, DWORD PTR _regs$[esp+4]
    mov    al, cl
    mov    dl, bl
    add    eax, edx
    mov    dl, BYTE PTR [esi+12]
    and    dl, 208                     ;
000000d0H
    test    al, al
    mov    BYTE PTR [esi+12], dl
    jne    SHORT $L22070

Here is the same function done by hand-crafted code, notice no size prefix bytes needed and no register contention pipeline stalls:

    xor    edx,edx
    xor    eax,eax
    push   esi
    push   ebx
    mov    dl, _ucByte2$[esp-4]
    mov    esi, _regs$[esp+4]
    mov    al, _ucByte1$[esp]
    add    eax,edx
    and    byte ptr [esi+12], 208
    test   al,al
    jne    short $L22070


The benefit of coding it in assembler is usually a minimum of a 2X performance gain. For code which does bit-twiddling or graphics, the gains can be considerably greater.  It is rarely necessary to write the entire application in assembler, but selective use of the inline assembler for critical sections can work wonders on the performance of your program.

Back