X86 assembly programming in real mode

From Free net encyclopedia

Template:Lowercase

x86 assembly programming in real mode is a type of assembly computer programming for the Intel x86 in real mode. It involves the manipulation of several 16-bit processor registers, and dealing with physical addresses in memory only (as opposed to protected mode). Perhaps the most popular use of this type of programming was writing DOS programs in the 1980s. All modern x86 operating systems use protected mode; however, when the computer boots, it starts up in real mode, so the part of the operating system responsible for switching into protected mode must operate in the real mode environment.

Contents

Registers

Each register is specialized for a certain task, and operations that deal with that task are often run more efficiently if the right register is used.

Registers in real mode include:

  • data registers
    • AX, the accumulator
    • BX, the base register
    • CX, the counter register
    • DX, the data register
  • address registers
    • SI, the source register
    • DI, the destination register
    • SP, the stack pointer register
    • BP, the stack base pointer register

Each data register can be broken up into two eight-bit registers - that is 16 bits of data in a 16 bit register can be addressed 8 bits at a time: the upper eight and the lower eight bits, and can be treated as registers in their own right. For example, in the AX register, the AH register addresses the upper eight bits of the AX register, and the AL register addresses the lower eight bits of the AX register. The other data registers can be addressed in this way by changing the suffix - "X" for extended, "H" for high, and "L" for low.

Collectively the data and address registers are called the general registers.

With the general registers, there are additionally the:

  • segment registers
    • CS, the code segment register
    • DS, the data segment register
    • ES, an extra segment register
    • FS, another extra segment register (not implemented before the 80386)
    • GS, yet another extra segment register (not implemented before the 80386)
    • SS, the stack segment register
  • other registers
    • IP, the instruction pointer register
    • FLAGS, the flag register

The IP register points to where in the program the processor is currently executing its code. The IP register cannot be accessed by the programmer directly.

The FLAGS register contains the current state of the processor. Each bit in this register is called a flag. Each flag can be either 1 or 0, set or not set. Some of the flags that the FLAGS register contains is carry, overflow, zero and single step.

Flags are notably used in the x86 architecture for comparisons. A comparison is made between two registers, for example, and in comparison of their difference a flag is raised. A jump instruction then checks the respective flag and jumps if the flag has been raised: for example

 cmp ax, bx
 jne do_something

first compares the AX and BX registers, and if they are unequal, the code branches off to the do_something label.

Mnemonics for opcodes

In real mode, the following mnemonics are available: aaa, aad, aam, aas, adc, add, and, call, cbw, clc, cld, cli, cmc, cmp, cmpsb, cmpsw, cwd, daa, das, dec, div, esc, hlt, idiv, imul, in, inc, int, into, iret, ja, jae, jb, jbe, jc, jcxz, je, jg, jge, jl, jle, jmp, jna, jnae, jnb, jnbe, jnc, jne, jng, jnge, jnl, jnle, jno, jnp, jns, jnz, jo, jp, jpe, jpo, js, jz, lahf, lds, lea, les, lock, lodsb, lodsw, loop, loope, loopne, loopnz, loopz, mov, movsb, movsw, mul, neg, nop, not, or, out, pop, popf, push, push, puchf, rcl, rcr, rep, repe, repne, repnz, repz, ret, rol, ror, sahf, sal, sar, sbb, scasb, scasw, shl, shr, stc, std, sti, stosb, stosw, sub, test, wait, xchg, xlat, xor

There are also some undocumented opcodes that has no mnemonics named after them. For example, 0x0F while executed by most 8086-processors could be translated to "POP CS". Other processors in the x86-family may not interpret undocumented opcodes as earlier processors do. Therefore, use of undocumented opcodes might render your program useless in future x86-processors.

The real mode addressing model

This is quite simple, but is quite controversial amongst programmers. The x86 architecture uses a process known as segmentation to address memory, and not a linear method as used in other architectures. Segmentation involves decomposing a linear address into two parts - a segment and an offset. The segment address points to the beginning of a 64K group of addresses and an offset from the base address of the specified segment. To translate back into a linear address, the segment address is shifted 4 bits left and then added to the offset. The formula looks like this: segment*0x10+offset.

In real mode, two registers are used for a memory address: one to hold the segment, and one to hold the offset.

For example, if DS contains the hexadecimal number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE One quick way to do this without a hexadecimal calculator would be to just add a zero to the hexadecimal number in the segment register and then add the content of the offset register to that number. The above would be

0xDEAD0 + 0xCAFE

In referring to an address with a segment and an offset, the notation of segment:offset is used, in the above example, the linear address 0xEB5CE can be written as 0xDEAD:0xCAFE, or if one has a segment and offset register pair, DS:DX.

There are some special combinations of segment registers and general registers that point to important addresses:

  • CS:IP points to the address where the processor will fetch its next byte of code.
  • SS:SP points to the location of the last item pushed onto the stack.
  • DS:SI is often used to point to data that is about to be copied to ES:DI

The PC memory layout in real mode

Start   End      Description
0x00000 0x003FF  Interrupt Vector Table (IVT)
0x00400 0x005FF  BIOS Data Area (BDA)
0x00600 0x9FFFF  Ordinary application RAM
0xA0000 0xBFFFF  Video memory
0xC0000 0xEFFFF  Optional ROMs (The VGA ROM is usually located at C0000)
0xF0000 0xFFFFF  BIOS ROM

Note that the BDA, Video memory, and ROMs are system architecture specific features of the original IBM PC architecture (and is retained even in modern PC compatible systems), and are not dictated by the x86 architecture itself. However, this mean that we have about 640KiB of general application RAM available in real mode.

Everything above 0xFFFFF is called the "high memory area".

Interrupts in real mode

The x86 architecture is an interrupt-driven architecture. In other words, instead of the processor polling for readiness, hardware and software can request service by sending interrupt requests to the processor.

There are two kinds of interrupts: software interrupts and hardware interrupts. Software interrupts are often used to communicate with the operating system. A standard DOS software interrupt is interrupt vector 0x21 from which nearly all real-mode DOS system functions are accessed . Another standard software interrupt is interrupt vector 0x03, also known as int3 (the special x86 instruction which triggers it), is used as a breakpoint to enter a software-debugger. A typical hardware interrupt is generated when an external circuit requests attention from the processor, such as when the system clock ticks. The Intel 8259 and Intel APIC Architecture are used to route peripherial bus interrupt lines to processor interrupt vectors. There are two 8259 chips in an x86 PC, an 8259A and an 8259B. If the 8259A chip is mapped to the processors real-mode interrupt vectors 0x20 to 0x27, then every time the system clock ticks the interrupt vector 0x20 is executed.

At the start of memory lies the real-mode Interrupt Vector Table (IVT). The IVT contains 256 real-mode pointers for all of the real-mode Interrupt Service Routines (ISRs). Real-mode pointers are 32-bits wide, formed by a 16-bit segment offset followed by a 16-bit segment address. The IVT has the following layout:

Entry Address Pointer
    0  0x0000 [[offset][segment]]
    1  0x0004 [[offset][segment]]
    2  0x0008 [[offset][segment]]
  ...     ...                 ...
  255  0x03FC [[offset][segment]]

Example

This NASM-assembler program is an example of real mode code that prints "Hello world!" to the screen by means of writing directly to video.

[org 0x100]
[bits 16]
[section .text]

        mov ax, cs         ; cs = code segment
        mov ds, ax         ; ds = cs 
                           ; (this way, we don't have to care much about where our data is located)
        mov ax, 0xB800     ; 0xB8000 is the base of the text video memory
        mov es, ax         ; Remember the memory model!
        mov si, text       ; Remember that ds:si -> es:di
        xor di, di         ; a xor a is always zero. (di is given the value 0)

around: mov al, [ds:si]    ; give al the value of what ds:si points to
        cmp al, 0          ; compare if al contains zero ("Hello world!",0)
        je stop            ; if so, stop writing to the screen
        mov [es:di], al    ; move the content of al to es:di (text video memory)
        inc si             ; select the next byte in the Hello world!-string
        add di, 2          ; and goto the next position on the screen.
        jmp around         ; and go back to the beginning of the loop
stop:   ret                ; and return back to the caller function

text    db "Hello world!",0

This program could be compiled into a DOS-compatible .com-file, it is also quite possible to assemble it to any other operating system running in realmode, or even no operating system at all, but you might need to make some minor changes in such cases. Because it does not make use of the screen-functions that is provided by DOS or the BIOS, the text that the program prints to screen will disappear when the program is terminated and other programs write to video memory.

See also