Fixed-point arithmetic

From Free net encyclopedia

This article is about a form of limited-precision arithmetic in computing. For the fixed points of a mathematical function, see fixed point (mathematics).

1 Introduction
2 Fixed Point Nomenclature
3 Fixed Point Operations, Precision Loss, and Overflow
4 Examples still in common use
5 External links

[edit]

Introduction

In computing, a fixed-point number representation is a real data type for a number that has a fixed number of digits before and after the radix point (e.g. "." in English decimal notation). Fixed-point number representation is in contrast to the more flexible floating point number representation. Fixed-point numbers are useful for representing fractional values in native two's complement format if the executing processor has no floating point unit (FPU) or if fixed-point provides improved performance or accuracy. Most low-cost embedded processors do not have an FPU.

The bits to the left of the radix point are magnitude bits that represent integer values, the bits to the right of the radix point represent fractional values. Each fractional bit represents an inverse power of 2. Thus the first fractional bit is 1/2, the second is 1/4, the third is 1/8 and so on. For signed fixed point numbers in two's complement format, the upper bound is given by:

<math>2^{m-1} - {1 \over 2^f}</math> and the lower bound is given by: <math>-2^{m-1}</math>

where m is bits of magnitude and f is bits of fraction.

The asymmetry between upper and lower bounds is due to the two's complement notation. For example, a 16 bit signed fixed-point binary number with 4 bits after the radix point yields 12 magnitude bits and 4 fractional bits. It can represent numbers between 2047.9375 and -2048. A 16 bit unsigned fixed-point binary number with 4 fractional bits ranges between 4095.9375 and 0. Fixed point numbers can represent fractional powers of two exactly, but, like floating point numbers, cannot exactly represent fractional powers of 10. If exact fractional powers of ten are desired, then binary-coded decimal (BCD) format should be used. However, BCD does not make as efficient use of bits as two's complement notation, nor is it as computationally fast.

For example, one-tenth (.1) and one-hundredth (.01) can be represented only approximately by two's complement fixed point or floating point representations, while they can be represented exactly in BCD representations.

Integer fixed-point values are aways exactly represented so long as they fit within the range determined by the magnitude bits. This is in contrast to floating-point representations, which have separate exponent and value fields which can allow specification of integer values larger than the number of bits in the value field, causing the represented value to lose precision.

A common use for fixed-point BCD numbers is for storing monetary values, where the inexact values of floating-point numbers are often a liability. Historically, fixed-point representations were the norm for decimal data types (for example, in PL/I or COBOL). The Ada programming language includes built-in support for both fixed-point and floating-point.

Very few computer languages include built-in support for fixed point values, because for most applications, floating-point representations are fast enough and accurate enough. Floating-point representations are easier than fixed-point representations, because they can handle a wider dynamic range and do not require programmers to specify the number of digits after the radix point.

However, if they are needed, fixed-point numbers can be implemented even in programming languages like C and [[C++]] that do not include such support built-in.

[edit]

Fixed Point Nomenclature

There are various notations used to represent word length and radix point in a fixed point number. One common notion is the "Q" prefix. Where the number following the Q specifies the fractional bits to the right of the radix point. Eg. Q15 represents a number with 15 fractional bits. This notation is ambiguous since it does not specify the word length, however it is usually assumed that the word length is either 16 or 32 bits depending on the target processor in use. Another notation, the "fx" prefix, is followed by a dotted pair of numbers that represent the magnitude bits to the left of the decimal point followed by the word length. Eg. fx1.16 describes a number with 1 magnitude bit and 15 fractional bits.

[edit]

Fixed Point Operations, Precision Loss, and Overflow

Because fixed point operations can produce results that have more bits than the operands, there is opportunity for information loss. For instance, the result of fixed point multiplication could potentially have as many bits as the sum of the number of bits in the two operands. In order to fit the result into the same number of bits as the operands, the answer must be rounded or truncated. If this is the case, the choice of which bits to keep is very important. When multiplying two fixed point numbers with the same format, for instance with I integer bits, and Q fractional bits, the answer could have up to 2*I integer bits, and 2*Q number of fractional bits. For simplicity, many coders of fixed-point multiply use the same result format as the operands. This has the effect of keeping the middle bits; the I-number of least significant integer bits, and the Q-number of most significant fractional bits. Fractional bits lost below this value represent a precision loss which is common in fractional multiplication. If non-zero integer bits are lost, however, the value will be radically inaccurate. This is considered to be an overflow, and needs to be avoided in embedded calculations. It is recommended that a model based operator simulation tool like VisSim be used to detect and avoid such overflows by use of appropriate result word size and radix point, proper scaling gains, and magnitude limiting of intermediate results. Some operations, like divide, often have built in result limiting so that any positive overflow results in the largest possible number that can be represented by the current format. Likewise, negative overflow results in the largest negative number represented by the current format. This built in limiting is often referred to as saturation. Some processors support a hardware overflow flag that can generate an interrupt on the occurance of an overflow, but it is usually too late to salvage the proper result at this point.

[edit]

Examples still in common use

GnuCash is an application for tracking money. It is written in C and switched from a floating-point representation of money to a fixed-point implementation as of version 1.6. This change was made to avoid the potential rounding errors of floating-point representations.
Tremor and Toast are software libraries that decode the Ogg Vorbis and GSM Full Rate audio formats respectively. These codecs use fixed-point arithmetic because many audio decoding hardware devices do not have an FPU (to save money) and audio decoding requires enough performance that a software implementation of floating-point on low-speed devices would not produce output in real time.
All 3D graphics engines on Sony's original PlayStation and Nintendo's Game Boy Advance (only 2D) and Nintendo DS (2D and 3D) video game systems use fixed-point arithmetic for the same reason as Tremor and Toast: to gain throughput on an architecture without an FPU.
Fractint represents numbers as Q3:29 fixed-point numbers[1].
VisSim A visually programmed block diagram language that supports a fixed-point block set to allow simulation and automatic code generation of fixed-point operations. Both word size and radix point can be specified on an operator basis.