AltiVec
From Free net encyclopedia
AltiVec is a floating point and integer SIMD instruction set designed and owned by Apple Computer, IBM and Motorola (the AIM alliance), and implemented on versions of the PowerPC including Motorola's G4 and IBM's G5 processors. AltiVec is a tradename owned solely by Motorola, so the system is also referred to as Velocity Engine by Apple and VMX by IBM.
Altivec is not shared technology. The technology sharing agreement signed by IBM, Apple and Motorola was in fact killed by Altivec, as Apple's choice of the Altivec containing G4 chip was the first chip outside of the agreement to be used.
VMX is distinct from Altivec. IBM backward-engineered the Altivec instructions and included them in a larger instruction set termed VMX. IBM never licensed any Altivec technology from Motorola.
AltiVec was the most powerful SIMD system in a desktop CPU when it was first introduced in the late-1990s. Compared to its contemporaries (Intel's integer-only MMX, floating point SSE, and various systems from other RISC vendors), AltiVec offered more registers that could be used in more ways and operated on by a much more flexible instruction set. However, Intel's third-generation and fourth-generation SIMD instruction sets, SSE2 and SSE3 (initially available for the Pentium 4 and also implemented by AMD in its AMD64 architectures), has many more functions than AltiVec. SSE3 is supported by the Intel Core Duo processors used in the first Apple Macintosh computers based on Intel architecture.
Both AltiVec and SSE feature 128-bit vector registers that can represent sixteen 8-bit signed or unsigned chars, eight 16-bit signed or unsigned shorts, four 32-bit ints or four 32-bit floating point variables. Both provide cache-control instructions intended to minimize cache pollution when working on streams of data.
They also exhibit important differences. Unlike SSE2, AltiVec supports a special RGB "pixel" data type, but it does not operate on 64-bit double precision floats, and there is no way to move data directly between scalar and vector registers. In keeping with the "load/store" model of the PowerPC's RISC design, the vector registers, like the scalar registers, can only be loaded from and stored to memory. However, AltiVec provides a much more complete set of "horizontal" operations that work across all the elements of a vector; the allowable combinations of data type and operations are much more complete. 32 128-bit vector registers are provided, compared to 8 for SSE and SSE2, and most AltiVec instructions take three register operands compared to only two register/register or register/memory operands on IA-32.
AltiVec is also unique in its support for a flexible vector permute instruction, in which each byte of a resulting vector value can be taken from any byte of either of two other vectors, parametrized by yet another vector. This allows for sophisticated manipulations in a single instruction.
Recent versions of the GNU Compiler Collection, IBM Visual Age Compiler and other compilers provide intrinsics to access AltiVec instructions directly from C and [[C++]] programs. The "vector" storage class is introduced to permit the declaration of native vector types, e.g., "vector unsigned char foo;" declares a 128-bit vector variable named "foo" containing sixteen 8-bit unsigned chars. Overloaded intrinsic functions such as "vec_add" emit the appropriate op code based on the type of the elements within the vector, and very strong type checking is enforced. In contrast, the Intel-defined data types for IA-32 SIMD registers declare only the size of the vector register (128 or 64 bits) and in the case of a 128-bit register, whether it contains integers or floating point values. The programmer must select the appropriate intrinsic for the data types in use, e.g., _mm_add_epi16(x,y) for adding two vectors containing eight 16-bit integers.
AltiVec was developed between 1996 and 1998 by Keith Diefendorff, the distinguished scientist and director of microprocessor architecture at Apple Computer.
Apple was the primary customer for AltiVec (Apple has announced intentions to use Intel based CPUs going forward), and uses it to accelerate multimedia applications such as QuickTime and iTunes. AltiVec is also put to work in key parts of Apple's Mac OS X including in the Quartz graphics compositor. Other companies such as Adobe use it for optimization of their image-processing programs such as Adobe Photoshop. Motorola was the first to supply AltiVec enabled processors starting with their G4 line (Motorola has since spun off its processor division into the separate company Freescale). AltiVec is also used in some embedded systems to provide extremely high-performance digital signal processing.
IBM has consistently left VMX out of their proprietary POWER systems, which are intended for mainframe and server applications where it is not very useful. However, the most recent PowerPC 970 (dubbed the G5 by Apple) desktop CPU from IBM does include a high-performance AltiVec unit. The core includes a multiplier/adder unit and a full VMX unit.
Also, according to IBM, some VMX instructions are included in the PowerPC-based Xenon processor used in the Microsoft Xbox 360 games console (The Microsoft Xbox 360 CPU story [1]) . The Cell's PPE does feature VMX as well.
According to the Apple document [2] Altivec as implemented on the G4 and G5 PPC processors can perform 8 32-bit floating point operations per cycle and SSE as implemented processors by AMD and Intel can perform only 4 32-bit floating point operations per cycle (x86s are also capable of 2 64-bit floating point operations per cycle using SSE-2, whereas AltiVec is not). The obvious implication is that SSE would need a clock 2 times the frequency of Altivec to perform the same number of FLOPS. The clock speed of the Pentium chip is not currently 2 times the clock speed of the PowerPC chip so the Altivec is faster at the level of operations per second. Of course application speed depends on many other factors such as memory and I/O architecture, compilers, operating system software and application software design.
Apple claims that the PowerPC is faster than Pentiums for certain multimedia applications ([3]), but these are not corroborated by other independent sources, and don't isolate specific AltiVec based kernels. Essentially no publicly disclosed objective analysis demonstrates any particular advantage of AltiVec over the SSE family of SIMD instruction sets, though in theory it should be faster for some applications because of AltiVec's larger register and instruction sets.