Redundant array of independent disks/Temp

From Free net encyclopedia

Template:Prod

In computing, a redundant array of independent disks (more commonly known as a RAID array) is a system of using multiple hard drives is a system for sharing or replicating data among the drives. The benefit of RAID is increased data integrity, fault-tolerance and/or performance, over using drives singularly. Put more simply, RAID is a way to combine multiple hard drives into one single logical unit. So instead of four different hard drives, the operating system sees only one hard drive. RAID is typically used on server computers, and is usually implemented with identically-sized disk drives. With decreases in hard drive prices and wider availability of RAID options built into motherboard chipsets, RAID is also being found and offered as an option in higher-end end user computers, especially computers dedicated to storage-intensive tasks, such as video and audio editing.

The original RAID specification (which also used the term, "Inexpensive" instead of "Independent") suggested a number of prototype "RAID Levels", or combinations of disks. Each had theoretical advantages and disadvantages. Over the years, different implementations of the RAID concept have appeared. Most differ substantially from the original idealized RAID levels, but the numbered names have remained. This can be confusing, since one implementation of RAID-5, for example, can differ substantially from another. RAID-3 and RAID-4 are often confused and even used interchangeably.

The very definition of RAID has been argued over the years. The use of the term, "Redundant", leads many to split hairs over whether RAID-0 is "real" RAID. Similarly, the change from "Inexpensive" to "Independent" confuses many as to the intended purpose of RAID. There are even some single-disk implementations of the RAID concept! For the purpose of this article, we will say that any system which employs the basic RAID concepts to recombine physical disk space for purposes of reliability or performance is a RAID system.

Contents

History

RAID was first patented by IBM in 1978. In 1988, RAID levels 1 through 5 were formally defined by David A. Patterson, Garth A. Gibson and Randy H. Katz in the paper, "A Case for Redundant Arrays of Inexpensive Disks (RAID)". This was published in the SIGMOD Conference 1988: pp 109–116. The term "RAID" started with this paper.

It was particularly ground-breaking work in that the concepts are "obvious". This paper spawned the entire disk array industry.

RAID implementations

Inexpensive vs. independent

While the I in RAID now generally means independent, rather than inexpensive, one of the original benefits of RAID was that it did use inexpensive equipment, and this still holds true in many situations, where IDE/ATA disks are used.

More commonly, independent (more expensive) SCSI hard disks are used, although the cost of such disks is now much lower—and much lower than the systems RAID was originally intended to replace.

Hardware vs. software

RAID can be implemented either in hardware or software.

With a software implementation, the operating system manages the disks of the array through the normal drive controller (IDE, SCSI, Fibre Channel or any other). This option can be slower than hardware RAID, but it does not require the purchase of extra hardware.

A hardware implementation of RAID requires (at a minimum) a special-purpose RAID controller. On the desktop, this may be a PCI expansion card, or might be a capability built-in to the motherboard. In larger RAIDs, the controller and disks are usually housed in an external multi-bay enclosure. The disks may be IDE, SCSI, or Fibre Channel while the controller links to the host computer with one or more high-speed SCSI or Fibre Channel connections. This controller handles the management of the disks, and performs parity calculations (needed for many RAID levels). This option tends to provide better performance, and makes operating system support easier. Hardware implementations also typically support hot swapping, allowing failed drives to be replaced while the system is running.

Both hardware and software versions may support the use of a hot spare, a preinstalled drive which is used to immediately (and usually automatically) replace a failed drive.

Standard RAID levels

RAID 0

A RAID 0 array (also known as a stripe set) splits data evenly across two or more disks with no parity information for redundancy. RAID-0 is normally used to increase performance, although it is also a useful way to create a small number of large virtual disks out of a large number of small ones. Although RAID-0 was not specified in the original RAID paper, an idealized implementation of RAID-0 would split I/O operations into equal-sized blocks and spread them evenly across two disks. RAID-0 implementations with more than two disks are also possible, however the reliability of a given RAID-0 set is equal to the average reliability of each disk divided by the number of disks in the set. That is, reliability (MTBF) decreases linearly with the number of members - so a set of two disks is half as reliable as a single disk.

RAID-0 is useful for setups such as large read-only NFS servers where mounting many disks is time-consuming or impossible and redundancy is irrelevant. Another use is where the number of disks is limited by the operating system - in Windows, the number of drive letters is limited to 24, so RAID-0 is a popular way to use more than this many disks. However, since there is no redundancy, yet data is shared between drives, hard drives cannot be swapped out as all disks are interdependent upon each other.

RAID 0 was not one of the original RAID levels.

Concatenation

Although a concatenation of disks (sometimes called JBOD, or "Just a Bunch of Disks") is not one of the numbered RAID levels, it is a popular method for combining multiple physical disk drives into a single virtual one. As the name implies, disks are merely concatenated together, end to end, so they appear to be a single large disk.

In this sense, concatenation is akin to the reverse of partitioning. Whereas partitioning takes one physical drive and creates two or more logical drives, JBOD uses two or more physical drives to create one logical drive.

In that it consists of an Array of Inexpensive Disks (no redundancy), it can be thought of as a distant relation to RAID. JBOD is sometimes used to turn several odd-sized drives into one useful drive. Therefore, JBOD could use a 3 GB, 15 GB, 5.5 GB, and 12 GB drive to combine into a logical drive at 35.5 GB, arguably more useful than the individual drives separately.

RAID 1

A RAID 1 Array creates an exact copy (or mirror) of all of data on two or more disks. This is useful for setups where redundancy is more important than using all the disks maximum storage capacity. The array can only be as big as the smallest member disk, however. An ideal RAID-1 set contains two disks, which increases reliability by a factor of two over a single disk, but it is possible to have many more than two copies. Since each member can be addressed independently if the other fails, reliability is a linear multiple of the number of members. RAID-1 can also provide enhanced read performance, since many implementations can read from one disk while the other is busy.

One common practice is to create an extra mirror of a volume (also known as a Business Continuance Volume or BCV) which is meant to be split from the source RAID set and used independently. In some implementations, these extra mirrors can be split and then incrementally re-established, instead of requiring a complete RAID set rebuild.

RAID 2

A RAID 2 Array stripes data at the bit (rather than block) level, and uses a Hamming code for error correction. The disks are synchronized by the controller to run in perfect tandem. This is the only original level of RAID that is not currently used.

RAID 3

A RAID 3 Array uses byte-level striping with a dedicated parity disk. RAID-3 is extremely rare in practice. One of the side effects of RAID-3 is that it generally cannot service multiple requests simultaneously. This comes about because any single block of data will by definition be spread across all members of the set and will reside in the same location, so any I/O operation requires activity on every disk.

In our example, below, a request for block "A1" would require all three data disks to seek to the beginning and reply with their contents. A simultaneous request for block B1 would have to wait.

  Traditional
    RAID-3
A1  A1  A1  A1p
A2  A2  A2  A2p
A3  A3  A3  A3p
B1  B1  B1  B1p
Note: A1, B1, etc each represent one data block

RAID 4

A RAID 4 Array uses block-level striping with a dedicated parity disk. RAID-4 looks similar to RAID 3 except that it stripes at the block, rather than the byte level. This allows each member of the set to act independently when only a single block is requested. If the disk controller allows it, a RAID-4 set can service multiple read requests simultaneously. Network Appliance Corporation uses RAID-4 on their Filer line of NFS servers.

In our example, below, a request for block "A1" would be serviced by disk 1. A simultaneous request for block B1 would have to wait, but a request for B2 could be serviced concurrently.

  Traditional
    RAID-4
A1  A2  A3  Ap
B1  B2  B3  Bp
C1  C2  C3  Cp
D1  D2  D3  Dp
Note: A1, B1, etc each represent one data block

RAID 5

A RAID 5 array uses block-level striping with parity data distributed across all member disks. RAID-5 is one of the most popular RAID levels, and is frequently used in both hardware and software implementations. Virtually all storage arrays offer RAID-5.

In our example, below, a request for block "A1" would be serviced by disk 1. A simultaneous request for block B1 would have to wait, but a request for B2 could be serviced concurrently.

  Traditional
    RAID-5
A1  A2  A3  Ap
B1  B2  Bp  B3
C1  Cp  C2  C3
Dp  D1  D2  D3
Note: A1, B1, etc each represent one data block

Every time a data "block" (sometimes called a "chunk") is written on a disk in an array, a parity block is generated within the same stripe. (A block or chunk is often composed of many consecutive sectors on a disk, sometimes as many as 256 sectors. A series of chunks [a chunk from each of the disks in an array] is collectively called a "stripe".) If another block, or some portion of a block is written on that same stripe, the parity block (or some portion of the parity block) is recalculated and rewritten. The disk used for the parity block is staggered from one stripe to the next, hence the term "distributed parity blocks".

Interestingly, the parity blocks are not read on data reads, since this would be unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of a data sector results in a CRC error. In this case, the sector in the same relative position within each of the remaining data blocks in the stripe and within the parity block in the stripe are used to reconstruct the errant sector. The CRC error is thus hidden from the main computer. Likewise, should a disk fail in the array, the parity blocks from the surviving disks are combined mathematically with the data blocks from the surviving disks to reconstruct the data on the failed drive "on-the-fly".

This is sometimes called Interim Data Recovery Mode. The main computer is unaware that a disk drive has failed. Reading and writing to the drive array continues seamlessly, though with some performance degradation.

In RAID 5 arrays which have only one parity block per stripe, the failure of a second drive results in total data loss.

The maximum number of drives is theoretically unlimited, but it is common practice to keep the maximum to 14 or less for RAID 5 implementations which have only one parity block per stripe. The reason for this restriction is that there is a greater likelihood that a drive will fail in an array when there is greater number of drives.

The mean time between failures (MTBF) value for a drive within the array becomes smaller.) In implementations with greater than 14 drives, RAID 5 with dual parity (also known as RAID 6) is sometimes used, since it can survive the failure of two disks.

RAID 6

A RAID 6 array uses block-level striping with parity data distributed 'twice' across all member disks. It was not one of the original RAID levels.

In RAID-6, parity is generated and written to two distributed parity stripes, on two separate drives.

  Traditional        Typical
    RAID-5            RAID-6
A1  A2  A3  Ap    A1  A2  Ap  Ap
B1  B2  Bp  B3    B1  Bp  B2  Bp
C1  Cp  C2  C3    Cp  C1  Cp  C2
Dp  D1  D2  D3    Dp  Dp  D1  D2
Note: A1, B1, etc each represent one data block

RAID-6 is more redundant than RAID-5, but is very inefficient with low count of drives. See also Double Parity, below, for another more redundant implementation.

Nested RAID levels

Many storage controllers allow RAID levels to be nested. That is, one RAID array can use another as its basic element.

RAID 0+1

A RAID 0+1 Array is a RAID array used for both replicating and sharing data among disks. The difference between RAID 0+1 and RAID 10 is the location of each RAID system - is it a stripe of mirrors or a mirror of stripes? Consider an example of RAID 0+1: 6 120GB drives need to be set up on a RAID 0+1 array. Below is an example configuration:

               RAID 1
                 |
        /-----------------\
        |                 |
      RAID 0            RAID 0
  /-----------\     /-----------\
  |     |     |     |     |     |
120GB 120GB 120GB 120GB 120GB 120GB

where the maximum storage space here is 360GB, spread across two arrays. The advantage is that when a hard drive fails in one of the RAID 0 arrays, the missing data can be transferred from the other array. However, adding an extra hard drive requires you to add two hard drives to balance out storage among the arrays.

Is not as robust as RAID 1+0. Cannot tolerate two simultaneous disk failures, if not from the same stripe. That is to say, once a single disk fails, all the disks in the other stripe are each individual single points of failure. Also, once the single failed disk is replaced, in order to rebuild its data all the disks in the array must participate in the rebuild.

RAID 10

A RAID 10 array is similar to a RAID 0+1 array except that the RAID levels used are reversed - RAID 10 is a stripe of mirrors.

RAID 53

RAID 50

info

RAID 51

Proprietary RAID levels

Although all implementaions of RAID differ from the idealized specification to some extent, some companies have developed entirely proprietary RAID implementaions that differ substantially from the rest of the crowd.

Double parity

One common addition to the existing RAID levels is double parity, sometimes implemented and known as diagonal parity. As in RAID-6, there are two sets of parity check information created. Unlike RAID-6, however, the second set is not a mere "extra copy" of the first. Rather, most implementations of Double Parity calculate the extra parity in a different "direction". If we were to call traditional RAID parity "horizontal", then Double Parity might be calculated vertically, or even diagonally, across a matrix of disks.

  Traditional        Typical        Double parity
    RAID-5            RAID-6            RAID-5
A1  A2  A3  Ap    A1  A2  Ap  Ap    A1  A2  A3  Ap
B1  B2  Bp  B3    B1  Bp  B2  Bp    B1  B2  Bp  B3
C1  Cp  C2  C3    Cp  C1  Cp  C2    C1  Cp  C2  C3
Dp  D1  D2  D3    Dp  Dp  D1  D2    1p  2p  3p  --
Note: A1, B1, etc each represent one data block

Drives can be organized into orthogonal matricies, where rows of drives form parity groups, similar to RAID 5, while the columns also keep consistent parity data with each other. If a single drive fails, either its row or column parity may be used to rebuild it. Serveral drives on any one column or row may fail before the array is corrupt. Any group of non-coincident drives may fail before the array is corrupt.

RAID 7

RAID 7 is a trademark of Storage Computer Corporation. It adds caching to RAID-3 or RAID-4 to improve performance.

RAID S or Parity RAID

RAID S is EMC Corporation's proprietary striped parity RAID system used in their Symmetrix storage systems. It is similar to RAID-4 in that it does not stripe data across disks. Instead, each volume exists on a single physical disk, and multiple volumes are arbitrarily combined for parity purposes. EMC originally referred to this capability as RAID-S, and then renamed it Parity RAID for the Symmetrix DMX platform. EMC now offers standard striped RAID-5 on the Symmetrix DMX as well.

  Traditional          EMC
    RAID-5            RAID-S
A1  A2  A3  Ap    A1  B1  C1  1p
B1  B2  Bp  B3    A2  B2  C2  2p
C1  Cp  C2  C3    A3  B3  C3  3p
Dp  D1  D2  D3    A4  B4  C4  4p
Note: A1, B1, etc each represent one data block.  A, B, etc are entire volumes.

See also

External links

es:RAID fr:RAID ja:レイド sv:RAID