Creating and maintaining a software RAID array in Linux - mdadm

Lecture

Content

1 mdadm
2 Configuring Software RAID
- 2.1 Creating Partitions
- 2.2 Unmounting
- 2.3 Changing the type of partitions
- 2.4 Creating a RAID
- 2.5 Verifying assembly
- 2.6 Creating a File System Over a RAID Array
- 2.7 Creating the mdadm.conf configuration file
- 2.8 Creating a Mount Point for a RAID Array
- 2.9 Changing / etc / fstab
- 2.10 Mounting the file system of a new RAID array
- 2.11 Checking RAID Status
- 2.12 Boot Problem on Multidisk Systems
3 Further work with the array
- 3.1 Marking a Disk as Failed
- 3.2 Removing a failed disk
- 3.3 Adding a new disk
- 3.4 Building an Existing Array
- 3.5 Array Expansion
- 3.6 Resuming Delayed Sync
4 Rename array
5 Deleting an array
6 Additional Information
- 6.1 Software RAID Performance
- 6.2 Different Notes Related to RAID

mdadm

Software RAID management in Linux is performed using the mdadm program.

The mdadm program has several modes of operation.

Assemble (build): Assemble the components of the previously created array into an array. Components can be specified explicitly, but you can not specify - then they are searched by superblock.
Build (build): Build an array of components that do not have superblocks. No checks are performed; the creation and assembly of an array is basically no different.
Create: Create a new array based on the specified devices. Use superblocks placed on each device.
Monitor (monitoring): Watch for device status changes. For RAID0, this mode does not make sense.
Grow (expansion or reduction): Expansion or reduction of the array, include or remove new drives.
Incremental Assembly (incremental assembly): Add a disk to the array.
Manage: Various array management operations, such as replacing a disk and marking it as failed.
Misc (miscellaneous): Actions that do not belong to any of the above modes of operation.
Auto-detect (auto-detect): Activation of automatically detectable arrays in the Linux kernel.

Call format

  mdadm [mode] [array] [options]

Modes:

-A , --assemble - build mode
-B , --build - build mode
-C , --create - create mode
-F , --follow , --monitor - watch mode
-G , --grow - expansion mode
-I , --incremental - incremental mode

Configuring Software RAID

Consider how to configure a RAID array level 5 on three disk partitions. We will use sections:

  / dev / hde1
    / dev / hdf2
    / dev / hdg1

In the event that the sections are different, do not forget to use the appropriate file names.

Creating partitions

It is necessary to determine on which physical partitions a RAID array will be created. If there are already partitions, you need to find free ( fdisk -l ). If there are no sections yet, but there is an unallocated space, you can create them using fdisk or cfdisk .

View what sections there are:

  % # fdisk -l

  Disk / dev / hda: 12.0 GB, 12072517632 bytes
    255 heads, 63 sectors / track, 1467 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

  Device Boot Start End Blocks Id System
    / dev / hda1 * 1 13 104391 83 Linux
    / dev / hda2 14 144 1052257+ 83 Linux
    / dev / hda3 145 209 522112+ 82 Linux swap
    / dev / hda4 210 1467 10104885 5 Extended
    / dev / hda5 210 655 3582463+ 83 Linux
    ...
    ...
    / dev / hda15 1455 1467 104391 83 Linux

See which partitions are mounted where and how much free space is on them (sizes in kilobytes):

  % # df -k
    Filesystem 1K-blocks Used Available Use% Mounted on
    / dev / hda2 1035692 163916 819164 17% /
    / dev / hda1 101086 8357 87510 9% / boot
    / dev / hda15 101086 4127 91740 5% / data1
    ...
    ...
    ...
    / dev / hda7 5336664 464228 4601344 10% / var

Unmounting

If you use partitions created earlier, be sure to unmount them. RAID cannot be created over partitions on which mounted file systems are located.

  % # umount / dev / hde1
   % # umount / dev / hdf2
   % # umount / dev / hdg1

Change partition type

It is desirable (but not necessary) to change the type of partitions that will be included in the RAID and set it to FD (Linux RAID autodetect). You can change the partition type using fdisk.

Consider how to do this using the example of the / dev / hde1 partition .

% # fdisk / dev / hde
Set to 8355.
But this is larger than 1024,
setups cause problems with:
1) software that runs at boot time (eg, old versions of LILO)
2) booting and partitioning software from other OSs
(eg, DOS FDISK, OS / 2 FDISK)

Command (m for help):

Use FDISK Help

Now use the fdisk command to get some help:

Command (m for help): m
...
...
p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition's system id
...
...
Command (m for help):

Set The ID Type To FD

Partition / dev / hde1 is the first partition on disk / dev / hde.
Modify its type
and type code.
You also shouldn't have
of ID types in case you forget.

Command (m for help): t
Partition number (1-5): 1
Hex code (type L to list codes): L

...
...
...
16 Hidden FAT16 61 SpeedStor f2 DOS secondary
17 Hidden HPFS / NTF 63 GNU HURD or Sys fd Linux raid auto
18 AST SmartSleep 64 Novell Netware fe LANstep
1b Hidden Win95 FA 65 Novell Netware ff BBT
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help):

Make Sure The Change Occurred

Use the p command to get the new proposed partition table:

Command (m for help): p

Disk / dev / hde: 4311 MB, 4311982080 bytes
16 heads, 63 sectors / track, 8355 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/ dev / hde1 1 4088 2060320+ fd Linux raid autodetect
/ dev / hde2 4089 5713 819000 83 Linux
/ dev / hde4 6608 8355 880992 5 Extended
/ dev / hde5 6608 7500 450040+ 83 Linux
/ dev / hde6 7501 8355 430888+ 83 Linux

Command (m for help):

Save The Changes

Use the w command to save permanently to disk / dev / hde:

Command (m for help): w
The partition table has been altered!

Calling ioctl () to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

Similarly, you need to change the partition type for all other partitions included in the RAID.

Creating a RAID Array

Creation of a RAID array is performed using the mdadm program ( --create key). We will use the --level option to create a level 5 RAID. Using the --raid-devices key, we specify the devices on which the RAID array will be built.

  % # mdadm --create --verbose / dev / md0 --level = 5 --raid-devices = 3 / dev / hde1 / dev / hdf2 / dev / hdg1

     mdadm: layout defaults to left-symmetric
     mdadm: chunk size defaults to 64K
     mdadm: / dev / hde1 appears to contain an ext2fs file system
         size = 48160K mtime = Sat Jan 27 23:11:39 2007
     mdadm: / dev / hdf2 appears to contain an ext2fs file system
         size = 48160K mtime = Sat Jan 27 23:11:39 2007
     mdadm: / dev / hdg1 appears to contain an ext2fs file system
         size = 48160K mtime = Sat Jan 27 23:11:39 2007
     mdadm: size set to 48064K
     Continue creating array?  y
     mdadm: array / dev / md0 started.

If you want to immediately create an array where the disk is not enough (degraded) just type the word missing instead of the device name. For RAID5, this can be only one disk; for RAID6 - no more than two; for RAID1 - as many as you want, but there must be at least one worker.

Check the assembly

You can make sure that the RAID array is initialized correctly by viewing the / proc / mdstat file. This file reflects the current state of the RAID array.

  % # cat / proc / mdstat
     Personalities: [raid5]
     read_ahead 1024 sectors
     md0: active raid5 hdg1 [2] hde1 [1] hdf2 [0]
         4120448 blocks level 5, 32k chunk, algorithm 3 [3/3] [UUU]

     unused devices: <none>

Notice how the new RAID is called. In our case, it is called / dev / md0 . We will refer to the array by this name.

Creating a file system over a RAID array

A new RAID partition needs to be formatted, i.e. create a file system on it. This can be done with the help of the program from the mkfs family. If we will create an ext3 file system, we will use the mkfs.ext3 program. :

  % # mkfs.ext3 / dev / md0
     mke2fs 1.39 (29-May-2006)
     Filesystem label =
     OS type: Linux
     Block size = 1024 (log = 0)
     Fragment size = 1024 (log = 0)
     36144 inodes, 144192 blocks
     7209 blocks (5.00%) reserved for the super user
     First data block = 1
     Maximum filesystem blocks = 67371008
     18 block groups
     8192 blocks per group, 8192 fragments per group
     2008 inodes per group
     Superblock backups stored on blocks: 
             8193, 24577, 40961, 57345, 73729

     Writing inode tables: done                            
     Creating journal (4096 blocks): done
     Writing superblocks and filesystem accounting information: done

     This filesystem will be automatically checked every 33 mounts or
     180 days, whichever comes first.  Use tune2fs -c or -i to override.

It makes sense for the best file system performance to specify when creating the number of disks in the raid and the number of file system blocks that can fit in one stripe (chunk), this is especially important when creating arrays of RAID0, RAID5, RAID6, RAID10 levels. For RAID1 (mirror) it doesn’t matter since the recording is always on one device, and in other raid types the date is recorded sequentially on different disks in portions corresponding to the size of the stripe. For example, if we use RAID5 of 3 disks, with default stripe size of 64K we use the ext3 file system with a block size of 4K, then we can call the mkfs.ext command like this:

   
      % # mkfs.ext3 -b 4096 -E stride = 16, stripe-width = 32 / dev / md0

stripe-width is usually calculated as stride * N (N is the date drives in the array — for example, in RAID5 — two data drives and one parity) stripe himself

  % # mkfs.xfs -d su = 64k, sw = 3 / dev / md0

Creating the mdadm.conf configuration file

The system itself does not remember which RAID arrays it needs to create and which components it contains. This information is in the mdadm.conf file.

The lines that should be added to this file can be obtained with the command

  mdadm --detail --scan --verbose

Here is an example of its use:

  % # mdadm --detail --scan --verbose
     ARRAY / dev / md0 level = raid5 num-devices = 4 
     UUID = 77b695c4: 32e5dd46: 63dd7d16: 17696e09
     devices = / dev / hde1, / dev / hdf2, / dev / hdg1

If the mdadm.conf file does not yet exist, you can create it:

  % # echo "DEVICE partitions"> /etc/mdadm/mdadm.conf
    % # mdadm --detail --scan --verbose |  awk '/ ARRAY / {print}' >> /etc/mdadm/mdadm.conf

Creating a Mount Point for a RAID Array

Since we created a new file system, we will probably need a new mount point. Call it / raid .

  % # mkdir / raid

Change / etc / fstab

In order for the file system created on the new RAID array to be automatically mounted at boot time, we will add the appropriate entry to the / etc / fstab file storing the list of file systems automatically mounted at boot time.

  / dev / md0 / raid ext3 defaults 1 2

If we have combined sections that were used before into a RAID array, we need to disable their mounting: delete or comment out the corresponding lines in the / etc / fstab file . You can comment out the line with # .

  # / dev / hde1 / data1 ext3 defaults 1 2
     # / dev / hdf2 / data2 ext3 defaults 1 2
     # / dev / hdg1 / data3 ext3 defaults 1 2

Mounting the file system of a new RAID array

In order to gain access to the file system located on the new RAID array, it needs to be mounted. Mount using mount command.

If a new file system is added to the / etc / fstab file , you can mount it with the mount -a command (all file systems that should be mounted at boot time but not mounted now are mounted).

  % # mount -a

You can mount only the section we need (provided that it is specified in / etc / fstab ).

  % # mount / raid

If the partition in / etc / fstab is not specified, then during the mounting we must set at least two parameters - the mount point and the device to be mounted:

  % # mount / dev / md0 / raid

Checking RAID Status

Information about the status of the RAID array is in the / proc / mdstat file.

  % # raidstart / dev / md0
    % # cat / proc / mdstat
     Personalities: [raid5]
     read_ahead 1024 sectors
     md0: active raid5 hdg1 [2] hde1 [1] hdf2 [0]
         4120448 blocks level 5, 32k chunk, algorithm 3 [3/3] [UUU]

     unused devices: <none>

If the information in the file is constantly changing, for example, array rebuilding is in progress, it is convenient to view the constantly changing file using the watch program:

  % $ watch cat / proc / mdstat

How to perform the integrity check of the md0 software RAID array:

  echo 'check'> / sys / block / md0 / md / sync_action

How to see if there were any errors in the process of checking a software RAID array using the check or repair command:

  cat / sys / block / md0 / md / mismatch_cnt

Boot problem on multidisk systems

In some mdadm manuals, after the initial assembly of arrays, it is recommended to add the output of the "mdadm --detail --scan --verbose" command to the /etc/mdadm/mdadm.conf file:

  ARRAY / dev / md / 1 level = raid1 num-devices = 2 metadata = 1.2 name = linuxWork: 1 UUID = 147c5847: dabfe069: 79d27a05: 96ea160b
   devices = / dev / sda1
 ARRAY / dev / md / 2 level = raid1 num-devices = 2 metadata = 1.2 name = linuxWork: 2 UUID = 68a95a22: de7f7cab: ee2f13a9: 19db7dad
   devices = / dev / sda2

in which the partition names are hard coded ( / dev / sda1, / dev / sda2 in the given example).

If you then update the boot image (on Debian, call update-initramfs -u or dpkg-reconfigure mdadm), the partition names will be written to the boot image of the boot image and you will not be able to boot from the array if the hard disk configuration will change (the necessary sections will receive other names). To do this, it is not necessary to add or remove hard drives: in multidisk systems, their names can vary from boot to boot.

Solution: write to /etc/mdadm/mdadm.conf the output of the command "mdadm --detail --scan" (without --verbose).

At the same time in the mdadm.conf file there will be the UUIDs of the partitions that make up each RAID array. When the system boots, mdadm finds the necessary partitions regardless of their symbolic names by UUID.

The mdadm.conf extracted from the Debian bootstrap image:

  DEVICE partitions
 HOMEHOST <system>
 ARRAY / dev / md / 1 metadata = 1.2 UUID = 147c5847: dabfe069: 79d27a05: 96ea160b name = linuxWork: 1
 ARRAY / dev / md / 2 metadata = 1.2 UUID = 68a95a22: de7f7cab: ee2f13a9: 19db7dad name = linuxWork: 2

The result of the study section with the command 'mdadm --examine' "

  / dev / sda1:
          Magic: a92b4efc
        Version: 1.2
    Feature Map: 0x0
     Array UUID: 147c5847: dabfe069: 79d27a05: 96ea160b
           Name: linuxWork: 1
  Creation Time: Thu May 23 09:17:01 2013
     Raid Level: raid1
   Raid Devices: 2

The partition with UUID 147c5847: dabfe069: 79d27a05: 96ea160b will be included into the array, even if it becomes / dev / sdb1 during the next system boot.

In general, there are 2 mdadm.conf files that affect the automatic assembly of arrays:

one when the system boots, it is recorded in the boot image when it is updated;
the other is in the / etc / mdadm / directory and affects the auto-assembly of arrays inside a running system.

Accordingly, you may have information:

1) in the boot image (DHO) and in /etc/mdadm/mdadm.conf;

2) ONZ only (gets there when it is updated);

3) only in /etc/mdadm/mdadm.conf;

4) nowhere.

In that place where there is mdadm.conf, assembly happens by rules; where not, unpredictable.

Note: if you didn’t update the DPS after creating the RAID-arrays, their configuration will still get into it - when you update the image with another program / when updating the system (but you will not know about it with all the consequences).

Further work with the array

Mark disk as failed

The disk in the array can be conditionally failed, the key --fail ( -f ):

  % # mdadm / dev / md0 --fail / dev / hde1
    % # mdadm / dev / md0 -f / dev / hde1

Remove a failed disk

The failed disk can be removed with the --remove ( -r ) switch:

  % # mdadm / dev / md0 --remove / dev / hde1
    % # mdadm / dev / md0 -r / dev / hde1

Adding a new drive

You can add a new disk to the array using the --add ( -a ) and --re-add options :

  % # mdadm / dev / md0 --add / dev / hde1
    % # mdadm / dev / md0 -a / dev / hde1

Build an existing array

You can assemble an existing array using mdadm --assemble . As an additional argument, it is indicated whether the device should be scanned, and if not, which devices should be collected.

  % # mdadm --assemble / dev / md0 / dev / hde1 / dev / hdf2 / dev / hdg1
    % # mdadm --assemble --scan

Array expansion

You can expand the array with the --grow ( -G ) switch. First a disk is added, and then the array is expanded:

  % # mdadm / dev / md0 --add / dev / hdh2

Check that the disk (partition) was added:

  % # mdadm --detail / dev / md0
    % # cat / proc / mdstat

If the section is actually added, we can expand the array:

  % # mdadm -G / dev / md0 - rained devices = 4

The --raid-devices option specifies the new number of disks used in the array. For example, there were 3 disks, and now we are expanding to 4 - we specify 4.

It is recommended to set the backup file in case the array rebuilding is interrupted, for example, add:

  --backup-file = / var / backup

If necessary, you can adjust the speed of the array expansion process by specifying the desired value in the files

  / proc / sys / dev / raid / speed_limit_min
     / proc / sys / dev / raid / speed_limit_max

Ensure that the array has expanded:

  % # cat / proc / mdstat

It is necessary to update the configuration file with the changes made:

  % # mdadm --detail --scan >> /etc/mdadm/mdadm.conf
    % # vi /etc/mdadm/mdadm.conf

Resume delayed sync

Delayed sync:

  Personalities: [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
 md0: active (auto-read-only) raid1 sda1 [0] sdb1 [1]
       78148096 blocks [2/2] [UU]
         resync = PENDING

Resume:

  echo idle> / sys / block / md0 / md / sync_action

PS : If you saw “ active (auto-read-only) ” in the /proc/mdstat file, then maybe you just did not write anything to this array. For example, after the partition is mounted and any changes in the mounted directory, the status automatically changes:

  md0: active raid1 sdc [0] sdd [1]

Rename array

First, unmount and stop the array:

  % # umount / dev / md0
    % # mdadm --stop / dev / md0

Then you need to rebuild as md5 each of the sections sd [abcdefghijk] 1

  % # mdadm --assemble / dev / md5 / dev / sd [abcdefghijk] 1 --update = name

or so

  % # mdadm --assemble / dev / md5 / dev / sd [abcdefghijk] 1 --update = super-minor

Deleting an array

First, unmount and stop the array:

  % # umount / dev / md0
    % # mdadm -S / dev / md0

Then you need to wipe the superblock of each of the components of the array:

  % # mdadm --zero-superblock / dev / hde1
    % # mdadm --zero-superblock / dev / hdf2

If the action above did not help, then wipe it like this:

  % # dd if = / dev / zero of = / dev / hde1 bs = 512 count = 1
    % # dd if = / dev / zero of = / dev / hdf2 bs = 512 count = 1

Additional Information

man mdadm (eng.)
man mdadm.conf (English)
Linux Software RAID (English)
Gentoo Install on Software RAID (English)
HOWTO Migrate To RAID
RAID Conversion for Linux Software Crazy Sysadmins HOWTO (English)
Migrating To RAID1 Mirror on Sarge (English)
Configuring Software RAID-1 on Debian Etch (rus.), As well as discussion on the ENT
Undocumented Software RAID Chips in Linux (rus.)

Software RAID Performance

Prehistory

After creating a RAID5 array of three two-TB drives, I was unpleasantly surprised by the low write speed of the array: only 55 MB / s. I had to learn, with the help of what parameters you can achieve from the array of normal performance.

The thing turned out to be that

first, the default stripe_cache_size value set for the array is too small;
secondly, I created an array with the internal Write Intent Bitmap ( –bitmap=internal ) and a small bitmap chunk.

After increasing the stripe_cache_size and bitmap chunk size, the write speed rose to 130-150 MB / s. Read more about these and other options below.

Parameters affecting performance

stripe_cache_size

The default value is 256, the range is from 32 to 32768, while an increase of at least up to 8192 greatly helps write speed:

  echo 8192> / sys / block / md? / md / stripe_cache_size

Attention : measured in the кол-ве страниц * на кол-во устройств , and can be very voracious in terms of consumption of RAM. With the above value of 8192 , a memory page size of 4096 bytes (used in the Linux kernel on x86 and x86-64) and four disks in a RAID , the memory consumption will be:

  8192 * 4096 * 4 = 134217728 bytes (128 megabytes)

Creating and maintaining a software RAID array in Linux - mdadm

http://peterkieser.com/2009/11/29/raid-mdraid-stripe_cache_size-vs-write-transfer/
http://forum.qnap.com/viewtopic.php?f=50&t=30280

Write Intent Bitmap

After incorrectly rebooting the system, it allows you to not double-check the entire RAID ', but only those areas that were recorded at the time of the reboot or power failure. In the variant –bitmap=internal , it has an extremely negative effect on the write speed:

Write intent bitmaps with Linux software RAID - blog.liw.fi

The effect can be reduced by using the larger value of the mdadm –bitmap-chunk parameter (see man mdadm ), for example, I now use 131072 . You can also eliminate it almost completely by using a bitmap in an external file - on a disk that is not in the array. For the section where the bitmap file will be stored, the author of the mdadm file system is ext2 or ext3, however, there were no problems when placing it on XFS.

  mdadm --grow --bitmap = internal --bitmap-chunk = 131072 / dev / md0

mkfs.ext4 -E stride =, stripe_width =

It is important to set these parameters correctly when creating the filesystem; if you forget, that's okay, you can change them later using tune2fs .

mkfs.ext3 RAID stride calculation
Calculating EXT2 EXT3 EXT4 stride size when using RAID
http://alephnull.com/benchmarks/sata2009/stride.html

XFS tries to determine the appropriate values automatically if it fails - see man mkfs.xfs for sunit= and swidth= .

Chunk size

Set when creating an array, –chunk-size mdadm –chunk-size parameter. The change in an already running array is an extremely long, and probably not quite safe operation, requiring, in addition, the most recent version of mdadm (3.1) and a fairly new kernel (2.6.31+) ¹⁾ .

The default value in the old mdadm was 64K, in the new one it became 512K ²⁾ . According to the author mdadm , “512K new drives fit better,” in my opinion - not so simple. For now, I prefer to continue using 64K.

http://louwrentius.blogspot.com/2010/05/raid-level-and-chunk-size-benchmarks.html
http://blog.jamponi.net/2008/07/raid56-and-10-benchmarks-on-26255_10.html#raid-5-performance

Another author of the test results recommended size of 128K:

http://alephnull.com/benchmarks/sata2009/chunksize.html

Read-head

  # View current:
 blockdev --getra / dev / md?

 # Change:
 blockdev --setra <new value in sectors (512 bytes)> / dev / md?

I have not yet conducted experiments with this, the read speed from the array (on a three-disk RAID5, reaching the sum of the linear speed of two disks) is fine with that.

Comments

To leave a comment

If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.

To reply

Comment

To confirm that you are not a bot, answer:

Name

Email(not published)

Vote

Creating and maintaining a software RAID array in Linux - mdadm

Content

mdadm

Configuring Software RAID

Creating partitions

Unmounting

Change partition type

Creating a RAID Array

Check the assembly

Creating a file system over a RAID array

Creating the mdadm.conf configuration file

Creating a Mount Point for a RAID Array

Change / etc / fstab

Mounting the file system of a new RAID array

Checking RAID Status

Boot problem on multidisk systems

Further work with the array

Mark disk as failed

Remove a failed disk

Adding a new drive

Build an existing array

Array expansion

Resume delayed sync

Rename array

Deleting an array

Additional Information

Software RAID Performance

Prehistory

Parameters affecting performance

stripe_cache_size

Write Intent Bitmap

mkfs.ext4 -E stride =, stripe_width =

Chunk size

Read-head

Comments

To leave a comment

LINUX operating system

Terms: LINUX operating system