The SocialTools Debian Root-on-LVM-on-RAID-on-IDE HOWTO
By
BenjaminGeer and
ToniPrug
version 1.8.1
NOTE: | We upgraded to Debian stable, sarge. These instruction are not valid any more, since sarge installer has RAID/LVM options. Use with caution. (2nd Jul 2005). |
Copyright and Disclaimer
Copyright © 2003 Benjamin Geer and Toni Prug
Permission is granted to copy, distribute and/or modify this document under the terms of the
GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.
All information herein is presented "as-is", with no warranties expressed nor implied. There is no guarantee whatsoever, that any of the software, or this information, is in any way correct, nor suited for any use whatsoever. Back up all your data before experimenting with this. Better safe than sorry.
Conventions in this Document
Most of the process described here needs to be done with root permissions, so we won't remind you to log in as root, or to type
sudo.
Introduction
Background on RAID
Source:
Software RAID HOWTO
On machine with a single hard disk, if the disk crashes and you have backups of your data, you can be glad you haven't lost any data. But you still have to reinstall and reconfigure the operating system. Even if you have a spare hard disk handy, this is time-consuming. During that time, your server is down, and you may find yourself working late into the night to fix it.
RAID (Redundant Array of Independent Disks) allows the operating system to treat an array of disks as a single disk. When data is written to the array, it is written to all the disks. If a disk crashes, your system can continue running with the remaing disks. You can remove the damaged disk and install a new one, and RAID will copy the data from the other disks on to the new disk.
RAID can be implemented in software or hardware; here we will be setting up software RAID, using the implementation in the Linux 2.4 kernel. The implementation in kernel 2.6 is different, and is not covered in this HOWTO.
Background on Logical Volume Management
Source:
LVM HOWTO.
Logical volume management is a more flexible alternative to disk partitions. It allows you to allocate drive space (perhaps including several drives) to 'logical volumes', which behave like resizable partitions. You can reallocate space to logical volumes as needed, while the system is running. If you run out of space on a volume, you can give it some space from another volume, or from a new disk drive.
This HOWTO uses the logical volume management implementation in the Linux 2.4 kernel. The implementation in kernel 2.6 is different, and is not covered in this HOWTO.
Requirements
- A machine with two identical blank IDE drives. If at all possible, get two drives of the same brand and the exact same model number. If the model number is different, the drives may have different geometries (even if they have the same total size), which will make it more difficult to set up RAID. It is also a good idea to put the two drives on separate IDE controllers, if possible (see the Software-RAID-HOWTO for the reasons); in this document, we will assume your drives are primary and secondary IDE masters, i.e.
/dev/hda and /dev/hdd (assuming that /dev/hdc is a CD-ROM drive). However, please note that we have not tested this configuration, and have in fact only tested LVM-on-RAID using /dev/hda and /dev/hdb. If you test the two-controller configuration, please let us know.
- Installation media for Debian Woody.
- Patience, pizza (or kebabs) and your favourite caffeinated beverage. It took us about 12 hours to do this for the first time; we spent most of that time searching for information on the Internet. This guide is meant to shorten that process.
Note: If you're sure your drives are identical, but they're still being recognised with different geometries, this may be because one of them is on an IDE controller that supports UDMA/66, while the other is on a controller that only supports UDMA/33. A workaround is to put both drives on the same controller.
The Goal
We want to:
- Use RAID to provide basic redundancy at low cost, in case one disk fails.
- Allocate the space in the RAID array to logical volumes using LVM.
To accomplish this, we'll use RAID-1, which provides basic redundancy, using two IDE drives of the same size; we'll call these 'disk 1' and 'disk 2'.
More specifically, we want the final result to look like this:
| | RAID Array /dev/md0 | RAID Array /dev/md1 |
| Physical Partitions | /dev/hda1, /dev/hdd1 | /dev/hda2, /dev/hdd2 |
| Volume Group | (none) | /dev/vg1 |
| Filesystems | /boot | /usr, /home, etc. |
There are two RAID arrays. Each RAID array is composed of two physical disk partitions, one on each drive. One of them has a logical volume group on it, which is divided into logical volumes such as
/usr and
/home. The other one has the
/boot filesystem on it, containing the files needed to boot up the operating system; it isn't included in the LVM setup, because the LILO boot loader can't handle logical volumes.
RAID arrays are traditionally called
md on Linux; it stands for Multiple Devices.
Overview of the Procedure
- Partition the two disks identically.
- Install a basic system on disk 1.
- Create two RAID arrays consisting only of the partitions on disk 2.
- Set up logical volumes on top of those RAID arrays.
- Copy everything from disk 1 into the logical volumes on disk 2.
- Make disk 2 bootable, and reboot into disk 2.
- Add the partitions on disk 1 to the RAID arrays (this copies the contents of disk 2 on to disk 1).
- Make disk 1 bootable again, and reboot into disk 1.
The result is that the two disks are identical; you can configure the BIOS to boot into either one.
Preparation
Partitioning the Disks
On each disk, make one small partition (about 16-24 MB), and another partition containing the remaining space. Make sure the corresponding partitions are
exactly the same sizes on each disk. If by some misfortune, the two disks don't have the same geometry, it may not be possible to make partitions of the same size on both disks. In that case, make slightly smaller partitions on disk 2, to ensure that, when RAIDs are set up on disk 2, they'll fit into the partitions on disk 1.
To partition disk 2 exactly like disk 1, you can type:
sfdisk -d /dev/hda | sfdisk /dev/hdd
Initial Debian Install
Install Debian on disk 1, using the small partition for
/boot, and the larger one as the root partition. Use the
bf24 kernel and
ext3 partitions, as described in
StandardDebianInstall. Using the BIOS, set disk 1 as the first disk in the boot order.
Now is a good time to install devfsd and raidtools2:
apt-get install devfsd raidtools2
Recompiling the Kernel
First install a new
procps as described in
StandardKernelConfig (the one that comes with Debian Woody has trouble with 2.4 kernels).
Download the latest 2.4 kernel (at least 2.4.26) from your nearest
kernel.org mirror. Install Debian's
kernel-package:
apt-get install kernel-package
Set the following line in
/etc/kernel-pkg.conf, to tell
kernel-package to put symbolic links to kernel images in
/boot instead of in
/:
image_in_boot := True
Unpack and configure the kernel, using the
.config from your Debian
bf24 kernel as a starting point:
cd /usr/src
tar jxf linux-2.4.26.tar.bz2
cd linux-2.4.26
cp /boot/config-2.4.18.bf2.4 .config
make oldconfig
(hold down the Enter key to accept all the defaults)
make menuconfig
Go through the options, removing anything you're sure you don't need. (See
StandardKernelConfig). Make sure you include these options:
- Block Devices
- Loopback Device Support (
CONFIG_BLK_DEV_LOOP): Y
- RAM Disk Support (
CONFIG_BLK_DEV_RAM): Y
- Initial RAM Disk (initrd) Support (
CONFIG_BLK_DEV_INITRD): Y
- ATA/IDE/MFM/RLL support
- IDE, ATA and ATAPI Block devices
- Use PCI DMA by default when available (
CONFIG_IDEDMA_PCI_AUTO): Y
- Multi-device support (RAID and LVM): say yes to everything here.
- File Systems
- Device file system (
CONFIG_DEVFS_FS): Y
- Automatically mount at boot (
CONFIG_DEVFS_MOUNT): Y
If you negect
CONFIG_DEVFS_MOUNT,
devfsd will fail to start, with the error message:
Error opening file: ".devfsd" No such file or directory
(Source:
Devfs FAQ.)
We found that we had to leave 'Enable loadable module support' turned on (otherwise the
lvmcreate_initrd command, described later, would generate spurious errors), but disable 'Set version information on all module symbols' and 'kernel module loader' (otherwise the
modutils that comes with Debian Woody seemed to create spurious modules, once again causing
lvmcreate_initrd to report errors). However, another user has reported that he didn't experience any problems with 'kernel module loader' activated.
Compile and install the kernel:
make-kpkg --revision=custom.1.0 kernel_image
dpkg -i ../kernel-image-2.4.26_custom.1.0_i386.deb
Say no to all the questions it asks. Check that it's made correct symbolic links in
/boot:
/boot/vmlinuz -> /boot/vmlinuz-2.4.26
/boot/vmlinuz.old -> /boot/vmlinuz-2.4.18-bf2.4
Edit your
/etc/lilo.conf and make sure there is an
image stanza pointing to an image in
/boot for each kernel you now have installed, like this:
default=Linux
image=/boot/vmlinuz
label=Linux
read-only
image=/boot/vmlinuz.old
label=LinuxOLD
read-only
Make sure that
/etc/lilo.conf also contains the following lines:
delay=20
prompt
timeout=100
Rewrite your changes to the disk's master boot record (MBR) by running
lilo. Make sure it doesn't report any errors, or say it skipped any images.
Reboot to make sure the new kernel works:
shutdown -r now
Making RAID Arrays
Use
cfdisk to change the partition types of both partitions on disk 2 (
not on disk 1) to 'Linux RAID Autodetect' (hexadecimal code
fd):
cfdisk /dev/hdd
Make sure to write your changes to the partition table before quitting
cfdisk. Reboot again:
shutdown -r now
Create
/etc/raidtab as follows:
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
chunk-size 32
persistent-superblock 1
device /dev/hdd1
raid-disk 0
device /dev/hda1
failed-disk 1
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
chunk-size 32
persistent-superblock 1
device /dev/hdd2
raid-disk 0
device /dev/hda2
failed-disk 1
Create the RAID arrays:
sudo mkraid --force /dev/md0
sudo mkraid --force /dev/md1
You'll be told to type a different command if you really want to do this; it's OK to go ahead and type it.
Setting up LVM
Install the LVM software.
apt-get install lvm-common lvm10
Allocate
/dev/md1 as a physical volume managed by LVM:
vgscan
pvcreate /dev/md1
pvscan
The output of
pvscan should show that you now have a physical volume. Then:
vgcreate vg1 /dev/md/1
This should produce a lot of output but no errors.
Use
lvcreate to create a logical volume for each filesystem you'll want. For example:
lvcreate -L 1.56G -n root vg1
lvcreate -L 7G -n home vg1
lvcreate -L 4G -n usr vg1
lvcreate -L 4G -n local vg1
lvcreate -L 2.93G -n var vg1
lvcreate -L 8.69G -n db vg1
lvcreate -L 7.81G -n mail vg1
lvcreate -L 13.77G -n www vg1
lvcreate -L 8G -n chroot vg1
lvcreate -L 1000M -n tmp vg1
lvcreate -L 512M -n swap vg1
One of your logical volumes should be called
root (for the
/ filesystem), and one should be called
swap (for kernel swap space). It's a good idea to have 2-4 times as much swap space as you have memory.
Making Filesystems
Make an
ext3 filesystem on
/dev/md0 (the RAID array that we'll use for
/boot):
mke2fs -j /dev/md0
tune2fs -c 0 -i 0 /dev/md0
Note that we're using
tune2fs to disable automatic filesystem checking (
fsck). For an explanation of this, see the section 'Filesystem check intervals' in Andrew Morton's document
Using the ext3 filesystem in 2.4 kernels.
Make an
ext3 filesystem on each of the logical volumes you created (except
swap), e.g.:
mke2fs -j /dev/vg1/root
tune2fs -c 0 -i 0 /dev/vg1/root
mke2fs -j /dev/vg1/usr
tune2fs -c 0 -i 0 /dev/vg1/usr
mke2fs -j /dev/vg1/home
tune2fs -c 0 -i 0 /dev/vg1/home
Create a swap filesystem on your
swap logical volume:
mkswap /dev/vg1/swap
Transferring Filesystems to LVM-on-RAID
Create mount a mount point called
/mnt, where you'll mount your new root filesystem:
mkdir -p /mnt
Add a line to
/etc/fstab to specify that
/dev/vg1/root should be mounted on
/mnt:
/dev/vg1/root /mnt ext3 defaults 0 0
Mount that filesystem:
mount /mnt
Make more mount points under
/mnt, where you'll mount your new logical volumes (except
swap):
mkdir /mnt/boot
mkdir /mnt/usr
mkdir /mnt/home
In
/etc/fstab, add a line for
/dev/md0 and for each of your logical volumes (except
swap), so that they mount under
/mnt, like this:
/dev/md0 /mnt/boot ext3 defaults 0 0
/dev/vg1/usr /mnt/usr ext3 defaults 0 0
/dev/vg1/home /mnt/home ext3 defaults 0 0
Mount the filesystems:
mount -a
(Note that when we want a filesystem's mount point to be within another filesystem, we have to mount the outer filesystem first, then create the mount point, then mount the inner filesystem. The same is true if, for example, you mount a filesystem for
/usr/local inside
/usr.)
Copy everything from disk 1 to disk 2 using
cp -a. Firstl, copy everything containing regular files (i.e. not
/dev,
/cdrom,
/floppy or
/proc) into
/mnt:
cp -a /boot /bin /etc /home /lib /opt /root /sbin /usr /var /mnt
then make the rest of the top-level directories under
/mnt (including
/initrd, which the kernel seems to expect):
mkdir /mnt/cdrom /mnt/floppy /mnt/proc /mnt/dev /mnt/initrd
Making an initrd file for LVM
The kernel will need an
initrd file, created specifically for LVM, which it can load into a ramdisk when booting. To create this file, you need to use the
lvmcreate_initrd command. Normally,
lvmcreate_initrd calculates the size of this file automatically. However, if you have a big hard disk (e.g. 80GB), this automatically-calculated size might not be big enough, and when the system boots, you will get a an 'ERROR 28 writing volume group backup file /etc/...' The way to avoid this problem is to calculate the size yourself, as described in
this message by the LVM author.
Start with the space reported by the following command.
du -chs /mnt/etc/lvmconf
(If you're reading this because you've already successfully set up RAID and LVM, and you're just recompiling your kernel, use
/etc/lvmconf instead.) Add 4MB. Convert that number to kilobytes. (We found that for 80G disks, 6000K was about right.) Run the following command, using the size you calculated as the value of INITRDSIZE, and passing the version number of your kernel as an argument.
INITRDSIZE=6000 lvmcreate_initrd 2.4.26
This creates an
initrd file, called something like
initrd-lvm-2.4.26.gz, in
/boot. Copy that file into
/mnt/boot.
Making Disk 2 Bootable
Now that you've copied the data from disk 1 to disk 2, you can fix
/mnt/etc/fstab (note:
not /etc/fstab) so it reflects your final setup. Comment out the lines that refer to
/hda1 and
/hda2. Take out the
/mnt from the LVM filesystem paths, fix the options to reflect what you'd have in a normal system, and add the
swap volume:
/dev/vg1/root / ext3 errors=remount-ro 0 1
/dev/md0 /boot ext3 defaults 0 2
/dev/vg1/swap none swap sw 0 0
/dev/vg1/usr /usr ext3 defaults 0 0
/dev/vg1/home /home ext3 defaults 0 0
Add the following to
/mnt/etc/lilo.conf (
not /etc/lilo.conf):
disk=/dev/hdd
bios=0x80
disk=/dev/hda
bios=0x81
If you get this wrong, you'll get a LILO error when you reboot: LILO will notice that the BIOS hasn't given it the right disk to boot from, and will say
L 07 07 07...
Add this line to make LILO use the kernels in our new
/boot filesystem:
boot=/dev/md0
Add this line to make LILO write its configuration to the master boot record (MBR) of drive 2 (but not drive 1):
raid-extra-boot="/dev/hdd"
In the the
image stanza for the kernel you just installed, add the following lines, using your calculated
initrd size as the value of
ramdisk_size, and the filename of the
initrd file you created using
lvmcreate_initrd.
initrd=/boot/initrd-lvm-2.4.26.gz
append="ramdisk_size=6000"
root=/dev/vg1/root
You can copy these lines into the other stanza, for your old kernel. Delete any other
root lines in the file.
Mount the
/dev filesystem in
/mnt so that LILO can use it:
mount --bind /dev /mnt/dev
Run LILO in a
chroot:
chroot /mnt /sbin/lilo
Before you reboot, make sure LILO hasn't overwritten its configuration in the MBR on disk 1. To do this, run
lilo again, the normal way:
lilo
Reboot. In the BIOS setup, move disk 2 into first position in the boot order, and make sure disk 1 is in second position. Then let the system boot. If it fails to boot from disk 2, your initrd file might be too small. To fix it, reboot from disk 1, make a new initrd file as described
above, mount
/dev/vg1/boot temporarily as
/mnt/boot, copy the new initrd file into
/mnt/boot, and try again to reboot from disk 2.
Including Disk 1 in the LVM-on-RAID System
If you've followed these instructions carefully, the permissions on your moint points should be correct, but if not, you might want to check them. Here are some examples:
chmod a+rwxt /tmp
chgrp staff /usr/local
chmod g+ws /usr/local
chgrp mail /var/mail
chmod g+ws /var/mail
Use
cfdisk to change the partition types of
/dev/hda to
fd ('Linux RAID Autodetect'), just as you did before for
/dev/hdd.
In
/etc/raidtab, change
failed-disk to
raid-disk.
Add the partitions on disk 1 to the RAID arrays:
raidhotadd /dev/md0 /dev/hda1
raidhotadd /dev/md1 /dev/hda2
SARGE NOTE 
: mdadm is now used instead of raditools on debian stable. adding syntax now looks like:
mdadm /dev/md0 --add /dev/hda1
mdadm /dev/md1 --add /dev/hda2
If you make a mistake typing the above, you can accidentally add two partitions on the same disk to the same RAID array. As a result, the second
raidhotadd won't work, and you'll get error -17 (invalid argument). Use
raidhotremove to remove the extra partition (which should go in the other array), and try again.
RAID will now start recovering disk 1 (i.e. copying data from disk 2 to disk 1). You can monitor its progress by typing:
while true; do clear; cat /proc/mdstat; sleep 5; done
This will take a while, possibly several hours, depending on your hard disks. (On our machines, it has tended to take about 2 hours for an 80GB disk.)
Once it's finished, edit
/etc/lilo.conf to remove the
disk and
bios lines. Add
/dev/hda to
raid-extra-boot:
raid-extra-boot="/dev/hda, /dev/hdd"
Run
lilo to save the boot information in the MBRs of both disks.
Reboot, and change the BIOS settings to put disk 1 back at the top of the boot order. It should boot normally.
Congratulations! You now have your LVM-on-RAID system running.
Testing
To test your setup, use the procedure described in the
Software RAID HOWTO. The idea is to see whether the system will run with one of the drives unplugged, then to see if it will rebuild that drive once the drive is plugged in again. When one drive is unplugged or needs to be rebuild, you must tell the BIOS to boot from the other drive.
- Power down the machine.
- Unplug the power cable from the slave disk.
- Restart the system, and log in. If everything looks OK, this means that your system can run without that drive.
- Type
cat /proc/mdstat. It should show that one of the partitions is missing from each array.
- Tell the kernel to rebuild the first partition on
/dev/hdd1, by typing: sudo raidhotadd /dev/md0 /dev/hdd1
- Look at
/proc/mdstat again; you should see the kernel rebuilding /dev/hdd1. This should only take a few seconds.
- Type:
sudo raidhotadd /dev/md1 /dev/hdd2
- The kernel should now start rebuilding
/dev/hdd2. This will take just as long as it took when you first set it up, in the last section.
Then repeat the process with the other drive.
Monitoring
To get email notification when one of your drives fails, first:
mkdir /etc/raidcheck.d
Install
/etc/raidcheck.d/readme.txt and
/etc/raidcheck.d/raidcheck. Then:
chown root.root /etc/raidcheck.d/raidcheck
chmod 755 /etc/raidcheck.d/raidcheck
cp /proc/mdstat /etc/raidcheck.d/mdstat.reference
cp /etc/raidcheck.d/raidcheck /etc/cron.daily/raidcheck
Recovery
If one of your drives should fail, this is the general procedure (assuming
/dev/hdd has failed):
- Replace the drive.
- Partition
/dev/hdd with exactly the same partitions as /dev/hda. You can either use sfdisk as shown above (sfdisk -d /dev/hda | sfdisk /dev/hdd), or if you prefer to see what you're doing:
- Type
cfdisk /dev/hda to look at the partitions on /dev/hda, copy down their sizes, and quit cfdisk.
- Type
cfdisk /dev/hdd, and create partitions of the same sizes. Set their type to Linux RAID autodetect.
- Tell
cfdisk to write your changes to the partition table, and quit cfdisk.
- Check the contents
/etc/raidtab, and change any failed-disk to raid-disk.
- Use
raidhotadd to add the new partitions to each array:
-
raidhotadd /dev/md0 /dev/hdd1
-
raidhotadd /dev/md1 /dev/hdd2
- Run
lilo again, to write the master boot record on the new drive.
Using LVM
Resizing, adding and removing logical volumes is easy, but be sure to read the relevant sections of the
LVM HOWTO before you attempt these operations.
The easiest approach is to unmount the filesystem in question, then use the
e2fsadm command, as described in the
LVM HOWTO.
Example: Creating a New Logical Volume
We have a top-level directory,
/chroot, which is currently in the root filesystem. BIND is
running chrooted in this directory, and we'd like to put it on its own logical volume, as a security measure; this way, if it fills up, it won't disturb any of the other filesystems. We have another logical volume,
/dev/vg1/db, with some extra space on it. We'll reduce the size of
/dev/vg1/db by 100 MB, and use that space for the new logical volume.
First, we reduce the size of
/dev/vg1/db and the filesystem it contains.
umount /dev/vg1/db
e2fsadm --size -100M /dev/vg1/db
We create a new logical volume, and put an
ext3 filesystem on it.
lvcreate --size 100M -n chroot vg1
mke2fs -j /dev/vg1/chroot
tune2fs -c 0 -i 0 /dev/vg1/chroot
We stop BIND, move our current
/chroot out of the way, and make a new mount point for the new
/chroot.
/etc/init.d/bind9 stop
mv /chroot /chroot-old
mkdir /chroot
We add a line to
/etc/fstab to mount the new filesystem:
/dev/vg1/chroot /chroot ext3 defaults 0 0
We remount our filesystems.
mount -a
We copy everything from
/chroot-old into the new
/chroot, and delete
/chroot-old.
cp -a /chroot-old/* /chroot
rm -rf /chroot-old
We can now restart BIND.
/etc/init.d/bind9 start
Resizing the Root Filesystem
You'll need the
ext2resize package:
apt-get install ext2resize
Normally you need to unmount a filesystem before resizing it. In order to unmount the root filesystem, you have to reboot the system into single-user mode (type
Linux 1 at the
boot: prompt), and log in using the root password. You can then unmount and resize the root filesystem, and resize
/dev/vg1/root, using the procedures described in the
LVM HOWTO. Note:
- As explained in the LVM HOWTO, you have to:
- Extend the volume before extending the filesystem.
- Reduce the filesystem before reducing the volume.
- It seems that
e2fsadm and resize2fs don't work for the root filesystem; you need to use ext2resize, along with lvextend or lvreduce.
- You have to unmount the filesystem immediately before resizing it; if you do anything else in between, it seems to remount itself.
Upgrading the Kernel
Once your RAID and LVM configuration is working properly, the next time you want to upgrade the kernel, you must:
- Copy your old kernel's
.config into the new kernel directory, type make oldconfig, and follow the prompts regarding any new options.
- Compile and install the kernel as described above.
- Run
lvmcreate_initrd again as described above (the initrd file will be created in /boot, where it belongs).
- Edit your
/etc/lilo.conf so that it contains a stanza for the new kernel, and make that kernel the default. This stanza must contain an initrd line like the one for the old kernel (but pointing to the new initrd file you just made in /boot). Don't forget to include the append="ramdisk_size=size" line.
- Double-check that any symbolic links in
/boot are correct, and that they correspond to what's in /etc/lilo.conf.
- Run
lilo to save your changes, and reboot. You should be able to choose either kernel from the boot menu.
to top