@ wrote... (7 years, 3 months ago)

I currently have a two disk raid1 array (2TB in size) that is the storage for my LVM but is running out of space. I want to add two more drives to bring available storage to 4TB but I also want to convert raid1 to raid10 to increase the performance of my storage. Here's how I did it.

This post from ServerFault, especially the last post by user75601 was instrumental. Note that the accepted answer is wrong so don't do that, do this.

This page also explains how to move your lvm physical extents from your old array to the new one.

One of the many awesome things about LVM is that this entire procedure is going to happen with the server up and running, in runlevel 3 and hosting three virtual machines. If you're not using LVM I'll explain what to do differently but without any command output since I never actually did it. If you're not using LVM then reboot to single user mode.

So here we go, step 1, BACKUP! Seriously, we're going to be flying Scottish here (if you wear a kilt with no underwear then you have no backup to errant gusts of wind) and a ripple in your electricity or an old fashioned bug could leave you with no data and no chance of recovering.

So step one, backup. In my case I bought a 3TB drive, a USB 3.0 addon card and USB 3.0 external enclosure and made a copy of of everything. My “new” drives for this project were actually being used with zfs-fuse, rsync and snapshots as online backups so I backed up with:

rsync -a --delete --progress /tank/ /pool/
zfs snapshot -r pool@`date +"%Y-%m-%d-%H%M"`
zfs send -R pool@<the date-time> | ssh other zfs recv -F backup_1/pool

If you're going to be doing this remotely, run everything from within screen.

One of the things I found confusing was none of the instruction on ServerFault really said which drive was which so you had to figure out the intent. Since your drive mappings will be different than mine I'll use the following notation in all the commands, old drives from the existing raid1 will be od1 and od2. The new drives that will start the raid10 array will be called nd1 and nd2. The commands that take a md path will have old_raid and new_raid.

In my case:

old_raid = /dev/md2
new_raid = /dev/md/tank or /dev/md127
od1 = /dev/sdh1
od2 = /dev/sdf1
nd1 = /dev/sdc1
nd2 = /dev/sdd1
vg_tank is my lvm

For the record, I'm doing this on Fedora 16

# cat /etc/fedora-release 
Fedora release 16 (Verne)

# uname -a
Linux xavier.burgundywall.com 3.2.9-1.fc16.x86_64 #1 SMP Thu Mar 1 01:41:10 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

So for inputs I'll use od1/nd2/etc but the output will be the real output so therefore show /dev/sdh1, etc.

Make the new raid10 array with only two drives:

# mdadm -v --create new_raid --level=raid10 --raid-devices=4 nd1 missing nd2 missing
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: layout defaults to n2
mdadm: size set to 1953510912K
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/tank started.

So I was a bit fancy here by calling my array /dev/md/tank instead of /dev/md3. The output of most commands will call new_raid /dev/md127 but it's all good.

So we've made the new array, lets see what it looks like.

# mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.2 UUID=9a44107c:2131baff:7f1eaada:c26cef3a name=xavier.burgundywall.com:0
ARRAY /dev/md/1 metadata=1.2 UUID=dd36d071:63816e05:afb4a9a9:48eccf4a name=xavier.burgundywall.com:1
ARRAY /dev/md/0 metadata=1.2 UUID=d0ae9e9b:6c9ed97c:527dc6b0:f1b190ca name=htpc.burgundywall.com:0
ARRAY /dev/md/tank metadata=1.2 UUID=442ad0a9:82d31bc2:a4399b19:8d340909 name=xavier.burgundywall.com:tank

Add the last line to /etc/mdadm.conf so everything works as expected on subsequent boots.

Just for fun

# mdadm --stop new_raid
mdadm: stopped /dev/md/tank

# mdadm -A new_raid
mdadm: /dev/md/tank has been started with 2 drives (out of 4).

# cat /proc/mdstat 
Personalities : [raid1] [raid10] 
md127 : active raid10 sdc1[0] sdd1[2]
      3907021824 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]

md0 : active raid1 sde1[1] sdg1[0]
      511988 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdh1[1] sdf1[2]
      1953512400 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sde2[1] sdg2[0]
      243684400 blocks super 1.2 [2/2] [UU]

So we can stop/start our array, it shows up as you'd expect in /proc/mdstat, we're doing well. But here's where we start getting dangerous (but not really since we have verified good backups. Right?).

This is where I diverged from other peoples instructions. I may be completely off base with this assumption, but I figured the drive we remove from our good array should be held in reserve for as long as possible in case something goes pear shaped during the rest of this procedure. I'm assuming that I can force a raid drive marked as faulty to be used as the sole valid drive but to be honest I didn't investigate that assumption.

# mdadm /dev/md2 --fail od2 --remove od2

We are well and truly committed now, note the [_U] for md2.

# cat /proc/mdstat 
Personalities : [raid1] [raid10] 
md127 : active raid10 sdc1[0] sdd1[2]
      3907021824 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]

md0 : active raid1 sde1[1] sdg1[0]
      511988 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdh1[1]
      1953512400 blocks super 1.2 [2/1] [_U]

md1 : active raid1 sde2[1] sdg2[0]
      243684400 blocks super 1.2 [2/2] [UU]

If you're not using LVM then format your new array with your filesystem of choice (probably ext4) and copy everything across with cp -av /old/tank /new/tank. Then skip ahead to where we add the old drives to the new array.

Lets make the physical volume so we can add it to our volume group.

# pvcreate new_raid
  Writing physical volume data to disk "/dev/md/tank"
  Physical volume "/dev/md/tank" successfully created

# vgextend vg_tank new_raid
  Volume group "vg_tank" successfully extended

# pvs -o+pv_used
  PV         VG        Fmt  Attr PSize   PFree   Used  
  /dev/md1   vg_xavier lvm2 a--  232.39g 134.15g 98.24g
  /dev/md127 vg_tank   lvm2 a--    3.64t   3.64t     0 
  /dev/md2   vg_tank   lvm2 a--    1.82t      0   1.82t
  /dev/sda1  vg_ssd    lvm2 a--   55.89g      0  55.89g

So as you can see we've created the physical volume and added it to our volume group vg_tank.

Now lets actually move all the data off of our old array and onto the new. I wish I had thought to run time but I didn't. All of the commands so far run instantly but this one takes a long time. Moving 2TB of data on Western Digital Green drives (5400 rpm) took approx 6 hours.

# time pvmove -v old_raid new_raid
...
  /dev/md2: Moved: 50.0%
... about 6 hours later ...
  /dev/md2: Moved: 100.0%

So now all our data is on our scary raid0 array. raid0 since the drives are not mirrored. Good thing we have that backup right?

So now we'll remove old_raid from the volume group. Actually I forgot to remove old_raid from vg_tank until after I destroyed old_raid so I had to run vgreduce vg_tank --removemissing. But if I was smarter this is what would have happened.

# vgreduce vg_tank old_raid
   Removed "/dev/md2" from volume group "vg_tank"

# pvs -o+pv_used
  PV         VG        Fmt  Attr PSize   PFree   Used  
  /dev/md1   vg_xavier lvm2 a--  232.39g 134.15g 98.24g
  /dev/md127 vg_tank   lvm2 a--    3.64t   1.82t  1.82t
  /dev/sda1  vg_ssd    lvm2 a--   55.89g      0  55.89g

Look at that, we've got all our data on the new array and still have 1.82TB free.

Lets get rid of our old raid1 array for good and start turning our new raid0 into raid10.

# mdadm --stop old_raid
mdadm: stopped /dev/md2

# remove old_raid from /etc/mdadm.conf

# mdadm --zero-superblock od1

# mdadm new_raid --add od1
mdadm: added /dev/sdh1

# cat /proc/mdstat 
Personalities : [raid1] [raid10] 
md127 : active raid10 sdh1[4] sdc1[0] sdd1[2]
      3907021824 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
      [>....................]  recovery =  0.0% (625216/1953510912) finish=260.2min speed=125043K/sec

md0 : active raid1 sde1[1] sdg1[0]
      511988 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sde2[1] sdg2[0]
      243684400 blocks super 1.2 [2/2] [UU]

This is where I went to bed. Presumably it took about 6 hours to sync up. I know the eta says 260 minutes but the drive gets slower as the read/write heads gets closer to the center of the drive. That 125MB/sec transfer doesn't hold, you can see that in the last update at 83.4% where I'm only getting 71MB/sec.

Okay, now is where we do the really scary bit and destroy our reserve drive from our original raid1 array and add it to the new array.

Pray to your gods that nd2 doesn't crater before this sync completes.

# mdadm --zero-superblock od2

# mdadm new_raid --add od2

# cat /proc/mdstat 
Personalities : [raid1] [raid10] 
md127 : active raid10 sdf1[5] sdh1[4] sdc1[0] sdd1[2]
      3907021824 blocks super 1.2 512K chunks 2 near-copies [4/3] [UUU_]
      [================>....]  recovery = 83.4% (1631012928/1953510912) finish=74.7min speed=71884K/sec

md0 : active raid1 sde1[1] sdg1[0]
      511988 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sde2[1] sdg2[0]
      243684400 blocks super 1.2 [2/2] [UU]

As you can see, as I write this article my array isn't totally finished syncing up but I'm confident the last 17% will be uneventful. Part of my optimism is that my server is attached to a UPS and the human aspect (this human anyways) has now been removed. So in a bit over an hour this should be all finished.

Wish me luck and you as well.

Update:

# cat /proc/mdstat 
Personalities : [raid1] [raid10] 
md127 : active raid10 sdf1[5] sdh1[4] sdc1[0] sdd1[2]
      3907021824 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

md0 : active raid1 sde1[1] sdg1[0]
      511988 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sde2[1] sdg2[0]
      243684400 blocks super 1.2 [2/2] [UU]

w00t!

ps - the transfer speed bottomed out at roughly 55MB/sec at 99.9%

Category: tech, Tags: linux, lvm, raid
Comments: 2
Comments
1.
citizenkei @ November 20, 2014 wrote... (4 years, 8 months ago)

time pvmove -v old_raid new_raid

I've seen the following somewhere:

pvmove -i1 -v old_raid new_raid

which reports the status every second - nice to know where you're at :)

...
  /dev/md2: Moved: 67.0%
  /dev/md2: Moved: 67.0%
  /dev/md2: Moved: 67.0%
  /dev/md2: Moved: 67.0%
  /dev/md2: Moved: 67.0%
  /dev/md2: Moved: 67.1%
  /dev/md2: Moved: 67.1%
2.
Kurt @ December 13, 2014 wrote... (4 years, 7 months ago)

Thanks for the comment, I didn't know about that particular option.

I can't remember the interval, but plain pvmove does give status updates occasionally. Every percent? Every minute?

Click here to add a comment