mdadm: device or resource busy

Alex Boisvert - 07 Jul 2012

I just spent a few hours tracking an issue with mdadm (Linux utility used to manage software RAID devices) and figured I'd write a quick blog post to share the solution so others don't have to waste time on the same.

As a short background, we use mdadm to create RAID-0 stripped devices for our Sugarcube analytics (OLAP) servers using Amazon EBS volumes.

The issue manifested itself as a random failure during device creation:

$ mdadm --create /dev/md0 --level=0 --chunk 256 --raid-devices=4 /dev/xvdh1 /dev/xvdh2 /dev/xvdh3 /dev/xvdh4
mdadm: Defaulting to version 1.2 metadata
mdadm: ADD_NEW_DISK for /dev/xvdh3 failed: Device or resource busy

I searched and searched the interwebs and tried every trick I found to no avail. We don't have dmraid installed on our Linux images (Ubuntu 12.04 LTS / Alestic cloud image) so there's no possible conflict there. All devices were clean, as they are freshly created EBS volumes and I knew none of them were in use.

Before running mdadm --create, mdstat was clean:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>

And yet after running it the devices were assigned to two different devices instead of just /dev/md0:

$ cat /proc/mdstatPersonalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md127 : inactive xvdh4[3](S) xvdh3[2](S)
      1048573952 blocks super 1.2

md0 : inactive xvdh2[1](S) xvdh1[0](S)
    1048573952 blocks super 1.2
    
unused devices: <none>

Looking into dmesg didn't reveal anything interesting either:

$ dmesg
...
[3963010.552493] md: bind<xvdh1>
[3963010.553011] md: bind<xvdh2>
[3963010.553040] md: could not open unknown-block(202,115).
[3963010.553052] md: md_import_device returned -16
[3963010.566543] md: bind<xvdh3>
[3963010.731009] md: bind<xvdh4>

And strangely, the creation or assembly would sometime work and sometime not:

$ mdadm --manage /dev/md0 --stop
mdadm: stopped /dev/md0

$ sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]
mdadm: /dev/md0 has been started with 4 drives.

$ mdadm --manage /dev/md0 --stop
mdadm: stopped /dev/md0

$ sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]
mdadm: cannot open device /dev/xvdh3: Device or resource busy

$ mdadm --manage /dev/md0 --stop
mdadm: stopped /dev/md0

$ sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]
mdadm: cannot open device /dev/xvdh1: Device or resource busy
mdadm: /dev/xvdh1 has no superblock - assembly aborted

$ mdadm --manage /dev/md0 --stop
mdadm: stopped /dev/md0

$sudo mdadm --assemble --force /dev/md0 /dev/xvdh[1234]
mdadm: /dev/md0 has been started with 4 drives.

I started suspecting I was facing some kind of underlying race condition where the devices would get assigned/locked during the device creation process. So I started googling for "mdadm create race" and I finally found a post that tipped me off. While it didn't provide the solution, the post put me on the right track by mentioning udev and it took only a few more minutes to narrow down on the solution: disabling udev events during device creation to avoid contention on device handles.

So now our script goes something like:

$ udevadm control --stop-exec-queue
$ mdadm --create /dev/md0 --run --level=0 --raid-devices=4 ...
$ udevadm control --start-exec-queue

And we now have consistent reliable device creation.Hopefully this blog post will help other passers-by with a similar problem. Good luck!

comments powered by Disqus