What to do if you have problems with Vinum
If you have problems with Vinum, it could be due to misunderstanding the setup, or it could be a bug. This page describes some of the more common pitfalls of setting up Vinum and then describes the state of the known bugs.
For similar reasons, the vinum start command will not accept a drive on partition c. Partition c is used by the system to represent the whole disk, and must be of type unused. Clearly there is a conflict here, which vinum resolves by not using the c partition.
In practice, people aren't too interested in what was in the plex when it was created, and other volume managers cheat by setting them up anyway. vinum provides two ways to ensure that newly created plexes are up:
vinumioctl: invalid ioctl from process 247 (vinum): c0e44642
This error may also occur if you use old versions of kld or userland program.
vinum -> start test.p1.s0 Can't start test.p1.s0: Device busy (16)
To get past this problem, you could first set the state to obsolete:
vinum -> setstate obsolete test.p1.s0 vinum -> start test.p1.s0 Reviving test.p1.s0 in the background
Status: I'm not sure whether the fix got committed. Watch this space.
Technical explanation: This is a bug. I had forgotten to write the code.
Workaround: When reviving striped plexes, ensure that no other write I/O takes place. The simplest way to achieve this is to unmount the volume.
Status: Fix in planning.
Technical explanation: It's possible that Vinum isn't releasing drive resources.
Status: This doesn't seem to happen any more.
# newfs /dev/vinum/rvol # fsck /dev/vinum/rvol OK # vinum stop vol # vinum start vol # fsck /dev/vinum/rvol Errors detected
Technical explanation: I haven't investigated this one yet. It appears likely that there is some issue with flushing data to the disk, since a global vinum stop does not have this problem.
Workaround: Don't do that then.
Status: This was reported once nearly a year ago. It's likely that either it wasn't a bug in the first place, or that other changes in this area have fixed it.
brw-r----- 1 root operator 4, 0x00010002 Sep 28 12:01 /dev/da0 brw-r----- 1 root operator 4, 0 Sep 28 12:01 /dev/da0a brw-r----- 1 root operator 4, 1 Sep 28 12:01 /dev/da0b brw-r----- 1 root operator 4, 2 Sep 28 12:01 /dev/da0c brw-r----- 1 root operator 4, 3 Sep 28 12:01 /dev/da0d brw-r----- 1 root operator 4, 4 Sep 28 12:01 /dev/da0e brw-r----- 1 root operator 4, 5 Sep 28 12:01 /dev/da0f brw-r----- 1 root operator 4, 6 Sep 28 12:01 /dev/da0g brw-r----- 1 root operator 4, 7 Sep 28 12:01 /dev/da0h brw-r----- 1 root operator 4, 0x00020002 Sep 28 12:01 /dev/da0s1 brw-r----- 1 root operator 4, 0x00020000 Sep 28 12:01 /dev/da0s1a brw-r----- 1 root operator 4, 0x00020001 Sep 28 12:01 /dev/da0s1b brw-r----- 1 root operator 4, 0x00020002 Sep 28 12:01 /dev/da0s1c brw-r----- 1 root operator 4, 0x00020003 Sep 28 12:01 /dev/da0s1d brw-r----- 1 root operator 4, 0x00020004 Sep 28 12:01 /dev/da0s1e brw-r----- 1 root operator 4, 0x00020005 Sep 28 12:01 /dev/da0s1f brw-r----- 1 root operator 4, 0x00020006 Sep 28 12:01 /dev/da0s1g brw-r----- 1 root operator 4, 0x00020007 Sep 28 12:01 /dev/da0s1h brw-r----- 1 root operator 4, 0x00030002 Sep 28 12:01 /dev/da0s2 brw-r----- 1 root operator 4, 0x00040002 Sep 28 12:01 /dev/da0s3 brw-r----- 1 root operator 4, 0x00050002 Sep 28 12:01 /dev/da0s4
In particular, if the /dev/da0s1a devices are missing, we have seen panics.
Technical explanation: The function vinum_scandisk was trying to read from a drive which had been invalidated. This caused a null pointer dereference.
Status (13 October 1999): Fixed in 4.0-CURRENT (file vinumio.c, revision 1.45) and 3.3-STABLE (file vinumio.c, revision 1.7.2.11). If you're running 3.3-RELEASE, you should upgrade to 3.3-STABLE or 3.4-RELEASE.
Will not be fixed in older releases, if indeed it exists there.
Fatal trap 12: page fault while in kernel mode fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0x0
Note particularly the instruction pointer value. This happens during normal file access. In all cases I have investigated, the system was using soft updates. I have one report of a similar panic without soft updates, but it hasn't been substantiated.
This problem doesn't happen everywhere, but where it does, it's quite reliable. If you're planning to use this constellation, be sure to test carefully.
Technical explanation: A buffer header gets corrupted between the time the top half of the driver issues the request to the disk driver, and when the I/O completes. Currently, the evidence is pointing towards the disk driver, but the corruption is of such an unusual nature that it's difficult to guess what's going on.
Status: Fix committed to FreeBSD-CURRENT on 5 January 2000 and to -STABLE on 11 May 2000.
Technical explanation: A deadlock arose between code locking stripes on a RAID-5 plex (vrlock) and code waiting for buffers to be freed (flswai).
Status: Fix committed to FreeBSD-CURRENT on 5 January 2000 and to -STABLE on 11 May 2000.
Technical explanation: This was due to sloppy coding (incorrect use of the buffered I/O routines).
Status (29 February 2000): Fixed.