Previous Page
Next Page

Planning Your SVM Configuration

When designing your storage configuration, keep the following guidelines in mind:

  • Striping generally has the best performance, but it offers no data protection. For write-intensive applications, RAID 1 generally has better performance than RAID 5.

  • RAID 1 and RAID 5 volumes both increase data availability, but they both generally result in lower performance, especially for write operations. Mirroring does improve random read performance.

  • RAID 5 requires less disk space, therefore RAID 5 volumes have a lower hardware cost than RAID 1 volumes. RAID 0 volumes have the lowest hardware cost.

  • Identify the most frequently accessed data, and increase access bandwidth to that data with mirroring or striping.

  • Both stripes and RAID 5 volumes distribute data across multiple disk drives and help balance the I/O load.

  • Use available performance monitoring capabilities and generic tools such as the iostat command to identify the most frequently accessed data. Once identified, the "access bandwidth" to this data can be increased using striping.

  • A RAID 0 stripe's performance is better than that of a RAID 5 volume, but RAID 0 stripes do not provide data protection (redundancy).

  • RAID 5 volume performance is lower than stripe performance for write operations because the RAID 5 volume requires multiple I/O operations to calculate and store the parity.

  • For raw random I/O reads, the RAID 0 stripe and the RAID 5 volume are comparable. Both the stripe and RAID 5 volume split the data across multiple disks, and the RAID 5 volume parity calculations aren't a factor in reads except after a slice failure.

  • For raw random I/O writes, a stripe is superior to RAID 5 volumes.

Exam Alert

RAID Solutions You might get an exam question that describes an application and then asks which RAID solution would be best suited for it. For example, a financial application with mission-critical data would require mirroring to provide the best protection for the data, whereas a video editing application would require striping for the pure performance gain. Make sure you are familiar with the pros and cons of each RAID solution.


Using SVM, you can utilize volumes to provide increased capacity, higher availability, and better performance. In addition, the hot spare capability provided by SVM can provide another level of data availability for mirrors and RAID 5 volumes. Hot spares were described earlier in this chapter.

After you have set up your configuration, you can use Solaris utilities such as iostat, metastat, and metadb to report on its operation. The iostat utility is used to provide information on disk usage and will show you which metadevices are being heavily utilized, while the metastat and metadb utilities provide status information on the metadevices and state databases, respectively. As an example, the output shown below provides information from the metastat utility whilst two mirror metadevices are being synchronized:

# metastat -i
d60: Mirror
    Submirror 0: d61
      State: Okay
    Submirror 1: d62
      State: Resyncing
    Resync in progress: 15 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 10462032 blocks (5.0 GB)d61: Submirror of d60
    State: Okay
    Size: 10462032 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t3d0s4          0     No            Okay   Yes
d62: Submirror of d60
    State: Resyncing
    Size: 10462032 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s5          0     No            Okay   Yes
d50: Mirror
    Submirror 0: d51
      State: Okay
    Submirror 1: d52
      State: Resyncing
    Resync in progress: 26 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 4195296 blocks (2.0 GB)
d51: Submirror of d50
    State: Okay
    Size: 4195296 blocks (2.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t3d0s3          0     No            Okay   Yes
d52: Submirror of d50
    State: Resyncing
    Size: 4195296 blocks (2.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s4          0     No            Okay   Yes
Device Relocation Information:
Device   Reloc  Device ID
c0t1d0   Yes    id1,dad@ASAMSUNG_SP0411N=S01JJ60X901935
c0t0d0   Yes    id1,dad@AWDC_AC310200R=WD-WT6750311269
#

Notice from the preceding output that there are two mirror metadevices, each containing two submirror component metadevicesd60 contains submirrors d61 and d62, and d50 contains submirrors d51 and d52. It can be seen that the metadevices d52 and d62 are in the process of resynchronization. Use of this utility is important as there could be a noticeable degradation of service during the resynchronization operation on these volumes, which can be closely monitored as metastat also displays the progress of the operation, in percentage complete terms. Further information on these utilities is available from the online manual pages.

You can also use SVM's Simple Network Management Protocol (SNMP) trap generating daemon to work with a network monitoring console to automatically receive SVM error messages. Configure SVM's SNMP trap to trap the following instances:

  • A RAID 1 or RAID 5 subcomponent goes into "needs maintenance" state. A disk failure or too many errors would cause the software to mark the component as "needs maintenance."

  • A hot spare volume is swapped into service.

  • A hot spare volume starts to resynchronize.

  • A hot spare volume completes resynchronization.

  • A mirror is taken offline.

  • A disk set is taken by another host and the current host panics.

The system administrator is now able to receive, and monitor, messages from SVM when an error condition or notable event occurs. All operations that affect SVM volumes are managed by the metadisk driver, which is described in the next section.

Metadisk Driver

The metadisk driver, the driver used to manage SVM volumes, is implemented as a set of loadable pseudo device drivers. It uses other physical device drivers to pass I/O requests to and from the underlying devices. The metadisk driver operates between the file system and application interfaces and the device driver interface. It interprets information from both the UFS or applications and the physical device drivers. After passing through the metadevice driver, information is received in the expected form by both the file system and the device drivers. The metadevice is a loadable device driver, and it has all the same characteristics as any other disk device driver.

The volume name begins with "d" and is followed by a number. By default, there are 128 unique metadisk devices in the range of 0 to 127. Additional volumes, up to 8192, can be added to the kernel by editing the /kernel/drv/md.conf file. The meta block device accesses the disk using the system's normal buffering mechanism. There is also a character (or raw) device that provides for direct transmission between the disk and the user's read or write buffer. The names of the block devices are found in the /dev/md/dsk directory, and the names of the raw devices are found in the /dev/md/rdsk directory. The following is an example of a block and raw logical device name for metadevice d0:

/dev/md/dsk/d0   - block metadevice d0
/dev/md/rdsk/d0  - raw metadevice d0

You must have root access to administer SVM or have equivalent privileges granted through RBAC. (RBAC is described in Chapter 11, "Controlling Access and Configuring System Messaging.")

SVM Commands

There are a number of SVM commands that will help you create, monitor, maintain and remove metadevices. All the commands are delivered with the standard Solaris 10 Operating Environment distribution. Table 10.5 briefly describes the function of the more frequently used commands that are available to the system administrator.

Table 10.5. Solaris Volume Manager Commands

Command

Description

metaclear

Used to delete metadevices and can also be used to delete hot spare pools.

metadb

Used to create and delete the state database and its replicas.

metadetach

Used to detach a metadevice, typically removing one half of a mirror.

metadevadm

Used to update the metadevice information, an example being if a disk device changes its target address (ID).

metahs

Used to manage hot spare devices and hot spare pools.

metainit

Used to configure metadevices. You would use metainit to create concatenations or striped metadevices.

metattach

Used to attach a metadevice, typically used when creating a mirror or adding additional mirrors.

metaoffline

Used to place submirrors in an offline state.

metaonline

Used to place submirrors in an online state.

metareplace

Used to replace components of submirrors or RAID5 metadevices. You would use metareplace when replacing a failed disk drive.

metarecover

Used to recover soft partition information.

metaroot

Used to set up the system files for the root metadevice. metaroot adds an entry to /etc/system and also updates /etc/vfstab to reflect the new device to use to mount the root (/) file system.

metastat

Used to display the status of a metadevice, all metadevices, or hot spare pools.


Note

Where They Live The majority of the SVM commands reside in the /usr/sbin directory, although you should be aware that metainit, metadb, metastat, metadevadm, and metarecover reside in /sbinthere are links to these commands in /usr/sbin as well.


Note

No More metatool You should note that the metatool command is no longer available in Solaris 10. Similar functionalitymanaging metadevices through a graphical utilitycan be achieved using the Solaris Management Console (SMC), specifically the Enhanced Storage section.


Creating the State Database

The SVM state database contains vital information on the configuration and status of all volumes, hot spares, and disk sets. There are normally multiple copies of the state database, called replicas, and it is recommended that state database replicas be located on different physical disks, or even different controllers if possible, to provide added resilience.

The state database, together with its replicas, guarantees the integrity of the state database by using a majority consensus algorithm. The algorithm used by SVM for database replicas is as follows:

  • The system will continue to run if at least half of the state database replicas are available.

  • The system will panic if fewer than half of the state database replicas are available.

  • The system cannot reboot into multi-user mode unless a majority (half+1) of the total number of state database replicas are available.

Note

No Automatic Problem Detection The SVM software does not detect problems with state database replicas until there is a change to an existing SVM configuration and an update to the database replicas is required. If insufficient state database replicas are available, you'll need to boot to single-user mode, and delete or replace enough of the corrupted or missing database replicas to achieve a quorum.


If a system crashes and corrupts a state database replica then the majority of the remaining replicas must be available and consistent; that is, half + 1. This is why at least three state database replicas must be created initially to allow for the majority algorithm to work correctly.

You also need to put some thought into the placement of your state database replicas. The following are some guidelines:

  • When possible, create state database replicas on a dedicated slice that is at least 4MB in size for each database replica that it will store.

  • You cannot create state database replicas on slices containing existing file systems or data.

  • When possible, place state database replicas on slices that are on separate disk drives. If possible, use drives that are on different host bus adapters.

  • When distributing your state database replicas, follow these rules:

    • Create three replicas on one slice for a system with a single disk drive. Realize, however, if the drive fails, all your database replicas will be unavailable and your system will crash.

    • Create two replicas on each drive for a system with two to four disk drives.

    • Create one replica on each drive for a system with five or more drives.

The state database and its replicas are managed using the metadb command. The syntax of this command is

/sbin/metadb -h
     /sbin/metadb [-s setname]
     /sbin/metadb [-s setname] -a [-f] [-k system-file] mddbnn
     /sbin/metadb [-s setname] -a [-f] [-k system-file] [- c number]\[-l length] slice...
     /sbin/metadb [-s setname] -d [-f] [-k system-file] mddbnn
     /sbin/metadb [-s setname] -d [-f] [-k system-file] slice...
     /sbin/metadb [-s setname] -i
     /sbin/metadb [-s setname] -p [-k system-file] [mddb.cf-file]

Table 10.6 describes the options available for the metadb command.

Table 10.6. metadb Command Options

Option

Description

-a

Specifies the creation of a new database replica.

-c <number>

Specifies the number of replicas to be created on each device. The default is 1.

-d

Deletes all the replicas that are present in the specified slice.

-f

Forces the creation of the first database replica (when used in conjunction with the -a option) and the deletion of the last remaining database replica (when used in conjunction with the -d option).

-h

Displays the usage message.

-i

Displays status information about all database replicas.

-k <system-file>

Specifies the name of the kernel file where the replica information should be written; by default, this is /kernel/drv/md.conf.

-l <length>

Specifies the size (in blocks) of each replica. The default length is 8,192 blocks.

-p

Specifies that the system file (default is /kernel/drv/md.conf) should be updated with entries from /etc/lvm/mddb.cf.

-s <setname>

Specifies the name of the diskset on which metadb should run.

slice

Specifies the disk slice to use; for example, /dev/dsk/c0t0d0s6.


In the following example, I have reserved a slice (slice 4) on each of two disks to hold the copies of the state database, and I'll create two copies in each reserved disk slice, giving a total of four state database replicas. In this scenario, the failure of one disk drive will result in a loss of more than half of the operational state database replicas, but the system will continue to function. The system will panic only when more than half of the database replicas are lost. For example, if I had created only three database replicas and the drive containing two of the replicas fails, the system will panic.

To create the state database and its replicas, using the reserved disk slices, enter the following command:

# metadb -a -f -c2 c0t0d0s4 c0t1d0s4

Here, -a indicates a new database is being added, -f forces the creation of the initial database, -c2 indicates that two copies of the database are to be created, and the two cxtxdxsx enTRies describe where the state databases are to be physically located. The system returns the prompt; there is no confirmation that the database has been created.

The following example demonstrates how to remove the state database replicas from two disk slices, namely c0t0d0s4 and c0t1d0s4:

# metadb -d c0t0d0s4 c0t1d0s4

The next section shows how to verify the status of the state database.

Monitoring the Status of the State Database

When the state database and its replicas have been created, you can use the metadb command, with no options, to see the current status. If you use the -i flag then you will also see a description of the status flags.

Examine the state database as shown here:

# metadb -i
        flags           first blk       block count
     a m  p  luo        16              8192           /dev/dsk/c0t0d0s4
     a    p  luo        8208            8192           /dev/dsk/c0t0d0s4
     a    p  luo        16              8192           /dev/dsk/c0t1d0s4
     a    p  luo        8208            8192           /dev/dsk/c0t1d0s4
 r - replica does not have device relocation information
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors

Each line of output is divided into the following fields:

  • flagsThis field will contain one or more state database status letters. A normal status is a "u" and indicates that the database is up-to-date and active. Uppercase status letters indicate a problem and lowercase letters are informational only.

  • first blkThe starting block number of the state database replica in its partition. Multiple state database replicas in the same partition will show different starting blocks.

  • block countThe size of the replica in disk blocks. The default length is 8192 blocks (4MB), but the size could be increased if you anticipate creating more than 128 metadevices, in which case, you would need to increase the size of all state databases.

The last field in each state database listing is the path to the location of the state database replica.

As the code shows, there is one master replica; all four replicas are active and up to date and have been read successfully.

Recovering from State Database Problems

SVM requires that at least half of the state database replicas must be available for the system to function correctly. When a disk fails or some of the state database replicas become corrupt, they must be removed with the system at the Single User state, to allow the system to boot correctly. When the system is operational again (albeit with fewer state database replicas), additional replicas can again be created.

The following example shows a system with two disks, each with two state database replicas on slices c0t0d0s7 and c0t1d0s7.

If we run metadb -i, we can see that the state database replicas are all present and working correctly:

# metadb -i
        flags           first blk       block count
     a m  p  luo        16              8192           /dev/dsk/c0t0d0s7
     a    p  luo        8208            8192           /dev/dsk/c0t0d0s7
     a    p  luo        16              8192           /dev/dsk/c0t1d0s7
     a    p  luo        8208            8192           /dev/dsk/c0t1d0s7
 r - replica does not have device relocation information
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors

Subsequently, a disk failure or corruption occurs on the disk c0t1d0 and renders the two replicas unusable. The metadb -i command shows that there are write errors on the two replicas on c0t1d0s7:

metadb -i
        flags           first blk       block count
     a m  p  luo        16              8192           /dev/dsk/c0t0d0s7
     a    p  luo        8208            8192           /dev/dsk/c0t0d0s7
     M    p             16              unknown        /dev/dsk/c0t1d0s7
     M    p             8208            unknown        /dev/dsk/c0t1d0s7
 r - replica does not have device relocation information
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors

When the system is rebooted, the following messages appear:

Insufficient metadevice database replicas located.
Use metadb to delete databases which are broken.
Ignore any Read-only file system error messages.
Reboot the system when finished to reload the metadevice database.
After reboot, repair any broken database replicas which were deleted.

To repair the situation, you will need to be in single-user mode, so boot the system with -s and then remove the failed state database replicas on c0t1d0s7.

# metadb -d c0t1d0s7

Now reboot the system againit will boot with no problems, although you now have fewer state database replicas. This will enable you to repair the failed disk and re-create the metadevice state database replicas.

Creating a Concatenated Volume

You create a simple volume when you want to place an existing file system under SVM control. The command to create a simple volume is metainit. Here is the syntax for metainit:

/sbin/metainit -h
/sbin/metainit [generic options]  concat/stripe  numstripes
/sbin/metainit [generic options]  mirror   -m submirror
/sbin/metainit [generic options]  RAID  -r component... [-i interlace]
/sbin/metainit [generic options]  -a
/sbin/metainit [generic options]  softpart -p [-e]  component size
/sbin/metainit -r

Table 10.7 describes the options available for the metainit command.

Table 10.7. metainit Command Options

Option

Description

-f

A generic option that forces the metainit command to continue even if one of the slices contains a mounted file system or is being used as swap. This option is necessary if you are configuring mirrors on root (/), swap, or /usr.

-h

A generic option that displays a usage message.

-n

A generic option that checks the syntax of the command without actually executing it.

-r

A generic option that is used in shell scripts to set up all metadevices that were previously enabled before the system either crashed or was shut down. Information about previously configured metadevices is obtained from the state database.

concat/stripe

Specifies the name (dxxx) of the concatenation, stripe, or concat/stripe being defined.

numstripes

Specifies the number of stripes in the metadevice. For a simple stripe, this will be 1.

component

Specifies the logical name of the physical disk slice being configured, such as /dev/dsk/c0t0d0s0. For a RAID5 stripe, there must be a minimum of three slices.

mirror -m submirror

Specifies the metadevice name of the mirror. The -m indicates that a mirror is being configured, and submirror identifies the metadevice that creates the initial one-way mirror.

RAID -r

Specifies the name of the RAID5 metadevice. The -r indicates that the configuration is a RAID5 metadevice.

-i <interlace>

Specifies the interlace parameter. This tells SVM how much data to write to a stripe or RAID5 metadevice before moving on to the next component in the stripe or RAID5 metadevice. The default is 16k.

-a

Activates all the metadevices specified in the /etc/lvm/md.tab file.


In the following example, a simple concatenation metadevice will be created using the disk slice /dev/dsk/c0t0d0s5. The metadevice will be named d100:

# metainit -f d100 1 1 c0t0d0s5
d100: Concat/Stripe is setup

Monitoring the Status of a Volume

Solaris Volume Manager provides the metastat command to monitor the status of all volumes. The syntax of this command is as follows:

/usr/sbin/metastat -h
/usr/sbin/metastat [-a] [-B] [-c] [-i] [-p] [-q] [-s setname] component

Table 10.8 describes the options for the metastat command.

Table 10.8. metastat Command Options

Option

Description

-a

Displays the metadevices for all disksets owned by the current host.

-B

Displays the status of all 64-bit metadevices and hot spares.

-c

Displays concise output, only one line per metadevice.

-h

Displays a usage message.

-i

Checks the status of RAID1 (mirror) volumes as well as RAID5 and hot spares.

-p

Displays the list of active metadevices and hot spare pools. The output is displayed in the same format as the configuration file md.tab.

-q

Displays the status of metadevices, but without the device relocation information.

-s <diskset>

Restricts the status to that of the specified diskset.

-t

Displays the status and timestamp of the metadevices and hot spares. The timestamp shows the date and time of the last state change.

component

Specifies the component or metadevice to restrict the output. If this option is omitted, the status of all metadevices is displayed.


In the following example, the metastat command is used to display the status of a single metadevice, d100:

# metastat d100
d100: Concat/Stripe
    Size: 10489680 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s5          0     No            Okay   Yes

Device Relocation Information:
Device   Reloc  Device ID
c0t1d0   Yes    id1,dad@ASAMSUNG_SP0411N=S01JJ60X901935

In the next example, the metastat -c command displays the status for the same metadevice (d100), but this time in concise format:

# metastat -c d100
d100              s  5.0GB c0t0d0s5

Creating a Soft Partition

Soft partitions are used to divide large partitions into smaller areas, or extents, without the limitations imposed by hard slices. The soft partition is created by specifying a start block and a block size. Soft partitions differ from hard slices created using the format command because soft partitions can be non-contiguous, whereas a hard slice is contiguous. Therefore, soft partitions can cause I/O performance degradation.

A soft partition can be built on a disk slice or another SVM volume, such as a concatenated device. You'll create soft partitions using the SVM command metainit. For example, let's say that we have a hard slice named c2t1d0s1 that is 10GB in size and was created using the format command. To create a soft partition named d10 which is 1GB in size, and assuming that you've already created the required database replicas, issue the following command:

# metainit d10 -p c2t1d0s1 1g

The system responds with

d10: Soft Partition is setup

View the soft partition using the metastat command:

# metastat d10
d10: Soft Partition
    Device: c2t1d0s1
    State: Okay
    Size: 2097152 blocks (1.0 GB)
        Device     Start Block  Dbase Reloc
        c2t1d0s1      25920     Yes   Yes

        Extent              Start Block              Block count
             0                    25921                  2097152

Device Relocation Information:
Device   Reloc  Device ID
c2t1d0   Yes    id1,sd@SIBM_____DDRS34560SUN4.2G564442__________

Create a file system on the soft partition using the newfs command as follows:

# newfs /dev/md/rdsk/d10

Now you can mount a directory named /data onto the soft partition as follows:

# mount /dev/md/dsk/d10 /data

To remove the soft partition named d10, unmount the file system that is mounted to the soft partition and issue the metaclear command as follows:

# metaclear d10

Caution

Removing the soft partition destroys all data that is currently stored on that partition.


The system responds with

d10: Soft Partition is cleared

Expanding an SVM Volume

With SVM, you can increase the size of a file system while it is active and without unmounting the file system. The process of expanding a file system consists of first increasing the size of the SVM volume, and then growing the file system that has been created on the partition. In Step by Step 10.1, I'll increase the size of a soft partition and the file system mounted on it.

Step By Step 10.1: Increasing the Size of a Mounted File System

1.
Check the current size of the /data file system, as follows:

# df -h /data
Filesystem            size   used  avail capacity  Mounted on
/dev/md/dsk/d10       960M   1.0M   901M     1%    /data

Note that the size of /data is currently 960MB.

A metastat -c shows the size as 12GB:

# metastat -c d10
d10              p  1.0GB c2t1d0s1

2.
Use the metattach command to increase the SVM volume named d10 from 1GB to 2GB as follows:

# metattach d10 1gb

Another metastat -c shows that the soft partition is now 2GB, as follows:

# metastat -c d10
d10              p  2.0GB c2t1d0s1

Check the size of /data again, and note that the size did not change:

# df -h /data
Filesystem             size   used  avail capacity  Mounted on
/dev/md/dsk/d10        960M   1.0M   901M     1%    /data

3.
To increase the mounted file system /data, use the growfs command as follows:

# growfs -M /data /dev/md/rdsk/d10
Warning: 416 sector(s) in last cylinder unallocated
/dev/md/rdsk/d10:       4194304 sectors in 1942 cylinders of 16
 tracks,
135 sectors
        2048.0MB in 61 cyl groups (32 c/g, 33.75MB/g, 16768 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 69296, 138560, 207824, 277088, 346352, 415616, 484880,
 554144, 623408,
 3525584, 3594848, 3664112, 3733376, 3802640, 3871904, 3941168,
 4010432,
 4079696, 4148960,

Another df -h /data command shows that the /data file system has been increased as follows:

# df -h /data
Filesystem             size   used  avail capacity  Mounted on
/dev/md/dsk/d10        1.9G   2.0M   1.9G     1%    /data


Soft partitions can be built on top of concatenated devices, and you can increase a soft partition as long as there is room on the underlying metadevice. For example, you can't increase a 1GB soft partition if the metadevice on which it is currently built is only 1GB in size. However, you could add another slice to the underlying metadevice d9.

In Step by Step 10.2 we will create an SVM device on c2t1d0s1 named d9 that is 4GB in size. We then will create a 3GB soft partition named d10 built on this device. To add more space to d10, we first need to increase the size of d9, and the only way to accomplish this is to add more space to d9, as described in the Step by Step.

Step By Step 10.2: Concatenate a New Slice to an Existing Slice

1.
Log in as root and create metadbs as described earlier in this chapter.

2.
Use the metainit command to create a simple SVM volume on c2t1d0s1 as follows:

# metainit d9 1 1 c2t1d0s1
d9: Concat/Stripe is setup

Use the metastat command to view the simple metadevice named d9 as follows:

# metastat d9
d9: Concat/Stripe
    Size: 8311680 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c2t1d0s1      25920     Yes           Okay   Yes
Device Relocation Information:
Device   Reloc  Device ID
c2t1d0   Yes    id1,sd@SIBM_____DDRS34560SUN4.2G564442__________

3.
Create a 3GB soft partition on top of the simple device as follows:

# metainit d10 -p d9 3g
d10: Soft Partition is setup

4.
Before we can add more space to d10, we first need to add more space to the simple volume by concatenating another 3.9GB slice (c2t2d0s1) to d9 as follows:

# metattach d9 c2t2d0s1
d9: component is attached

The metastat command shows the following information about d9:

# metastat d9
d9: Concat/Stripe
    Size: 16670880 blocks (7.9 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c2t1d0s1      25920     Yes           Okay   Yes
    Stripe 1:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c2t2d0s1          0     No            Okay   Yes

Device Relocation Information:
Device   Reloc  Device ID
c2t1d0   Yes    id1,sd@SIBM_____DDRS34560SUN4.2G564442__________
c2t2d0   Yes    id1,sd@SIBM_____DDRS34560SUN4.2G3Z1411__________

Notice that the metadevice d9 is made up of two disk slices (c2t1d0s1 and c2t2d0s1) and that the total size of d9 is now 7.9GB.

5.
Now we can increase the size of the metadevice d10 using the metattach command described in Step by Step 10.1.


Creating a Mirror

A mirror is a logical volume that consists of more than one metadevice, also called a submirror. In this example, there are two physical disks: c0t0d0 and c0t1d0. Slice 5 is free on both disks, which will comprise the two submirrors, d12 and d22. The logical mirror will be named d2; it is this device that will be used when a file system is created. Step by Step 10.3 details the whole process:

Step By Step 10.3: Creating a Mirror

1.
Create the two simple metadevices that will be used as submirrors first.

# metainit d12 1 1 c0t0d0s5
d12: Concat/Stripe is setup
# metainit d22 1 1 c0t1d0s5
d22: Concat/Stripe is setup

2.
Having created the submirrors, now create the actual mirror device, d2, but only attach one of the submirrorsthe second submirror will be attached manually.

# metainit d2 -m d12
d2: Mirror is setup

At this point, a one-way mirror has been created.

3.
Now attach the second submirror to the mirror device, d2.

# metattach d2 d22
d2: Submirror d22 is attached

At this point, a two-way mirror has been created and the second submirror will be synchronized with the first submirror to ensure they are both identical.

Caution

It is not recommended to create a mirror device and specify both submirrors on the command line, because even though it will work, there will not be a resynchronization between the two submirrors, which could lead to data corruption.

4.
Verify that the mirror has been created successfully and that the two submirrors are being synchronized.

# metastat
d2: Mirror
    Submirror 0: d12
      State: Okay
    Submirror 1: d22
      State: Resyncing
    Resync in progress: 27 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 4194828 blocks (2.0 GB)
d12: Submirror of d2
    State: Okay
    Size: 4194828 blocks (2.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s5          0     No            Okay   Yes

d22: Submirror of d2
    State: Resyncing
    Size: 4194828 blocks (2.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s5          0     No            Okay   Yes

Notice that the status of d12, the first submirror, is Okay, and that the second submirror, d22, is currently resyncing, and is 27% complete. The mirror is now ready for use as a file system.

5.
Create a UFS file system on the mirrored device:

# newfs /dev/md/rdsk/d2
newfs: construct a new file system /dev/md/rdsk/d2: (y/n)? y
Warning: 4016 sector(s) in last cylinder unallocated
/dev/md/rdsk/d2:        4194304 sectors in 1029 cylinders of 16
 tracks, 255
sectors
        2048.0MB in 45 cyl groups (23 c/g, 45.82MB/g, 11264 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 94128, 188224, 282320, 376416, 470512, 564608, 658704,
 752800, 846896,
 3285200, 3379296, 3473392, 3567488, 3661584, 3755680, 3849776,
 3943872,
 4037968, 4132064,

Note that it is the d2 metadevice that has the file system created on it.

6.
Run fsck on the newly created file system before attempting to mount it. This step is not absolutely necessary, but is good practice because it verifies the state of a file system before it is mounted for the first time:

# fsck /dev/md/rdsk/d2
** /dev/md/rdsk/d2
** Last Mounted on
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2 files, 9 used, 2033046 free (14 frags, 254129 blocks, 0.0%
 fragmentation)

The file system can now be mounted in the normal way. Remember to edit /etc/vfstab to make the mount permanent. Remember to use the md device and for this example, we'll mount the file system on /mnt.

# mount /dev/md/dsk/d2 /mnt
#


Unmirroring a Non-Critical File System

This section details the procedure for removing a mirror on a file system that can be removed and remounted without having to reboot the system. Step by Step 10.4 shows how to achieve this. This example uses a file system, /test, that is currently mirrored using the metadevice, d2; a mirror that consists of d12 and d22. The underlying disk slice for this file system is /dev/dsk/c0t0d0s5:

Step By Step 10.4: Unmirror a Non-Critical File System

1.
Unmount the /test file system.

# umount /test

2.
Detach the submirror, d12, that is going to be used as a UFS file system.

# metadetach d2  d12
d2: submirror d12 is detached

3.
Delete the mirror (d2) and the remaining submirror (d22).

# metaclear -r d2
d2: Mirror is cleared
d22: Concat/Stripe is cleared

At this point, the file system is no longer mirrored. It is worth noting that the metadevice, d12, still exists and can be used as the device to mount the file system. Alternatively, the full device name, /dev/dsk/c0t0d0s5, can be used if you do not want the disk device to support a volume. For this example, we will mount the full device name (as you would a normal UFS file system), so we will delete the d12 metadevice first.

4.
Delete the d12 metadevice:

# metaclear d12
d22: Concat/Stripe is cleared

5.
Edit /etc/vfstab to change the entry:

/dev/md/dsk/d2    /dev/md/rdsk/d2    /test  ufs  2  yes  -

to

/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /test  ufs  2  yes  -

6.
Remount the /test file system:

# mount /test


Mirroring the Root File System

In this section we will create another mirror, but this time it will be the root file system. This is different from Step by Step 10.3 because we are mirroring an existing file system that cannot be unmounted. We can't do this while the file system is mounted, so we'll configure the metadevice and a reboot will be necessary to implement the logical volume and to update the system configuration file. The objective is to create a two-way mirror of the root file system, currently residing on /dev/dsk/c0t0d0s0. We will use a spare disk slice of the same size, /dev/dsk/c0t1d0s0, for the second submirror. The mirror will be named d0, and the submirrors will be d10 and d20. Additionally, because this is the root (/) file system, we'll also configure the second submirror as an alternate boot device, so that this second slice can be used to boot the system if the primary slice becomes unavailable. Step by Step 10.5 shows the procedure to follow:

Step By Step 10.5: Mirror the root File System

1.
Verify that the current root file system is mounted from /dev/dsk/c0t0d0s0.

# df -h /
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c0t0d0s0      4.9G   3.7G   1.2G    77%    /

2.
Create the state database replicas, specifying the disk slices c0t0d0s4 and c0t0d0s5. We will be creating two replicas on each slice.

# metadb -a -f -c2 c0t0d0s4 c0t1d0s4

3.
Create the two submirrors, d10 and d20.

# metainit -f d10 1 1 c0t0d0s0
d10: Concat/Stripe is setup
# metainit d20 1 1 c0t1d0s0
d20: Concat/Stripe is setup

Note that the -f option was used in the first metainit command. This is the option to force the execution of the command, because we are creating a metadevice on an existing, mounted file system. The -f option was not necessary in the second metainit command because the slice is currently unused.

4.
Create a one-way mirror, d0, specifying d10 as the submirror to attach.

# metainit d0 -m d10
d0: Mirror is setup

5.
Set up the system files to support the new metadevice, after taking a backup copy of the files that will be affected. It is a good idea to name the copies with a relevant extension, so that they can be easily identified if you later have to revert to the original files, if problems are encountered. We will use the .nosvm extension in this step by step.

# cp /etc/system /etc/system.nosvm
# cp /etc/vfstab /etc/vfstab.nosvm
# metaroot d0

The metaroot command has added the following lines to the system configuration file, /etc/system, to allow the system to boot with the / file system residing on a logical volume. This command is only necessary for the root device.

* Begin MDD root info (do not edit)
rootdev:/pseudo/md@0:0,0,blk
* End MDD root info (do not edit)

It has also modified the /etc/vfstab entry for the / file system. It now reflects the metadevice to use to mount the file system at boot time:

/dev/md/dsk/d0  /dev/md/rdsk/d0 /       ufs     1       no      -

6.
Synchronize file systems prior to rebooting the system.

# lockfs -fa

The lockfs command is used to flush all buffers so that when the system is rebooted, the file systems are all up to date. This step is not compulsory, but is good practice.

7.
Reboot the system.

# init 6

8.
Verify that the root file system is now being mounted from the metadevice /dev/md/dsk/d0.

# df -h /
Filesystem             size   used  avail capacity  Mounted on
/dev/md/dsk/d0         4.9G   3.7G   1.2G    77%    /

9.
The next step is to attach the second submirror and verify that a resynchronization operation is carried out.

# metattach d0 d20
d0: Submirror d20 is attached
# metastat
d0: Mirror
    Submirror 0: d10
      State: Okay
    Submirror 1: d20
      State: Resyncing
    Resync in progress: 62 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 10462032 blocks (5.0 GB)

d10: Submirror of d0
    State: Okay
    Size: 10462032 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s0          0     No            Okay   Yes

d20: Submirror of d0
    State: Resyncing
    Size: 10462032 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s0          0     No            Okay   Yes

10.
Install a boot block on the second submirror to make this slice bootable. This step is necessary because it is the root (/) file system that is being mirrored.

# installboot /usr/platform/'uname -i'/lib/fs/ufs/bootblk /dev
/rdsk/c0t1d0s0
#

The uname -I command substitutes the system's platform name.

11.
Identify the physical device name of the second submirror. This will be required to assign an OpenBoot alias for a backup boot device.

# ls -l /dev/dsk/c0t1d0s0
lrwxrwxrwx   1 root     root          46 Mar 12  2005 /dev/dsk
/c0t1d0s0 ->\
 ../../devices/pci@1f,0/pci@1,1/ide@3/dad@1,0:a
#

Record the address starting with /pci... and change the dad string to disk, leaving you, in this case, with /pci@1f,0/pci@1,1/ide@3/disk@1,0:a.

12.
For this step you need to be at the ok prompt, so enter init 0 to shut down the system.

# init 0
# svc.startd: The system is coming down.  Please wait.
svc.startd: 74 system services are now being stopped.
[ output truncated ]
ok

Enter the nvalias command to create an alias named backup - root, which points to the address recorded in step 11.

ok nvalias backup-root /pci@1f,0/pci@1,1/ide@3/disk@1,0:a

Now inspect the current setting of the boot-device variable and add the name backup-root as the secondary boot path, so that this device is used before going to the network. When this has been done, enter the nvstore command to save the alias created.

ok printenv boot-device
boot-device = disk net
ok setenv boot-device disk backup-root net
boot-device = disk backup-root net
ok nvstore

13.
The final step is to boot the system from the second submirror to prove that it works. This can be done manually from the ok prompt, as follows:

ok boot backup-root
Resetting ...
[… output truncated]

Rebooting with command: boot backup-root
Boot device: /pci@1f,0/pci@1,1/ide@3/disk@1,0  File and args:
SunOS Release 5.10 Version Generic 64-bit
Copyright 1983-2005 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
[… output truncated]
<hostname> console login:


Unmirroring the Root File System

Unlike Step by Step 10.4, where a file system was unmirrored and remounted without affecting the operation of the system, unmirroring a root file system is different because it cannot be unmounted while the system is running. In this case, it is necessary to perform a reboot to implement the change. Step by Step 10.6 shows how to unmirror the root file system that was successfully mirrored in Step by Step 10.5. This example comprises a mirror, d0, consisting of two submirrors, d10 and d20. The objective is to remount the / file system using its full disk device name, /dev/dsk/c0t0d0s0, instead of using /dev/md/dsk/d0:

Step By Step 10.6: Unmirror the root File System

1.
Verify that the current root file system is mounted from the metadevice /dev/md/dsk/d0.

# df -h /
Filesystem             size   used  avail capacity  Mounted on
/dev/md/dsk/d0         4.9G   3.7G   1.2G    77%    /

2.
Detach the submirror that is to be used as the / file system.

# metadetach d0 d10
d0: Submirror d10 is detached

3.
Set up the /etc/system file and /etc/vfstab to revert to the full disk device name, /dev/dsk/c0t0d0s0.

# metaroot /dev/dsk/c0t0d0s0

Notice that the entry that was added to /etc/system when the file system was mirrored has been removed, and that the /etc/vfstab entry for / has reverted back to /dev/dsk/c0t0d0s0.

4.
Reboot the system to make the change take effect.

# init 6

5.
Verify that the root file system is now being mounted from the full disk device, /dev/dsk/c0t0d0s0.

# df -h /
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c0t0d0s0      4.9G   3.7G   1.2G    77%    /

6.
Remove the mirror, d0, and its remaining submirror, d20.

# metaclear -r d0
d0: Mirror is cleared
d20: Concat/Stripe is cleared

7.
Finally, remove the submirror, d10, that was detached earlier in step 2.

# metaclear d10
d10: Concat/Stripe is cleared


Troubleshooting Root File System Mirrors

Occasionally, a root mirror fails and recovery action has to be taken. Often, only one side of the mirror fails, in which case it can be detached using the metadetach command. You then replace the faulty disk and reattach it. Sometimes though, a more serious problem occurs prohibiting you from booting the system with SVM present. In this case, you have two options available to you. First, temporarily remove the SVM configuration so that you boot from the original c0t0d0s0 device, or second, you boot from a CD-ROM and recover the root file system manually, by carrying out an fsck.

To disable SVM, you must reinstate pre-SVM copies of the files /etc/system and /etc/vfstab. In Step by Step 10.5 we took a copy of these files (step 5). This is good practice and should always be done when editing important system files. Copy these files again, to take a current backup, and then copy the originals back to make them operational, as shown here:

# cp /etc/system /etc/system.svm
# cp /etc/vfstab /etc/vfstab.svm
# cp /etc/system.nosvm /etc/system
# cp /etc/vfstab.nosvm /etc/vfstab

You should now be able to reboot the system to single-user without SVM and recover any failed file systems.

If the preceding does not work, it might be necessary to repair the root file system manually, requiring you to boot from a CD-ROM. Insert the Solaris 10 CD 1 disk (or the Solaris 10 DVD) and shut down the system if it is not already shut down.

Boot to single-user from the CD-ROM as follows:

ok boot cdrom -s

When the system prompt is displayed, you can manually run fsck on the root file system. In this example, I am assuming a root file system exists on /dev/rdsk/c0t0d0s0:

# fsck /dev/rdsk/c0t0d0s0
** /dev/rdsk/c0t0d0s0
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? y
136955 files, 3732764 used, 1404922 free (201802 frags, 150390 blocks, \
3.9% fragmentation)
***** FILE SYSTEM WAS MODIFIED *****

You should now be able to reboot the system using SVM and you should resynchronize the root mirror as soon as the system is available. This can be achieved easily by detaching the second submirror and then reattaching it. The following example shows a mirror d0 consisting of d10 and d20:

# metadetach d0 d20
d0: submirror d20 is detached
# metattach d0 d20
d0: submirror d20 is attached

To demonstrate that the mirror is performing a resynchronization operation, you can issue the metastat command as follows, which will show the progress as a percentage:

# metastat d0
d0: Mirror
    Submirror 0: d10
      State: Okay
    Submirror 1: d20
      State: Resyncing
    Resync in progress: 37 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 10462032 blocks (5.0 GB)

d10: Submirror of d0
    State: Okay
    Size: 10462032 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s0          0     No            Okay   Yes

d20: Submirror of d0
    State: Resyncing
    Size: 10489680 blocks (5.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s0          0     No            Okay   Yes

Device Relocation Information:
Device   Reloc  Device ID
c0t0d0   Yes    id1,dad@AWDC_AC310200R=WD-WT6750311269
c0t1d0   Yes    id1,dad@ASAMSUNG_SP0411N=S01JJ60X901935


Previous Page
Next Page