9.1. Introduction
Understanding
filesystem fundamentals is key to understanding how Linux works.
Everything is a file—data files, partitions, pipes, sockets,
and hardware devices. Directories are simply files that list other
files.
The
Filesystem
Hierarchy Standard (FHS) was developed as a voluntary standard. Most
Linuxes follow it. These are the
required elements of the Linux root
filesystem:
- /
-
Root directory, even though it is always represented at the top
- /bin
-
Essential system commands
- /boot
-
Static boot loader files
- /dev
-
Device files
- /etc
-
Host-specific system configuration files
- /lib
-
Shared libraries needed to run the local system
- /mnt
-
Temporary mount points
- /opt
-
Add-on software packages (not used much in Linux)
- /proc
-
Live kernel snapshot and configuration
- /sbin
-
System administration commands
- /tmp
-
Temporary files—a well-behaved system flushes them between
startups
- /usr
-
Shareable, read-only data and binaries
- /var
-
Variably sized files, such as mail spools and logs
These are considered optional because they can be located anywhere on
a network, whereas the required directories must be present to run
the machine:
- /home
-
User's personal files
- /root
-
Superuser's personal files
The FHS goes into great detail on
each directory, for those who are interested. Here are some things
for the Linux user to keep in mind:
/tmp and /var can go in
their own individual partitions, as a security measure. If something
goes awry and causes them to fill up uncontrollably, they will be
isolated from the rest of the system. /home can go in its own partition, or on its own
dedicated server, for easier backups and to protect it from system
upgrades. You can then completely wipe out and re-install a Linux
system, or even install a different distribution, while leaving
/home untouched. Because all configuration files are in /etc and
/home, backups are simplified. It is possible to
get away with backing up only /etc and
/home and to rely on your installation disks to
take care of the rest. However, this means that program updates will
not be preserved—be sure to consider this when plotting a
disaster-recovery plan.
9.1.1 Linux File Types
Remember that
"everything is a file." There are
seven file types in Linux; everything that goes in the file tree must
be one of the types in Table 9-1.
Table 9-1. File types
Type indicator
|
Type of file
|
---|
-
|
Regular file
|
d
|
Directory
|
l
|
Link
|
c
|
Character device
|
s
|
Socket
|
p
|
Named pipe
|
b
|
Block device
|
The type indicators show up at the very front of the file listings:
# ls -l /dev/initctl
prw------- 1 root root 0 Jan 12 00:00 /dev/initctl
# ls -l /tmp/.ICE-unix/551
srwx------ 1 carla carla 0 Jan 12 09:09 /tmp/.ICE-unix/551
You can specify which file types to look at with the
find command:
# find / -type p
# find / -type s
Ctrl-C interrupts find, if it goes on for too
long.
9.1.2 File Attributes
Take a
look at the
attributes of a file, such as this shell script,
sortusers:
$ ls -l sortusers
-rwxr-xr-x 1 meredydd programmers 3783 Jan 7 13:29 sortusers
-rwxr-xr-x 1 meredydd programmers tells us a lot
of things:
The - means that this is a regular file. This
attribute is not changeable by the user. This is the bit that tells
Linux what the file type is, so it does not need file extensions.
File extensions are for humans and applications. rwx are the file owner's
permissions. The first r-x is the group
owner's permissions. The second r-x applies to anyone with access to
the file, or "the world." 1 is the number of hard links to the file. All
files have at least one, the link from the parent directory. meredydd programmers names the file owner and
the group owner of the file.
"Owner" and
"user" are the same; remember this
when using chmod's symbolic
notation u = user = owner.
Permissions and ownership are attributes that are configurable, with
the chmod,
chgrp, and
chown commands; chmod
changes the permissions, chown and
chgrp change ownership.
All those rwx things look weird, but they are
actually mnemonics: rwx = read, write, execute.
These permissions are applied in order to user, group, and other.
So, in the sortusers example,
meredydd can read, write, and execute the file.
Group members and others may only read and execute. Even though only
meredydd may edit the file itself, nothing is
stopping group and other users from copying it.
Since this is a shell script, both read and execute permissions must
be set, because the interpreter needs to read the file. Binary files
are read by the kernel directly, without an interpreter, so they
don't need read permissions.
9.1.3 File Type Definitions
Let's take a closer look
at what the file types in Linux really are:
- Regular files
-
Plain ole text and data files, or binary executables.
- Directories
-
Lists of files.
- Character and block devices
-
Files that could be considered as meeting points between the kernel,
and device drivers—for example, /dev/hda
(IDE hard drive), /dev/ttyS1 (serial modem), and
so forth. These allow the kernel to correctly route requests for the
various hardware devices on your system.
- Local domain sockets
-
Communications between local processes. They are visible as files but
cannot be read from or written to, except by the processes directly
involved.
- Named pipes
-
Also for local interprocess communications. It is highly unlikely
that a Linux user will ever need to do anything with either
sockets or pipes; they are
strictly system functions. Programmers, however, need to know
everything about them.
- Links
-
Links are of great
interest to Linux users. There are two types: hard
links
and soft links. Links are pointers to files. A hard link
is really just another name for a file, as it points to a specific
inode. All the hard links that point to a file
retain all of the file's
attributes—permissions, ownership, and so on.
rm will happily delete a hard link, but the file
will remain on disk until all hard links are gone and all processes
have released it. Hard links cannot cross filesystems, so you
can't make hard links over a network share. Soft
links point to a filename; they can point to any file, anywhere. You
can even create "dead" soft links
by deleting the files they point to, or changing the names of the
files.
9.1.4 Filesystem Internals
Here
are some more useful definitions relating to filesystems:
- Logical block
-
The smallest unit of storage, measured in
bytes, that can be allocated by the filesystem. A single file may
consume several blocks.
- Logical volume
-
A disk partition, a disk, or a volume
that spans several disks or partitions—any unit of storage that
is perceived as a single, discrete allocation of space.
- Internal fragmentation
-
Empty spaces that occur when a
file, or a portion of a file, does not a fill a block completely. For
example, if the block is 4K, and the file is 1K, 3K are wasted space.
- External fragmentation
-
Fragmentation occurs when the
blocks that belong to a single file are not stored contiguously, but
are scattered all over the disk.
- Extent
-
A
number of contiguous blocks that belong to a single file. The
filesystem sees an extent as a single unit, which is more efficient
for tracking large files.
- B+trees
-
First
there were btrees (balanced trees), which were
improved and became b+trees. These are nifty concepts borrowed from
indexed databases, which make searching and traversing a given data
structure much faster. Filesystems that use this concept are able to
quickly scan the directory tree, first selecting the appropriate
directory, then scanning the contents. The Ext2 filesystem does a
sequential scan, which is slower.
- Metadata
-
Everything
that describes or controls the internal data structures is lumped
under metadata. This includes everything except
the data itself: date and time stamps, owner, group, permissions,
size, links, change time, access time, the location on disk, extended
attributes, and so on.
- Inode
-
Much of a file's metadata is contained in an
inode,
or index node. Every file has a unique inode number.
9.1.5 Journaling Filesystems
Our faithful old
Ext2
filesystem is showing its age. It can't keep up with
users who need terabytes to play with and who need fast recovery from
service interruptions. For the majority of users, who still measure
their storage needs in gigabytes or less, fast recovery and data
integrity are the most important reasons to use a journaling
filesystem.
Linux filesystems are
asynchronous. They do not instantly write
metadata to disk, but rather use a write cache in memory and then
write to disk periodically, during slack CPU moments. This speeds up
overall system performance, but if there is a power failure or system
crash, there can be metadata loss. In this event, when the filesystem
driver kicks in at restart and
fsck (filesystem consistency check) runs, it
finds inconsistencies. Because Ext2 stores multiple copies of
metadata, it is usually able to return the system to health.
The downside to this is recovery time. fsck
checks each and every bit of metadata. This can take from a few
minutes to 30 minutes or more on a large filesystem. Journaling
filesystems do not need to perform this minute, painstaking
inspection, because they keep a journal of changes. They check only
files that have changed, rather than the entire filesystem.
Linux users have a number of great choices for journaling
filesystems, including Ext3, ReiserFS, XFS, and JFS. Ext3 is a
journaling system added to Ext2. ReiserFS, XFS, and JFS are all
capable of handling filesystems that measure in exabytes on 64-bit
platforms. ia32 users are limited to mere
terabytes, I'm afraid.
Which one should you use? There's no definitive
"best" one;
they're all great. Here's a rundown
on the high points:
- Ext3
-
This one is easy and comfortable.
That's what it's designed to be. It
fits right on top of Ext2, so you don't need to
rebuild the system from scratch. All the other filesystems discussed
here must be selected at system installation, or when you format a
partition. You can even have
"buyer's
remorse"—you can remove Ext3 just as easily.
Because it's an extension of Ext2, it uses the same
file utilities package, e2fsprogs. One major
difference between Ext3 and the others is that it uses a fixed number
of inodes, while the others allocate them dynamically. Another
difference is that Ext3 can do data journaling, not just metadata
journaling. This comes at a cost, though, of slower performance and
more disk space consumed. Ext3 runs on any Linux-supported
architecture.
- ReiserFS
-
ReiserFS is especially suited for
systems with lots of small files, such as a mail server using the
maildir format, or a news server.
It's very efficient at file storage; it stuffs
leftover file bits into btree leaf nodes,
instead of wasting block space. This is called "tail
packing." It scales up nicely, and it handles large
files just fine. ReiserFS runs on any Linux-supported architecture.
- JFS
-
This is IBM's entry
in the Way Big Linux Filesystems contest, ported from AIX and OS/2
Warp. It supports multiple processors, access control lists (ACLs),
and—get this—native resizing. That's
right, simply remount a JFS filesystem with the new size you desire,
and it's done. Note that you may only increase the
volume size, not decrease it.
- XFS
-
This is SGI's
brainchild, ported from IRIX. XFS thinks big—it claims it can
handle filesystems of up to nine exabytes. Its strength is handling
very large files, such as giant database files. There is one
excellent feature unique to XFS, called delayed allocation.
It procrastinates. It puts off
actually writing to disk, delaying the decision on which blocks to
write to, so that it can use the largest possible number of
contiguous blocks. When there are a lot of short-term temp files in
use, XFS might never allocate blocks to these at all, in effect
ignoring them until they go away. XFS has its own native support for
quotas, ACLs, and backups and restores.
On a 32-bit system, there's only so much addressing
space available, so the theoretical upper filesystem size limit is 16
terabytes (as of the 2.5 kernel). Calculating the maximum possible
filesystem size depends on hardware, operating system, and block
sizes, so I shall leave that as an exercise to those who really need
to figure out those sort of things.
Another way to while away the hours is to compare performance
benchmarks, or run your own. About all they agree on is that Ext3
really isn't suited for high-performance,
high-demand applications. It's fine for workstations
and light-to-medium-duty servers, but the others are better choices
for high-demand servers.
9.1.6 When Not to Use a Journaling Filesystem
Stick with plain ole Ext2 when you have a
/boot partition and are
running LILO. LILO cannot read any filesytem but
Ext2 or Ext3. The /boot partition is so small,
and so easily backed up and restored, that there's
no advantage to be gained from journaling in any case. You can put a
journaling filesystem on your other partitions; in fact, you can mix
and match all you like, as long as your kernel supports them.
On small partitions or small disks, such as 100-MB Zip disks, the
journal itself consumes a significant amount of disk space. The
ReiserFS journal can take up to 32 MB. Ext3, JFS, and XFS use about 4
MB, but if data journaling is enabled in Ext3, it will eat up a lot
more space.
9.1.7 See Also
|