2009-01-07 22:54:24 +08:00
|
|
|
|
2013-03-27 03:36:12 +08:00
|
|
|
BTRFS
|
|
|
|
=====
|
2009-01-07 22:54:24 +08:00
|
|
|
|
2013-03-27 03:36:12 +08:00
|
|
|
Btrfs is a copy on write filesystem for Linux aimed at
|
2009-01-07 22:54:24 +08:00
|
|
|
implementing advanced features while focusing on fault tolerance,
|
|
|
|
repair and easy administration. Initially developed by Oracle, Btrfs
|
|
|
|
is licensed under the GPL and open for contribution from anyone.
|
|
|
|
|
|
|
|
Linux has a wealth of filesystems to choose from, but we are facing a
|
|
|
|
number of challenges with scaling to the large storage subsystems that
|
|
|
|
are becoming common in today's data centers. Filesystems need to scale
|
|
|
|
in their ability to address and manage large storage, and also in
|
|
|
|
their ability to detect, repair and tolerate errors in the data stored
|
|
|
|
on disk. Btrfs is under heavy development, and is not suitable for
|
|
|
|
any uses other than benchmarking and review. The Btrfs disk format is
|
|
|
|
not yet finalized.
|
|
|
|
|
|
|
|
The main Btrfs features include:
|
|
|
|
|
|
|
|
* Extent based file storage (2^64 max file size)
|
|
|
|
* Space efficient packing of small files
|
|
|
|
* Space efficient indexed directories
|
|
|
|
* Dynamic inode allocation
|
|
|
|
* Writable snapshots
|
|
|
|
* Subvolumes (separate internal filesystem roots)
|
|
|
|
* Object level mirroring and striping
|
|
|
|
* Checksums on data and metadata (multiple algorithms available)
|
|
|
|
* Compression
|
|
|
|
* Integrated multiple device support, with several raid algorithms
|
|
|
|
* Online filesystem check (not yet implemented)
|
|
|
|
* Very fast offline filesystem check
|
|
|
|
* Efficient incremental backup and FS mirroring (not yet implemented)
|
|
|
|
* Online filesystem defragmentation
|
|
|
|
|
|
|
|
|
2013-03-27 03:36:12 +08:00
|
|
|
Mount Options
|
|
|
|
=============
|
|
|
|
|
|
|
|
When mounting a btrfs filesystem, the following option are accepted.
|
|
|
|
Unless otherwise specified, all options default to off.
|
|
|
|
|
|
|
|
alloc_start=<bytes>
|
|
|
|
Debugging option to force all block allocations above a certain
|
|
|
|
byte threshold on each block device. The value is specified in
|
|
|
|
bytes, optionally with a K, M, or G suffix, case insensitive.
|
|
|
|
Default is 1MB.
|
|
|
|
|
|
|
|
autodefrag
|
|
|
|
Detect small random writes into files and queue them up for the
|
|
|
|
defrag process. Works best for small files; Not well suited for
|
|
|
|
large database workloads.
|
|
|
|
|
|
|
|
check_int
|
|
|
|
check_int_data
|
|
|
|
check_int_print_mask=<value>
|
|
|
|
These debugging options control the behavior of the integrity checking
|
|
|
|
module (the BTRFS_FS_CHECK_INTEGRITY config option required).
|
|
|
|
|
|
|
|
check_int enables the integrity checker module, which examines all
|
|
|
|
block write requests to ensure on-disk consistency, at a large
|
|
|
|
memory and CPU cost.
|
|
|
|
|
|
|
|
check_int_data includes extent data in the integrity checks, and
|
|
|
|
implies the check_int option.
|
|
|
|
|
|
|
|
check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values
|
|
|
|
as defined in fs/btrfs/check-integrity.c, to control the integrity
|
|
|
|
checker module behavior.
|
|
|
|
|
|
|
|
See comments at the top of fs/btrfs/check-integrity.c for more info.
|
|
|
|
|
|
|
|
compress
|
|
|
|
compress=<type>
|
|
|
|
compress-force
|
|
|
|
compress-force=<type>
|
|
|
|
Control BTRFS file data compression. Type may be specified as "zlib"
|
|
|
|
"lzo" or "no" (for no compression, used for remounting). If no type
|
|
|
|
is specified, zlib is used. If compress-force is specified,
|
|
|
|
all files will be compressed, whether or not they compress well.
|
|
|
|
If compression is enabled, nodatacow and nodatasum are disabled.
|
|
|
|
|
|
|
|
degraded
|
|
|
|
Allow mounts to continue with missing devices. A read-write mount may
|
|
|
|
fail with too many devices missing, for example if a stripe member
|
|
|
|
is completely missing.
|
|
|
|
|
|
|
|
device=<devicepath>
|
|
|
|
Specify a device during mount so that ioctls on the control device
|
|
|
|
can be avoided. Especialy useful when trying to mount a multi-device
|
|
|
|
setup as root. May be specified multiple times for multiple devices.
|
|
|
|
|
|
|
|
discard
|
|
|
|
Issue frequent commands to let the block device reclaim space freed by
|
|
|
|
the filesystem. This is useful for SSD devices, thinly provisioned
|
|
|
|
LUNs and virtual machine images, but may have a significant
|
|
|
|
performance impact. (The fstrim command is also available to
|
|
|
|
initiate batch trims from userspace).
|
|
|
|
|
|
|
|
enospc_debug
|
|
|
|
Debugging option to be more verbose in some ENOSPC conditions.
|
|
|
|
|
|
|
|
fatal_errors=<action>
|
|
|
|
Action to take when encountering a fatal error:
|
|
|
|
"bug" - BUG() on a fatal error. This is the default.
|
|
|
|
"panic" - panic() on a fatal error.
|
|
|
|
|
|
|
|
flushoncommit
|
|
|
|
The 'flushoncommit' mount option forces any data dirtied by a write in a
|
|
|
|
prior transaction to commit as part of the current commit. This makes
|
|
|
|
the committed state a fully consistent view of the file system from the
|
|
|
|
application's perspective (i.e., it includes all completed file system
|
|
|
|
operations). This was previously the behavior only when a snapshot is
|
|
|
|
created.
|
|
|
|
|
|
|
|
inode_cache
|
|
|
|
Enable free inode number caching. Defaults to off due to an overflow
|
|
|
|
problem when the free space crcs don't fit inside a single page.
|
|
|
|
|
|
|
|
max_inline=<bytes>
|
|
|
|
Specify the maximum amount of space, in bytes, that can be inlined in
|
|
|
|
a metadata B-tree leaf. The value is specified in bytes, optionally
|
|
|
|
with a K, M, or G suffix, case insensitive. In practice, this value
|
|
|
|
is limited by the root sector size, with some space unavailable due
|
|
|
|
to leaf headers. For a 4k sectorsize, max inline data is ~3900 bytes.
|
|
|
|
|
|
|
|
metadata_ratio=<value>
|
|
|
|
Specify that 1 metadata chunk should be allocated after every <value>
|
|
|
|
data chunks. Off by default.
|
|
|
|
|
|
|
|
noacl
|
|
|
|
Disable support for Posix Access Control Lists (ACLs). See the
|
|
|
|
acl(5) manual page for more information about ACLs.
|
|
|
|
|
|
|
|
nobarrier
|
|
|
|
Disables the use of block layer write barriers. Write barriers ensure
|
|
|
|
that certain IOs make it through the device cache and are on persistent
|
|
|
|
storage. If used on a device with a volatile (non-battery-backed)
|
|
|
|
write-back cache, this option will lead to filesystem corruption on a
|
|
|
|
system crash or power loss.
|
|
|
|
|
|
|
|
nodatacow
|
|
|
|
Disable data copy-on-write for newly created files. Implies nodatasum,
|
|
|
|
and disables all compression.
|
|
|
|
|
|
|
|
nodatasum
|
|
|
|
Disable data checksumming for newly created files.
|
|
|
|
|
|
|
|
notreelog
|
|
|
|
Disable the tree logging used for fsync and O_SYNC writes.
|
|
|
|
|
|
|
|
recovery
|
|
|
|
Enable autorecovery attempts if a bad tree root is found at mount time.
|
|
|
|
Currently this scans a list of several previous tree roots and tries to
|
|
|
|
use the first readable.
|
|
|
|
|
|
|
|
skip_balance
|
|
|
|
Skip automatic resume of interrupted balance operation after mount.
|
|
|
|
May be resumed with "btrfs balance resume."
|
|
|
|
|
|
|
|
space_cache (*)
|
|
|
|
Enable the on-disk freespace cache.
|
|
|
|
nospace_cache
|
|
|
|
Disable freespace cache loading without clearing the cache.
|
|
|
|
clear_cache
|
|
|
|
Force clearing and rebuilding of the disk space cache if something
|
|
|
|
has gone wrong.
|
|
|
|
|
|
|
|
ssd
|
|
|
|
nossd
|
|
|
|
ssd_spread
|
|
|
|
Options to control ssd allocation schemes. By default, BTRFS will
|
|
|
|
enable or disable ssd allocation heuristics depending on whether a
|
|
|
|
rotational or nonrotational disk is in use. The ssd and nossd options
|
|
|
|
can override this autodetection.
|
|
|
|
|
|
|
|
The ssd_spread mount option attempts to allocate into big chunks
|
|
|
|
of unused space, and may perform better on low-end ssds. ssd_spread
|
|
|
|
implies ssd, enabling all other ssd heuristics as well.
|
|
|
|
|
|
|
|
subvol=<path>
|
|
|
|
Mount subvolume at <path> rather than the root subvolume. <path> is
|
|
|
|
relative to the top level subvolume.
|
|
|
|
|
|
|
|
subvolid=<ID>
|
|
|
|
Mount subvolume specified by an ID number rather than the root subvolume.
|
|
|
|
This allows mounting of subvolumes which are not in the root of the mounted
|
|
|
|
filesystem.
|
|
|
|
You can use "btrfs subvolume list" to see subvolume ID numbers.
|
|
|
|
|
|
|
|
subvolrootid=<objectid> (deprecated)
|
|
|
|
Mount subvolume specified by <objectid> rather than the root subvolume.
|
|
|
|
This allows mounting of subvolumes which are not in the root of the mounted
|
|
|
|
filesystem.
|
|
|
|
You can use "btrfs subvolume show " to see the object ID for a subvolume.
|
|
|
|
|
|
|
|
thread_pool=<number>
|
|
|
|
The number of worker threads to allocate. The default number is equal
|
|
|
|
to the number of CPUs + 2, or 8, whichever is smaller.
|
|
|
|
|
|
|
|
user_subvol_rm_allowed
|
|
|
|
Allow subvolumes to be deleted by a non-root user. Use with caution.
|
|
|
|
|
|
|
|
MAILING LIST
|
|
|
|
============
|
2009-01-07 22:54:24 +08:00
|
|
|
|
|
|
|
There is a Btrfs mailing list hosted on vger.kernel.org. You can
|
|
|
|
find details on how to subscribe here:
|
|
|
|
|
|
|
|
http://vger.kernel.org/vger-lists.html#linux-btrfs
|
|
|
|
|
|
|
|
Mailing list archives are available from gmane:
|
|
|
|
|
|
|
|
http://dir.gmane.org/gmane.comp.file-systems.btrfs
|
|
|
|
|
|
|
|
|
|
|
|
|
2013-03-27 03:36:12 +08:00
|
|
|
IRC
|
|
|
|
===
|
2009-01-07 22:54:24 +08:00
|
|
|
|
|
|
|
Discussion of Btrfs also occurs on the #btrfs channel of the Freenode
|
|
|
|
IRC network.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
UTILITIES
|
|
|
|
=========
|
|
|
|
|
|
|
|
Userspace tools for creating and manipulating Btrfs file systems are
|
|
|
|
available from the git repository at the following location:
|
|
|
|
|
2011-11-17 00:35:37 +08:00
|
|
|
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
|
2009-01-07 22:54:24 +08:00
|
|
|
|
|
|
|
These include the following tools:
|
|
|
|
|
|
|
|
mkfs.btrfs: create a filesystem
|
|
|
|
|
|
|
|
btrfsctl: control program to create snapshots and subvolumes:
|
|
|
|
|
|
|
|
mount /dev/sda2 /mnt
|
|
|
|
btrfsctl -s new_subvol_name /mnt
|
|
|
|
btrfsctl -s snapshot_of_default /mnt/default
|
|
|
|
btrfsctl -s snapshot_of_new_subvol /mnt/new_subvol_name
|
|
|
|
btrfsctl -s snapshot_of_a_snapshot /mnt/snapshot_of_new_subvol
|
|
|
|
ls /mnt
|
|
|
|
default snapshot_of_a_snapshot snapshot_of_new_subvol
|
|
|
|
new_subvol_name snapshot_of_default
|
|
|
|
|
|
|
|
Snapshots and subvolumes cannot be deleted right now, but you can
|
|
|
|
rm -rf all the files and directories inside them.
|
|
|
|
|
|
|
|
btrfsck: do a limited check of the FS extent trees.
|
|
|
|
|
|
|
|
btrfs-debug-tree: print all of the FS metadata in text form. Example:
|
|
|
|
|
|
|
|
btrfs-debug-tree /dev/sda2 >& big_output_file
|