learnxinyminutes-docs/zfs.html.markdown

474 lines
16 KiB
Markdown
Raw Normal View History

2015-10-17 08:47:49 +00:00
---
category: tool
2024-10-20 21:46:35 +00:00
tool: ZFS
2015-10-17 08:47:49 +00:00
contributors:
- ["sarlalian", "http://github.com/sarlalian"]
2024-05-13 06:46:20 +00:00
- ["81reap", "https://github.com/81reap"]
- ["A1EF", "https://github.com/A1EF"]
2015-10-17 08:47:49 +00:00
filename: LearnZfs.txt
---
[ZFS](http://open-zfs.org/wiki/Main_Page)
is a rethinking of the storage stack, combining traditional file systems as well as volume
2016-02-12 15:15:27 +00:00
managers into one cohesive tool. ZFS has some specific terminology that sets it apart from
2015-10-17 08:47:49 +00:00
more traditional storage systems, however it has a great set of features with a focus on
usability for systems administrators.
## ZFS Concepts
### Virtual Devices
2024-06-05 01:34:29 +00:00
A VDEV (Virtual Device) in ZFS is analogous to a RAID device and similarly offers different
2024-05-13 06:46:20 +00:00
benefits in terms of redundancy and performance. In general VDEV's offer better reliability
and safety than a RAID card. It is discouraged to use a RAID setup with ZFS, as ZFS expects
to directly manage the underlying disks.
2024-05-13 06:46:20 +00:00
| VDEV Type | Similar RAID | Notes |
|-----------|----------------|---------------------------------------|
| Mirror | RAID 1 | Supports n-way mirroring for redundancy. |
| raidz1 | RAID 5 | Single disk parity, offering fault tolerance of one disk failure. |
| raidz2 | RAID 6 | Two-disk parity, can tolerate two disk failures. |
| raidz3 | - | Three-disk parity, can tolerate three disk failures. |
| Disk | - | Represents a single physical disk in a VDEV. |
| File | - | File-based VDEV, not recommended for production as it adds complexity and reduces reliability. |
2015-10-17 08:47:49 +00:00
2024-05-13 06:46:20 +00:00
Data in a ZFS storage pool is striped across all VDEVs. Adding more VDEVs, Logs, or Caches
can increase IOPS (Input/Output Operations Per Second), enhancing performance. It's crucial
to balance VDEVs for optimal performance and redundancy.
2015-10-17 08:47:49 +00:00
### Storage Pools
ZFS uses Storage Pools as an abstraction over the lower level storage provider (VDEV), allow
2016-02-12 15:15:27 +00:00
you to separate the user visible file system from the physical layout.
2015-10-17 08:47:49 +00:00
### ZFS Dataset
2016-02-12 15:15:27 +00:00
ZFS datasets are analogous to traditional filesystems but with many more features. They
2015-10-17 08:47:49 +00:00
provide many of ZFS's advantages. Datasets support [Copy on Write](https://en.wikipedia.org/wiki/Copy-on-write)
2016-02-12 15:15:27 +00:00
snapshots, quota's, compression and de-duplication.
2015-10-17 08:47:49 +00:00
### Limits
One directory may contain up to 2^48 files, up to 16 exabytes each. A single storage pool
can contain up to 256 zettabytes (2^78) of space, and can be striped across 2^64 devices. A
single host can have 2^64 storage pools. The limits are huge.
## Commands
### Storage Pools
Actions:
2015-10-17 08:47:49 +00:00
* List
* Status
* Destroy
* Get/Set properties
List zpools
```bash
2016-02-12 15:15:27 +00:00
# Create a raidz zpool
$ zpool create zroot raidz1 gpt/zfs0 gpt/zfs1 gpt/zfs2
2015-10-17 08:47:49 +00:00
# List ZPools
$ zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 141G 106G 35.2G - 43% 75% 1.00x ONLINE -
# List detailed information about a specific zpool
$ zpool list -v zroot
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 141G 106G 35.2G - 43% 75% 1.00x ONLINE -
gptid/c92a5ccf-a5bb-11e4-a77d-001b2172c655 141G 106G 35.2G - 43% 75%
```
Status of zpools
```bash
# Get status information about zpools
$ zpool status
pool: zroot
state: ONLINE
scan: scrub repaired 0 in 2h51m with 0 errors on Thu Oct 1 07:08:31 2015
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
gptid/c92a5ccf-a5bb-11e4-a77d-001b2172c655 ONLINE 0 0 0
errors: No known data errors
# Scrubbing a zpool to correct any errors
$ zpool scrub zroot
$ zpool status -v zroot
pool: zroot
state: ONLINE
scan: scrub in progress since Thu Oct 15 16:59:14 2015
39.1M scanned out of 106G at 1.45M/s, 20h47m to go
0 repaired, 0.04% done
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
gptid/c92a5ccf-a5bb-11e4-a77d-001b2172c655 ONLINE 0 0 0
errors: No known data errors
```
Properties of zpools
```bash
# Getting properties from the pool properties can be user set or system provided.
$ zpool get all zroot
NAME PROPERTY VALUE SOURCE
zroot size 141G -
zroot capacity 75% -
zroot altroot - default
zroot health ONLINE -
...
# Setting a zpool property
$ zpool set comment="Storage of mah stuff" zroot
$ zpool get comment
NAME PROPERTY VALUE SOURCE
tank comment - default
zroot comment Storage of mah stuff local
```
Remove zpool
```bash
$ zpool destroy test
```
### Datasets
Actions:
2015-10-17 08:47:49 +00:00
* Create
* List
* Rename
* Delete
* Get/Set properties
Create datasets
```bash
# Create dataset
$ zfs create zroot/root/data
2015-10-17 08:47:49 +00:00
$ mount | grep data
zroot/root/data on /data (zfs, local, nfsv4acls)
2015-10-17 08:47:49 +00:00
# Create child dataset
$ zfs create zroot/root/data/stuff
2015-10-17 08:47:49 +00:00
$ mount | grep data
zroot/root/data on /data (zfs, local, nfsv4acls)
zroot/root/data/stuff on /data/stuff (zfs, local, nfsv4acls)
2015-10-17 08:47:49 +00:00
# Create Volume
$ zfs create -V zroot/win_vm
$ zfs list zroot/win_vm
NAME USED AVAIL REFER MOUNTPOINT
zroot/win_vm 4.13G 17.9G 64K -
2015-10-17 08:47:49 +00:00
```
List datasets
```bash
# List all datasets
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
zroot 106G 30.8G 144K none
zroot/ROOT 18.5G 30.8G 144K none
zroot/ROOT/10.1 8K 30.8G 9.63G /
zroot/ROOT/default 18.5G 30.8G 11.2G /
zroot/backup 5.23G 30.8G 144K none
zroot/home 288K 30.8G 144K none
...
# List a specific dataset
$ zfs list zroot/home
NAME USED AVAIL REFER MOUNTPOINT
zroot/home 288K 30.8G 144K none
# List snapshots
$ zfs list -t snapshot
zroot@daily-2015-10-15 0 - 144K -
zroot/ROOT@daily-2015-10-15 0 - 144K -
zroot/ROOT/default@daily-2015-10-15 0 - 24.2G -
zroot/tmp@daily-2015-10-15 124K - 708M -
zroot/usr@daily-2015-10-15 0 - 144K -
zroot/home@daily-2015-10-15 0 - 11.9G -
zroot/var@daily-2015-10-15 704K - 1.42G -
zroot/var/log@daily-2015-10-15 192K - 828K -
zroot/var/tmp@daily-2015-10-15 0 - 152K -
```
Rename datasets
```bash
$ zfs rename zroot/root/home zroot/root/old_home
$ zfs rename zroot/root/new_home zroot/root/home
2015-10-17 08:47:49 +00:00
```
Delete dataset
```bash
# Datasets cannot be deleted if they have any snapshots
$ zfs destroy zroot/root/home
2015-10-17 08:47:49 +00:00
```
Get / set properties of a dataset
```bash
# Get all properties
$ zfs get all zroot/usr/home
NAME PROPERTY VALUE SOURCE
zroot/home type filesystem -
zroot/home creation Mon Oct 20 14:44 2014 -
zroot/home used 11.9G -
zroot/home available 94.1G -
zroot/home referenced 11.9G -
2015-10-17 08:47:49 +00:00
zroot/home mounted yes -
...
# Get property from dataset
$ zfs get compression zroot/usr/home
NAME PROPERTY VALUE SOURCE
zroot/home compression off default
# Set property on dataset
$ zfs set compression=lz4 zroot/lamb
2015-10-17 08:47:49 +00:00
# Get a set of properties from all datasets
$ zfs list -o name,quota,reservation
NAME QUOTA RESERV
zroot none none
zroot/ROOT none none
zroot/ROOT/default none none
zroot/tmp none none
zroot/usr none none
zroot/home none none
zroot/var none none
...
```
2024-05-13 06:46:20 +00:00
### Write Log Pool
2024-06-05 01:34:29 +00:00
The ZFS Intent Log (ZIL) is a write log designed to speed up synchronous writes. This is
2024-05-13 06:46:20 +00:00
typically a faster drive or drive partition than the larger storage pools.
```bash
# Add a log pool
$ zpool add mypool/lamb log /dev/sdX
2024-06-05 01:34:29 +00:00
# Check the configuration
2024-05-13 06:46:20 +00:00
$ zpool status mypool/lamb
```
### Read Cache Pool
The Level 2 Adaptive Replacement Cache (L2ARC) extends the primary ARC (in-RAM cache) and is
used for read caching. This is typically a faster drive or drive partition than the larger
storage pools.
```bash
# Add a cache pool
$ zpool add mypool/lamb cache /dev/sdY
2024-06-05 01:34:29 +00:00
# Check the configuration
2024-05-13 06:46:20 +00:00
$ zpool status mypool/lamb
```
### Data Compression
2024-06-05 01:34:29 +00:00
Data compression reduces the amount of space data occupies on disk in exchange for some extra
2024-05-13 06:46:20 +00:00
CPU usage. When enabled, it can enhance performance by reducing the amount of disk I/O. It
especially beneficial on systems with more CPU resources than disk bandwidth.
```bash
# Get compression options
$ zfs get -help
...
compression NO YES on | off | lzjb | gzip | gzip-[1-9] | zle | lz4 | zstd | zstd-[1-19] | zstd-fast | zstd-fast-[1-10,20,30,40,50,60,70,80,90,100,500,1000]
...
# Set compression
$ zfs set compression=on mypool/lamb
2024-06-05 01:34:29 +00:00
# Check the configuration
2024-05-13 06:46:20 +00:00
$ zpool get compression mypool/lamb
```
### Encryption at Rest
Encryption allows data to be encrypted on the device at the cost of extra CPU cycles. This
2024-06-05 01:34:29 +00:00
property can only be set when a dataset is being created.
2024-05-13 06:46:20 +00:00
```bash
# Enable encryption on the pool
$ zpool set feature@encryption=enabled black_hole
# Create an encrypted dataset with a prompt
$ zfs create -o encryption=on -o keyformat=passphrase black_hole/enc
2024-06-05 01:34:29 +00:00
# Check the configuration
2024-05-13 06:46:20 +00:00
$ zfs get encryption black_hole/enc
```
It should be noted that there are parts of the system where the data is not encrypted. See
the table below for a breakdown.
| Component | Encrypted | Notes |
|----------------------|-------------------------------------------|------------------------------------------------------|
| Main Data Storage | Yes | Data in datasets/volumes is encrypted. |
| ZFS Intent Log (ZIL) | Yes | Synchronous write requests are encrypted. |
| L2ARC (Cache) | Yes | Cached data is stored in an encrypted form. |
| RAM (ARC) | No | Data in the primary ARC, in RAM, is not encrypted. |
| Swap Area | Conditional | Encrypted if the ZFS swap dataset is encrypted. |
| ZFS Metadata | Yes | Metadata is encrypted for encrypted datasets. |
| Snapshot Data | Yes | Snapshots of encrypted datasets are also encrypted. |
| ZFS Send/Receive | Conditional | Encrypted during send/receive if datasets are encrypted and `-w` flag is used. |
2015-10-17 08:47:49 +00:00
### Snapshots
ZFS snapshots are one of the things about zfs that are a really big deal
* The space they take up is equal to the difference in data between the filesystem and its snapshot
* Creation time is only seconds
* Recovery is as fast as you can write data.
* They are easy to automate.
Actions:
2015-10-17 08:47:49 +00:00
* Create
* Delete
* Rename
* Access snapshots
* Send / Receive
* Clone
Create snapshots
```bash
# Create a snapshot of a single dataset
zfs snapshot zroot/home/sarlalian@now
2015-10-17 08:47:49 +00:00
# Create a snapshot of a dataset and its children
$ zfs snapshot -r zroot/home@now
2015-10-17 08:47:49 +00:00
$ zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zroot/home@now 0 - 26K -
zroot/home/sarlalian@now 0 - 259M -
zroot/home/alice@now 0 - 156M -
zroot/home/bob@now 0 - 156M -
2015-10-17 08:47:49 +00:00
...
```
2015-10-17 08:47:49 +00:00
Destroy snapshots
```bash
# How to destroy a snapshot
$ zfs destroy zroot/home/sarlalian@now
2015-10-17 08:47:49 +00:00
# Delete a snapshot on a parent dataset and its children
$ zfs destroy -r zroot/home/sarlalian@now
2015-10-17 08:47:49 +00:00
```
Renaming Snapshots
```bash
# Rename a snapshot
$ zfs rename zroot/home/sarlalian@now zroot/home/sarlalian@today
$ zfs rename zroot/home/sarlalian@now today
2015-10-17 08:47:49 +00:00
$ zfs rename -r zroot/home@now @yesterday
2015-10-17 08:47:49 +00:00
```
Accessing snapshots
```bash
# CD into a snapshot directory
2015-10-17 08:47:49 +00:00
$ cd /home/.zfs/snapshot/
```
Sending and Receiving
```bash
# Backup a snapshot to a file
$ zfs send zroot/home/sarlalian@now | gzip > backup_file.gz
2015-10-17 08:47:49 +00:00
# Send a snapshot to another dataset
$ zfs send zroot/home/sarlalian@now | zfs recv backups/home/sarlalian
2015-10-17 08:47:49 +00:00
# Send a snapshot to a remote host
$ zfs send zroot/home/sarlalian@now | ssh root@backup_server 'zfs recv zroot/home/sarlalian'
2015-10-17 08:47:49 +00:00
# Send full dataset with snapshots to new host
$ zfs send -v -R zroot/home@now | ssh root@backup_server 'zfs recv zroot/home'
2015-10-17 08:47:49 +00:00
```
Cloning Snapshots
2015-10-17 08:47:49 +00:00
```bash
# Clone a snapshot
$ zfs clone zroot/home/sarlalian@now zroot/home/sarlalian_new
2015-10-17 08:47:49 +00:00
# Promoting the clone so it is no longer dependent on the snapshot
$ zfs promote zroot/home/sarlalian_new
2015-10-17 08:47:49 +00:00
```
### Putting it all together
2016-02-12 15:15:27 +00:00
This following a script utilizing FreeBSD, jails and ZFS to automate
2024-06-05 01:34:29 +00:00
provisioning a clean copy of a MySQL staging database from a live replication
slave.
```bash
#!/bin/sh
echo "==== Stopping the staging database server ===="
jail -r staging
echo "==== Cleaning up existing staging server and snapshot ===="
zfs destroy -r zroot/jails/staging
zfs destroy zroot/jails/slave@staging
echo "==== Quiescing the slave database ===="
echo "FLUSH TABLES WITH READ LOCK;" | /usr/local/bin/mysql -u root -pmyrootpassword -h slave
echo "==== Snapshotting the slave db filesystem as zroot/jails/slave@staging ===="
zfs snapshot zroot/jails/slave@staging
echo "==== Starting the slave database server ===="
jail -c slave
echo "==== Cloning the slave snapshot to the staging server ===="
zfs clone zroot/jails/slave@staging zroot/jails/staging
echo "==== Installing the staging mysql config ===="
mv /jails/staging/usr/local/etc/my.cnf /jails/staging/usr/local/etc/my.cnf.slave
cp /jails/staging/usr/local/etc/my.cnf.staging /jails/staging/usr/local/etc/my.cnf
echo "==== Setting up the staging rc.conf file ===="
mv /jails/staging/etc/rc.conf.local /jails/staging/etc/rc.conf.slave
mv /jails/staging/etc/rc.conf.staging /jails/staging/etc/rc.conf.local
echo "==== Starting the staging db server ===="
jail -c staging
2016-02-12 15:15:27 +00:00
echo "==== Makes the staging database not pull from the master ===="
echo "STOP SLAVE;" | /usr/local/bin/mysql -u root -pmyrootpassword -h staging
echo "RESET SLAVE;" | /usr/local/bin/mysql -u root -pmyrootpassword -h staging
```
### Additional Reading
* [BSDNow's Crash Course on ZFS](http://www.bsdnow.tv/tutorials/zfs)
2015-10-19 07:02:07 +00:00
* [FreeBSD Handbook on ZFS](https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs.html)
* [BSDNow's Crash Course on ZFS](http://www.bsdnow.tv/tutorials/zfs)
* [Oracle's Tuning Guide](http://www.oracle.com/technetwork/articles/servers-storage-admin/sto-recommended-zfs-settings-1951715.html)
* [OpenZFS Tuning Guide](http://open-zfs.org/wiki/Performance_tuning)
* [FreeBSD ZFS Tuning Guide](https://wiki.freebsd.org/ZFSTuningGuide)