documentation:zfs
Differences
This shows you the differences between two versions of the page.
| documentation:zfs [2020/02/01 12:35] – external edit 127.0.0.1 | documentation:zfs [2020/02/01 13:01] (current) – removed lucid | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ======ZFS/ | ||
| - | Sources: https:// | ||
| - | https:// | ||
| - | https:// | ||
| - | ===== ZFS ===== | ||
| - | |||
| - | ZFS support was added to Ubuntu Wily 15.10 as a technology preview and comes fully supported in Ubuntu Xenial 16.04. | ||
| - | |||
| - | A minimum of 2 GB of free memory is required to run ZFS, however it is recommended to use ZFS on a system with at least 8 GB of memory. | ||
| - | |||
| - | To install ZFS, use: | ||
| - | |||
| - | < | ||
| - | sudo apt install zfsutils-linux | ||
| - | </ | ||
| - | |||
| - | Below is a quick overview of ZFS, this is intended as a getting started primer. | ||
| - | |||
| - | ===== NOTE ===== | ||
| - | |||
| - | **For the sake of brevity, devices in this document are referred to as /dev/sda /dev/sdb etc. One should avoid this and use a full device name path using / | ||
| - | |||
| - | ===== Quick Setup ===== | ||
| - | Much like mounting disks with UUIDs in fstab, using the disk id is a much more reliable way to keep track of the disks. There are other ways, noted [[https:// | ||
| - | |||
| - | < | ||
| - | $ ls -lh / | ||
| - | total 0 | ||
| - | lrwxrwxrwx 1 root root 9 Oct 26 09:04 ata-HGST_HUS726060ALE610_######## | ||
| - | lrwxrwxrwx 1 root root 9 Oct 26 09:04 ata-HGST_HUS726060ALE610_######## | ||
| - | lrwxrwxrwx 1 root root 9 Oct 26 09:04 ata-HGST_HUS726060ALE614_######## | ||
| - | lrwxrwxrwx 1 root root 9 Oct 26 09:04 ata-HGST_HUS726060ALE614_######## | ||
| - | </ | ||
| - | |||
| - | ==== Creating the pool ==== | ||
| - | **-f** forces the pool to be created even if there are existing filesystems on the devices | ||
| - | \\ | ||
| - | **-m** specifies the mount point of the pool | ||
| - | \\ | ||
| - | **raidz** this can be specified as; mirror, raidz, raidz2, or raidz3 | ||
| - | |||
| - | At pool creation, ashift=12 should always be used, except with SSDs that have 8k sectors where ashift=13 is correct. A vdev of 512 byte disks using 4k sectors will not experience performance issues, but a 4k disk using 512 byte sectors will. Since ashift cannot be changed after pool creation, even a pool with only 512 byte disks should use 4k because those disks may need to be replaced with 4k disks or the pool may be expanded by adding a vdev composed of 4k disks. Because correct detection of 4k disks is not reliable, '' | ||
| - | |||
| - | < | ||
| - | sudo zpool create -f -o ashift=12 -m / | ||
| - | </ | ||
| - | |||
| - | Check the status of the pool | ||
| - | |||
| - | < | ||
| - | # zpool status | ||
| - | </ | ||
| - | |||
| - | Output: | ||
| - | |||
| - | < | ||
| - | lucid@shiro: | ||
| - | pool: bastion | ||
| - | | ||
| - | scan: none requested | ||
| - | config: | ||
| - | |||
| - | NAME | ||
| - | bastion | ||
| - | raidz1-0 | ||
| - | ata-HGST_HUS726060ALE610_######## | ||
| - | ata-HGST_HUS726060ALE610_######## | ||
| - | ata-HGST_HUS726060ALE614_######## | ||
| - | ata-HGST_HUS726060ALE614_######## | ||
| - | |||
| - | errors: No known data errors | ||
| - | </ | ||
| - | |||
| - | Check the configuration of the pool, this also shows the total available size of the pool. | ||
| - | < | ||
| - | # zpool get all <pool name> | ||
| - | </ | ||
| - | |||
| - | ==== Create filesystems ==== | ||
| - | Filesystems are individual folders on the root of the pool, more information on pools, filesystems, | ||
| - | |||
| - | Example: | ||
| - | < | ||
| - | # zfs create bastion/ | ||
| - | </ | ||
| - | |||
| - | The created filesystem will be owned by root so the file will need permissions changed to the user of choice. | ||
| - | |||
| - | < | ||
| - | # chown lucid:lucid / | ||
| - | </ | ||
| - | |||
| - | ==== Automatic scrubbing ==== | ||
| - | |||
| - | Using a systemd timer/ | ||
| - | |||
| - | < | ||
| - | / | ||
| - | ----------------------------------------------- | ||
| - | [Unit] | ||
| - | Description=Monthly zpool scrub on %i | ||
| - | |||
| - | [Timer] | ||
| - | OnCalendar=monthly | ||
| - | AccuracySec=1h | ||
| - | Persistent=true | ||
| - | |||
| - | [Install] | ||
| - | WantedBy=multi-user.target | ||
| - | </ | ||
| - | |||
| - | < | ||
| - | / | ||
| - | ----------------------------------------------- | ||
| - | [Unit] | ||
| - | Description=zpool scrub on %i | ||
| - | |||
| - | [Service] | ||
| - | Nice=19 | ||
| - | IOSchedulingClass=idle | ||
| - | KillSignal=SIGINT | ||
| - | ExecStart=/ | ||
| - | </ | ||
| - | |||
| - | Enable/ | ||
| - | |||
| - | Unmounting a pool is weird | ||
| - | < | ||
| - | # zpool export <pool name> | ||
| - | </ | ||
| - | |||
| - | ===== ZFS Virtual Devices (ZFS VDEVs) ===== | ||
| - | |||
| - | A VDEV is a meta-device that can represent one or more devices. ZFS supports | ||
| - | 7 different types of VDEV: | ||
| - | |||
| - | * File - a pre-allocated file | ||
| - | * Physical Drive (HDD, SDD, PCIe NVME, etc) | ||
| - | * Mirror - a standard RAID1 mirror | ||
| - | * ZFS software raidz1, raidz2, raidz3 ' | ||
| - | * Hot Spare - hot spare for ZFS software raid. | ||
| - | * Cache - a device for level 2 adaptive read cache (ZFS L2ARC) | ||
| - | * Log - ZFS Intent Log (ZFS ZIL) | ||
| - | |||
| - | VDEVS are dynamically striped by ZFS. A device can be added to a VDEV, but cannot | ||
| - | be removed from it. | ||
| - | |||
| - | ===== ZFS Pools ===== | ||
| - | |||
| - | A zpool is a pool of storage made from a collection of VDEVS. One or more | ||
| - | ZFS file systems can be created from a ZFS pool. | ||
| - | |||
| - | In the following example, a pool named " | ||
| - | physical drives: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create pool-test /dev/sdb /dev/sdc /dev/sdd | ||
| - | </ | ||
| - | |||
| - | Striping is performed dynamically, | ||
| - | pool. | ||
| - | |||
| - | ''' | ||
| - | you should probably prefer / | ||
| - | of drives. The examples here should not suggest that ' | ||
| - | They merely make examples herein easier to read. | ||
| - | |||
| - | One can see the status of the pool using the following command: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool status pool-test | ||
| - | </ | ||
| - | |||
| - | ...and destroy it using: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool destroy pool-test | ||
| - | </ | ||
| - | |||
| - | ==== A 2 x 2 mirrored zpool example ==== | ||
| - | |||
| - | The following example, we create a zpool containing a VDEV of 2 drives in a mirror: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create mypool mirror /dev/sdc /dev/sdd | ||
| - | </ | ||
| - | |||
| - | next, we add another VDEV of 2 drives in a mirror to the pool: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool add mypool mirror /dev/sde /dev/sdf -f | ||
| - | |||
| - | $ sudo zpool status | ||
| - | pool: mypool | ||
| - | | ||
| - | scan: none requested | ||
| - | config: | ||
| - | |||
| - | NAME STATE READ WRITE CKSUM | ||
| - | mypool | ||
| - | mirror-0 | ||
| - | sdc | ||
| - | sdd | ||
| - | mirror-1 | ||
| - | sde | ||
| - | sdf | ||
| - | </ | ||
| - | |||
| - | In this example: | ||
| - | * /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf are the physical devices | ||
| - | * mirror-0, mirror-1 are the VDEVs | ||
| - | * mypool is the pool | ||
| - | |||
| - | There are plenty of other ways to arrange VDEVs to create a zpool. | ||
| - | |||
| - | ==== A single file based zpool example ==== | ||
| - | |||
| - | In the following example, we use a single 2GB file as a VDEV and make a zpool from just this one VDEV: | ||
| - | |||
| - | < | ||
| - | $ dd if=/ | ||
| - | $ sudo zpool create pool-test / | ||
| - | $ sudo zpool status | ||
| - | pool: pool-test | ||
| - | | ||
| - | scan: none requested | ||
| - | config: | ||
| - | |||
| - | NAME STATE READ WRITE CKSUM | ||
| - | pool-test | ||
| - | / | ||
| - | </ | ||
| - | |||
| - | In this example: | ||
| - | * / | ||
| - | * pool-test is the pool | ||
| - | |||
| - | ===== RAID ===== | ||
| - | |||
| - | ZFS offers different RAID options: | ||
| - | |||
| - | ==== Striped VDEVS ==== | ||
| - | |||
| - | This is equivalent to RAID0. This has no parity and no mirroring to rebuild the data. This is not recommended because of the risk of losing data if a drive fails. Example, creating a striped pool using 4 VDEVs: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create example /dev/sdb /dev/sdc /dev/sdd /dev/sde | ||
| - | </ | ||
| - | |||
| - | ==== Mirrored VDEVs ==== | ||
| - | |||
| - | Much like RAID1, one can use 2 or more VDEVs. For N VDEVs, one will have to have N-1 fail before data is lost. Example, creating mirrored pool with 2 VDEVs | ||
| - | |||
| - | $ sudo zpool create example mirror /dev/sdb /dev/sdc | ||
| - | |||
| - | ==== Striped Mirrored VDEVs ==== | ||
| - | |||
| - | Much like RAID10, great for small random read I/O. Create mirrored pairs and then stripe data over the mirrors. Example, creating striped 2 x 2 mirrored pool: | ||
| - | |||
| - | < | ||
| - | sudo zpool create example mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde | ||
| - | </ | ||
| - | |||
| - | or: | ||
| - | |||
| - | < | ||
| - | sudo zpool create example mirror /dev/sdb /dev/sdc | ||
| - | sudo zpool add example mirror /dev/sdd /dev/sde | ||
| - | </ | ||
| - | |||
| - | ==== RAIDZ ==== | ||
| - | |||
| - | Like RAID5, this uses a variable width strip for parity. | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create example raidz /dev/sdb /dev/sdc /dev/sdd /dev/sde | ||
| - | </ | ||
| - | |||
| - | ==== RAIDZ2 ==== | ||
| - | |||
| - | Like RAID6, with double the parity for 2 disk failures with performance similar to RAIDZ. Example, create a 2 parity 5 VDEV pool: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create example raidz2 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf | ||
| - | </ | ||
| - | |||
| - | ==== RAIDZ3 ==== | ||
| - | |||
| - | 3 parity bits, allowing for 3 disk failures before losing data with performance like RAIDZ2 and RAIDZ. Example, create a 3 parity 6 VDEV pool: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create example raidz3 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg | ||
| - | </ | ||
| - | |||
| - | ==== Nested RAIDZ ==== | ||
| - | |||
| - | Like RAID50, RAID60, striped RAIDZ volumes. This is better performing than RAIDZ but at the cost of reducing capacity. | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create example raidz /dev/sdb /dev/sdc /dev/sdd /dev/sde | ||
| - | $ sudo zpool add example raidz /dev/sdf /dev/sdg /dev/sdh /dev/sdi | ||
| - | </ | ||
| - | |||
| - | ===== ZFS Intent Logs ===== | ||
| - | |||
| - | ZIL (ZFS Intent Log) drives can be added to a ZFS pool to speed up the | ||
| - | write capabilities of any level of ZFS RAID. One normally would use | ||
| - | a fast SSD for the ZIL. Conceptually, | ||
| - | to be the written is stored, then later flushed as a transactional write. In reality, the ZIL is | ||
| - | more complex than this and [[http:// | ||
| - | One or more drives can be used for the ZIL. | ||
| - | |||
| - | For example, to add a SSDs to the pool ' | ||
| - | |||
| - | < | ||
| - | $ sudo zpool add mypool log /dev/sdg -f | ||
| - | </ | ||
| - | |||
| - | ===== ZFS Cache Drives ===== | ||
| - | |||
| - | Cache devices provide an additional layer of caching between main memory and disk. | ||
| - | They are especially useful to improve random-read performance of mainly static data. | ||
| - | |||
| - | Fox example, to add a cache drive /dev/sdh to the pool ' | ||
| - | |||
| - | < | ||
| - | $ sudo zpool add mypool cache /dev/sdh -f | ||
| - | </ | ||
| - | |||
| - | |||
| - | ===== ZFS file systems ===== | ||
| - | |||
| - | ZFS allows one to create a maximum of 2^64 file systems per pool. In the following | ||
| - | example, we create two file systems in the pool ' | ||
| - | |||
| - | < | ||
| - | sudo zfs create mypool/tmp | ||
| - | sudo zfs create mypool/ | ||
| - | </ | ||
| - | |||
| - | and to destroy a file system, use: | ||
| - | |||
| - | < | ||
| - | sudo zfs destroy mypool/tmp | ||
| - | </ | ||
| - | |||
| - | Each ZFS file systems can have properties set, for example, setting a maximum | ||
| - | quota of 10 gigabytes: | ||
| - | |||
| - | < | ||
| - | sudo zfs set quota=10G mypool/ | ||
| - | </ | ||
| - | |||
| - | or adding using compression: | ||
| - | |||
| - | < | ||
| - | sudo zfs set compression=on mypool/ | ||
| - | </ | ||
| - | |||
| - | ===== ZFS Snapshots ===== | ||
| - | |||
| - | A ZFS snapshot is a read-only copy of ZFS file system or volume. It can be used | ||
| - | to save the state of a ZFS file system at a point of time, and one can roll back | ||
| - | to this state at a later date. One can even extract files from a snapshot and not | ||
| - | need to perform a complete roll back. | ||
| - | |||
| - | In the following example, we snapshot the mypool/ | ||
| - | |||
| - | < | ||
| - | $ sudo zfs snapshot -r mypool/ | ||
| - | </ | ||
| - | |||
| - | ..and we can see the collection of snapshots using: | ||
| - | |||
| - | < | ||
| - | $ sudo zfs list -t snapshot | ||
| - | NAME | ||
| - | mypool/ | ||
| - | </ | ||
| - | |||
| - | Now lets ' | ||
| - | |||
| - | < | ||
| - | $ rm -rf / | ||
| - | $ sudo zfs rollback mypool/ | ||
| - | </ | ||
| - | |||
| - | One can remove a snapshot using the following: | ||
| - | |||
| - | < | ||
| - | $ sudo zfs destroy mypool/ | ||
| - | </ | ||
| - | |||
| - | |||
| - | ===== ZFS Clones ===== | ||
| - | |||
| - | A ZFS clone is a writeable copy of a file system with the initial content of the | ||
| - | clone being identical to the original file system. | ||
| - | from a ZFS snapshot and the snapshot cannot be destroyed until the clones created | ||
| - | from it are also destroyed. | ||
| - | |||
| - | For example, to clone mypool/ | ||
| - | |||
| - | < | ||
| - | $ sudo zfs snapshot -r mypool/ | ||
| - | $ sudo zfs clone mypool/ | ||
| - | </ | ||
| - | |||
| - | |||
| - | ===== ZFS Send and Receive ===== | ||
| - | |||
| - | ZFS send sends a snapshot of a filesystem that can be streamed to a file or | ||
| - | to another machine. | ||
| - | of the snapshot back as a ZFS filesystem. This is great for backups or sending copies | ||
| - | over the network (e.g. using ssh) to copy a file system. | ||
| - | |||
| - | For example, make a snapshot and save it to a file: | ||
| - | |||
| - | < | ||
| - | sudo zfs snapshot -r mypool/ | ||
| - | sudo zfs send mypool/ | ||
| - | </ | ||
| - | |||
| - | ..and receive it back: | ||
| - | |||
| - | < | ||
| - | sudo zfs receive -F mypool/ | ||
| - | </ | ||
| - | |||
| - | ===== ZFS Ditto Blocks ===== | ||
| - | |||
| - | Ditto blocks create more redundant copies of data to copy, just for more | ||
| - | added redundancy. With a storage pool of just one device, ditto blocks are | ||
| - | spread across the device, trying to place the blocks at least 1/8 of the disk | ||
| - | apart. | ||
| - | across separate VDEVs. 1 to 3 copies can be can be set. For example, setting | ||
| - | 3 copies on mypool/ | ||
| - | |||
| - | < | ||
| - | $ sudo zfs set copies=3 mypool/ | ||
| - | </ | ||
| - | |||
| - | ===== ZFS Deduplication ===== | ||
| - | |||
| - | ZFS dedup will discard blocks that are identical to existing blocks and will | ||
| - | instead use a reference to the existing block. | ||
| - | but comes at a large cost to memory. | ||
| - | per block. The greater the table is in size, the slower write performance becomes. | ||
| - | |||
| - | For example, enable dedup on mypool/ | ||
| - | |||
| - | < | ||
| - | $ sudo zfs set dedup=on mypool/ | ||
| - | </ | ||
| - | |||
| - | For more pros/cons of deduping, refer to | ||
| - | [[http:// | ||
| - | is almost never worth the performance penalty. | ||
| - | |||
| - | ===== ZFS Pool Scrubbing ===== | ||
| - | |||
| - | To initiate an explicit data integrity check on a pool one uses the zfs scrub command. For | ||
| - | example, to scrub pool ' | ||
| - | |||
| - | < | ||
| - | $ sudo zpool scrub mypool | ||
| - | </ | ||
| - | |||
| - | one can check the status of the scrub using zpool status, for example: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool status -v mypool | ||
| - | </ | ||
| - | |||
| - | |||
| - | ===== Data recovery, a simple example ===== | ||
| - | |||
| - | Let's assume we have a 2 x 2 mirror' | ||
| - | |||
| - | < | ||
| - | $ sudo zpool create mypool mirror /dev/sdc /dev/sdd mirror /dev/sde /dev/sdf -f | ||
| - | $ sudo zpool status | ||
| - | pool: mypool | ||
| - | | ||
| - | scan: none requested | ||
| - | config: | ||
| - | |||
| - | NAME STATE READ WRITE CKSUM | ||
| - | mypool | ||
| - | mirror-0 | ||
| - | sdc | ||
| - | sdd | ||
| - | mirror-1 | ||
| - | sde | ||
| - | sdf | ||
| - | </ | ||
| - | |||
| - | Now populate it with some data and check sum the data: | ||
| - | |||
| - | < | ||
| - | $ dd if=/ | ||
| - | $ md5sum / | ||
| - | f0ca5a6e2718b8c98c2e0fdabd83d943 | ||
| - | </ | ||
| - | |||
| - | Now we simulate catastrophic data loss by overwriting one of the | ||
| - | VDEV devices with zeros: | ||
| - | |||
| - | < | ||
| - | $ sudo dd if=/ | ||
| - | </ | ||
| - | |||
| - | And now initiate a scrub: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool scrub mypool | ||
| - | </ | ||
| - | |||
| - | And check the status: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool status | ||
| - | pool: mypool | ||
| - | | ||
| - | status: One or more devices has experienced an unrecoverable error. | ||
| - | attempt was made to correct the error. | ||
| - | action: Determine if the device needs to be replaced, and clear the errors | ||
| - | using 'zpool clear' or replace the device with 'zpool replace' | ||
| - | see: http:// | ||
| - | scan: scrub in progress since Tue May 12 17:34:53 2015 | ||
| - | 244M scanned out of 1.91G at 61.0M/s, 0h0m to go | ||
| - | 115M repaired, 12.46% done | ||
| - | config: | ||
| - | |||
| - | NAME STATE READ WRITE CKSUM | ||
| - | mypool | ||
| - | mirror-0 | ||
| - | sdc | ||
| - | sdd | ||
| - | mirror-1 | ||
| - | sde | ||
| - | sdf | ||
| - | </ | ||
| - | |||
| - | ...now let us remove the drive from the pool: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool detach mypool /dev/sde | ||
| - | </ | ||
| - | |||
| - | ..hot swap it out and add a new one back: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool attach mypool /dev/sdf / | ||
| - | </ | ||
| - | |||
| - | ..and initiate a scrub to repair the 2 x 2 mirror: | ||
| - | |||
| - | < | ||
| - | $ sudo zpool scrub mypool | ||
| - | </ | ||
| - | |||
| - | ===== ZFS compression ===== | ||
| - | |||
| - | As mentioned earlier, one can compress data automatically with ZFS. With the speed of modern CPUs this is a useful option as reduced data size means less data to physically read and write and hence faster I/O. ZFS offers a comprehensive range of compression methods. | ||
| - | |||
| - | < | ||
| - | sudo zfs set compression=gzip-9 mypool | ||
| - | </ | ||
| - | |||
| - | or even the compression type: | ||
| - | |||
| - | < | ||
| - | sudo zfs set compression=lz4 mypool | ||
| - | </ | ||
| - | |||
| - | and check on the compression level using: | ||
| - | |||
| - | < | ||
| - | sudo zfs get compressratio | ||
| - | </ | ||
| - | |||
| - | lz4 is significantly faster than the other options while still performing well; lz4 is the safest choice. | ||
| - | |||
| - | ===== ZFS Management ===== | ||
| - | You may want to list the zpools that you have, as well as see various statistics about the pools. | ||
| - | < | ||
| - | sudo zpool list | ||
| - | </ | ||
documentation/zfs.1580560512.txt.gz · Last modified: 2021/06/18 16:36 (external edit)