QueTek
  
  
Back to Cyrus's Blogs
ZFS Pool with multiple virtual devices recovery

Hardware and Volume configuration:

A customer has a Netgear NAS/SAN with ten drives. Two of the drives suffered mechanical problems and brought down the system. The customer believed this was a RAID-50 (a RAID0 spanned over RAID5s). Our analysis shows that this was a ZFS storage pool consisting of three RAIDZ1s. Besides the two damaged drives, one of the drives did not contain any relevant data. Below was the status of the ZPool:

Vol1
          raidz1-0: DEGRADED
              Missing                    UNAVAIL  
              9E 88 69 A0 B4 C1 16 EE    ACCESSABLE
              4A 50 2D 8B 61 BD B7 96    ACCESSABLE

          raidz1-1: NOT RECOVERABLE (insufficient replicas)
              56 6F 3F 0D 7B FE 6D D2    ACCESSABLE
              Missing                    UNAVAIL
              Missing                    UNAVAIL

          raidz1-2: GOOD
              5F 14 EC A8 77 86 36 9B    ACCESSABLE
              C1 D4 89 81 DC E9 F9 14    ACCESSABLE
              7C F6 4E 99 D7 2D 86 24    ACCESSABLE
              57 0C 7C CA 4D 29 74 B8    ACCESSABLE
		
The Zpool provided storage for a number of virtual volumes (VHDX) that held the customer's Microsoft Exchange databases, and backups of users' emails and other documents.

Problem:

Since one of the vdevs (raidz1-1 in the list above) was missing, around 30% of the storage space was no longer accessible. It's almost certain that large files such as databases would have gaps in the data stream. At best, this would be a partial recovery.

Solution:

There were a few lucky break that helped us solving this case. Apparently, ZFS prefers to store data on the vdevs with more drives and/or faster drives, and shies away from vdevs running in degraded mode. For this ZPool, raidz1-2 was added at a later time. Not only it had more drives but also the drives had larger capacity and faster speed. Furthermore, the other two vdevs had been running in degraded mode for sometimes. As a result, most of recent updates were stored on this last vdev. we could read 100% of metadata with double or triple DVAs and rebuild the complete folder structure. At the file data level, around 40% of older data were missing, but newer data is almost 100% recoverable.

Result:

We were able to recover most of newer backup files and documents. Older files were partially recovered. The Exchange databases were recovered with gaps filled in with zeros. The customer were able to extract useful data from the databases using third party software.


Back to Cyrus' Blogs