What does your organization do when SSD (Solid State Drive) storage devices fail in production or needs to be decommissioned? For failed magnetic drives, the process is to simply sanitize with a purge or clear method, and then RMA the drive back to the vendor for credit. With SSD devices, the status-quo has been to shred the device and incur the cost of a replacement unit.
Is it necessary to shred an SSD drive? The industry perception of SSD sanitization is that the devices cannot be sanitized due to the flash device architecture. The truth about SSD sanitization is that the answer is more complicated than a simple yes or no. There are solutions available, some that work, others that do not. To understand how to make the best sanitization choices requires an understanding of how SSD work.
There have been studies performed in the past to answer the questions:
- Is it possible to permanently delete a file from an SSD?
- Is it possible to permanently delete all data from an SSD?
The studies state that the answer to file deletion is NO – It is not possible. However, file deletion is the wrong question to ask when planning the end of service and the final disposition of the drive. The correct viewpoint for ITAD is to address the entire capacity of the device rather than specifics of file level addressing attributes which is what will be discussed in this article.
The studies leave the answer to the 2nd question open for interpretation along with the correct approach to take for full device sanitization.
To answer these questions for data sanitation, I am using my professional experience architecting storage solutions and evaluating various storage and bus technologies to build enterprise class systems. From that work, I have developed a systematic or methodical approach to problem solving that I have applied to SSD devices.
When SSD devices started to play in the storage market, it was all about speed. What was it about NAND technology that allowed us to get so many IOPs from a small form-factor device? Many of the SSD vendors would demonstrate the performance of a single SSD drive compared against an array of 15K RPM SAS drives with higher power consumption. No one was really interested in the power savings…or sanitization, for that matter.
Fortunately, I had acquired many technical contacts through vendors trying to qualify their devices within my storage solutions. Through countless product evaluations, design reviews, and those that made it to our qualification process, I amassed a broad understanding of the NAND hardware technology and firmware components that handle the Flash Translation Layer (FTL), Block Management, and wear-leveling voodoo.
When moving on from the Storage System development to the ITAM world, sanitization of storage devices was paramount to success. Our customers talked about the expense and challenges with SSD devices within their infrastructures.
Inside the SSD Drive
SSD drives are productized with a capacity. The capacity of a drive is represented as the logical capacity. If dissecting a 200GB drive, you would see 16x 16GB NAND chips. Doing a bit of math, we will see the physical capacity of the drive is 256GB. Why is this? The physics behind the NAND flash components work on moving electrons across the NAND cells. A cell is charged and it could represent a 1. Discharged, it becomes a 0. Over time, as the cell is charged over and over it will lose the ability to hold the charge. To diminish the deterioration of the cells, the firmware will implement several techniques that help minimize the number of program cycles to each NAND cell. This is called wear-leveling.
The wear-leveling function monitors the age of the NAND blocks and as the blocks are programmed and issued another program cycle, the logical to physical mapping of the NAND blocks can change to another physical NAND block that has fewer program cycles. The FTL manages the rotation of the blocks to and from the spare area to the user area (logical maps).
There are also firmware techniques that will monitor the flash program cycle in that if the cell already contains a positive charge, and the host is requesting a write operation on the drive, then the flash firmware will utilize that block and fake the program operation. The over provisioning, wear-leveling, and faking of program cycles are what makes sanitizing SSD devices a challenge. I used “challenge” not “impossible.”
On the device, there is a front-end controller that is SATA, SAS, FC, PCI, or USB, and then the back-end NAND controller. The front-end interface provides the vehicle for which an application shall talk to the device in order to tell it to write data, retrieve data, or sanitize itself. The back-end NAND controller will direct the flash banks what to do. With the back-end NAND controller, there are only a few commands such as PROGRAM cycle, READ cycle, or BLOCK_ERASE cycle. The PROGRAM and READ commands will mind the defect page registers and will be able to access the smallest capacity denomination which is a PAGE. The BLOCK_ERASE command disregards the defect page registers and will always address the full block.
The FTL and Block Manager will usually restrict the commands to block-based, which is the grouping of the pages. You can think of pages-to-block relationships of NAND as similar to the way magnetic disks uses sectors-to-tracks as the layout. However, unlike a magnetic disk where access is based on the sectors, the NAND flash will be accessed based on the blocks.
Being familiar with these commands and by knowing what they will and will not do, I will confirm that the ONLY way to sanitize flash is to issue the BLOCK_ERASE command. As NAND cell life deteriorates, the program and read cycle times will increase. Once a page (cell) within the block exceeds the cycle time threshold, the firmware’s block manager shall flag that block as defective within the status register of the block and make sure it is no longer used. Data typically remains within that block even though it is removed from the circulation process. Through the standards, if a block is flagged as containing defective pages within the defect registers, then it cannot be issued a program cycle. However, it can be issued a BLOCK_ERASE cycle. This is one of the reasons why overwrite methods for SSD devices will never work to forensically sanitize these device types.
The BLOCK_ERASE cycle disregards the block/page defect registers and will remove the charge on every NAND gate by removing all of the electrons (0), then it will apply a positive charge to all of the gates (1), followed by a READ cycle. The BLOCK_ERASE will not be affected by slower performing gates and will destroy the data completely. The process of the READ cycle is to re-map the defects within the blocks.
We know what is needed to sanitize the flash based devices, now we will need to know how to communicate with the drive to issue the proper commands. The SSD devices we see on the market do not have a direct interface to issue the NAND commands. At ITRenew, we usually see an ATA, SCSI, or PCI interface that is used to communicate to the device. When commands are issued to the drive, it is the responsibility of the firmware development vendor to implement proper commands to translate and trigger the correct command to the back-end NAND controller. This is where complications occur.
Many of the ATA and SCSI commands are noted in the specifications as “optional.” Also, these specifications only describe the command protocol and status implementation which leaves room for misinterpretation on the vendor side. One of the largest struggles is with the ATA Security command set. I have heard many times that the method to sanitize a SATA SSD is to issue a SECURE_ERASE command. Okay, that is a start, but afterwards, did it work? One of the publicized studies of SSD sanitization showed a table of several vendor SSD devices that contained staged data and what happened when issued a SECURE_ERASE. Some of the data was changed on a few; others took the command and did not change any of the data, and some implemented the back-end function correctly and it did sanitize the physical capacity. Since this study, the storage world evolved a bit more and we see a little more consistency, but there are still holes in the proper process implementation. How many times do we hear people talking about NORMAL or ENHANCED modes of SECURE_ERASE? Rarely. Yet, these modes will usually offer a different approach on the back-end flash sanitization. Depending on the vendor, they could perform any of the following under each mode without fully documenting:
- Back-end CYRPTOGRAPHIC key erasure between the NAND controller and the flash
- BLOCK_ERASE command
- Device issued program cycle to ALL of the blocks not flagged with defects or errors
- Device issued program cycle to just the physical blocks currently mapped to the logical capacity (remember, the vendors will want to preserve the cells by reducing as many of the program cycles as possible)
- Do nothing
For reputable sanitization solution providers, it is important to qualify SSD device vendors based on the drive family and firmware implementation for forensic level sanitization needs. This ensures governance of the solution and will prevent any data breaches.
Conclusion: Myth Busted
It is possible to sanitize an SSD device at a forensic level. In fact, I work with a true enterprise class digital asset disposition and diagnostic platform that processes large volumes of SSD devices concurrently with FC, SATA, SAS, and PCI-E interfaces. The process has been proven by several hyper-scale data center customers and through independent 3rd party forensic analysis labs.
However, consumers must use caution when selecting a software sanitization provider who claims to have SSD support. SSD devices should not be put through countless overwrite passes to claim “certified sanitization.” Multiple passes will just take more time, may or may not get around the wear-leveling algorithms and will NOT address any of the blocks marked bad for under-performing. The same warning applies when the vendor leverages “SECURE_ERASE” as the method of sanitization – again, it might work, but won’t work for infrastructures using multiple vendors. For enterprise customers, select a solution that also has support for traditional enterprise class interfaces such as SAS, FC, and PCI-E, rather than just SATA.