Enterprise versus Client SSD

A professional facing a wall with two white arrows, one pointing left, one pointing right.

A growing number of enterprise datacenters requiring high data throughput and low transaction latency previously reliant on Hard Disk Drives (HDD) in their servers are now running into performance bottlenecks and are looking to Solid-State Drives (SSDs) as a viable storage solution to increase their datacenter performance, efficiency, reliability and lowering overall operating expenses (OpEx).

To begin to understand the differences between SSD classes, we have to distinguish the two key components of an SSD – the Flash Storage Controller (or simply called SSD controller) and the non-volatile NAND Flash memory used to store data.

In today’s market, SSD and NAND Flash memory consumption are split into three main groups:
  • Consumer devices (Tablets, cameras, mobile phones),
  • Client systems (Netbook, notebook, Ultrabook, AIO, desktop personal computers), embedded/industrial (Gaming kiosk, purpose-built system, digital signage)
  • Enterprise computing platforms (HPC, datacenter servers).

Choosing the right SSD storage device for enterprise datacenters can be a long and arduous process of learning and qualifying a multitude of different SSD vendors and product types as not all SSDs and NAND Flash memory are created equal.

SSDs are manufactured to be easily deployable as replacements for or complement to rotational magnetic platter-based Hard Disk Drives (HDDs) and are available in a number of different form factors, including 2.5", communication protocol / interfaces including Serial ATA (SATA), Serial Attached SCSI (SAS) and more recently PCIe to transfer data to and from the Central Processing Unit (CPU) of a server.

Being easily deployable, however, does not guarantee that all SSDs will be suitable in the long term for the enterprise application they were selected for; the cost of choosing the wrong SSD can often negate any initial cost-savings and performance benefits gained when the SSDs are either worn out prematurely due to excessive writes, achieve far lower sustained write performance over their expected lifetime or introduce additional latency in the storage array and thus require early field replacement.

We will discuss the three main qualities that distinguish an enterprise and client class SSD to assist in making the right purchasing decision when the time comes to replace or add further storage to an enterprise datacenter.

Performance

SSDs can deliver incredibly high read and write performance for both sequential and random data requests from the CPU through multi-channel architecture and parallel access from the SSD’s Controller to the NAND Flash chips.

In a typical data center scenario involving processing millions of bytes of random company data, including collaboration on technical CAD drawings, seismic data for analysis (e.g., Big Data) or accessing worldwide customer data for banking transactions (e.g., OLTP), the storage devices must be accessible with the least amount of latency, which can involve many clients needing access to the same piece of data simultaneously with no degradation in response time. User experience is based upon having low latencies, which increases user productivity. Multiply this amongst an entire workforce, and you can see how the benefits of low latency can quickly add up.

A client application will only involve a single user or application access with a higher tolerable delta between the minimum and maximum response time (or latency) on any user or system actions.

Complex storage arrays using SSDs (e.g., Network Attached Storage, Direct Attached Storage or Storage Area Network) are also adversely affected by mismatched performance and can cause havoc on the storage array latency, sustained performance and ultimately, quality of service as perceived by users.

Unlike client SSDs, Kingston’s enterprise class SSDs are optimized not only for peak performance in the first few seconds of access but using a larger over-provisioned area (OP), they also offer a higher sustained steady state performance over longer periods of time. More information on specific drives can be found on the Kingston website under Enterprise SSDs.{{Footnote.N48213}}

This guarantees that the storage array performance stays consistent with the organization’s expected Quality of Service (QoS) requirement during peak traffic loads.

A rackmount LED console in a server room.

Reliability

NAND flash memory has several inherent issues associated with it. The two most important include a finite life expectancy as NAND flash cells wear out during repeated writes, and a naturally occurring error rate.

During the production process of NAND Flash, each NAND Flash die cut from silicon wafers is tested and characterized with a raw Bit Error Rate (BER or RBER).

The BER defines the rate at which naturally occurring bit errors in NAND Flash occur without the benefit of Error Correction Code (ECC) and which the SSD Controller corrects using on-the-fly Advanced ECC (typically called BCH ECC, Strong ECC or LDPC error correction by different SSD controller manufacturers) without disrupting user or system access.

The SSD controller’s ability to correct these bit errors can be interpreted by the Uncorrectable Bit Error Ratio (UBER), “a metric for data corruption rate equal to the number of data errors per bit read after applying any specified error-correction method”. {{Footnote.N48213}}

As defined and standardized by the industry standards association, JEDEC in 2010 with documents JESD218A:Solid State Drive (SSD) Requirements and Endurance Test Method and JESD219:Solid State Drive (SSD) Endurance Workloads, the enterprise class differs in a number of ways from client class SSDs including but not limited in their ability to support heavier write workloads, more extreme environmental conditions and recovery from a higher BER than a client SSD.{{Footnote.N52081}}{{Footnote.N52082}}

Application
Class
Workload
(see JESD219)
Active Use
(power on)
Retention Use
(power off)
UBER
Requirement
Client Client 40° C
8 hrs/day
30° C
1 year
≤10 -15
Enterprise Enterprise 55° C
24hrs/day
40° C
3 months
≤10 -16

Table 1 - JESD218A:Solid State Drive (SSD) Requirements and Endurance Test Method
Copyright JEDEC. Reproduced with permission by JEDEC.

Using the JEDEC proposed UBER requirement for enterprise versus client SSD, an enterprise class SSD is expected to only experience 1 unrecoverable bit error at a ratio of 1 bit error for every 10 quadrillion bits (~1.11 Petabytes) compared to a client SSD at 1 bit error for every 1 quadrillion bits (~0.11 Petabytes) processed.

Kingston’s enterprise SSDs will also add additional technologies that will allow for the recovery of corrupted blocks of data using parity data stored in other NAND dies (like RAIDing drives, this allows for the recovery of specific blocks that can be rebuilt with the parity data stored in other blocks).

To complement the redundant data block recovery technologies built into Kingston enterprise SSDs, periodic checkpoint creation, Cyclic Redundancy Check (CRC) and ECC error correction are also implemented in an End-to-End internal protection scheme to guarantee the integrity of data from the host through the flash and back to the host. End-to-End data protection means data received from the host is checked for integrity during its storage into the SSD’s internal cache and when written or read back from the NAND storage areas.

Like enterprise class SSDs that enhance ECC protection against bit errors, SSDs may also contain physical circuitry for power loss detection that manages power storage capacitors on the SSDs. Power Fail support in hardware monitors incoming power to the SSD, and during a surprise power loss, it provides temporary power to the SSD circuitry using capacitors to complete any internally or externally issued outstanding writes before powering down the SSDs. Power Loss Protection (PLP) circuitry is usually required for applications where data loss is not recoverable.

Power Loss Protection may also be implemented in the SSD firmware through frequent flushing of data in the SSD controller’s cache areas (e.g., its Flash Translation Layer Table) to the NAND storage – this does not guarantee that no data will be lost during a power loss event but tries to minimize the impact of unsafe power shutdowns. Firmware Power Loss Protection also ensures that the SSD is not likely to become inoperable after encountering an unsafe shutdown.

In many situations, the use of Software Defined Storage or server clustering may reduce the need for hardware-based Power Fail support as any data is replicated onto a separate and independent storage device on a different server or servers. Web-scale data centers often dispense with Power Fail support using Software Defined Storage to RAID servers to store redundant copies of the same data.

Endurance

A professional prods a hex on a superimposed graphic of tessellating hexes, each with a symbol indicating tech concepts such as cloud computing.

All NAND Flash memory contained in Flash storage devices degrade in their ability to reliably store bits of data with every program or erase (P/E) cycle of a NAND Flash memory cell until the NAND Flash blocks can no longer reliably store data; at that point, a degraded or bad block is removed from the user addressable storage pool and the logical block address (or LBA) is moved to a new physical address on NAND Flash storage array. A new storage block replaces the bad one using the Spares Block pool that is part of the Over Provisioned (OP) storage on the SSD.

As the cell is constantly programmed or erased, the BER also increases linearly, and it is for this reason a complex set of management techniques must be implemented on the enterprise SSD Controller to manage the cell capability to reliably store data over the expected life of the SSD.{{Footnote.N52083}}

The P/E endurance of a given NAND Flash memory can vary substantially depending on the current lithography manufacturing process and type of NAND Flash produced.

NAND flash memory typeQLCTLCMLCSLC
Architecture 4 bits per cell 3 bits per cell 2 bits per cell 1 bit per cell
Capacity Highest capacity Higher capacity High capacity Lowest capacity
Endurance (P/E) Lowest endurance Lower endurance Medium endurance High endurance
Cost $ $$ $$$ $$$$
Approx NAND Bit Error Rate (BER) 10^4 10^4 10^7 10^9

Table 2 – NAND flash memory types {{Footnote.N52084}}{{Footnote.N52085}}

Enterprise SSDs will also vary from client SSDs on their duty cycle. An enterprise-class SSD must be able to withstand heavy read or write activity in scenarios typical for a data center server requiring access to the data across the entire 24 hours of every day in the week. Compare this with a client-class SSD that is typically only fully utilised for 8 hours a day within a week.

Enterprise SSDs have a 24x7 duty cycle, compared to client SSDs with a 20/80 duty cycle (20% of the time active, 80% in idle or sleep mode during computer usage).

Understanding the write endurance of any application or SSD can be complex, which is why the JEDEC committee also proposed an endurance measurement metric using the TeraBytes Written (TBW) value to indicate the amount of raw Host data that can be written to the SSD before the NAND Flash contained in the SSD becomes an unreliable storage medium and the drive should be retired.

Using the JEDEC proposed JESD218A testing methods and JESD219 enterprise class workloads, it becomes an easier task to interpret an SSD manufacturer’s endurance calculations via TBW and extrapolate a more understandable endurance measure that can be applied to any datacenter.

As noted in documents JESD218 and JESD219, different application class workloads can also suffer from a Write Amplification Factor (WAF) an order of magnitude higher than the actual writes submitted by the host. This can easily lead to unmanageable NAND Flash wear, higher NAND Flash BER from excessive writes over time and slower performance from widely distributed invalid pages across the SSD.

While TBW is an important topic for the discussion between enterprise and client class SSDs, TBW is only a NAND Flash level endurance prediction model. The Mean Time Between Failure (MTBF) should be observed as a component level endurance and reliability prediction model based on the reliability of components utilized on the device. The expectation of an enterprise class SSDs components includes outlasting and working harder at managing the voltages across all NAND Flash memory over the SSDs life expectancy. All enterprise SSDs should be rated at least at two million hours MTBF, which translates to over 230 years! Kingston specs its SSDs very conservatively and it is not uncommon to see higher MTBF specifications on SSDs; it is important to note that 2 million hours is more than a sufficient starting point for enterprise SSDs.

S.M.A.R.T. monitoring and reporting on enterprise class SSDs allows the device to be easily queried pre-failure for life expectancy based on the current write amplification (WAF) factor and wear level. Pre-failure predictive warnings for failure events such as a loss of power, bit errors occurring from the physical interface or un-even wear distribution are often also supported. The Kingston SSD Manager utility can be downloaded from the Kingston web site and used to view a drive’s status.

Client class SSDs may only feature the minimum S.M.A.R.T. output for monitoring the SSD during standard use or post-failure.

Depending on the application class and capacity of the SSD, an increased reserve capacity of NAND Flash memory can also be allocated as an over-provisioned (OP) spare capacity. The OP capacity is hidden from user and operating system access and can be utilized as a temporary write buffer for higher sustained performance and as a replacement of defective Flash memory cells during the life-expectancy of the SSD to enhance the reliability and endurance of the SSD (with greater numbers of Spare Blocks).

Conclusion

There are distinctive differences between enterprise and client class SSDs, ranging from their NAND Flash memory Program and Erase endurance to their complex management techniques to suit different application class workloads.

Understanding these differences in application classes can be an effective tool in minimizing and managing the risk of disruptive downtime in the demanding and often mission critical enterprise environment.

If you have further questions, or would like to know more about Enterprise SSDs from Kingston, please contact your Kingston representative, our Ask An Expert team or our Tech Support Chat.

Related Videos

Related Articles