Specific characteristics of flash memory (SSDs) for industrial use

02 October 2024 Knowledge Base

IT applications in an industrial environment require the components used to have characteristics that are not found in the areas of office, server and home/consumer IT, or only to a limited extent. These include robustness, as the devices are often used in very demanding environments. The devices and components often have to withstand shocks, impacts and vibrations. In addition, particularly high or low operating temperatures, high humidity, moisture or hazardous atmospheres are sometimes present. Another point is durability, which is often a must as the devices are used in remote locations, which means long downtimes and high costs for replacement or repair in the event of a defect. The third point is long-term availability guaranteed over many years, as use in industry is often associated with expensive and lengthy certifications, and components are difficult to replace later without recertification. This is the case, for example, in the medical technology, military, transportation (road, rail and ship) and food processing industries. Of course, flash memories designed for industrial applications must also meet these high standards. This is ensured on the one hand by using high-quality components and production methods, and on the other by innovative (software) concepts and a focus on special applications with specific requirements, e.g. data integrity, speed or low data loss.

Long-term availability:

The industrial flash memories offered by IPC2U GmbH are available unchanged for years. Even the introduction of new, generally faster and/or larger SSDs does not mean that the previous models are no longer available. For projects that are planned for several years from the outset, we are happy to enter into agreements with the manufacturers to ensure that the components you require remain available unchanged for the entire duration of the project, if this is necessary for certification reasons, for example.

Robustness (extended operating temperature range, protective coating, side fill, anti-sulphonation):

As flash memory, like all IT components in industrial use, can be exposed to particularly harsh operating environments, there are also (in some cases optional) adaptations to meet these requirements.

For example, SSDs are available for outdoor use or in cold stores, which not only have the standard operating temperature range of 0°C-70°C, but can also be used at temperatures of -40°C-85°C.

It should also be noted that as electronic assemblies, including flash memory modules, become increasingly compact, the areas of the solder joints are also becoming smaller and smaller. Thermal stress caused by contraction and expansion and mechanical stress caused by bending or twisting the PCBs can quickly lead to solder joints becoming loose and losing conductivity. This can be counteracted with the so-called side fill process. Here, a resin that hardens under ultraviolet radiation is injected into the gaps and corners of the modules. This creates a firm connection between the circuit board and the memory chip, which distributes the thermal and mechanical forces that occure across the entire component and thus prevents the concentration of stress from the solder joints. This improves the reliability and service life of the SSD.

side_fill_1920x1080_2_en.png

Increased exposure to moisture, dust, or other contaminants can be counteracted by applying protective coatings, known as conformal coatings. This is a polymer film with a thickness of around 50 to 210 μm, which is applied to the surface and thus serves as protection against mechanical stresses, chemical gases, corrosion, and short circuits. This improves both reliability under harsh conditions and product service life.

coating-6.png

A special anti-sulphonation technology is used to prevent the formation of metal sulphides, which can deposit on electronic components in environments containing sulphur compounds. Electronic components, such as resistors, are often made of silver. If these are exposed to a sulphurous atmosphere for a long period of time, they can take on abnormal values due to oxidation (sulphurization) of the electrodes, which reduces the reliability of the SSD. However, if the surfaces of the conductors are coated with a sulphur-resistant material, no sulphuric acid compounds are formed that could lead to altered characteristics or even short circuits.

Occurrence area-3.jpg

Durability (quality components, garbage collection, TRIM, wear leveling):

Alongside long-term availability and robustness, the third pillar on which the industrial IT landscape rests is the longevity of the systems, as failures often result in high costs. This also applies to the flash memory used, for which particularly high-quality components and special technologies are used to ensure a particularly long service life.

Solid-state hard disks (SSDs) differ considerably from HDDs, as with SSDs the existing data in a memory block must first be erased before new data can be written to it. This means that new data cannot directly overwrite old, invalid data. With SSDs, garbage collection refers to the process in which existing data is moved to another location in the NAND flash memory in order to then completely delete the invalid data. With SSDs, data is written in page units, while data is deleted in block units. In order to remove invalid data, the valid data of a block must therefore first be copied to a new block page so that the invalid data in the original block can be deleted. As soon as the data in a block has been completely deleted, it becomes a free block into which new data can be written. During garbage collection, scattered valid data is merged and moved to another data block, thereby deleting the original block. The aim of this approach is to provide as many free blocks as possible in order to guarantee the write speed of the SSD.

garbage_collection.jpg

Newer operating systems support the TRIM command, through which the operating system informs the SSD that certain files have been deleted. This enables the SSD to manage the garbage collection more efficiently and restore the free storage space more quickly. Before the introduction of TRIM, there was no way to tell SSDs that certain data had been released for deletion. However, with TRIM, pages belonging to deleted files can be processed by the garbage collection the next time the computer is idle. Without TRIM, SSDs retained invalid information until they had to write new data in its place, slowing down the process and negatively impacting the life of the drive.

Depending on the flash memory type (SLC, MLC, TLC, QLC...), NAND memory cells only have a limited number of possible write/erase operations before the different voltage levels can no longer be reliably detected. Each erase and write operation generates wear, which slows down the memory operations or, in the worst case, even leads to bad blocks. Wear leveling is a protection algorithm in the firmware that ensures that all memory cells are used as evenly as possible to prevent particularly frequently used cells from being damaged more quickly than the others. This prevents the SSD from becoming unusable prematurely due to too many faulty blocks. Even use also distributes wear evenly across all cells, so that the overall service life of the product is extended and can also be calculated. This means that the user can be given a warning in good time before the product's service life expires so that it can be replaced before problems arise during operation.

wear_leveling_1.jpg

Data integrity (end-to-end protection, PLP, S.Live-Monitor):

Every SSD, regardless of form factor (2.5”, M.2, mSATA etc.) and communication protocol (SATA or NVMe), has a controller that controls the communication between the SSD itself and the host system.All write and read processes pass through this controller. The “end-to-end data protection” implemented here is a function that enables the verification of data paths from the computer system via the hard disk to the data transfer in order to ensure the integrity of the data. Even if data is moved in the controller's internal RAM and a soft error occurs, the system can detect these errors and thus prevent faulty data transmission to the host. This enables each component to independently maintain debugging functions and avoid erroneous data during transmission. To ensure data integrity during the transition from the SSD controller to the NAND memory, the SSD controller uses an “error correction code” (ECC) technique that can detect and correct most errors. The NAND flash chip adds additional error correction information and writes the data of each block in such a way that the SSD controller can also correct errors when reading a data block. In extremely rare cases, however, data errors cannot be corrected. In such cases, the SSD controller classifies these as “uncorrectable ECC errors” (UECC) and provides feedback to the host computer accordingly.

E2E Data Protection.jpg

Another important key technology in terms of data integrity and reliability when using flash data storage is Power Loss Protection, or PLP for short, which uses firmware and hardware to ensure that important data is protected even in the event of abnormal power peaks or failures. Essentially, two mechanisms are used here. The first is software-based and is based on the fact that important data is stored in several copies. Should an abnormal voltage event now damage the current data set, the last status can be restored from the backup copies. However, the data changes since the last backup will be lost. To prevent this, a second hardware mechanism is used to intercept such voltage peaks and drops before they can cause damage. Banks of current holding capacitors are used to ensure that the SSD is still supplied with power long enough to continue buffered read/write processes until completion, thus preventing data loss, even in the event of a failure.

plp.png

In order to monitor the functionality of the flash memory, manufacturers of industrial SSDs offer software solutions that collect, process and display events such as power failures, write/read errors, bad blocks, etc., the current status including operating hours, PLP condensation states, number of bytes written per day and in total, temperature, etc., as well as other information. This enables the user to estimate the service life of the SSD and intervene if necessary. The monitor software can also be used to intervene in the reporting process in order to delete counter readings or set values for the wear leveling factor, for example. Some of the flash memory manufacturers that IPC2U GmbH works with even offer to customize the monitor functions for specific customers and applications.

SMART_3(1).jpg

Specialization in certain application areas (security, speed, RAID, etc.):

When using IT hardware in an industrial environment, it is important to note that specific applications often have specific requirements, and this also applies to flash memory. In order to adapt these as ideally as possible to their intended use, adjustments to the hardware and firmware are often necessary. For example, there are SSDs that are particularly secure against failures and require an early replacement via the monitor software (e.g. for security-relevant applications such as passenger transportation, in banks or casinos). Others have particularly high capacities in order to record large amounts of data or are optimized for fast reading and/or writing of the cells. In the field of video surveillance, on the other hand, it is important that all data supplied by the cameras in real time is stored without loss so that no individual frame is lost. Here, therefore, it is not so much the pure speed of the SSD that is important, but a particularly large buffer memory. SSDs and controllers that are specially designed for use in RAID systems are another specialty. Here it is crucial that the individual mass storage devices are always the same size. The controller must therefore receive information from the SSDs when bad blocks are detected and also reduce the capacities on the other mass storage devices accordingly.


Fast Product Request