# Adaptive Program Verify Scheme for Improving NAND Flash Memory Performance and Lifespan Sang In Park Semiconductor Division, Samsung Electronics SSIT, Sungkyunkwan University Hwasung, Korea Email: sipsi.park@samsung.com Dongkun Shin Department of Computer Engineering Sungkyunkwan University Suwon, Korea Email: dongkun@skku.edu Eui Gyu Han Semiconductor Division Samsung Electronics Hwasung, Korea Email: eg.han@samsung.com Abstract—Since NAND flash memory program/erase (PE) cycling gradually degrades the reliability of memory cells, the redundancy of error-correction code (ECC) is determined so as to sufficiently ensure the PE cycling endurance at the end of memory lifetime. Therefore, ECC redundancy is under-utilized when PE cycling number is relatively small at the early lifetime. Considering the variations on program speed and error rate depending on the program step pulse voltage $(\Delta V_{pp})$ in the incremental step pulse programming (ISPP), an adaptive $\Delta V_{pp}$ scheme was proposed in order to improve program performance by exploiting the under-utilized ECC. However, the adaptive $\Delta V_{pp}$ scheme missed the problem of increased voltage stress on memory cells at a large $\Delta V_{pp}$ . The voltage stress will shorten the lifespan of flash memory devices. This paper proposes an adaptive $V_{verify}$ scheme, which trades the under-utilized ECC for improving program performance at the early lifetime of flash memory without decreasing the memory lifetime. The experiments with real NAND flash chips demonstrate up to 21% of program time improvement and 10% of lifetime improvement over the fixed $V_{verify}$ scheme. ## I. INTRODUCTION As NAND flash memory fabrication process shrinks and multi-level-cell (MLC) flash memory is widely used, NAND flash memory increasingly relies on error correction codes (ECC). Since NAND flash memory cells gradually wear out with program and erase (PE) cycling, the bit error rate increases as the PE cycling number increases. To ensure reliability at a specified PE cycling limit, the manufacturers fabricate enough number of ECC redundancy cells to tolerate the worst-case reliability at the end of memory lifetime [1]. Therefore, ECC redundancy is under-utilized during the early lifetime of NAND flash memory device. In order to trade such under-utilized ECC redundancy for improving the program performance of flash memory device, Pan et al. [2] suggested an adaptive scheme on the incremental program step voltage $\Delta V_{pp}$ , where $\Delta V_{pp}$ is adjusted according to PE cycling number. However, the adaptive $\Delta V_{pp}$ scheme may degrade the endurance of flash memory cell by imposing voltage stress on the memory cell thus decreases the lifespan of flash memory In this paper, we propose a novel program scheme which adjusts the program verify voltage $V_{verify}$ . By exploiting the trade-off between the performance and the retention mode error rate depending on $V_{verify}$ , the proposed scheme improves (a) Incremental Step Pulse Programming (ISPP) (b) Threshold voltage distribution shift during ISPP Fig. 1. ISPP and its corresponding cell threshold voltage change. the program speed by utilizing ECC redundancy cells fully at the early lifetime of memory. Moreover, it can improve the lifespan of flash memory device by minimizing the voltage stress on memory cells. The experiments on real flash memory devices demonstrate that the proposed scheme can improve the program performance by up to 21%. # II. NAND FLASH MEMORY PROGRAM MODEL The Incremental Step Pulse Programming (ISPP) scheme [3] is typically used for precise programming at NAND flash memory as shown in Fig. 1. ISPP repeats the program-and-verify pulses with a stair case program voltage until all the memory cells in the target word-line are programmed. At the k-th program pulse, ISPP programs memory cells with the program voltage $V_{pgm}^{(k)}$ , and verifies the program with the program verify voltage $V_{verify}$ . The program verification is performed to detect failed bits that have not been successfully programmed. When failed bits are detected, the next program pulse is performed with the higher program voltage $V_{pgm}^{(k+1)}$ , where $V_{pgm}^{(k+1)} = V_{pgm}^{(k)} + \Delta V_{pp}$ . $\Delta V_{pp}$ is the incremental program step voltage. Fig. 1(B) shows that the threshold voltage distribution shift from the erased state to the programmed state during ISPP. At the first program pulse, the fastest cell is sufficiently programmed thus the threshold voltage reaches $V_{th.fastest}$ which is higher than $V_{verify}$ , but the threshold voltage of the slowest cell reaches $V_{th.slowest}$ which is lower than $V_{verify}$ . Therefore, the cell threshold voltages are distributed across the range of $[V_{th.slowest}, V_{th.fastest}]$ . The next program pulses shift the cell threshold voltages distribution more to the erased state, and the program operation will be completed when the cell threshold voltages are distributed in the range of $[V_{verify}, V_{verify} + \Delta V_{pp}]$ . The number of program pulses $N_p$ and the program time $T_p$ are determined as follows [4]: $$N_p = 1 + \left[ \Delta V_{th.0} / \Delta V_{pp} \right] \tag{1}$$ $$T_p = T_{load} + (T_{pulse} + T_{vfy}) \cdot N_p \tag{2}$$ $\Delta V_{th.0}$ is the initial threshold voltage distribution, and is expressed as $\Delta V_{th.0} = V_{verify} - V_{th.slowest}$ . $T_{load}$ is the duration of the data load, $T_{pulse}$ is the program pulse width, and $T_{vfy}$ is the verify read time. According to Eq. (1) and (2), we can know that a higher $\Delta V_{pp}$ and/or a lower $V_{verify}$ can reduce program time by reducing the value of $N_p$ . However, the values of $\Delta V_{pp}$ and $V_{verify}$ affect the error rate of memory cells. The error rate is related to charge trapping and detrapping, which increase with PE cycling. The charge trapping at program mode reduces the noise margin between $V_{verify} + \Delta V_{pp}$ and $V_{read}$ . Since there is a less noise margin at a larger $\Delta V_{pp}$ , there are more program disturbance errors under a larger $\Delta V_{pp}$ . The charge detrapping at retention mode reduces the noise margin between 0 V and $V_{verify}$ by shifting the cell threshold voltage to the erased state. Since there is a less noise margin at a lower $V_{verify}$ , there are more retention errors under a lower $V_{verify}$ . Therefore, a higher $\Delta V_{pp}$ and/or a lower $V_{verify}$ degrade the reliability of memory cells. Such a trade-off between program speed and bit error rate depending on $\Delta V_{pp}$ and $V_{verify}$ should be considered when manufacturers determine the values of $\Delta V_{pp}$ and $V_{verify}$ . The values of $\Delta V_{pp}$ and $V_{verify}$ are determined considering the guaranteed reliability at a specified PE cycling endurance limit $N_{max}^{PE}$ . The worst-case oriented $\Delta V_{pp}$ and $V_{verify}$ are typically fixed over the entire lifetime of memory. ### III. ADAPTIVE PROGRAM STEP VOLTAGE SCHEME Pan et al. [2] suggested improving the memory program speed by using a large $\Delta V_{pp}$ at the early lifetime of memory Fig. 2. The reliability degradation by voltage stress at high $\Delta V_{pp}$ (SLC NAND). cells and adaptively decreasing $\Delta V_{pp}$ according to PE cycling. Although the larger $\Delta V_{pp}$ can generate more bit errors, they can be correctable with the under-utilized ECC redundancy when the PE cycling number is small. However, the authors of [2] missed the problem of increased voltage stress on memory cells at a large $\Delta V_{pp}$ . The voltage stress may degrade the memory lifetime. When the program voltage of $V_{pgm}^{(k)}$ is applied to flash memory cells in the k-th pulse of ISPP, fast cells may be programmed but slow cells may not be programmed. Then, the (k+1)-th pulse will apply the program voltage of $V_{pgm}^{(k+1)}$ (= $V_{pgm}^{(k)}+\Delta V_{pp}$ ) to all memory cells in the target word-line, then the memory cells already programmed at the k-th pulse will get $\Delta V_{pp}$ of voltage stress. Therefore, a larger $\Delta V_{pp}$ gives more voltage stress to flash memory cells. We verified the reliability degradation under a large $\Delta V_{pp}$ with real flash memory chips. Fig. 2 shows the relationship between $\Delta V_{pp}$ and the reliability of memory cells. As $\Delta V_{pp}$ increases, the number of defective memory cells in the retention mode accordingly increases due to the voltage stress. To exploit the under-utilized ECC redundancy without additional voltage stress, it is better to adjust $V_{verify}$ . A low $V_{verify}$ can reduce the program time since $\Delta V_{th,0}$ is reduced under a low $V_{verify}$ in Eq. (1). However, a low $V_{verify}$ will increase the error rate in the retention mode by reducing the margin between $V_{verify}$ and 0 V. The short margin is vulnerable to the cell threshold voltage shift due to detrapping on the recovery period. Therefore, our adaptive $V_{verify}$ scheme uses a low $V_{verify}$ at the early lifetime of flash memory to improve the program speed, but adaptively increases $V_{verify}$ to make enough margin as the number of PE cycling increases. The proposed scheme does not impose additional voltage stress because there is no change on $\Delta V_{pp}$ . Furthermore, it can reduce the voltage stress on memory cells by reducing $V_{verify}$ thus can improve the memory lifetime compared with the conventional fixed $V_{verify}$ scheme. ## IV. ADAPTIVE PROGRAM VERIFY VOLTAGE SCHEME Fig. 3 shows the change of program verify voltage with PE cycling at the adaptive $V_{verify}$ scheme. The program verify Fig. 3. Adaptive change of with PE cycling. TABLE I PE cycling thresholds for different verify voltages (SLC NAND). | ĺ | $V_{verify}$ | | | | | | | |---|--------------|-----|-----|-----|-----|-----|------| | ĺ | PE cycling | ≤1K | ≤2K | ≤4K | ≤8K | ≤9K | ≤10K | voltage gradually increases with the sequence of $V_{verify}^{(1)} \rightarrow V_{verify}^{(2)} \rightarrow \cdots V_{verify}^{(k)} \rightarrow V_{verify}^{device}$ at the predetermined thresholds on PE cycling. $V_{verify}^{device}$ represents the program verify voltage determined by the conventional fixed scheme considering the error rate at the PE cycling limit $N_{max}^{PE}$ . $N_1, N_2, \cdots$ are the threshold PE cycling numbers where the value of $V_{verify}$ should be adjusted to handle the increased error rate. These threshold values can be obtained through an experiment on the target flash memory chip as follows: First, n number of different program verify voltages in the range of $[0,\ V_{verify}^{device}]$ are selected, i.e., $0 < V_{verify}^{(1)} < V_{verify}^{(2)} < \cdots < V_{verify}^{(n)} < V_{verify}^{device}$ . Second, n number of memory block groups $BG_1,\ BG_2,\cdots,\ BG_n$ are selected, where each block group has m number of flash memory blocks. We denote the i-th block in $BG_j$ as $B_j^i$ . The blocks in the same block group $BG_j$ will be programmed with the same program verify voltage $V_{verify}^{(j)}$ . Third, for each block $B_1^i$ in $BG_1$ , we repeat PE cycling as the number of $i \cdot N_{max}^{PE}/m$ , and we observe the bit error rate of each block in the retention mode. The retention mode errors can be measured at time-accelerated baking test. If the bit errors in $B_1^i$ are correctable but the bit errors in $B_1^{i+1}$ are not correctable with the ECC redundancy of the device, we can know that the verify voltage $V_{verify}^{(1)}$ can be used up to $i \cdot N_{max}^{PE}/m$ of PE cycling. Therefore, $N_1$ is set to $i \cdot N_{max}^{PE}/m$ . We repeat the same experiment on $BG_2$ to find $N_2$ . For all blocks in $BG_2$ , $N_1$ number of PE cycling are performed with $V_{verify}^{(1)}$ and the remaining PE cycling are performed with $V_{verify}^{(2)}$ . The remaining threshold PE cycling numbers are obtained by repeatedly performing the previous experiment. Table 1 shows an example of the threshold PE cycling numbers for different program verify voltages. As we use larger values of n and m, the finer adjustment on the verify voltage is possible. However, the hardware cost will be high to support the fine-grained adjustment. Therefore, the trade-off should be considered to determine the values of n and m. We must also consider the process variation of NAND flash memory chips. A sufficient margin should be applied to determine the PE cycling thresholds. The proposed adaptive $V_{verify}$ scheme is also applicable the MLC flash chips. However, since the MLC chip has multiple program states, we must evaluate the error rate of each program state under different verification voltages and read voltages. #### V. EXPERIMENTS #### A. Performance Improvement In this paper, we have investigated the effect of the adaptive $V_{verify}$ scheme with 4Gb SLC NAND device. The device has 1-bit correctable ECC scheme, and guarantees 10K PE cycling and 10 years retention. The verify voltage of fixed scheme is 1.4 V and $\Delta V_{pp}$ is 1.0 V. We first compared the numbers of program and erase pulses over the entire lifetime of memory device at the conventional fixed $V_{verify}$ scheme and the proposed adaptive $V_{verify}$ scheme. We used 4 memory chips for the experiment, and measured the average numbers of program or erase pulses at every 0.1K PE cycling. Fig. 4 shows that the adaptive scheme reduces the number of program or erase pulses. Fig. 5 shows the program speed improvement by the adaptive scheme. The performance gain gradually decreases as PE cycling progresses. The average performance gain over the entire lifetime is 11% and there are significant gains (about 21%) especially at the early life time of memory device. ## B. Lifespan Improvement The adaptive $V_{verify}$ scheme can improve the device lifetime by reducing the gate voltage stress applied to the memory cell. We measured the cell threshold voltage shift at timeaccelerated baking test (2hr at 250°C) after 10K PE cycling. Fig. 6(a) shows the difference on the cell threshold voltage shifts between the fixed scheme and the adaptive scheme. The threshold voltage shift in the fixed scheme is on average 0.05 V larger than that in the adaptive scheme. To know the effect of voltage shift, we observed the change of cell threshold voltage with PE cycling as shown in Fig. 6(b). The slope of the trend line on Fig. 6(b) is about 0.06. This means that 0.06 V of threshold voltage shift to the erased state corresponds to about 1K of PE cycling. Therefore, we can know that the adaptive $V_{verify}$ scheme can further save 1K of PE cycling compared with the fixed scheme. Since the lifespan of the test flash chip is 10K of PE cycling, 1K of PE cycling corresponds to 10% of the entire lifespan. This shows that the adaptive $V_{verify}$ scheme can improve the overall memory lifespan more than 10% compared with the fixed $V_{verify}$ scheme. ## VI. CONCLUSION The adaptive program step pulse voltage ( $\Delta V_{pp}$ ) scheme can improve the program speed of NAND flash memory by exploiting the under-utilized ECC redundancy at the early Fig. 4. Comparison program/erase pulses between the adaptive $V_{verify}$ scheme and the fixed $V_{verify}$ scheme. lifetime of the device. However, the adaptive $\Delta V_{pp}$ scheme has a critical problem that the device lifetime decreases due to more voltage stress with a larger $\Delta V_{pp}$ . We demonstrated the reliability degradation problem with real NAND flash chips. In this paper, we proposed an adaptive verify voltage $(V_{verify})$ scheme to trade the under-utilized ECC redundancy for improving program performance without lifetime degradation. The proposed adaptive scheme has two advantages over the conventional fixed verify voltage scheme: First, the program speed can be improved by reducing the number of program pluses. Second, the memory lifetime can be improved by reducing the gate voltage stress. The experiment on real NAND flash memory chips showed on average 11% program speed improvement over the entire lifetime and about 10% lifetime improvement. Especially, there was up to 21% program speed improvement at the early lifetime. #### ACKNOWLEDGEMENT This research was partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2012-0002117). Fig. 5. Program performance improvement by adaptive $V_{verify}$ scheme. (a) Comparison between the adaptive and fixed schemes (b) The trend of threshold voltage shift Fig. 6. Program cell threshold voltage shift (after 10K PE cycling). ## REFERENCES - N. Mielke et al. Bit error rate in NAND flash memories. In Proc. of IEEE International Reliability Physics Symposium, IRPS'08, pages 9–19, 2008. - [2] Y. Pan et al. Exploiting memory device wear-out dynamics to improve NAND flash memory system performance. In Proc. of the 9th USENIX conference on File and stroage technologies, FAST'11, pages 18–18, 2011. - [3] K.-D. Suh et al. A 3.3 V 32 Mb NAND flash memory with incremental step pulse programming scheme. In Proc. of 42nd IEEE International Solid-State Circuits Conference, pages 128–129, 1995. - [4] K. Takeuchi et al. A multi-page cell architecture for high-speed programming multi-level NAND flash memories. In Proc. of 1997 Symposium on VLSI Circuits, pages 67–68, 1997.