EEPROM wear levelling

Whether we are trying to bring the cost of the product down and keeping the component count low, or upgrade an already existing system.

This post should cover my take on wear levelling on some lower-end microcontrollers, with byte-addressable memmory.

Some interesting information can be found in Microchip's AN537

Basically, there are two failure modes:

During storage, the cell can loss charge, reverting to erased state ( usually all 0xFF)
After some amount of rewrites the cell can become stuck

For storing application critical data is checksums are always a must. The second failure mode can be easily detected by re-reading what has been written into the memmory.

Storing incremental data

For the first example, we'll only need to store an always incrementing counter. The counter is represented by single unsigned integer (uint32_t).

Let's say, we have 1K memmory (like on Atmega328P). For simplicity we'll represent this memory as an array, because eeprom memory data acess is not the point of this tutorial.

#include <iostream>
#include <cstdint>
#include <string>
using namespace std;

uint8_t eeprom [1024];

Data that we will store into the memmory will be represented as a struct.

struct __attribute__((__packed__)) data {
    uint32_t counter;
    uint8_t crc;
};

std::cout << "sizeof(data) = " << sizeof(data) << std::endl;

sizeof(data) = 5

Starting with some calculations. We can use up all memmory, so we have $ 2^{10} $ of available bytes, called $ total $.

One data entry takes up 5 bytes (4B data and 1B checksum), represented by $ data_size $.

The attribute __packed__ prevents the compiler from adding padding to the data, this does not have any impact on 8 bit microcontrollers, but can cause a lot of trouble on different systems, beacuse gcc (or any other compiler) will align the data to the platofrm architecture.

For example on x86_64:

struct upacked_data {
    uint32_t counter;
    uint8_t crc;
};
std::cout << "sizeof(upacked_data) = " << sizeof(upacked_data) << std::endl;

sizeof(upacked_data) = 8

When generating memory contents on different architectures watch out for the endianess.

There are two ways we can keep the information about bad cells.

meta table with information about bad cells
checksum verification

Meta table

When storing data for shorter periods of time and without the need of an checksum we can focus on overcomming the fail-on-write issue. When writing data we simply read it back, if it's not same we mark the cell as bad and use another one that's not already marked. Some errors could be detected by the fact, that we are storing only ever increasing data.

Because we are storing single flag for every cell, this meta table will take up: $ \begin{align} meta = \frac{total - meta}{8} \end{align} $

If we are not storing data per cell, but have some bigger structure, like struct data in this case, the cells will wear out about the same amount for each data index, we can make the meta table smaller only contain the indexes, thus saving space.

That gives us:

$ \begin{align} meta = \frac{total - meta}{8 * data_size} \end{align} $

That's about 25 bytes, or 2.5%. On per-cell basis it would be 114 bytes.

checksums

If the data is critical and the system must rely on it, it's best to have an some kind of error detection (and possibly correction) code. We can abuse the fact that we have to store the CRC and use it to mark our cells as invalid. On first start, or when we are programming the EEPROM in the factory, we set all data indexes to the value 0 with correctly calculated crc.

For reference, here is an simple function to calculate CRC8.

#define CRC_POLY 0x1D
#define CRC_INTI 0xFF

uint8_t crc8(uint8_t *data, size_t len) {
    uint8_t crc = CRC_INTI;
    size_t i, j;
    for (i = 0; i < len; ++i) {
        crc ^= data[i];
        for (j = 0; j < 8; ++j) {
            if ((crc & 0x80) != 0)
                crc = (uint8_t)((crc << 1) ^ CRC_POLY);
            else
                crc <<= 1;
        }
    }
    return crc;
}

uint8_t crc8(data & d) {
    return crc8((uint8_t*)&d, sizeof(data) - sizeof(d.crc));
}

To start with some memory contents, we create an example cell

data zero{1, 0};
zero.crc = crc8(zero);

and fill the whole memmory with it.

Be aware, we can only fit $ \lfloor \frac{total}{data_size} \rfloor $ entries into the table. That means we will only use $ \lfloor \frac{total}{data_size} \rfloor \cdot data_size $ bytes. That's 1020 bytes in this example.

We can fill the contents of the memory using memcpy.

for (size_t i = 0; i < sizeof(eeprom) - sizeof(data); i += sizeof(data))
    memcpy(eeprom + i, &zero, sizeof(data));

As said earlier, we are representing the eeprom contents as a solid block of memmory, mainly for readability. Beacuse of that we can cast the eeprom into an array of data (data *)eeprom.

Eeprom is slow for reading and writing, so when runinng or on real system, if you have the luxury of free 1024 bytes in RAM, the data can be stored in device and periodically synced with the eeprom when idling, If not, the data acess in the functions has to be changed to accomodate this.

We'll store the casted data into pointer data_ptr for simplicity. Because it's only pointer to sequence we can't use sizeof, so we'll store the size as well.

data * data_ptr = (data *)eeprom;
size_t data_size = sizeof(eeprom) / sizeof(data);

for (size_t i = 0; i < data_size; ++i)
    data_ptr[i] = zero;

Now when trying to find the latest entry from the memory, simply iterate over the whole memory, validate records and remember the one with highest counter.

data * max_data = nullptr;

for(size_t i = 0; i < data_size; ++i){
    if( data_ptr[i].crc == crc8((uint8_t*)&data_ptr[i], sizeof(data) - sizeof(zero.crc)) ){
        if ( !max_data || data_ptr[i].counter > max_data->counter )
            max_data = &data_ptr[i];
    }
}

cout << dec << "found max ( " << max_data->counter << " ) at i = " << max_data - data_ptr << endl;

found max ( 1 ) at i = 0

Writing data is little trickier, because we are storing data in an circular structure, we must first find the head of the sequence. We've already done that, when we were trying to find the max counter, so the current head is at max_data

To find out the index which we will be writing to, we have to calculate position of the max_data.

$ start = max_data - data_ptr $.

The index will then be calculed as

$ idx = start + i \mod{data_size}$

Make sure we are iterating from 1, not zero as that will overwrite our latest data.

The next data index that has valid crc will be overwriten with new data. This process will be semi-constant. At the beginning when all cells are fresh, it will write to first cell approached. After some time, when the cells start to break-down the writing process will take more and more time.

data new_entry{12, 0};
new_entry.crc = crc8(new_entry);

bool sucess = false;
for(size_t i = 1; i < data_size; ++i ){
    
    size_t idx = (i + max_data - data_ptr) % (data_size);
    
    if(data_ptr[idx].crc == crc8(data_ptr[idx])){
        
        data_ptr[idx] = new_entry;
        max_data = &data_ptr[idx];
        
        sucess = true;
        break;
    }
}

if (sucess)
    cout << "sucessfully written data to " << max_data - data_ptr << endl;

sucessfully written data to 1

When loop finished without sucess we know, that it's the end of the eeprom lifetime. We can continue to use it, maybe try to restore some data from crcs.

different approach

If we leave out the drawbacks of storing an calculating CRC for every data in memory and focus on the nature of the counter, we can see that the LSB will change much more frequently than MSB, because we are storing incremental data. When writing to the same cell, the last byte will change (almost) everytime, but the second to last only every 256th write. So even when "wear levelling" the wear won't be even on the whole memory. It will work, but it won't be much efficient.

Let's look at one data chunk. If we try to store counter untill it's dead, it will die after approximately $ 10^6 $ writes that's 0xf4240 writes. But that's only 2.5 bytes, the rest remains untouched.

The fix is simple, divide the memory into chunks for each byte and wear-level bytes individually. Going from MSB to LSB each cell will require 256 times more cells to wear-level evenly than the previous one. We can express that as a system of equation $$ \begin{align} c_1 = 256 \cdot c_0 \ c_2 = 256 \cdot c_1 \ c_3 = 256 \cdot c_2 \ \ c_0 + c_1 + c_2 + c_3 = 1024 \end{align} $$

solving this equations gives us $$ \begin{align} c_0 = 0.0000608\ c_1 = 0.0155640\ c_0 = 3.9843750\ c_0 = 1020.0000\ \end{align} $$

We'll have to adjust these size to be 1, 1, 4, 1018. That gives us about $ 1018 \cdot 256 \cdot 10^6 $ writes. if we want keep the checksum it would be only half of that, which is still great, that's $ 1.3E17 $ that won't fit into 32bit unsigned integer. That means that the counter will overflow before the eeprom stops working.

But beware, in the 2 lower partitions we are not storing only-ever increasing data anymore, that means, that we have to change the way we keep the track of the latest valid data.

That could be done by simply reseting all other data to zero.

When storing data make sure you are storing the correct byte, of the whole counter.

Conclusion

We've coverd the most basic ways to wear-levell a simple byte-adressable memmory.

Code reference for the last part will come in future (if not feel free to contact me)