Error Detection and Correction

Last modified: 2025-02-11 Category: Networks ..

Error Detection and Correction

Some bits will inevitably be recieved in error. Noise may flip the bits recieved over the network. We need to be able to...

detect errors
retransmit
correct errors

Approach: Add Redundancy

In error detection codes, add check bits to the message bits.
In error correction codes, add more check bits to let some errors be corrected.
Key issue is to design codes to detect as many errors as possible without having too much redundancy or computation required.

Generally, a codeword is a $D$-bit message with $R$ check bits added to it. The sender computes the check bits and appends them to the message. The reciever then verifies the check bits by recomputing them and comparing them to the recieved check bits.

Example Code:

$$ 1 \to 11\ 0 \to 00\ x \to xx\ $$

For example: $$ 101110 \to 101110101110 $$

Can detect errors with this code up to 1 bit. However, no guarantee if more bits are flipped. Also cannot correct errors. This sucks.

Intuition

Let $S$ be the set of all possible $n$-bit sequences, and let $C$ be the set of all $n$-bit code words with $D$ data bits and $R$ check bits. We have $C \subset S$, and we want to choose $C$ such that the probability of a random $D$-bit sequence being in $C$ is low.

Consider a random $n$-bit sequence $x \in S$...

$$ \mathbb{P}(x \in C) = \frac{|C|}{|S|} = \frac{2^R}{2^n} = 2^{-D} $$

Error correction/detection is hard because even the check bits can be corrupted. Given $d$ errors, we can detect and correct $d$ errors if the distance between any two code words is $2d + 1$.

Hamming Distance

Distance: The number of bit flips needed to change from one valid code word to another. The distance of a code is the minimum distance between any two code words. Letting $C$ be the set of all code words, the distance is:

$$ \min_{x, y \in C, x \neq y} d(x, y) $$

For a coding of distance $d + 1$, we can detect $d$ errors
For a coding of distance $2d + 1$, we can correct $d$ errors by mapping to the closest code word.

Internet Checksum

Sum up chunks of data and append the sum to the end of the data. If the sum is 0 on the recieving end, then the data is valid.

This code has a distance of 2, so it can detect 1 bit errors, and can correct 0 bit errors (since it can't correct any errors)

Internet Checksum Algorithm

sender

Split up data into 4 byte chunks.
Sum up all the 4 byte chunks, wrapping around for carry beyond 4 bytes.
netgate append the checksum to the end of the data.

reciever

Split up data into 4 byte chunks.
Sum up all the 4 byte chunks, wrapping around for carry beyond 4 bytes.
netgate append the checksum to the end of the data.
If the sum is 0, then the data is valid.

def internet_checksum(data):
    checksum = 0
    for i in range(0, len(data), 4):
        chunk = data[i:i+4]
        checksum += int.from_bytes(chunk, 'big')
    return checksum.to_bytes(4, 'big')

Cyclical Redundancy Check (CRC)

Given a generator polynomial $C$ and a message of $n$ bits, generate $k$ bits such that the $n + k$ bit message is divisible by $C$. Works with binary values that operate over the field $\mathbb{Z}_2$ (mod 2 arithmetic).

CRC Algorithm

sender

create binary representation of polynomial as divisor
append 0s to the end of the data to be sent, equal to the degree of the polynomial
divide the data by the polynomial, using XOR
change redundant bits to remainder

reciever

create binary representation of polynomial as divisor
divide the data by the polynomial, using XOR
if the remainder is 0, then the data is valid

Notes:

x + 1 as a generating polynomial is just a parity bit!

Hamming Code

A code with distance 3. Can detect 2 bit errors and correct 1 bit errors.

Uses $n = 2^k - k - 1$. Put check bits in positions that are powers of 2, and fill in the rest with data bits. Check bits are calculated by checking the bits in the ith position for all data bits whose ith bit is 1.

def hamming_encode(data):
    n = len(data)
    k, acc = 0, 1
    while acc < n + k + 1:
        k += 1
        acc *= 2

    code = [0] * (n + k)
    j = 0
    for i in range(1, n + k + 1):
        if i & (i - 1) == 0:
            code[i - 1] = 0
        else:
            code[i - 1] = data[j]
            j += 1
    for i in range(k):
        j = 2 ** i
        for l in range(j, n + k + 1):
            if l & j:
                code[j - 1] ^= code[l - 1]
    return code

Algorithm

Send

Put all parity bits in positions that are powers of 2 (1, 2, 4, 8, 16, etc.)
Fill in all other positions with data bits.
calculate parity bits for each of the ith check bits. (check the bits in ith position for all data bits whose ith bit is 1)

Recieve

calculate parity bits for each of the ith check bits. (check the bits in ith position for all data bits whose ith bit is 1 of its index is a power of 2)
If the parity bits are all 0, then the data is valid (remove check bits from data bits)
If the parity bits are not all 0, then the data is invalid. The syndrome (concat and reverse the parity bits) is the index of the bit that is wrong. Flip that bit.

Detection vs. Correction

Error detection is easier than error correction. There is less overhead and computation required.
Which is better depends on the pattern of errors. In general, error correction should be used when errors are detected and retransmission is expensive. Error detection should be used when errors are rare/unrecoverable, or when retransmission is cheap. Also used in application layer for physical storage (Reed-Solomon codes for CDs, DVDs, etc).

Error correction is heavily used in the physical layer. Low Density Parity Check (LDPC) codes are used in 802.11, DVB, WiMAX etc, and convolutional codes are used a lot in practice. On the other hand, detection combined with retranmission is used in the data link layer and above for residual errors.