Road to C Programmer #10 - Union vs Struct

Last Edited: 8/12/2024

The blog introduces the difference between union and structure in C.

C Union

Union

A union is a useful data structure in C that can contain different values of different types, like the following example:

union Data {
    int x;
    float y;
    char z[16]; 
};

If you recall struct, the way a union is defined looks identical to how a struct is defined. But how are they different? While a struct allocates space for all its variables, a union only allocates enough memory to store the largest variable.

struct Data1 {
    int x;
    float y;
    char z[16]; 
};
 
int main () {
    printf("size of union: %lu \n", sizeof(Data)); // => size of union: 16
    printf("size of struct: %lu \n", sizeof(Data1)); // => size of struct: 24
 
    Data data;
    data.x = 1;
    data.y = 3.14;
    strcpy(data.z, "my data");
    
    printf("x: %d, y: %f, z: %s", data.x, data.y, data.z);
    // => x: 1978725, y: 0, z: my data
    return 0;
}

While the union allocates only 16 bytes (the size of the char array), the struct allocates 24 bytes, allowing all the variables to be stored properly. Thus, when you assign values to a union, it rewrites the value stored in memory. As it uses less memory, a union is useful when you only need one of the types and/or when you want to save as much memory as possible.

Union & Struct

You can sometimes combine union and struct effectively. The following shows some examples of how they can be used together:

union Color { 
    struct { float r, g, b; } rgb; 
    struct { float c, m, y, k; } cmyk; 
    struct { float h, s, l; } hsl; 
};
 
enum BufferType { Char, Float, Double };
 
struct Buffer {
    BufferType type;
    union {
        char x[1024];
        float y[1024];
        double z[1024]; 
    } data;
};

Both the Color and Buffer examples demonstrate how to use union and struct together appropriately.

Structure Padding

When you define a struct and print out its size, you might notice that more memory is allocated than expected:

struct Data {
    char x; // takes 1 byte
    int y;  // takes 4 bytes
};
 
int main () {
    Data data;
    printf("size of struct: %lu bytes\n", sizeof(data));
    // => size of struct: 8 bytes
    return 0;
}

Although int and char together only take up 5 bytes, the size of the struct is 8 bytes. This is due to how the CPU accesses memory. Instead of accessing one byte at a time, the CPU typically accesses memory in 4-byte chunks (It varies). If x and y were allocated only 5 bytes, accessing y would require accessing to 2 chunks. By adding padding between x and y so that each occupies 4 bytes, CPU can access only a chunk for retrieving each of them. This is partly the reason why you might want to use union whenever you can to save memory.

Understanding padding is also helpful when optimizing how to build structs:

struct Data1 {
    char x;
    int y;
    char z;
};
 
struct Data2 {
    int y;
    char x;
    char z;
};
 
int main () {
    printf("size of Data1: %lu \n", sizeof(Data1));
    printf("size of Data2: %lu \n", sizeof(Data2));
    // => size of Data1: 12
    // => size of Data2: 8
    return 0;
}

The only difference between Data1 and Data2 is the order in which the variables are stored, but this affects the amount of memory allocated. In Data1, storing y in the middle requires padding between x and y, whereas in Data2, storing y at the beginning allows the space to be utilized more efficiently by both x and z.

Exercises

From this article, there will be an exercise section where you can test your understanding of the material introduced in the article. I highly recommend solving these questions by yourself after reading the main part of the article. You can click on each question to see its answer.

Resources