diff --git a/standards/cpp.md b/standards/cpp.md index 6ed8c62..9bae4a2 100755 --- a/standards/cpp.md +++ b/standards/cpp.md @@ -80,6 +80,54 @@ https://www.agner.org/optimize/instruction_tables.pdf | Cache Line | 64 B | | Page Size | 4 KB | +### Cache locality + +Column wise traversal + +```c++ +void process_columns(int matrix[1000][1000]) { + for (int col = 0; col < 1000; ++col) { + for (int row = 0; row < 1000; ++row) { + matrix[row][col] *= 2; + } + } +} +``` + +Row wise traversal + +```c++ +void process_rows(int matrix[1000][1000]) { + for (int row = 0; row < 1000; ++row) { + for (int col = 0; col < 1000; ++col) { + matrix[row][col] *= 2; + } + } +} +``` + +### Data Padding + +Wasting 6 bytes + +```c++ +struct Data { + char a; + int b; + char c; +}; +``` + +Wasting 2 bytes + +```c++ +struct Data { + char a; + char c; + int b; +}; +``` + ### Cache line sharing between CPU cores When working with multi-threading you may choose to use atomic variables and atomic operations to reduce the locking in your application. You may think that a variable value `a[0]` used by thread 1 on core 1 and a variable value `a[1]` used by thread 2 on core 2 will have no performance impact. However, this is wrong. Core 1 and core 2 both have different L1 and L2 caches BUT the CPU doesn't just load individual variables, it loads entire cache lines (e.g. 64 bytes). This means that if you define `int a[2]`, it has a high chance of being on the same cache line and therfore thread 1 and thread 2 both have to wait on each other when doing atomic writes.