Update cpp.md

Signed-off-by: Dennis Eichhorn <spl1nes.com@googlemail.com>
This commit is contained in:
Dennis Eichhorn 2024-08-05 19:38:08 +02:00 committed by GitHub
parent 0910172c26
commit 983a6d4599
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -48,7 +48,6 @@ for (int i = 0; i < N; i++)
}
```
Branchless code
```c++
@ -57,6 +56,22 @@ for (int i = 0; i < N; i++)
}
```
### Instruction table latency
| Instruction | Latency | RThroughput |
|-------------|---------|:------------|
| `jmp` | - | 2 |
| `mov r, r` | - | 1/4 |
| `mov r, m` | 4 | 1/2 |
| `mov m, r` | 3 | 1 |
| `add` | 1 | 1/3 |
| `cmp` | 1 | 1/4 |
| `popcnt` | 1 | 1/4 |
| `mul` | 3 | 1 |
| `div` | 13-28 | 13-28 |
https://www.agner.org/optimize/instruction_tables.pdf
### Cache line sharing between CPU cores
When working with multi-threading you may choose to use atomic variables and atomic operations to reduce the locking in your application. You may think that a variable value `a[0]` used by thread 1 on core 1 and a variable value `a[1]` used by thread 2 on core 2 will have no performance impact. However, this is wrong. Core 1 and core 2 both have different L1 and L2 caches BUT the CPU doesn't just load individual variables, it loads entire cache lines (e.g. 64 bytes). This means that if you define `int a[2]`, it has a high chance of being on the same cache line and therfore thread 1 and thread 2 both have to wait on each other when doing atomic writes.