mirror of
https://github.com/Karaka-Management/Developer-Guide.git
synced 2026-02-04 23:18:40 +00:00
Update cpp.md
Signed-off-by: Dennis Eichhorn <spl1nes.com@googlemail.com>
This commit is contained in:
parent
983a6d4599
commit
9bc0c3db7f
|
|
@ -41,19 +41,15 @@ When writing code keep the following topics in mind:
|
||||||
Branched code
|
Branched code
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
for (int i = 0; i < N; i++)
|
if (a < 50) {
|
||||||
if (a[i] < 50) {
|
b += a;
|
||||||
s += a[i];
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Branchless code
|
Branchless code
|
||||||
|
|
||||||
```c++
|
```c++
|
||||||
for (int i = 0; i < N; i++)
|
b += (a < 50) * a;
|
||||||
s += (a[i] < 50) * a[i];
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Instruction table latency
|
### Instruction table latency
|
||||||
|
|
@ -72,6 +68,18 @@ for (int i = 0; i < N; i++)
|
||||||
|
|
||||||
https://www.agner.org/optimize/instruction_tables.pdf
|
https://www.agner.org/optimize/instruction_tables.pdf
|
||||||
|
|
||||||
|
### Cache sizes
|
||||||
|
|
||||||
|
| CPU Category | Stat |
|
||||||
|
|--------------|---------|
|
||||||
|
| L1 Cache | 32 - 48 KB |
|
||||||
|
| L2 Cache | 2 - 4 MB |
|
||||||
|
| L3 Cache | 8 - 36 MB |
|
||||||
|
| L4 Cache | 0 - 128 MB |
|
||||||
|
| Clock speed | 3.5 - 6.2 Ghz |
|
||||||
|
| Cache Line | 64 B |
|
||||||
|
| Page Size | 4 KB |
|
||||||
|
|
||||||
### Cache line sharing between CPU cores
|
### Cache line sharing between CPU cores
|
||||||
|
|
||||||
When working with multi-threading you may choose to use atomic variables and atomic operations to reduce the locking in your application. You may think that a variable value `a[0]` used by thread 1 on core 1 and a variable value `a[1]` used by thread 2 on core 2 will have no performance impact. However, this is wrong. Core 1 and core 2 both have different L1 and L2 caches BUT the CPU doesn't just load individual variables, it loads entire cache lines (e.g. 64 bytes). This means that if you define `int a[2]`, it has a high chance of being on the same cache line and therfore thread 1 and thread 2 both have to wait on each other when doing atomic writes.
|
When working with multi-threading you may choose to use atomic variables and atomic operations to reduce the locking in your application. You may think that a variable value `a[0]` used by thread 1 on core 1 and a variable value `a[1]` used by thread 2 on core 2 will have no performance impact. However, this is wrong. Core 1 and core 2 both have different L1 and L2 caches BUT the CPU doesn't just load individual variables, it loads entire cache lines (e.g. 64 bytes). This means that if you define `int a[2]`, it has a high chance of being on the same cache line and therfore thread 1 and thread 2 both have to wait on each other when doing atomic writes.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue
Block a user