Quantifying The Cost of Context Switch
Chuanpeng Li et al., 2007
Beginner
5.0
Experimental quantification of the indirect cost of context switch using a synthetic workload, measuring the impact of program data size and access stride.
Download PDF
Analysis of false cache line sharing effects on multicore CPUs
Suntorn Sae-eung, 2010
Beginner
5.0
A study on the effects of false cache line sharing in multicore CPUs and its impact on system performance.
Download PDF
Non-scalable locks are dangerous
S Boyd-Wickizer et al., 2012
Intermediate
5.0
Using Linux on a 48-core machine, this paper shows that non-scalable locks can cause dramatic collapse in the performance of real workloads, even for very short critical sections.
Download PDF
Multicore Locks: The Case Is Not Closed Yet
Hugo Guiroux et al., 2016
Intermediate
5.0
NUMA multicore machines are pervasive and many multithreaded applications are suffering from lock contention. To mitigate this issue, application and library developers can choose from the plethora of optimized mutex lock algorithms that have been designed over the past 25 years. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications.
Download PDF
User-level Scheduling on NUMA Multicore Systems under Linux
Blagodurov et al., 2011
Advanced
4.0
A study on user-level scheduling techniques for NUMA multicore systems, examining performance implications and optimization strategies.
Download PDF