How Memory Access Patterns Can Make Or Break Performace (Part 2)
Part 2: Effects of Cache On Multi-Threading In the last part , we discussed how we can achieve better performance by accessing contiguous chunks of data so that we can take advantage of low latency caches, reducing waiting time of the processor. In this part, we will see why you should be careful about accessing contiguous memory data when your program is running in parallel on multiple cores. Example: A Simple Parallel Counter Let's look at an example of a scalability issue arising due to unoptimised cache operations. You can find this example in detail on Herb Sutter's website . So what Mr. Herb wanted to do was to count the number of odd elements in a matrix. This is a trivial parallel problem where you can use multiple threads to count odd elements at different parts of the matrix and then just add the number of odd elements found in each thread. So this is what he did: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...