Darryl Gove's Blog: June 2010

Professor David Patterson wrote an interesting article in IEEE Spectrum, 'The trouble with multicore'. The tag line is "Chipmakers are busy designing microprocessors that most programmers can't handle". The thrust of the article is that multicore processors are a hardware development that software is poorly equipped to utilise.

There are two main arguments made in the article. The first is that programming languages are very poor at describing parallelism. There has been a long list of languages that were either designed to tackle parallelism or have had parallelism imposed upon them. To be fair parallel programming is littered with the ill-conceived corpses of languages that were meant to solve the problem. So his view is correct, but perhaps this is not relevant.

The second point he makes is that not all tasks break down to independent work. His example is that of ten reporters writing the same story, and not being able to write the story ten times faster because each section of text has to build on the previous sections. Again, this is true. There are some tasks that have implicit (or explicit) dependencies, but perhaps this is not relevant.

The example in his paper that best illustrates how multicore is the solution, and not the problem, is that of cloud computing. As he says "Expert programmers can take advantage of the task-level parallelism inherent in cloud computing.". Are you an expert programmer when you type a search term into Google? A lot of computation goes into finding the results for you, but they appear nearly instantly. It could be argued that Google put a considerable amount of effort into designing a system that produced results so quickly. Of course they did. However, they did it once, and its used for millions of search queries every day.

Observation 1: Many problems just need parallelising once. Or conversely, not every developer needs to worry about the parallelism – in the same way as not every developer on a project needs to worry about the GUI.

But this only addresses part of the argument. It is all very well using an anecdotal example to demonstrate that it is possible to utilise multiple cores, but that does not disprove Professor Patterson's argument.

Lets return to the example of the reporters. The way the reporters are working is perhaps not the best use of their resources. Much of the work of reporting is fact checking, talking to people, and gathering data. The writing part of this is only the final step in a long pipeline. Perhaps a better way of utilising the ten reporters would be during the data gathering stages, multiple people could be interviewed simultaneously, multiple sources consulted at the same time. On the other hand, a newspaper would rarely allocate more than a single reporter to a single story. More progress would be made if each reporter was working on a different story. So perhaps the critical observation is that dependencies within a task are an indication that parallelism needs to be discovered outside that task.

Observation 2: It is rare that there are no other ways of productively utilising compute resources. Meaning that given a number of cores, it is almost always possible to find work to keep them busy. For example, rendering a movie could have cores working on separate frames, or separate segments of the same frame. Sequencing genes could have multiple genes being examined simultaneously. Simulation models of different scenarios could be completed in parallel.

But, it can be argued that there are times when you need to do a single task, and you care how long that task takes to complete. So, lets consider exactly what problems we encounter during our day where we would benefit from a faster processor.

"I waited for my PC to boot.". Well booting a PC is pretty much a serial process, however, the boot time is largely dominated by disk access time rather than processor speed.
"I waited for my e-mail to download". Any downloading activity, be it e-mail or webpages is going to be dominated by network latency or bandwidth issues. There is undoubtedly some processor activity in the mix, but it is unlikely that a fast processor would make a noticeable difference to performance.
"I was watching a video when my virus scanner kicked in and caused the movie to stutter." Assuming it wasn't a disc activity, this is a great example of where having multiple cores will help rather than hinder. Two cores would allow the video to continue playing while the virus scanner did its work. This was, of course, the frequently given example of why multicore processors were a good thing – as if virus scanner were a desirable use of processor time!
"I was compiling an application and it took all afternoon." Some stages of compilation, like linking or crossfile optimisation, are inherently serial. But, unless the entire source code was placed into a single file, most projects have multiple source files, so these could be compiled in parallel. Again, the performance can be dominated by disk or network performance, so it is not entirely a processor performance issue.

These are a few situations where you might possibly feel frustration at the length of time a task takes. You may have plenty more. The point is that it is rare that there is no parallelism available, and no opportunity to make parallel progress on some other task.

Observation 3: There are very few day to day tasks that are actually limited by processor performance. Most tasks have substantial bottlenecks in other parts of the system (disk, network, speed of devices). If anything having multiple cores enables a system to remain useful while other compute tasks are completed.

All this discussion has not truly refuted Professor Patterson's observation that there exist problems which are inherently serial, or fiendishly difficult to parallelise. But that's ok. Most commonly encountered computational activities are either easy to parallelise, or there are ways of extracting parallelism at other levels.

But what of software? There is great allure to using threads on a multicore processor to deliver many times the performance of a single core processor. And this is the crux of the matter. Advances in computer languages haven't 'solved' this problem for us. It can still be hard, for some problems, to write parallel programs that are both functionally correct and scale well.

However, we don't all need to solve the hard problems. There are plenty of opportunities for exploiting parallelism in a large number of common problems, and in other situations there are opportunities for task level parallelism. This combination should cover 90+% of the problem space.

Perhaps there are 10% of problems that don't map well to multicore processors, but why focus on those when the other 90% do?

In some instances you can get better performance, or reproducibility, by restricting the processors that a thread runs on. Linux has pthread_set_affinity_np (the 'np' tag means non-portable). On Solaris you have a number of nearly equivalent options:

Processor sets where you create a set of processors and allow only particular processes to run on this set.
Processor_bind, where you bind a particular process or thread to a particular virtual CPU. The thread cannot migrate off this CPU, but other threads can run on it. This means that you need to coordinate between different processes to ensure that they are not all allocated to the same CPU.
Locality groups. On a NUMA system, a locality group is a set of CPUs which share the same memory local memory. Processes that remain executing on processors within their locality group will continue to get low memory access times, if they get scheduled to processors outside the group their memory access times may increase.

Darryl Gove's Blog

Wednesday, June 30, 2010

The solution is multicore

Wednesday, June 9, 2010

Runtime analysis in the Solaris Studio IDE

Tuesday, June 8, 2010

Setting thread/process affinity

Call trees in the Performance Analyzer

Monday, June 7, 2010

Checking for memory access errors with discover

Sunday, June 6, 2010

Solaris Studio Express

Multicore Application Programming

Multicore Application Programming resources

Solaris Application Programming

Solaris Application Programming resources

The Developer's Edge

The Developer's Edge resources

OpenSPARC Internals

OpenSPARC Internals resources

Archive

Other blogs