I'm pretty excited, we've now got documentation up for the SPARC processors. Take a look at the SPARC T4 supplement, the SPARC T4 performance instrumentation supplement, the SPARC M5 supplement, or the familiar SPARC 2011 Architecture.
Sunday, September 28, 2014
Tuesday, September 23, 2014
Comparing constant duration profiles
I was putting together my slides for Open World, and in one of them I'm showing profile data from a server-style workload. ie one that keeps running until stopped. In this case the profile can be an arbitrary duration, and it's the work done in that time which is the important metric, not the total amount of time taken.
Profiling for a constant duration is a slightly unusual situation. We normally profile a workload that takes N seconds, do some tuning, and it now takes (N-S) seconds, and we can say that we improved performance by S/N percent. This is represented by the left pair of boxes in the following diagram:
In the diagram you can see that the routine B got optimised and therefore the entire runtime, for completing the same amount of work, reduced by an amount corresponding to the performance improvement for B.
Let's run through the same scenario, but instead of profiling for a constant amount of work, we profile for a constant duration. In the diagram this is represented by the outermost pair of boxes.
Both profiles run for the same total amount of time, but the right hand profile has less time spent in routine B() than the left profile, because the time in B() has reduced more time is spent in A(). This is natural, I've made some part of the code more efficient, I'm observing for the same amount of time, so I must spend more time in the part of the code that I've not optimised.
So what's the performance gain? In this case we're more likely to look at the gain in throughput. It's a safe assumption that the amount of time in A() corresponds to the amount of work done - ie that if we did T units of work, then the average cost per unit work A()/T is the same across the pair of experiments. So if we did T units of work in the first experiment, then in the second experiment we'd do T * A'()/A(). ie the throughput increases by S = A'()/A() where S is the scaling factor. What is interesting about this is that A() represents any measure of time spent in code which was not optimised. So A() could be a single routine or it could be all the routines that are untouched by the optimisation.
Friday, September 5, 2014
Fun with signal handlers
I recently had a couple of projects where I needed to write some signal handling code. I figured it would be helpful to write up a short article on my experiences.
The article contains two examples. The first is using a timer to write a simple profiler for an application - so you can find out what code is currently being executed. The second is potentially more esoteric - handling illegal instructions. This is probably worth explaining a bit.
When a SPARC processor hits an instruction that it does not understand, it traps. You typically see this if an application has gone off into the weeds and started executing the data segment or something. However, you can use this feature for doing something whenever the processor encounters an illegal instruction. If it's a valid instruction that isn't available on the processor, you could write emulation code. Or you could use it as a kind of break point that you insert into the code. Or you could use it to make up your own instruction set. That bit's left as an exercise for you. The article provides the template of how to do it.
Thursday, September 4, 2014
C++11 Array and Tuple Containers
This article came out a week or so back. It's a quick overview, from Steve Clamage and myself, of the C++11 tuple and array containers.
When you take a look at the page, I want you to take a look at the "about the authors" section on the right. I've been chatting to various people and we came up with this as a way to make the page more interesting, and also to make the "see also" suggestions more obvious. Let me know if you have any ideas for further improvements.
Wednesday, September 3, 2014
My schedule for JavaOne and Oracle Open World
I'm very excited to have got my schedule for Open World and JavaOne:
CON8108: Engineering Insights: Best Practices for Optimizing Oracle Software for Oracle Hardware
Venue / Room: Intercontinental - Grand Ballroom C
Date and Time: 10/1/14, 16:45 - 17:30
CON2654: Java Performance: Hardware, Structures, and Algorithms
Venue / Room: Hilton - Imperial Ballroom A
Date and Time: 9/29/14, 17:30 - 18:30
The first talk will be about some of the techniques I use when performance tuning software. We get very involved in looking at how Oracle software works on Oracle hardware. The things we do work for any software, but we have the advantage of good working relationships with the critical teams.
The second talk is with Charlie Hunt, it's a follow on from the talk we gave at JavaOne last year. We got Rock Star awards for that, so the pressure's on a bit for this sequel. Fortunately there's still plenty to talk about when you look at how Java programs interact with the hardware, and how careful choices of data structures and algorithms can have a significant impact on delivered performance.
Anyway, I hope to see a bunch of people there, if you're reading this, please come and introduce yourself. If you don't make it I'm looking forward to putting links to the presentations up.