Tuesday, August 9, 2011
by Mark Wojciechowicz
This past Tech Tuesday, I presented techniques for data visualization as espoused by Stephen Few, the author of Now You See It
. We first looked at some common best practices for visualizing data in any tool and then took some of those techniques to SSRS. For those who have not worked much in SSRS before, I demonstrated how easy it is to get started. In this post, I will cover some of the key considerations in displaying data. At the end of the article, you can find links to the source files. The data that was used for the demonstration was taken from the www.beerinstitute.org
(when you are not presenting data to sales people you have to have some catchy subject matter!)
First things first, by data visualization we mean to take that raw set of numbers that’s presented in a giant data grid and present it in way to draw at the meaning and trends in the data. The human mind can understand images much more quickly than it can read numbers and assess their relative weighting to each other. Those images which are most easily understood by the mind are called pre-attentive attributes. Some of these attributes are much easier to absorb than others and make it so that the audience of the report can comprehend the data with little explanation.
Not all charts are created equal
Some charts take much more work to understand than others. So let’s compare a pie chart, bubble chart and column graph.
A pie chart presents triangular shaped areas that are hard to accurately compare the slices relative value. In the image above, how much more is D3 compared to D8, and better yet, which is D3 and which is D8? The default settings for a pie chart often include a legend with an array of colors that are time consuming to match up to their slices.
Bubble charts look cool! The sales guys love them when they want to compare 3 different data points at once. How much greater is D5 compared to D6? If we went around the room, we would probably hear numbers like 10 times as much, while in reality it is only 6 times larger. Circle areas are even more difficult to compare than pie slices.
When we look at the old standby, the column chart, we more easily see which lines are the largest and the relative difference between them. In most charting tools, we still have to battle some obtrusive defaults as we’ll see in the image below.
In the above graph, there are some common charting problems:
- The scale starts at 10, rather than 0. This creates an optical illusion having us think that D1 is a 3rd greater than D2. This is the same dataset throughout all of these charts, however. Looking back to our previous bar chart, you will see that these two data points are far closer to each other.
- The order of the dataset is alphabetical rather than by value. This makes it more difficult to see exactly how the data points compare to each other. You might even be prone to sorting them in your head.
- Last, the multi-color bars merely add distraction. A single series should all be the same color unless you want to highlight a particular data point.
If it is necessary to zoom in on the data, we can use a different image to display them which takes away the optical illusion. Below, I have replaced the part chart with a line chart. I made the line invisible and added horizontal markers. Now, you can see the distance between the columns, but you are given to using the scale and their 2-dimensional distance to compare them.
Now who doesn’t love 3-D charts?
Well, me. And probably just me at that — but let me explain the challenges. 3-D images are really just 2-D images with shadow and line effects. Comparing the exact difference in bars is very difficult, even with the help of a gridline. Not only this, data points can be hidden behind others making it very hard to get the whole picture. Even with a cool charting tool that lets you spin the image around, there is almost always a hidden data point and, thus, a challenge to your short term memory. And I sure don’t need challenges to my memory. A solution to the 3D graph is below when we discuss Trellis Charts
So what’s good to include in a chart?
First, try to leave out any additional markup that is distracting to what you are trying to feature – the data. Some of these things are:
- Chart borders
- Background colors
- Labels (labels can be delivered through a mouse over)
- Multiple colors for the same series
- 3D presentations
Some good practices in charting are:
- Ordering a series by value
- Using 0 scale or 2-D positioning with markers
- Using lines to connect a time series
Now, that we have these principles in place, we can see how this plays out in Reporting Services. SSRS 2008 R2 offers a few new tricks which can help make the process of building a visual report much easier. Sparklines can be dragged onto a report table over a data point and can be quickly configured to show the trend for a category in a tiny graph that wastes no space and gets straight to the point.
We can easily compare the annual trend of beer sales across beer sources in the table above. However, sparklines do not capture the relative difference between each data point. To do this, we must convert the sparkline to a regular chart (right click the sparkline and select convert to chart), clean up all its decoration and scale it according to the highest month for all beer sales amongst all varieties.
Now, we can see that domestic draft is small potatoes compared to domestic packaged beer. When we have many categories to compare across each other, we can fit these charts into a tablix. This creates a trellis chart. Trellis charting paints the same chart across rows and columns to compare relative differences to each other.
Please note that, like a regular chart, the categories should be ordered by greatest amount to smallest in order to aid comparison. While we covered some other cool tricks in reporting services like indicators, one last point that I would like to highlight is the use of bullet graphs. Bullet graphs offer a very compact way to indicate status with relative weighting. A bullet graph is nothing more than a bar chart (or stacked bar chart) with a target line (a marker). See below.
The bullet graph shows an orange bar seeking to cross a target line. In order to make this affect realistic, I used the Datamining excel add-in to forecast sales of beer. This forecast number became the marker point and the actual sales are the bar. The eye is immediately drawn to those lines that make the cross. In addition, you can see the relative difference in magnitude. We would probably care much more about forecast accuracy for packaged beer than the other smaller categories.
In the last column, I illustrate a poor use of indicators. Indicators can be a great way to subtly highlight a data point that is in trouble. But it tends to dumb down the data when it is used to emphatically. You also might draw a completely different conclusion looking at a bullet graph versus an indicator. While import packaged goods had a very good 2010, it pales in comparison to the relative importance of domestic packaged beer. And I’m not just saying that because I like micro-brewed beer.
I hope these charting tips are helpful. To learn more, please check out Stephen Few’s work. Links to the BeerInstitute database and SSRS project are below.
Few, S. (2004, September 4). www.perceptualedge.com. Retrieved July 25, 2011, from www.perceptualedge.com/articles/ie/visual_perception.pdf
Few, S. (2009). Now You See It. Oakland, CA: Analytics Press.
Ware, C. (2004). Information Visualization: Perception for Design, Second Edition. San Francisco: Kaufman Publishers.