Tag Archives: visualization

Edward Tufte

Here’s a fun interview with Edward Tufte, insult comic and author of The Visual Display of Quantitative Information. Here are a couple of his snappy retorts:

…highly produced visualizations look like marketing, movie trailers, and video games and so have little inherent credibility for already skeptical viewers, who have learned by their bruising experiences in the marketplace about the discrepancy between ads and reality (think phone companies)…

…overload, clutter, and confusion are not attributes of information, they are failures of design. So if something is cluttered, fix your design, don’t throw out information. If something is confusing, don’t blame your victim — the audience — instead, fix the design. And if the numbers are boring, get better numbers. Chartoons can’t add interest, which is a content property. Chartoons are disinformation design, designed to distract rather than inform. Thus they reduce the credibility of your presentation. To distract, hire a magician instead of a chartoonist, for magicians are honest liars…

Sensibly-designed tables usually outperform graphics for data sets under 100 numbers. The average numbers of numbers in a sports or weather or financial table is 120 numbers (which hundreds of million people read daily); the average number of numbers in a PowerPoint table is 12 (which no one can make sense of because the ability to make smart multiple comparisons is lost). Few commercial artists can count and many merely put lipstick on a tiny pig. They have done enormous harm to data reasoning, thankfully partially compensated for by data in sports and weather reports. The metaphor for most data reporting should be the tables on ESPN.com. Why can’t our corporate reports be as smart as the sports and weather reports, or have we suddenly gotten stupid just because we’ve come to work?

It’s a very interesting point, actually, that people are willing to look at very complex data on sports sites, really study it and think about it, and do that voluntarily, considering it fun rather than boring, hard work. It’s child-like in a way – I mean in a positive sense, that for children the world is fresh and new and learning is fun. What is the secret of not shutting down this ability in adults. I think it’s context.

R graph catalog

Here’s a nice catalog of graphs made with R, along with source code for each. Some of the images were broken or missing when I tried it, but hopefully they’ll get that fixed. (By they way, this is my personal experience with interactive “Shiny” apps so far – I love the idea and the look, but there always seems to be something wrong that needs to be fixed, and fixing it takes more time and requires more specialized training than just dealing with plain old code. At first, I thought it might be a productivity enhancer, but instead it’s a drag when your job is not to build cool-looking apps, but to produce useful data analysis results in a reasonable amount of time.)

visualization

Solomon Messing has a pretty good article on data visualization and communicating scientific information, focusing on the ideas of Tufte and Cleveland. I like the idea that there is a science of what our brains can most easily process, and not just a need to create visual infotainment because we have lost our ability to concentrate on anything else. I’m not quite ready to give up on stacked bar charts in all cases.

When most people think about visualization, they think first of Edward Tufte.  Tufte emphasizes integrity to the data, showing relationships between phenomena, and above all else aesthetic minimalism.  I appreciate his ruthless crusade against chart junk and pie charts (nice quote from Data without Borders). We share an affinity for multipanel plotting approaches, which he calls “small multiples,” (thanks to Rebecca Weiss for pointing this out) though I think people give Tufte too much credit for their invention—both juiceanalytics and infovis-wiki write that Cleveland introduced the concept/principle. However, both Cleveland and Tufte published books in 1983 discussing the use of multipanel displays; David Smith over at Revolutions writes that “the “small-multiples” principle of data visualization [was] pioneered by Cleveland and popularized in Tufte’s first book”; and the earliest reference to a work containing multipanel displays I could find was published *long* before Tufte’s 1983 work–Seder, Leonard (1950), “Diagnosis with Diagrams—Part I”, Industrial Quality Control (New York, New York: American Society for Quality Control) 7 (1): 11–19.

I’m less sure about Tufte’s advice to always show axes starting at zero, which can make comparison between two groups difficult, and to “show causality,” which can end up misleading your readers.  Of course, the visualizations on display in the glossy pages of Tufte’s books are beautiful–they belong  in a museum.  But while his books are full of general advice that we should all keep in mind when creating plots, he does not put forth a theory of what works and what doesn’t when trying to visualize data.

Cleveland (with Robert McGill) develops such a theory and subjects it to rigorous scientific testing. In my last post I linked to one of Cleveland’s studies showing that dots (or bars) aligned on the same scale are indeed the best visualization to convey a series of numerical estimates.  In this work, Cleveland examined how accurately our visual system can process visual elements or “perceptual units” representing underlying data.  These elements include markers aligned on the same scale (e.g., dot plots, scatterplots, ordinary bar charts), the length of lines that are not aligned on the same scale (e.g., stacked bar plots), area (pie charts and mosaic plots), angles (also pie charts), shading/color, volume, curvature, and direction.

I’m slowly getting on board. I’ve given up pie charts in most cases. I’m not ready to give up stacked bar charts in all cases – I think they serve a purpose. Microscopic multi-panel charts still make my head spin sometimes, although if they were interactive and I could click on one panel to blow it up, that would be cool. There is one thing I am sure he is right about though, which is that the first step to serious analysis and visualization is to leave Excel behind.