Tag Archives: r

R graph catalog

Here’s a nice catalog of graphs made with R, along with source code for each. Some of the images were broken or missing when I tried it, but hopefully they’ll get that fixed. (By they way, this is my personal experience with interactive “Shiny” apps so far – I love the idea and the look, but there always seems to be something wrong that needs to be fixed, and fixing it takes more time and requires more specialized training than just dealing with plain old code. At first, I thought it might be a productivity enhancer, but instead it’s a drag when your job is not to build cool-looking apps, but to produce useful data analysis results in a reasonable amount of time.)

open source street noise model

Here’s an open-source code for modeling street noise propagation. It’s written in R and open source database and GIS tools.

This paper describes the development of a model for assessing TRAffic Noise EXposure (TRANEX) in an open-source geographic information system. Instead of using proprietary software we developed our own model for two main reasons: 1) so that the treatment of source geometry, traffic information (flows/speeds/spatially varying diurnal traffic profiles) and receptors matched as closely as possible to that of the air pollution modelling being undertaken in the TRAFFIC project, and 2) to optimize model performance for practical reasons of needing to implement a noise model with detailed source geometry, over a large geographical area, to produce noise estimates at up to several million address locations, with limited computing resources. To evaluate TRANEX, noise estimates were compared with noise measurements made in the British cities of Leicester and Norwich. High correlation was seen between modelled and measured LAeq,1hr (Norwich: r = 0.85, p = .000; Leicester: r = 0.95, p = .000) with average model errors of 3.1 dB. TRANEX was used to estimate noise exposures (LAeq,1hr, LAeq,16hr, Lnight) for the resident population of London (2003–2010). Results suggest that 1.03 million (12%) people are exposed to daytime road traffic noise levels ≥ 65 dB(A) and 1.63 million (19%) people are exposed to night-time road traffic noise levels ≥ 55 dB(A). Differences in noise levels between 2010 and 2003 were on average relatively small: 0.25 dB (standard deviation: 0.89) and 0.26 dB (standard deviation: 0.87) for LAeq,16hr and Lnight.

 

visualization

Solomon Messing has a pretty good article on data visualization and communicating scientific information, focusing on the ideas of Tufte and Cleveland. I like the idea that there is a science of what our brains can most easily process, and not just a need to create visual infotainment because we have lost our ability to concentrate on anything else. I’m not quite ready to give up on stacked bar charts in all cases.

When most people think about visualization, they think first of Edward Tufte.  Tufte emphasizes integrity to the data, showing relationships between phenomena, and above all else aesthetic minimalism.  I appreciate his ruthless crusade against chart junk and pie charts (nice quote from Data without Borders). We share an affinity for multipanel plotting approaches, which he calls “small multiples,” (thanks to Rebecca Weiss for pointing this out) though I think people give Tufte too much credit for their invention—both juiceanalytics and infovis-wiki write that Cleveland introduced the concept/principle. However, both Cleveland and Tufte published books in 1983 discussing the use of multipanel displays; David Smith over at Revolutions writes that “the “small-multiples” principle of data visualization [was] pioneered by Cleveland and popularized in Tufte’s first book”; and the earliest reference to a work containing multipanel displays I could find was published *long* before Tufte’s 1983 work–Seder, Leonard (1950), “Diagnosis with Diagrams—Part I”, Industrial Quality Control (New York, New York: American Society for Quality Control) 7 (1): 11–19.

I’m less sure about Tufte’s advice to always show axes starting at zero, which can make comparison between two groups difficult, and to “show causality,” which can end up misleading your readers.  Of course, the visualizations on display in the glossy pages of Tufte’s books are beautiful–they belong  in a museum.  But while his books are full of general advice that we should all keep in mind when creating plots, he does not put forth a theory of what works and what doesn’t when trying to visualize data.

Cleveland (with Robert McGill) develops such a theory and subjects it to rigorous scientific testing. In my last post I linked to one of Cleveland’s studies showing that dots (or bars) aligned on the same scale are indeed the best visualization to convey a series of numerical estimates.  In this work, Cleveland examined how accurately our visual system can process visual elements or “perceptual units” representing underlying data.  These elements include markers aligned on the same scale (e.g., dot plots, scatterplots, ordinary bar charts), the length of lines that are not aligned on the same scale (e.g., stacked bar plots), area (pie charts and mosaic plots), angles (also pie charts), shading/color, volume, curvature, and direction.

I’m slowly getting on board. I’ve given up pie charts in most cases. I’m not ready to give up stacked bar charts in all cases – I think they serve a purpose. Microscopic multi-panel charts still make my head spin sometimes, although if they were interactive and I could click on one panel to blow it up, that would be cool. There is one thing I am sure he is right about though, which is that the first step to serious analysis and visualization is to leave Excel behind.