Tag Archives: visualization

March 2020 in Review

To state the obvious, March 2020 was all about the coronavirus. At the beginning of the month, we here in the U.S. watched with horror as it spread through Europe. We were hearing about a few cases in Seattle and California, and stories about people flying back from Italy and entering the greater New York area and other U.S. cities without medical screening. It was horrible, but still something happening mostly to other people far away on TV. In the middle of the month, schools and offices started to close. By the end of the month, it was a full blown crisis overwhelming hospitals in New York and New Jersey and starting to ramp up in other U.S. cities. It’s a little hard to follow my usual format this month but I’ll try. Most frightening and/or depressing story:
  • Hmm…could it be…THE CORONAVIRUS??? The way the CDC dropped the ball on testing and tracking, after preparing for this for years, might be the single most maddening thing of all. There are big mistakes, there are enormously unfathomable mistakes, and then there are mistakes that kill hundreds of thousands of people (at least) and cost tens of trillions of dollars. I got over-excited about Coronavirus dashboards and simulations towards the beginning of month, and kind of tired of looking at them by the end of the month.
Most hopeful story:
  • Some diabetics are hacking their own insulin pumps. Okay, I don’t know if this is a good thing. But if medical device companies are not meeting their patient/customers’ needs, and some of those customers are savvy enough to write software that meets their needs, maybe the medical device companies could learn something.
Most interesting story, that was not particularly frightening or hopeful, or perhaps was a mixture of both:
  • I studied up a little on the emergency powers available to local, state, and the U.S. federal government in a health crisis. Local jurisdictions are generally subordinate to the state, and that is more or less the way it has played out in Pennsylvania. For the most part, the state governor made the policy decisions and Philadelphia added a few details and implemented them. The article I read said that states could choose to put their personnel under CDC direction, but that hasn’t happened. In fact, the CDC seems somewhat absent in all this other than as a provider of public service announcements. The federal government officials we see on TV are from the “Institute of Allergies and Infectious Diseases”, which most people never heard of, and to a certain extent the surgeon general. I suppose my expectations on this were created mostly by Hollywood, and if this were a movie the CDC would be swooping in with white suits and saving us, or possibly incinerating the few to save the many. If this were a movie, the coronavirus would also be mutating into a fog that would seep into my living room and turn me inside out, so at least there’s that.
https://www.youtube.com/watch?v=4chSOb3bY6Y

hospital capacity data visualization

I was going to stop posting coronavirus tracker apps but this one looks really useful. Now that we know most infected people aren’t tested, the number of confirmed cases isn’t all that helpful as a metric except maybe to look at trends over time. The number of people in the hospital, on the other hand, is a hard number, and comparing that number to hospital capacity is very useful. This app from the University of Washington does that. It also forecasts future hospitalizations and gives a confidence range (which is quite wide, but there it is to ponder.)

This is by state, which is a slightly big and arbitrary geographic unit. Looking at my home state of Pennsylvania, things look almost reassuring, but then looking at New Jersey, they look dire. It would take me five hours to drive to Pittsburgh, Pennsylvania but I could almost spit on Camden, New Jersey. There will clearly be pressure to move patients across state lines within and between nearby metro areas, and in fact that is already in the news this morning.

The situation in New York looks just awful. I didn’t look at all 50 states, but a quick sampling suggests that states with large cities (and by proxy, probably large hospital systems), and states that started social distancing relatively early, are likely to do a lot better. People might think they would be safer in more rural areas, and perhaps it is true that your odds of infection are much lower, but your chances of survival if you do get infected could also be much lower. This is partially speculation and based on a few anecdotes I have heard, but I do know that this trend holds for car accidents and gun shot wounds.

To this water resource engineer, the differences in capacity use between states and the differences in the timing of available capacity suggest that you could move patients around, or move equipment and medical staff around, between regions in an organized way and save lives. Maybe somebody should get on that if they haven’t already.

coronavirus stats by metro area and normalized for population

I like this City Observatory approach to coronavirus stats. They are reporting numbers by metropolitan statistical area and normalizing them per 100,000 population. They are also reporting the rate at which cases are growing in each metropolitan area. They are using static tables and graphs but I think these provide much better information than the fancy maps and dashboards I have seen. The fancy maps and dashboards are updated more often – the ideal approach would blend all this together. As long as I am making a wish list, it would be nice to see the number of people hospitalized in each metro over time. That is the number we are looking for – the stock of available beds to first reappear as a positive number, then start to grow. When that happens I think we will start to see more public and political pressure to get people back to work. I expect high risk people to have to hide in their homes for quite some time after that, which is sad but I think that is the balance our society is likely to strike. If there comes a point later in the year where that stock of available hospital capacity starts to shrink or disappear in a given metro, that is when we might see shorter, more geographically targeted social distancing orders come and go.

visualizing tunnel progress

Metro Los Angeles has put together kind of a nice graphic to communicate the status of a tunnel construction project. It’s cartoonish, and yet contains a surprisingly large amount of scientific and engineering information.

 

alternatives to word clouds

I like this post on R bloggers proposing several alternatives to word clouds. I’ll list them below but really, you should look at the pictures because hey, this is about pictures.

  1. circle packing (basically this replaces the words with circles, dealing with the problem of bigger/longer words appearing to be more important in standard word clouds); there is a variation on this called the “horn of plenty” where the circles are arranged in order rather than randomly
  2. cartogram (in my ignorance, I have been calling this a “bubble map”. I have used these frequently to show engineering model results and find they work well for many people)
  3. chloropleth (these shade in geographic areas to convey data. I find these work well if the size of the geographic area is important information. If it is not, these tend to draw the viewer’s eye to larger areas, and in that case the bubbles are better. For example, per-person income of Luxembourg vs. China.)
  4. treemap (I’ve been calling these “packed rectangles” and I generally find them good for anything where conveying relative magnitudes of things to people is important)
  5. donuts (surpringly, the author concludes a donut is the best option for the data he is trying to show and I kind of agree, it gets the point across and leaves lots of room for labels)

The article has links to the specific packages and code used to create the graphics.

data-ink ratio

Here’s a wiki post about Edward Tufte’s data-ink ratio:

Tufte refers to data-ink as the non-erasable ink used for the presentation of data. If data-ink would be removed from the image, the graphic would lose the content. Non-Data-Ink is accordingly the ink that does not transport the information but it is used for scales, labels and edges. The data-ink ratio is the proportion of Ink that is used to present actual data compared to the total amount of ink (or pixels) used in the entire display. (Ratio of Data-Ink to non-Data-Ink).

Good graphics should include only data-Ink. Non-Data-Ink is to be deleted everywhere where possible. The reason for this is to avoid drawing the attention of viewers of the data presentation to irrelevant elements.

The goal is to design a display with the highest possible data-ink ratio (that is, as close to the total of 1.0), without eliminating something that is necessary for effective communication.

Before I offer an opinion,  I should state the disclaimer that you should definitely listen to Edward Tufte, not me! So here’s my opinion: this idea is clearly absurd when taken to extremes because it would just mean a bunch of dots on a page that you have no way of interpreting. I can’t think of a way of making graphs without axes, scales, and a legend. Labels, arrows, and text boxes are an alternative which I find myself using often when giving projected slide presentations in fairly large rooms.

A reasonable interpretation of Tufte, I think, is to ask yourself whether each new thing you are adding to a graph provides useful information to the reader/viewer, increases the chances that the reader/viewer will draw the right conclusions, and makes the reader/viewer’s job easier or harder. The holy grail is to help your audience imbibe the point of the graph with very little effort. Unnecessary 3D effects and clip art aren’t going to do that. A splash of color and some nice big labels that middle aged people can read from the back of the room just might help.

flow maps

Here is an interesting paper proposing design principles for flow maps, which “visualize movement using a static image and demonstrate not only which places have been affected by movement but also the direction and volume of movement.”

Design principles for origin-destination flow maps

Origin-destination flow maps are often difficult to read due to overlapping flows. Cartographers have developed design principles in manual cartography for origin-destination flow maps to reduce overlaps and increase readability. These design principles are identified and documented using a quantitative content analysis of 97 geographic origin-destination flow maps without branching or merging flows. The effectiveness of selected design principles is verified in a user study with 215 participants. Findings show that (a) curved flows are more effective than straight
flows, (b) arrows indicate direction more effectively than tapered line widths, and (c) flows between nodes are more effective than flows between areas. These findings, combined with results from user studies in graph drawing, conclude that effective and efficient origin-destination flow maps should be designed according to the following design principles: overlaps between flows are minimized; symmetric flows are preferred to asymmetric flows; longer flows are curved
more than shorter or peripheral flows; acute angles between crossing flows are avoided; sharp bends in flow lines are avoided; flows do not pass under unconnected nodes; flows are radially distributed around nodes; flow direction is indicated with arrowheads; and flow width is scaled with represented quantity.