Tag Archives: visualization

most popular R books of 2023

Here is something useful (to me, personally, and maybe too others), and thankfully not too pessimistic or morally fraught.

A Crash Course in Geographic Information Systems (GIS) using R – yes, please! We must end the tyranny of the monopolistic Environmental Systems “Research Institute”. Okay, they make some nice products, but just admit you are a rapacious for-profit corporation, please!

A ggplot2 Tutorial for Beautiful Plotting in R – Who doesn’t need to improve their data visualization and communication game?

just start your y-axis at zero

Seriously, just do that and it will work out most of the time. The only exception in my mind is if you are comparing the range or spread of two data sets and neither one is close to zero.

Snopes

I’ve been to Indonesia, and people there are normal human beings who are in fact somewhat shorter than Europeans on average. But their heads were typically around my shoulder height, not my knees. Some political violence has occurred there in the not-so-distant past, but I found the culture warm and hospitable. Like almost any country not at war, the biggest risk to your physical safety is probably being in a car accident or hit by a car. The next biggest if you are there for any length of time might be air pollution and second hand smoke. Once an Indonesian woman yelled at me to not sit next to her on a ferry. The ferry was crowded and there was nowhere else to sit, but I was eventually able to solve the problem by swapping seats with another woman (my gender being what made her uncomfortable apparently.) Other times I had groups of female Indonesian tourists stop me on the street and ask to take vacation pictures with me to show their friends back home. This was when I was quite a bit younger than I am now.

tile maps

Tile maps, which visually show areas with unequal areas as having equal area, are, somewhat obviously, appropriate when you don’t want the unequal geographic area to distort the message you are trying to communicate. An example might be if you want to show a variable by congressional districts, which have (roughly) equal populations but variable (spatial) areas.

A couple other ideas with tile maps are (1) to use rectangles of equal shape but different length/width ratios, and (2) to use words spatially arranged and with a variety of properties (font, size, color) to denote a variety of variables.

538 – best charts of 2022

There is nothing in 538’s best charts of 2002 that truly bowled me over. I mean, there are some graphics and maps that are effective at telling a story about their underlying data. There just aren’t any types of charts or applications of old types of charts that were a big surprise to me and that I thought I would want to copy if I could. Just purely for personal interest in the subject matter, the one I found most interesting was the map showing how college football conferences are losing all geographic meaning. I find myself slowly being less interested in college football with each passing year, and this is one reason why. My team’s losing campaign, loss to the NFL or “transfer portal” of many of their best players, blowout of the junior varsity squad in the mid-December bowl game they were lucky to even be selected for, and lackluster recruiting class are other reasons.

jobs, jobs, jobs, families, infrastructure, and more jobs…and Richard Nixon, from the bottom of my heart go fuck yourself!

Adam Tooze has a nice visualization of Biden’s spending proposals. Is this a tree plot? a cartogram? I’m not sure, experts please weigh in. A few things I noticed:

  • What Biden talks most and least about does not always match the largest and smallest proposed spending amounts. I think this is called “messaging”. For example, more would be spent on electric vehicle subsidies than on community college.
  • There is no clear line between the infrastructure package and the families package. For example, there is spending on public schools in the former and child care facilities in the latter.

That’s just scratching the surface. You could (and should) stare at this graphic for hours, and then there is a long article to go with it. But I have to go make breakfast now because I can hear the children getting grumpy, which means my precious little bit of early morning quiet thinking time as a working-parent-of-small-children-with-no-childcare-or-grandparent-support is now over. If Biden gets this stuff through our dysfunctional Congress, it will be mostly too late to help my family but I hope it helps others. Thanks Obama…Bush, Clinton, other Bush, Reagan, Carter, Ford, and Nixon at least. Especially Nixon, fuck you – a quick skim of the article reminded me of the bipartisan childcare program of the 1970s that you vetoed. Oh and also, fuck you Ralph Nader because maybe Al Gore would have gotten some of this stuff back on track 20 years ago. And last but not least, thank you once again Bernie Sanders for not pulling a Nader.

junkiest junk charts of 2020

Junk Charts is a great blog that takes an example of a data visualization, critiques it systematically, and then either improves it or shows a different way of displaying the same data. The site doesn’t go for overly elaborate graphics, just clear and effective ones. This post has a roundup of the most viewed posts and the author’s favorite posts of 2020.

One thing you probably shouldn’t do is describe interesting graphics in words. Nonetheless, here is some data, which I am not putting in a visual form because it would take exponentially longer than just listing it out:

  • There are 12 graphics covered by the post.
    • 2 scatter plots
    • 3 bar charts
      • 2 horizontal, not stacked – one of these gets changed to a bump chart
      • 1 horizontal, stacked – actually this is more of a “tree plot” where two data points are stacked and then a third is placed underneath
    • 2 pie charts
      • 1 3D pie chart – gets converted to a bump chart
      • 1 is allowed to continue to exist as a pie chart, with minor tweaks
    • 1 “dot matrix” (I’m not even sure if this is the best name, but basically you have empty squares or circles showing the total number of a thing, then some of them get filled in to illustrate how many of that thing fit a certain category)
    • 3 time series plots
      • 2 conventional – although one has two vertical axes, and the author illustrates how the limits can manipulated to suggest to the eye that two trends are related, or not
      • 1 showing shaded regions over time – basically a stacked bar changing over time
    • 538’s election snake

There is something intuitive about pie charts – that is why we explain fractions and percentages to children in terms of pizza or pie, and they grasp it instantly. Pie charts are obviously the wrong way to compare the absolute magnitudes of things.

I do like tree plots. I made one in 2020 and I was proud of myself – it showed the number of acres served by stormwater management controls implemented by three different administrative programs. And then I made a second one where I broke the numbers down further within each of the categories. This was very effective in conveying how much is actually achieved by each of the programs compared to the effort and expense that goes into them.

Resolution for 2021 is to play with “dot matrix” plots at some point (and maybe learn what the best name for these is.) I think these are effective in putting numbers in context of bigger numbers, regardless of units. For example, my city has around 80,000 cumulative confirmed coronavirus cases, maybe 5,000 confirmed active infections (about the number of confirmed cases in the last 10 days), maybe between 80,000 and 800,000 actual cumulative infections, and a population of about 1.6 million. I don’t know how many have been vaccinated at this point, but probably a few thousand. So maybe I would make 16 or 160 boxes each representing a chunk of people, and start coloring them in. Then we could see at a glance how much of the population might have some immunity to the virus right now, and how much does not. You could slice and dice the data many ways. Of course, some people died or moved away, and others were born or moved in. Incidentally, about 2,600 people died of Covid, 400 were murdered, and 120 died in and around motor vehicles. I haven’t seen numbers on suicides or drug overdoses but they are always horrifying. Around 1% of any given population dies in any given year from a combination of preventable and not preventable causes, which is sad but news flash: we are mortal beings.

This site doesn’t do maps, which is fine. I am a big fan of maps. But I have a very simple test – is the data geographic in nature? Then make a map. But often, some other types of graphs and tables will further illuminate the data, and those often work well alongside your map rather than being shoehorned into your map where they don’t really belong. And I also find it clunky trying to do any type of mathematical analysis in mapping software when the analysis is not spatial in nature.

2020 visualizations from FiveThirtyEight

Fivethirtyeight.com has a roundup of interesting visualizations they did in 2020. There’s a lot here, but one theme I think I would like to try to make use of is is pretty simple. When you are counting something, put the count in context by first showing a bunch of empty squares that represent the potential or total number of something (voters, or citizens stopped by police, or human beings with potential Covid exposure). Then put dots in some of the boxes, or color in some of the boxes, to illustrate the count. If you want to introduce some additional categories, you can use colors or put boxes around the boxes, or to get really fancy, put groups of boxes on a map. This technique undoubtedly has a name, but the article doesn’t tell me what the name is.

one more covid tracker

I thought I was over covid trackers, but I just can’t help it. I know this isn’t my first “one more”, and it might not be my last. This one plots new cases over the past week on the vertical axis vs. total confirmed cases on the horizontal, the animates over time. You can add any country or U.S. state. The simulation starts whenever 10 cases were reported in that location, and you can see them grow at first exponentially and then deviate from the line when they start to get it under control. You can pick a log or arithmetic axis – log is good for the math, but it kind of lets you forget that there is a difference between 10 people dying and 10,000 people dying. Anyway, it’s nice and thanks to this person for posting it for free.

March 2020 in Review

To state the obvious, March 2020 was all about the coronavirus. At the beginning of the month, we here in the U.S. watched with horror as it spread through Europe. We were hearing about a few cases in Seattle and California, and stories about people flying back from Italy and entering the greater New York area and other U.S. cities without medical screening. It was horrible, but still something happening mostly to other people far away on TV. In the middle of the month, schools and offices started to close. By the end of the month, it was a full blown crisis overwhelming hospitals in New York and New Jersey and starting to ramp up in other U.S. cities. It’s a little hard to follow my usual format this month but I’ll try. Most frightening and/or depressing story:
  • Hmm…could it be…THE CORONAVIRUS??? The way the CDC dropped the ball on testing and tracking, after preparing for this for years, might be the single most maddening thing of all. There are big mistakes, there are enormously unfathomable mistakes, and then there are mistakes that kill hundreds of thousands of people (at least) and cost tens of trillions of dollars. I got over-excited about Coronavirus dashboards and simulations towards the beginning of month, and kind of tired of looking at them by the end of the month.
Most hopeful story:
  • Some diabetics are hacking their own insulin pumps. Okay, I don’t know if this is a good thing. But if medical device companies are not meeting their patient/customers’ needs, and some of those customers are savvy enough to write software that meets their needs, maybe the medical device companies could learn something.
Most interesting story, that was not particularly frightening or hopeful, or perhaps was a mixture of both:
  • I studied up a little on the emergency powers available to local, state, and the U.S. federal government in a health crisis. Local jurisdictions are generally subordinate to the state, and that is more or less the way it has played out in Pennsylvania. For the most part, the state governor made the policy decisions and Philadelphia added a few details and implemented them. The article I read said that states could choose to put their personnel under CDC direction, but that hasn’t happened. In fact, the CDC seems somewhat absent in all this other than as a provider of public service announcements. The federal government officials we see on TV are from the “Institute of Allergies and Infectious Diseases”, which most people never heard of, and to a certain extent the surgeon general. I suppose my expectations on this were created mostly by Hollywood, and if this were a movie the CDC would be swooping in with white suits and saving us, or possibly incinerating the few to save the many. If this were a movie, the coronavirus would also be mutating into a fog that would seep into my living room and turn me inside out, so at least there’s that.
https://www.youtube.com/watch?v=4chSOb3bY6Y

hospital capacity data visualization

I was going to stop posting coronavirus tracker apps but this one looks really useful. Now that we know most infected people aren’t tested, the number of confirmed cases isn’t all that helpful as a metric except maybe to look at trends over time. The number of people in the hospital, on the other hand, is a hard number, and comparing that number to hospital capacity is very useful. This app from the University of Washington does that. It also forecasts future hospitalizations and gives a confidence range (which is quite wide, but there it is to ponder.)

This is by state, which is a slightly big and arbitrary geographic unit. Looking at my home state of Pennsylvania, things look almost reassuring, but then looking at New Jersey, they look dire. It would take me five hours to drive to Pittsburgh, Pennsylvania but I could almost spit on Camden, New Jersey. There will clearly be pressure to move patients across state lines within and between nearby metro areas, and in fact that is already in the news this morning.

The situation in New York looks just awful. I didn’t look at all 50 states, but a quick sampling suggests that states with large cities (and by proxy, probably large hospital systems), and states that started social distancing relatively early, are likely to do a lot better. People might think they would be safer in more rural areas, and perhaps it is true that your odds of infection are much lower, but your chances of survival if you do get infected could also be much lower. This is partially speculation and based on a few anecdotes I have heard, but I do know that this trend holds for car accidents and gun shot wounds.

To this water resource engineer, the differences in capacity use between states and the differences in the timing of available capacity suggest that you could move patients around, or move equipment and medical staff around, between regions in an organized way and save lives. Maybe somebody should get on that if they haven’t already.