0

Mercury in Chlor-Alkali Plants Mapped with CartoDB

The other day I learned that wordpress.com now supports embeds of CartoDB maps. This is pretty cool, and it inspired me to finish up a little project that I’ve been tinkering with for a while, in order to try out the new feature.

By the way, CartoDB is a web mapping tool that I think is one of the best interfaces available for creating interactive maps. You can make great looking maps quickly and easily, but there is also enough functionality to do more advanced stuff, like mess around with the CSS code.

This map shows estimates of how much mercury is on site at chlor-alkali plants per country. It distinguishes between countries that ban the export of mercury and those that don’t. This is important because chlor-alkali plants often contain hundreds of tons of mercury. When the facilities close the mercury can enter the commodity market where it can be used in artisanal gold mining.

The size of the bubbles reflects how many tons of mercury are estimated to be in chlor-alkali facilities in each country. Scroll, zoom, hover, or click for more details. The data are from the UNEP Global Mercury Partnership chlor-alkali inventory.

Technical CartoDB note: In order to distinguish (by bubble color) countries with and without export bans, I made two layers from the data table. However, because each set had a different range of values, the scale for the bubble size was different for each color. To fix this I manually changed the bubble size distribution cutoffs in the CSS tab. Is there an easier solution that I am missing?

Oh yeah, this is how you do the embed.

1

The Rio Declaration and the Decline of Multilateral Environmental Agreements

It’s been quite some time since my last post. I have been busy with a young child, new job, and an international move. But I’m hoping to get back into posting and making visualizations on a regular basis.

The reason for this post is that I came across an interesting resource called the International Environmental Agreements Database Project, hosted at the University of Oregon. The database contains information on about 1100 multilateral environmental agreements (MEAs) dating back to 1857. The data include the title, type (an original agreement or a protocol or amendment to an existing agreement), dates of signature and entry into force, and the parties. For some agreements there is even data on performance as well as coding to allow for comparison of the actual legal components.

As an initial exploration, I simply looked at how many agreements were concluded over time. The plot below shows the results for the last 100 years. Click for the interactive and shareable plot.ly version.

100 Years of Multilateral Environmental Agreements

Click for interactive version

There is a pretty interesting pattern. From the early 20th century until the 1950s there are not that many MEAs. Then the pace picks up in mid-century, peaking in the early 1990s, and declining considerably after that.

What’s going on? Have all the easy agreements been reached and there is nothing more for countries to negotiate about? Maybe that’s part of it, but I think it has something to do with an event that coincided with the peak in MEAs – The 1992 Earth Summit and the resulting Rio Declaration on Environment and Development.

The Earth Summit was a huge event in the global environmental community, and occurred at a high point of optimism about multilateralism. There was a flurry of MEA activity around this time. But there was also a building movement to ensure that international environmental diplomacy was benefiting the poor, and in particular, developing countries.

The Rio Declaration enshrined the principle of common but differentiated responsibilities. This is the idea that while all nations have a responsibility to protect the global environment, rich nations should shoulder a greater share of the burden.

It is a noble sentiment, and one that in my view makes a lot of sense. But it had the effect of making it more difficult to reach agreements in international environmental negotiations. Developing countries started going into the negotiations expecting more support, in the form of funding, reduced obligations, or technology transfer, from the developed world. Common but differentiated responsibilities is at the root of a major sticking point in global climate talks. Should China, India, and other rapidly developing nations have the same stringent obligations as more mature economies?

I certainly don’t think this is the only cause of the decline in new MEAs in the last 20 years. And neither can I claim to be the first to think about the Rio Declaration’s impact on MEAs. There’s an entire literature on it. For example, Richard Benedick discussed this theme at length in reference to the Montreal Protocol and its aftermath in his book Ozone Diplomacy.

As a final disclaimer, for this analysis it would be best to filter the IEA database to exclude those MEAs that only have a few parties. That way you could really focus on the rate of global or large regional MEAs over time. Perhaps I’ll do that next.

But in any case, it’s an interesting dataset and an interesting pattern. And a good excuse to step back and think about the big picture in global environmental politics.

2

Hurricanes and Baby Names

Recently there has been bit of buzz about a study claiming that female named hurricanes cause more fatalities, on average, than male ones. The authors suggested that the discrepancy is attributable to gender bias. Female named hurricanes do not seem as threatening to people, so presumably they take fewer precautions. From the start this seemed pretty far-fetched, and in fact a number of problems have been found with the study.

But it got me thinking about hurricane names. A more likely effect of a hurricane’s name would be to discourage parents from giving their children that name, if the hurricane is associated with death and destruction. Fortunately, there is readily available data with which to test this hypothesis. For hurricanes, I used the same data as the hurricane gender study described above (they may have had some problems with their methodology, but at least they released their data). It contains data on 92 Atlantic hurricanes that made landfall in the U.S. since 1950*. For baby names I turned to the Social Security Administration. There is a great R package called babynames that makes the yearly SSA data available in a readily accessible format for use in R. As an aside, the SSA baby names data is the source of all sorts of interesting visualizations and analyses, such as the baby name voyager and this article from fivethirtyeight.com on predicting a person’s age based on their name.

The tricky part of this analysis is deciding how to define a decrease in name usage after a hurricane. The simplest way would be to look at how many times a name was given in the year of a hurricane versus how many times that name was given the following year. For example, how many baby Kartrinas were there in 2005 versus 2006. However, this method does not take into account that most names are either decreasing or increasing in popularity as part of a longer-term trend. So you have to look at how the popularity of a name was changing before the hurricane as well. To see why, look at this plot of the number of babies named Katrina over time.

katrina

Katrina peeked in popularity in in 1980 and has been declining ever since. But from 2004-2005 the number of Katrina’s actually increased about 13%. From 2005-2006, however, it decreased dramatically – by 26%. It’s a pretty good bet that this rapid decrease was due to the hurricane.

To quantify the change in a name’s usage after a hurricane, I made the assumption that the best predictor of how a name’s popularity will change in a given year is how it changed last year. To calculate the post-hurricane change in name usage I subtracted the percent change in name usage in the year before the hurricane from the percent change after the hurricane. In the Katrina example the post hurricane change would be (-26%) – (13%) = -39%. This post-hurricane percent change value is what I use in the analysis below.

Before we get to the results, let’s take at look at the fascinating case of Carla:

carla

Hurricane Carla was an extremely intense storm that hit Texan in 1961, killing 43. The name “Carla” had been surging in popularity,  but after 1961 it started a decline in popularity from which it never recovered. It seems a pretty good bet that the hurricane had a major role in Carla’s decline. Interestingly, the first live television broadcast of a hurricane was of Carla, with a young Dan Rather himself reporting from Galveston. Could the shock of the American TV-viewing public seeing footage of the storm in their living rooms have contributed to the demise of Carla as a name?

Back to the analysis. Indeed, the hurricane baby name effect seems real. After running the numbers, I found that names associated with a landfalling hurricane were about 15 percent less common in the year after the hurricane. Out of the 93 hurricanes in the data set, 65 were associated with a decrease in the popularity of their names, and only 21 were followed by increasing name usage. (Seven hurricane names were not found in the SSA data in their landfall year).

So far this is pretty intuitive. Of course people are less likely to name their dear infant after a natural disaster. Based on this reasoning, you’d expect that the more fatalities caused by a hurricane, the greater the baby name effect. Let’s test that.

names.fatal

The effect is quite small. When we take Katrina out (a massive outlier in terms of fatalities), it’s smaller still:

name.fatal.ex.kat.rug

So the correlation between change in baby name usage and hurricane fatalities is quite weak. Finally, I had to see if the gender of the hurricane name affected this relationship. Were more deadly female-named hurricanes more or less likely than male names to affect baby name popularity? Maybe I’d even find that male baby name usage goes up with hurricane fatalities because parents associate the names with strength? I can see the Slate headline now! Alas, there is no significant difference:

names.fatal.ex.kat.bysex

By the way, there are more female names because from 1950 – 1979 all Atlantic hurricanes were given female names.

There’s an almost endless amount of interesting things to glean from the baby names data. My ultimate dream is an algorithm to determine the perfect name for your baby based on a number of criteria chosen by the expectant parents. It would really take the stress out of the naming process. One of the criteria would certainly be that the name is not on the World Meteorological Association’s list of tropical storm names!

Data and code available on github.

* The authors of the hurricane fatalities study did not include Katrina in their data set. I added it in with data from Wikipedia.

 

2

Graphics for Fitness Motivation using Plot.ly

This post is intended to illustrate the cool things you can do with plot.ly’s API for R. Plot.ly is a web-based tool for making interactive graphs. It uses the D3.js visualization library, and lets you create very attractive plots that can be easily shared or embedded in a web page. With the R API you can manipulate data in R and then send it over to plot.ly to create an interactive graph. There’s also a function that let’s you create a plot in R using ggplot2, and then shoot the result directly over to plot.ly (summarized nicely here).

I have great little free app on my iPhone called Pedometer++ that keeps track of how many steps I take each day. I exported the data, plotted up a time series with ggplot2, and used the API to make the graph in plot.ly. It worked quite nicely. The only hiccup was that plot.ly did not recognize the local regression curve, so I had to add that separately.

You can see from the plot that I’m not consistently meeting my 10,000 step goal. In fact, I averaged 7,002 steps over this period. That still comes out to a total of 1,470,463 steps. From October through February my step count was trending slightly downward, but since then it’s picked up. Maybe that had something to do with the cold winter. Hopefully as the weather (and my motivation) improves, I’ll hit my goal.

steps_taken_per_day_october_2014_-_november_2014

Click to see the interactive version

Any here’s a bonus box plot showing steps taken by day of the week (also using the R API):

steps_per_day2c_october_2013_-_may_2014

Click to see the interactive version

If there are any pedometer users out there who are interested, let me know and I can post the code.

1

Updated Global Mercury Pollution Viz and Graphics

One of the first posts on this blog was about using Tableau to visualize data on global emissions of mercury.  I’ve gotten suggestions from a few people and given the graphic a bit of a face lift. Click on the image to see the interactive viz:

Dashboard_1 (3)

Click for interactive graphic

I also used the same dataset to make some static graphics using ggplot2 and the ggthemes package. I’d love any input on how to improve the the look and feel of both these and the Tableau viz. I’ve always found picking good colors very challenging, so thoughts on the palettes are especially welcome.

hg.emissions.bysec

The 8 industry sectors with the highest global mercury emissions. Data for 2010 from the 2013 UNEP Global Mercury Assessment.

hg.emissions.bycty_fewm

Countries with the highest mercury emissions. Data for 2010 from the 2013 UNEP Global Mercury Assessment.

4

Visualizations about Data Visualization

It’s no secret that interest in data visualization has been growing in recent years. Don’t believe me? Let me show you a graph:

google trends

From Google Trends

Sure, humans have been presenting information graphically for hundreds, if not thousands, of years, with increasing sophistication.  We still study the work of people like John Snow, William Playfair and Florence Nightingale for their innovations in graphical presentation. Today, however, with the increasing availability of large, rich, and easily accessible datasets, and the proliferation of software tools for creating graphics, we are seeing an explosion in the amount of data visualizations. This is a great development. I obviously think so, since I jumped on the bandwagon.

The recent ubiquity of the data visualization brings with it a new subgenre, the meta-visualization. Visualizations about visualizations. Some of these describe what data visualization is, or should be. Some present information about common types or characteristics of visualizations. Still others poke fun at cliches, poor practices, and the very pervasiveness of visualization as a medium for communicating information. Let’s take a look at some examples.

First, here’s the Infographic of Infographics:

Then there’s this periodic table of types of visualizations:

periodic viz

 

Robert Kosara is not amused. For an nice take on the actual periodic table (the one with the elements), have a look at this.

Continuing with the periodic table theme, here is a periodic table of period tables. This is very meta.

The Periodic Table of Periodic Tables

But does this periodic table of periodic tables contain itself? (It does.) And, more importantly, should a periodic table of all periodic tables that do not contain themselves contain itself.

Some graphics attempt to illustrate what characteristics a good data visualization should have. Like this 4-set Venn diagram, for example:

Or like this Venn-like diagram, which I’m not quite sure how to read:

Now if you really want to turn it up to 11, or more accurately, up to seven, you could employ this epic 7-set Venn diagram:

7venn

Click on this. You won’t regret it.

Another category of meta-visualizations contains humorous or satirical ones. These are not literally visualizations of other visualizations, but they are about visualization as a medium. These are funny, self-aware takes on the cliches and excesses in the field. Pie charts that skewer the graphical form of the pie chart itself are practically a sub-subgenre in themselves:

pie-i-have-eaten-chart

Really, nobody seems to have any love for the pie chart.

Or, you know how there are like a million maps on the internet showing which state or country is the most this, or the most like that? Well that’s the set up for this brilliant satirical tweet:

And on the topic of maps, here’s a gem from xkcd:

Its fully because it’s true!

Finally, we venture into silliness with one of my all-time favorites, All You Need to Know about Lady Gaga’s Hit “Bad Romance” in One Chart:

To sum up, here is a word cloud visualization of this post:

viz word cloud

0

Getting to Know the Worldwide Governance Indicators

A while ago I wrote a post suggesting that Ukraine’s propensity for revolution might have something to do with its high level of government corruption in combination with its relatively well-developed civil society. As evidence for this, I showed that Ukraine (together with Kyrgyzstan and Moldova, two countries that have also recently experienced political unrest) was an outlier among post-Soviet states with respect to the relationship between corruption perceptions and authoritarianism. This finding was interesting, but by no means robust enough to warrant broad generalizations about corruption and democracy and revolution.

Since then, a few others chimed in with some ideas. Ben Jones suggested looking at corruption and authoritarianism in countries that experienced revolutions over time. Cavendish McCay looked at corruption and authoritarianism data from the same sources but over the entire globe, and produced a very cool visualization. He also pointed me to the World Bank’s Worldwide Governance Indicators, which contains measures of democracy, corruption, and political stability. Perhaps it would be possible to test my hypothesis empirically using these data. This could be done for individual regions or for the whole world, and could also have a temporal component (the indicators have been published since 1996).

In order to determine if such an analysis is feasible, I decided to take a closer look at the dataset (which is free and downloadable from the website). The Worldwide Governance Indicators (WGI) project is an ambitious one. The authors compile data from 31 different sources (such as think tanks, NGOs, private firms) and produce annual scores for every country for six indicators of the quality of governance. The indicators are:

  • Voice and Accountability
  • Political Stability and Absence of Violence
  • Government Effectiveness
  • Regulatory Quality
  • Rule of Law
  • Control of Corruption

First off, we can look at the data on a map. Fortunately the WGI website has a series of nice Tableau interactive graphics, including maps:

Screen Shot 2014-04-27 at 2.17.49 PM

Looking at the indicators geographically is helpful. But to evaluate whether they can be used to test the hypothesis, I want to see how each indicator is correlated with all the others. For this, we’ll turn to R. Here is a correlation matrix of the six indicators as calculated for 2012. Positive correlations are reflected as positive values. The closer the the number to one, the stronger the correlation. wgi.corrplot As you can see, all the indicators are positively correlated to each other, some very strongly. This is not surprising. We would expect well-governed countries to get high marks for rule of law, regulatory quality, control of corruption, etc. One interesting observation here is that Control of Corruption actually has the lowest correlations of all the indicators. A scatter plot matrix is a good way to look at the data in more detail:
wbi.splom.plot

The idea for this variation on the scatter plot matrix comes from Winston Chang’s R Graphics Cookbook. Its structure is similar to the correlation matrix in that all of the indicators are plotted against each other. The lower panels show scatter plots with LOESS regression lines for each indicator pair. This plot has some extra bells and whistles thrown in – histograms of the distribution of each in indicator in the diagonal panels and correlation coefficients (just like the correlation matrix) in the upper panels. The scatter plots show the strong to moderate correlations that we already saw in the correlation matrix, but allow us to make out some curious features of the data, like the non-linear relationship between Voice and Accountability and many of the other indicators.

The indicator values are in units of a standard normal distribution. A value of zero is the mean, while a value of one is one standard deviation higher than the mean. Given the distributions,  the indicator values range from about -2.5 to 2.5.  Positive values represent better governance, negative represent worse. Because each indicator is measured on the same scale, we can simply sum all six to determine the overall “best governed” country. The top six are:

Country     sum
FINLAND     11.19
SWEDEN      10.94
NEW ZEALAND 10.83
NORWAY      10.67
DENMARK     10.59
SWITZERLAND 10.57

And the bottom six:

SOMALIA              -13.65
CONGO, DEM. REP.     -9.76
SUDAN                -9.74
SYRIAN ARAB REPUBLIC -9.53
AFGHANISTAN          -9.48
KOREA, DEM. REP.     -9.35

I got a bit carried away examining the correlations between the governance indicators, but in a subsequent post I hope to look closer at the democracy – corruption – stability hypothesis. I’m still not quite sure what statistical tests to use and how to apply them, and I’d welcome any ideas. Data and code are posted on Github (github.com/caluchko/wgi)

 

1

Another Way to Look at Mercury in Seafood

In the previous post, I used Tableau Public to create a visualization of the Seafood Hg Database. That graphic showed the mean mercury content and number of samples by seafood category. But there are several other dimensions in the database, including the year of the study and the particular species of seafood sampled. I couldn’t resist playing around with the data a little more, this time using the lattice package in R.

The plot below shows the mean mercury concentration (y-axis) in studies of the 12 seafood categories with the highest median mercury concentration. The x axis shows the date of the study. I’ve also plotted a trend line for each panel. This is a nice way to visualize the data, but I wouldn’t read too much into this plot. For one thing, many of the seafood categories contain multiple species, some of which are higher than others in mercury. Also, this plot does not account for the geographical region where the fish were sampled.

fish.hg.latticeplot
We can tease a little more from the dataset by looking at the individual species within a seafood category. Here is a plot of the six tuna species with the greatest number of studies. The larger species, like bluefin, seem have higher mercury contents than the smaller ones, like skipjack. One curious feature of the dataset is also visible here: there were very few studies of mercury in seafood in the 1980s.
fish.hg.tunaplot

3

How Much Mercury is in Your Favorite Seafood?

I’ve written before about mercury emissions, mercury as a commodity, and mercury use in artisanal mining. But the reason we pay so much attention to mercury is because of its human health impacts, and these are primarily caused by eating contaminated seafood.

Different types of seafood have different amounts of mercury. Because mercury is bioaccumulative, organisms that are higher on the food chain tend to have greater mercury concentrations. Of course, the particular environment where the organism lives also plays a big part.

Scientists have been interested in the mercury content of seafood for decades. Recently, a group of researchers undertook the herculean task of aggregating data from almost 300 studies. The result is the Seafood Hg Database (and an accompanying paper). The database contains the mean mercury concentrations measured in each study for one or more of 62 seafood categories. Overall, the database represents over 62,000 individual measurements from around the world.

It’s a great dataset to play around with and experiment with visualizations. In the graphic below, I plot mercury concentrations for a subset of common seafood types. Each circle represents the mean concentration measured in one study, and the size of the circle is proportional to the number of samples in that study. I’ve overlaid box plots for each seafood category that show the median of all the means, as well as first and third quartiles (whiskers go to 1.5x the IQR).

I think this is much more instructive than simply plotting the grand mean (average of all the study averages) for each seafood category. For one thing, you lose a lot of information on how much mercury concentration varies within a category. Take tilefish, for example. This is one of the species that EPA and FDA advise pregnant women not to eat. But there are relatively few studies of tilefish, and the mean mercury concentrations they measured vary by an order of magnitude.

Click on the image below to bring up the full interactive Tableau Public visualization:

Hg in seafood

Click on the image to see full version

3

Satellite Image Time Lapse of Artisanal Mining in Peru

My last post was about gold and mercury prices, and how we might measure their relationship. We would expect a relationship between prices of these metals because mercury is used in artisanal and small scale gold mining (ASGM). We may or may not see a signal in mercury prices related to ASGM, but we most definitely see the effects of ASGM on the landscape on a massive scale. Using the Landsat Annual Timelapse tool in Google Earth Engine, I created this animation showing the explosive growth of ASGM and associated deforestation near Huaypetue in the Madre de Dios region of Peru. Click on the image below to view the animation.

asgm landsat anim

You can see that beginning in the late 1990s, large areas around rivers turn from green (rain forest), to brown (cleared areas for mining). The trend seems to accelerate in the last 10-15 years. You can explore the region as it appears today in Google Maps:

And because it’s fun to play with Google maps, here is a striking oblique image of the region.

Zooming in a bit closer, seen from a plane flown by the Carnegie Airborne Observatory, the impacts of mining come into even sharper view:

The scenes on the ground look every bit as desolate as you would expect from the satellite and airborne imagery:

If you are looking for more information on artisanal mining in Madre de Dios, this article in Nature is a good place to start. The Guardian has also been covering this region. This piece focuses on mercury use in mining and its toxic impacts.