Here’s a little project I created to try out the free online graphic design package Canva. While it won’t replace a full-service tool like Illustrator, Canva makes it very easy to create attractive presentations, posters, and simple infographics. It’s definitely worth a try.
In my previous post, I explored a dataset on fatal avalanches in Switzerland from the Swiss Institute for Snow and Avalanche Research (SLF). The dataset also contains the location of each avalanche, and here I’ll explore a few ways to show the data geographically.
In the map above, the location and date of each avalanche is used to make a time lapse with CartoDB’s Torque function. Each flashing white marker is one fatal avalanche. Besides the general location of avalanche risk in Switzerland and the seasonal pulsation of events, this map does not convey all that much information. However, I think it is worthwhile because it drives home the sheer number of deadly avalanches – 361 – during this period. We have to keep in mind that each of these flashing markers is a separate tragedy that together represent the loss of 465 lives.
This map shows the geographical distribution of fatal avalanches by the activity or location involved in the accident. As I discussed in the last post, the great majority occurred in open country during recreational activities like backcountry touring or off-piste skiing. The map illustrates that backcountry touring accidents are distributed fairly evening across the high Alps, while off-piste skiing and snowboarding accidents tend to be clustered. Closer inspection reveals that these clusters occur around high mountain lifts, like this, the largest cluster, one on the north slope of Mt. Gele and Mt. Fort near the resort of Verbier:
This map also lends itself well to exploration. The Open Street Map base has great detail upon zooming, and you can click on each point to get more information about each avalanche, such as elevation, aspect, date, and number of fatalities.
Finally, here’s a heatmap showing the density of fatal avalanches, with red areas having the highest densities. The cantons of Valais (in the southwest) and Grisons (in the east) have the highest concentrations of deadly avalanche accidents. I used a Landsat mosaic as a base map, which allows for comparison of the relationship between terrain and avalanche density.
All avalanche data from WSL Institute for Snow and Avalanche Research SLF, 25 March 2016. Data and code available here. Maps generated using CartoDB.
Snow-covered mountains are one of the most beautiful sights in nature, but in the wrong circumstances they can kill you. Skiers and other mountain enthusiasts sometimes refer to avalanches as the “white death”, and for good reason. Hundreds die in avalanches every year, and a great deal of effort is spent on trying to understand the factors that cause avalanches in the hope of decreasing this toll.
Located in the Alps and a mecca for winter sports, Switzerland takes avalanches seriously. The Swiss Institute for Snow and Avalanche Research (SLF) monitors snow conditions, issues warnings, and collects data on avalanches. Their web site is very interesting for those interested in winter sports in the Alps. I find the snow maps particularly useful. But for this post I will use their data on fatal Swiss avalanches in the last 20 years to experiment with different ways to visualize some patterns and relationships.
The dataset includes information on the date, location, elevation, and number of fatalities, in addition to the slope aspect, type of activity involved (e.g. off-piste skiing), and danger level at the time of the avalanche. Over the last 20 years there have been 361 fatal avalanches in Switzerland, for a total of 465 deaths. Most avalanches killed only one victim.
Because I wanted to experiment with radial plots, I’ll focus on the variable of slope aspect in this post. Aspect is the compass direction that a slope faces. In this case we’re looking at the slope where the avalanche occurred. In Switzerland, the majority of avalanches occur slopes facing NW – NE, as you can see from this plot:
This pattern is common in the temperate latitudes of the northern hemisphere. Avalanches are more common on north-facing slopes because they are more shaded and therefore colder, which allows snowfall to remain unconsolidated for longer. When more snow falls, these unconsolidated layers can act as planes of weakness on which snow above can slide. It’s much more complicated that that, with factors like wind and frost layers coming into play. To learn more about how aspect and avalanches, see here. The pattern is unmistakable, but does it hold all year long? I separated the data by month to find out:
A few interesting insights emerge from this plot. First, February is clearly the most deadly month for avalanches. In December there are actually quite a few avalanches on SE facing slopes, but by January the predominate direction is centered around NW. In February, and to some extent in March, it changes to N-NE. In April it’s NW again, but by then there are significantly few avalanches. So there are some monthly patterns, but I’m not exactly sure what the explanation is. Of course to really nail this down we’d want to do some statistics as well.
One pattern I expected, but did not see, was a decrease in the dominance of northern aspects later in the spring. I expected this because as the days get longer, the shading effect of north facing slopes decreases. It’s important to remember that these are fatal avalanches, and a dataset of all avalanches would look different. For example there are probably a lot of wet avalanches on southern slopes in the spring. But these are much less dangerous than the slab and dry powder avalanches, and therefore not reflected in the fatality data.
The rose style plots above are useful, but I wanted to try to illustrate more variables at once. So I tried a radial scatter plot:
Click on the image for the interactive Plot.ly version
This plot is similar to the previous ones in that the angular axis represent compass direction (e.g. 90 degrees means an east-facing slope). The radial axis (the distance form the center) represent the elevation where the avalanche occurred. And color represents the type of activity that resulted in the fatality or fatalities. Each point is one avalanche. The data are jittered (random variations in aspect) to minimize overplotting. This is necessary because the aspect data are recorded by compass direction (e.g. NE or ESE). The density of the points clearly illustrates the dominance of north-facing aspects. It’s also clear that most avalanches occur between 2000 and 3000 meters (in fact the mean is 2507 m). In terms of activity, backcountry touring and off-piste skiing and boarding dominate. And avalanches at very high altitudes are mostly associated with backcountry touring, which makes sense, as not many lifts go up above 3000m. Perhaps especially perceptive viewer can make out some other patterns in the relationships between variables, but I can’t. Any thoughts on the usefulness of this plot for the dataset?
Finally, I want to share a couple graphics from SLF (available here). Here is a timeline of avalanche fatalities in Switzerland since 1936:
The average number of deaths per year is 25, but this has decreased a bit in the 20 years. There were also more deaths in buildings and transportation routes prior to about 1985. Presumably improvements in avalanche control and warnings reduced fatalities in those areas. And what happened in the 1950/51 season. That was the infamous Winter of Terror. The next plot shows the distribution of fatalities by the warning level in place when the avalanche occurred:
Interestingly, the great majority of deaths happened when warning levels where moderate or considerable. There were significantly fewer deaths during high or very high warning periods. One reason must be that high/very high warnings don’t occur that frequently, but it’s also likely that skiers and mountaineers exercise greater caution or even stay off the mountain during these exceptionally dangerous times. There’s probably some risk compensation going on here. To really quantify risk, you have to know more than just the number of deaths at a given time or place. You also have to know how many people engaged in activities in avalanche country without dying. One clever approach is to use social media to estimate activity levels, as demonstrated in this paper.
Have fun in the mountains and stay safe!
Data and code from this post available here.
All data from WSL Institute for Snow and Avalanche Research SLF, 25 March 2016
This map shows the current status of ratifications of the Minamata Convention on Mercury. Although I update it frequently, check mercuryconvention.org for the most recent status. The map also shows countries engaged in Minamata initial assessment (MIA) and artisanal and small-scale gold mining national action plan (NAP) projects funded by the Global Environment Facility (GEF), along with the implementing agencies. Use the “Visible layers” function on the map to toggle between ratification status, MIAs, and NAPs. The full screen button, located below the zoom controls, is also useful.
Data on ratification and GEF project status from the Interim Secretariat of the Minamata Convention and UNEP. Country boundaries from Natural Earth. Mapping done in CartoDB using Robinson projection.
The terrorist attacks in Paris on November 13 brought renewed attention to the movement of refugees from Syria to the West. Unfortunately, much of this attention has been negative, despite the fact that refugees are fleeing the very brutality that was unleashed on Paris. The rhetoric from the Republican presidential candidates in the U.S. has been particularly vile. However, many people around the world continue to welcome refugees and show compassion. That’s why I made this visualization:
This map shows positive media coverage of refugees over the past 24 hours (updated hourly). Each animated marker represents one positive media mention about refugees in a particular location.
The data comes from GDELT (The Global Database of Events, Language, and Tone). GDELT’s Global Knowledge Graph monitors media in 65 languages around the world and uses algorithms to measure the emotions and tone of the texts. The map shows results on the theme of “refugees” with a tone of greater than two. Tone is the most basic GDELT parameter, and measures how positive or negative a media article is. So, for example, this article about how churches in Kansas and Nebraska are ready to help refugees is included in the dataset.
How I made the map
This map is a nice demonstration of some useful CartoDB features, such as sync tables, animation, and custom map projections.
This returns a geojson file with all the results over the last 24 hours tagged with the “refugees” theme. Using CartoDB’s sync tables you can set the data table to update automatically. Mine updates every hour.
I filtered the results to only include articles with a tone score of greater than two (positive coverage), and then used CartoDB’s Torque tool to create the animation with a custom marker (the heart).
Inspiration came from this blog post, and this tutorial was very helpful in figuring out how to use the GDELT API. You can access the data from my CartoDB page here and easily create a map of your own.
A while back I was thinking about European colonialism and the enormous impact it’s had on world history. Wouldn’t it be nice to have a simple visualization to illustrate colonization and decolonization around the world? It occurred to me that a dumbbell dot plot would work well for this task. Here’s what I came up with:
The chart shows the dates of colonization and independence of 100 current nations. The countries are organized into broad regions (Asia, Africa, and the Americas), and sorted by date of independence. Color represents the principal colonial power, generally the occupier for the greatest amount of time.
There are many interesting patterns visible in the chart. For example, you can clearly see Spain’s rapid conquest of Central and South America, and then even more rapid loss of its colonies in the 1820s. The scramble for Africa in the late 19th century stands out well, as does the rapid decolonization phase of the late 1950s through early 1970s.
About the Data
To reduce complexity to a manageable level, I set some limitations on what countries to include. First, the chart shows only those countries victim to Western European colonialism. I don’t include Ottoman, Japanese, Russian, American, or other colonial empires. I also don’t include territories that are still governed by former colonial powers (e.g. Gibraltar). This gets controversial and complicated. Countries that were uninhabited upon discovery by colonial powers are also not included. The same with countries that later gained independence from a post-colonial state (e.g. South Sudan).
The dates of independence come from the CIA World Factbook (here). Dates of colonization were derived by my own research, mostly from Wikipedia country pages. I quickly found that establishing a date of colonization is a somewhat subjective decision. Do you choose the date of first European contact? Formal incorporation of the territory into the colonial empire? For the most part, I chose the date of the first permanent European settlement. Notes on the rationale for the date chosen are include in the data spreadsheet (below). In looking at the chart, it’s important to remember that in many cases colonial subjugation was a long process, moving from initial contact, to trade, conquest, settlement, and incorporation.
Constructing the Plot
I wanted to make this plot using
R, but was not sure about best approach. So I reached out on Twitter to dataviz guru and dot plot enthusiast @
The response from the #rstats and dataviz community was extremely constructive and useful. Users @, @, @, and @ all provided great examples (here, here, here, and here, respectively). In the end, I chose to adapt the approach taken by @.
A quick note on color: I choose colors from the flags of the principal colonial powers to represent them on the plot (except for the Netherlands for which I picked orange). The idea is to make it easier for the viewer to match the color with the country without having to always go back to the legend. I’d be interested in any reactions to this approach. In general, I’d be thrilled with any feedback on how to make this plot better.
Data and code for the plot:
Etymology of “tomato” in Europe and the Mediterranean
It’s been an extremely hot summer, which has led to a bumper crop of tomatoes. The harvest is so big that I’ve been bringing them to work to give to colleagues. I work in a very international office, and recently the discussion turned to how to say “tomato” in everyone’s native language. The results were interesting, and inspired this map (mouse over each country for more details):
The tomato plant is native to South America, but was first domesticated by the Aztecs in present-day Mexico. Their word for the fruit was tomatl*, which means something like “the swelling fruit”. The Spanish brought it to the New World in the 16th century, calling it a tomate.
Many languages still use a derivative of the Spanish word tomate, but another name arose in Italy. The Italian word for tomato is pomodoro, which came from pomo d’oro, or golden apple. Somehow** that name spread to Poland, where they say pomidor, and from there to Russian, Ukrainian, and several other languages.
A different name arose in some German dialects: Paradiesapfel, or “apple of paradise”, which for anyone who has eaten a ripe one right from the vine is an apt description. Although modern Germans way tomate, Austrians call it a paradieser, and variants of this were adapted into Czech, Slovak, Hungarian, Serbian and others.
In Arabic, it seems there are two common ways to say “tomato” (At least that’s what my friends tell me. I’d be happy for feedback from any Arabic linguists out there.) There’s tamatim (طماطم), which is used in North Africa. That, of course, comes from tomate. But in the Near East (Syria, Jordan, Lebanon), the common term is banadora (بندورة), from the pomo d’oro family.
It gets really interesting in Hebrew, which has a word for tomato unlike any other language. The word is agvania (עגבניה). It was coined only in 1886 and has as its root the Hebrew word for “to love, desire”. This name was chosen because of the archaic English term “love apple”, an homage to the apparent aphrodisiac properties of the tomato. More on the story of the Hebrew word here.
So there you have it. Pretty interesting for a fruit (vegetable?) only introduced to much of the world a few hundred years ago. Sources for map include Google Translate and Cultivated Vegetables of the World: A Multilingual Onomasticon, an actual book that actually exists. I made the map in CartoDB using the Watercolor base map from Stamen Design. If you want to see more etymology maps, there’s a subreddit dedicated to the topic.
And if all that hasn’t made you hungry from some apples of paradise, this will:
UPDATE: A few readers have correctly pointed out that what I have is a map of nation states, not a map of languages. For the sake of simplicity I am using national borders as a proxy for language regions. I should have specified that I selected the language for each country based on the official language, or if there is more than one, the most commonly spoken language. One negative consequence of that approach is that several states languages did not make it onto the map (e.g. Basque (tomate or tomatea) and Kurdish (temate)).
* More precisely, “tomatl” comes from the Nahuatl words “tomohuac” (swelling, roundness, fatness) and “atl” (water).
** I have subsequently been informed that “pomodoro” was introduced to Poland by the Italian noblewoman Bona Sforza, who became Queen of Poland by marriage in 1518.
Thanks to the members of reddit.com/r/etymologymaps for the helpful feedback and corrections
In honor of Swiss National Day I made a map of all the mountains in Switzerland accessible by public transport (cable car, gondola, cog-wheel railroad, funicular, and chairlift). With the Swiss transportation system you can get to almost all the base stations by train or bus. Having such great access to high places is one of my favorite things about Switzerland.
I made the map in CartoDB using data from a great Wikipedia page. There are about 100 peaks on the list, all with an elevation of at least 800 m, a topographic prominence of at least 30 meters, and a transport station within 120 m of the summit. The highest is the Klein Matterhorn, where you can take a cable car to within 20 m of the 3,883 m summit. The current weather at the time of writing? You guessed it – snow. Here’s the webcam.
For the base map, I used Open Street Maps Switzerland (easily done in CartoDB using XYZ map tiles). While it’s a little more cluttered than I’d like for a base map, the level of detail in the mountain areas is great. You can really zoom in to plan your trip.
One more feature I played with in CartoDB is the customizable infowindows. I added a photo and a link to the Wikipedia page of each peak so it’s more enjoyable to to explore the map and use it as a tool for planning your summit assaults.
I’m going to try something new on this blog: a book review. For my younger readers, a book is an object made of a series of static screen images printed on cellulose fiber. Think of it as a collection of thousands of tweets, Snapchat screenshots, and Facebook status updates all related by a common “narrative”. Or just ask your parents.
Despite its silly title, The Grapes of Math: How Life Reflects Numbers and Numbers Reflect Life by Alex Bellos is a fascinating look at some of the most interesting developments in mathematics throughout history. Math books often come in one of two flavors. There are the hard-core textbook-style books that quickly get over my head, despite having words like “elementary” and “introduction to” in the title. Then there are the overly simplified and popularized books that lack sufficient depth or patronize readers by refusing to ever show an equation. For me, The Grapes of Math hits the sweet spot between these extremes and does an extraordinary job of providing clear explanations of some really complex and abstract math, while still challenging a numerate reader.
Grapes is divided into chapters each dedicated to a broad topic in mathematics, like number theory, power laws, trigonometry, imaginary and complex numbers, exponential functions, complex systems, etc. Each chapter covers some of the history of the ideas, some explanations of the ideas themselves, and some modern applications or research. It’s by no means a comprehensive review of the history of mathematics or even all the big ideas. But there is plenty of fascinating material, not to mention amusing anecdotes about history’s parade of quirky mathematicians and improbable discoveries.
Rather than try to summarize everything, I’ll just highlight a few of the most interesting bits (for me at least).
One of the my favorite chapters, about power laws, starts by introducing Benford’s law. This law is all around us, present in many real-world data sets, but it’s so unexpected and counterintuitive that it was only discovered a century ago. Benford’s law states that for many real-life datasets, the first digits of each number in the set are not equally distributed as you might expect. In fact, the small digits (e.g. 1,2,3) occur much more frequently than the large ones (7,8,9). In almost any dataset that varies over an order of magnitude (and meets a few other criteria), about 30% of the first digits are one, and less than five percent are nine. This is really weird! Here is the distribution of first digits under Benford’s law:
The law was discovered by observing that the books of logarithm tables were more worn on pages with tables of numbers starting with the smaller digits. The log books phenomenon was first noticed in 1880, but then rediscovered by Frank Benford in 1938. Benford found this distribution in all sorts of totally unrelated data sets, like the populations of US cities, areas of river basins, atomic weights of the elements, even baseball statistics.
It turns out that this phenomenon is so widespread that Benford’s Law is used by forensic accountants (yes, that’s really a thing) to look for falsified or manipulated data.
If you want an explanation for why Benford’s Law occurs, you’ll have to read the book.
Another of my favorite parts of the book is about the famous Mandelbrot set. Stunning computer-generated images of the Mandelbrot set, like the one below, are often used to illustrate fractal geometry and chaotic behavior in numeric systems. A fractal is an object (or a set of numbers) that looks similar no matter what the scale. In nature, examples of fractal geometries include the trace of a coastline, the drainage patterns in river basins, the topography of mountain ranges and geologic fault systems.
When you approach the edges of the Mandelbrot set you see amazing complex patterns that just keeping going (and changing) no matter how far you zoom. There’re really no way to explain with words. You need to see it:
But what is the Mandelbrot set? I must confess that before reading The Grapes of Math I didn’t really know, despite working on fractals as a graduate student in geology. Bellos gives a clear explanation of how the set is generated, which is at the same time incredibly simple and very counterintuitive. The Mandelbrot set consists of complex numbers generated by iterating over a quadratic equation (). The numbers that do not go to infinity upon iteration are members of the set, all others are not. The pictures are generated by projecting the set on the complex plane (and sometimes adding color to the edges). That’s all it takes to generate this image of extraordinary, indeed infinite, complexity!
Despite the simplicity of the algorithm, the discovery and implications of the Mandelbrot set were far from trivial. The work of Benoit Mandelbrot (who made the first computer image of the set) helped usher in an entirely new understanding of chaos in deterministic systems.
And if you don’t remember anything about imaginary or complex numbers from high school or college math (I needed a refresher), don’t worry. The Grapes of Math does a good job of walking the reader though it. In fact, the development of imaginary numbers is an extraordinary story in and of itself.
Speaking of high school or college math, this book introduced me to a beautiful equation that I can’t believe I never learned before:
This is Euler’s identity, discovered by the brilliant Swiss mathematician Leonhard Euler. It links five of the most basic numbers in mathematics: , e (the exponential constant), i (the square root of negative one), zero, and one. Why am I only finding out about this now?
Euler’s identity an example of math at its most elegant and mysterious. Mathematician Benjamin Pierce once said the Euler’s Identity is “absolutely paradoxical; we cannot understand it, and we don’t know what it means, but we have proved it, and therefore we know it must be the truth”.
This is a fitting description for many of the mathematical concepts discussed in The Grapes of Math. The book shows how the history of math is a progression toward greater abstraction, from what we can physically see and count and measure, to concepts like Euler’s identify that cannot be intuitively understood, only discovered through applying mathematical logic.
I highly recommend this book. It’s perfect for summer beach reading, if you’re the kind of person who likes to draw curves and equations in the wet sand by the shore of a fractal coastline.
My last post was about the 1960 Chile megathrust earthquake, and how much energy it released (about 1/3 of all seismic energy on earth over the last 100 years). I used data from USGS on all earthquakes greater than magnitude 6 from 1915-2015. Since I had this nice dataset (about 10,500 quakes), I could not resist playing around in CartoDB to make some nice visualizations.
This is an animated map of all earthquakes since 1915 using the Torque function in CartoDB. I know this has been done many times, but it makes such a striking image it’s hard to resist. If you watch closely you’ll notice that the earthquakes seem to occur more frequently towards the end of the time lapse (starting in the 1960s). That’s because seismologists got better at measuring and recording earthquakes, not because the quakes actually became more frequent.
This is a heatmap of all quakes in the dataset. The Pacific ring of fire (the arcs of subduction zones encircling much of the Pacific Ocean) dominates the global pattern. The mid-ocean spreading centers are also visible, but not as pronounced the ring of fire. There are fewer big earthquakes in the extensional spreading centers than the compressional subduction zones. There is also a broad zone of earthquake activity that stretches from Italy and Greece through Asia Minor, Iran, Central Asia, the Himalayas, into China. This is a huge zone of compression caused by the African, Indian, and other small plates colliding with Eurasia.
This map shows earthquake depth, with deep earthquakes in red, intermediate depth in orange, and shallow in yellow. Plotting earthquake depth on a map illustrates the geometry of subduction zones. For example, in South America, the ocean crust of the Nazca plate (under the Pacific Ocean) is subducting under the South American plate. As the Nazca plate plunges eastward at an angle, the earthquakes produced get deeper with distance to the east.
You can pan and zoom right in the embedded maps if you are keen to explore. You can also make the maps full screen using the button on the upper left.