An Easy Way to Map Data in R with Plotly

A couple of years ago, I wrote The complete n00bs guide to mapping in R, my first adventure into R. While that tutorial still holds up, if you’re looking to make a state-level Choropleth Map, there really isn’t anything easier than working with Ploty in R.

Once you get R and RStudio installed and set up, there’s only a few steps that you need to take. If you have a spreadsheet or can make one easily enough of state-level data, like this ranking of mental health and access in the USA by states , you only need a couple of lines of code (minus all the comments that follow the #s).

Let’s start by getting plotly set up.
install.packages("plotly") #installs the Plotly library for R
library("plotly") #tells R you want to use the Plotly library package

I just made a CSV file by copying the information from the website. One column was “state” and the other was “rank.” Because the data was which state and its rankings it was simple. Warning: In this case, the states do need to be copied as postal abbreviations for this to work. Plotly can also do countries. Check out their documentation for the changes you’ll need to make.

Import your spreadsheet and give it a name to use in R
mh<- read.csv("C:\\Users\\Documents\\mentalhealth.csv", header = TRUE, sep = ",")

And we're ready to plot it!
plot_ly(type="choropleth", locations=mh$state, locationmode="USA-states", z=mh$rank) %>% layout(geo=list(scope="usa"), )

And did I mention it's interactive!

A map of the US with the states shaded in colors ranging from dark blue to yellow
Click for a full size map

Another Example
Because plotly makes the mapping so simple, I finally got around to looking at the geographic distribution of the All-America City Award. It started in 1949 and the city I grew up in (Grand Rapids, MI) was one of the inaugural winners in 1949. Since finding out about the award, I was curious if there was any states that did particularly well. However, it was one of those curiosities that was never really worth the effort. Until Plotly made it super easy!

I copied the table straight from Wikipedia and stripped it down to the just the state column. A few metropolitan areas are listed with multiple state winners so split the entry giving both states their own line in the data. I also deleted Puerto Rico (sorry Puerto Rico). I then brought it the file as above.
usacities<- read.csv("C:\\Users\\Documents\\allamericanwins.csv", header = TRUE, sep = ",")

This time, I took a few extra steps. Because my data was just a list of states over and over again (Alabama, Alabama, Wisconsin, North Carolina, Wisconsin, etc), I needed to count how many times each state was in the file. R makes it easy to generate a frequency table.
(I told you it was easy)

I then made a new file out of the frequency table because that's how I roll.
write.csv(table(usacities), file = "C:\\Users\\Documents\\allamericanwinsfreq.csv")

I changed the state names to postal abbreviations and add NV and UT who had zero wins, then I was ready to bring in the file and plot the map.
usa<- read.csv("C:\\Users\\Documents\\allamericanwins1.csv", header = TRUE, sep = ",") plot_ly(type="choropleth", locations=usa$state, locationmode="USA-states", z=usa$wins) %>% layout(geo=list(scope="usa"), )
I suppose I could have done this at the beginning and skipped the whole writing a new document thing but hey that's hindsight for you.

A map of the US with states shaded from dark red to light grey
Click for a full size map

It looks like the upper Midwest and and North Carolina/Virginia are the big winners of the award. Because I wanted to get an idea of how this fit with the state's populations, I did some more simple calculations (finding the difference in each state's ranking of number of times they won the award and the rank in total population. This map shows Alaska as a big winner. The Great Plains stays as a winner and this time the South is shown to be a loser when it comes to this award. New England did ok too.

A map of the US with states shaded in colors ranging from blue to red
Click for a full size map

Now go forth and create choropleth maps!

The Complete n00b’s Guide to Gephi

Because my last tutorial, The Complete n00b’s Guide to Mapping in R, received a positive response, I decided to create another beginner’s guide to visualizing data. For this edition, I’ve chosen Gephi, an excellent and simple tool to do social network analysis. This tutorial is meant to get you started quickly and provide the basics of using Gephi.

Step 1: Get set up
Download Gephi, install it, open it up and start a new project.

Step 2: Import a Spreadsheet
So you have a spreadsheet, maybe one like this list of bankers in Grand Rapids from 1902 (gleaned from Google Books). You will need to have your spreadsheet saved as a CSV file (comma-separated values).

We will be importing an “Edges” table, meaning the spreadsheet will have the necessary data to establish relationships between nodes. Your CSV file will need two columns, “Source” and “Target.” In my spreadsheet, I’ve made the following changes: Name –> Source / Bank –> Target. In my data, there is no direction of the relationship. If your data does have a specific direction you will need to carefully select which is Source and which is Target.

To import your spreadsheet, make sure you have a “Data Table” tab open (1). If you don’t, click on “Window” (2) in the menu and select “Data Table” to open it. My spreadsheet is an Edges table so make sure, you have clicked on the “Edges” (3) button on the left side of the window. This will allow you to view the edges once you import them. Then click on the “Import Spreadsheet” button (4) in the middle of the Data Table tab and browse for your file (5). Make sure the file will be imported as a edges table. (6)

Screenshot of importing a spreadsheet

After hitting “Next” you will have options as to which columns to import. For my spreadsheet, it is not necessary to import the other columns so you can uncheck those boxes. Keep the “Create missing nodes” box checked and you will not need to import another spreadsheet.

Tip: If you have a separate spreadsheet(s) of data relating to your visualization you can repeat the above steps. If the data only relates to your nodes and not the relationships between them (edges), save it as a “Nodes” table instead of an Edges table (6 – above). When importing multiple spreadsheets be careful creating nodes twice. The best way to avoid duplication is by having an “id” column. The “id” column will tell Gephi the data is all the same nodes instead of creating new nodes each time you import it. Like the “Create missing nodes” box in Edges tables, there is a “Force nodes to be created as new ones” option when importing Nodes. If you’ve already created nodes with your edges spreadsheet, you should probably avoid creating new nodes with a nodes table.

Step 3: Layout your Visualization
Make sure you have a Graph tab open (1) and then check out your visualization. It will resemble a grey blob (2). Now you can run some data analysis. Gephi has a number of options on the right side of the window. Gephi will give you short explanations after you choose to run them. For my example, I will select Eigenvector Centrality (undirected) (3) and Modularity (without weights) (4). After selecting options and running the tests a graph will appear. You can look at it or ignore it and continue, your call.

You can also choose from a number of layouts to better organize your visualizations (5). For my spreadsheet, I’ve selected a Yifan Hu layout. Click run (6) and watch the nodes scramble to new locations.

Gephi screenshots of laying out a visualizations

Step 4: Make it Pretty
Once you have tests run, you can begin altering the color and size of your nodes or edges. For my example, we will click on the Nodes button (1) to change the color (2) and size (3) of Nodes. I have selected to use Eigenvector Centrality for the size and modularity class for the color using the drop down menu (4) and clicking apply (5) after each one. Your resulting visualizations (6) will look much better.

Screenshot of changing color and size of nodes

Step 5: Apply Polish
To get a better look at your visualization, click on the “Preview” tab (1). You may need to hit “refresh” first (2). There are a number of default settings (3) that you can explore and customize.

Screenshot of Preview

Node labels are a valuable tool to help your visualization, but we have not set them up yet and they were not in our original spreadsheet. To create these labels, we will go back to the “Data Table” tab (1) and copy data (2) from “Id” to “Label” (3). Once you return to the preview tab, you can add node labels without any problem.

Screenshot of adding label

Step 6: Export
Clicking on File > Export will give you a number of options to export your image. Another useful tool to use is Sigmajs Exporter, a plugin for Gephi that will allow you to export your visualization as a dynamic webpage.

Hopefully, this tutorial has given you a quick way to understand the basics of working with Gephi. There are a number of ways to customize your visualizations so keep exploring!

Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to

What makes your city famous?

Last weekend, I was watching Clueless and looked up Pismo Beach, California on Wikipedia after Cher spearheads the disaster relief efforts for that city. I was pleasantly surprised to find that Pismo Beach claims to be the “Clam Capital of the World.” My dissertation examines the identities of cities claiming to be the “Capital of the World” in various industries during the Gilded Age and Progressive Era and I recently released a digital project examining how Grand Rapids took on the identity of “Furniture City”. City identities, particularly ones based on economics, have been an interest of mine for a while now and often produce lovably quirky nicknames and slogans.

Wondering what other cities and towns integrate specific products into their urban identity, I am launching a project where I ask What Makes Your City Famous? I have created a simple form that asks for the City, Zip Code, City Nickname or Slogan and Source (where I can verify the information). Once the information is collected, I use Google Fusion Tables to create an interactive map that displays the cities and their nicknames. I chose Google Fusion Tables because of the ease in transitioning from a Google Form to Map and used zip codes to help cities display multiple nicknames. I populated the map with a few cities to start, but hope that this project will draw on user knowledge from across the country to uncover a variety of local identities.

So please, tell me What Makes Your City Famous?

Constructing Furniture City

Last year, I had a wonderful opportunity to be one of the initial fellows of the Center for Digital Research in the Humanities’s Digital Scholarship Incubator. I pitched an ambitious agenda during which I would create many varied visualizations all of which would evaluate the industrial ability of certain cities during the Gilded Age and Progressive Era. My final product, however, is quite different.

A few weeks into the incubator, I presented at a “spring showcase” for student projects where I briefly discussed some initial maps and some of the issues I encountered working with quantitative data and qualitative concepts. What struck me from the audience’s very helpful comments and questions was that I needed more context, both in terms of my historical narrative and argument as well as my methodology and the thought process behind my editorial decisions. The search for context would deeply shift my focus throughout the course of my time as a fellow.

In terms of needing more historical context, I eventually settled on building a project framework that would integrate visualizations into narrative. Traditional scholarly questions would drive my digital research. This lead to the creation of Constructing Furniture City the project that houses my work developed while part of the Incubator and it’s parent project The Rise and Fall of the American Small City which will house my various dissertation related digital history projects.

When developing the narrative projects, I carved out substantial space devoted to explaining my methodology. By allowing myself to expand on the editorial decisions behind each visualization, I believe that I have expanded my audience. An expanded methodology opens my project to those interested in the digital questions as well as those interested in the historical ones.

In presenting Constructing Furniture City to the “fall showcase,” I emphasized this journey from a project focused primarily on visualization production to one of historical narrative. While both types of projects have their merits, my final project works to bridge the gap between them, allowing users to explore the history, the methods, or both.

Quantifying Prestige

As with any scholarly project, in my dissertation on the development of small cities during the Gilded Age and Progressive Era I need to explain why it matters. I argue these cities are worth the time and effort of a dissertation because they provide a different narrative of urbanization and industrialization. Key to this alternative narrative is the dominant role of niche industries within each city. The cities built an urban identity around these industries, often claiming to be the “capitol of the world” in crafting a certain product. In addition to being catchy, these city slogans are actually quite central to my argument. As part of my work as a Digital Scholarship Incubator Fellow at UNL’s CDRH, I decided to use business and organizational records to attempt to see whether or not my case studies were in fact a leader in their respective industries. Rather quickly, I found calculating prestige would be more difficult than I had thought. While many sources frequently discuss these cities as industry leaders, quantifying this anecdotal evidence is a more complex project.

I began with Grand Rapids, in part because the sources I had were the easiest to copy into spreadsheets. My initial data set was the attendance records of the city’s furniture markets, during which buyers would travel to see new products. These furniture markets were a key in building up Grand Rapids as a leader in the industry. Though the markets dated back to 1878, my records began in 1923 (I have the numbers for many later years, but I decided to use 1933 as a cut off date because my dissertation’s focus ends around the Great Depression). Working with the numbers as spreadsheets I noticed a decline as the economy worsened, as I expected from prior research about the city and its industry.

I also noticed the most buyers attended from Michigan and nearby states: Illinois, Pennsylvania, New York, and Ohio. Although not terribly surprising, it did raise an important question: were these markets a sign of influence on the industry’s national stage, or were they simply large regional events? Could they be both? As I presented my early stage research to the UNL DH community and toyed with mapping the data, the question became more ambiguous. Cities like Grand Rapids did have a larger market share than their population size would indicate, but the city’s reputation was built on quality, not quantity. I had set out to measure abstract concepts like reputation, influence, and prestige while using very concrete numbers.

This dilemma became painfully clear as I created cholorpleth maps from the furniture market data. While thankfully R allowed me a quick and easy way to create these maps, how I colored them deeply affected the way in which the reader would perceive the data.

Continuous Scale Map

Using a sliding color scale, the dominance of Michigan and the surrounding states is clear. The vast majority of the nation remains close to zero while the hundreds of delegates from a few select states clearly dominated the markets. However, breaking the map into buckets shows a larger base from which the markets pulled buyers, suggesting it may be more than a purely regional event.

Map with Buckets

Even determining the size of the buckets was a difficult judgment call. How many buyers from each state does it take to make the event “national”? Obviously, the hundreds from Michigan are noteworthy, but what about the three dozen from Oklahoma? Are they insignificant? The next step seems to be weighting the attendance records by population, or furniture production/consumption, or some other metric. While I ponder how to proceed and this question of measuring reputation, for the time being, I’m moving on to working with spreadsheet data for other visualizations but I would greatly welcome any advice (in the comments, via email, twitter, etc)

The complete n00b’s guide to mapping in R

You should also check out the next tutorial in the series: The Complete n00b’s Guide to Gephi

A few weeks ago, I presented to the UNL DH community about a project that I’m beginning while a fellow at the CDRH’s Digital Scholarship Incubator. The project is an effort to utilize digital tools to visualize business and organizational records related to my dissertation on industrialization in small cities. During my talk, I noted I was still uncertain as to what tool to use to create my maps, but thankfully, James Austin Wehrwein was also presenting. Afterwards he suggested I consider R and check out his tutorial on creating a density map in R.

Frankly, I was blown away how easy it was to create a map in R. His tutorial was easy to follow and acclimated me to R rather quickly. On top of this tutorial, I realized that the data I had used initially in Gephi already contained the coordinates for each geographic location and I would not need to clean up my data, reducing the number of steps even further. Convinced R was my new best friend, I began looking around for a way to create choropleth maps, which were another type of visualization I wanted in my project. I was thrilled to find someone had already done much of the heavy lifting and there was a packet that made the process so easy even I could create maps without pulling my hair out.

In the interest in helping out other n00bs, I’ve posted my steps in creating these maps below:

Creating a Density Map

Packets you’ll need:

> library(“ggmap”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)

Import Spreadsheet:

> ph<- read.csv("C:\\Users\\Home\\Documents\\school\\shipping.csv", header = TRUE, sep = ",") ph is just a placeholder, use whatever name you want Create Map: > map<-get_map(location='united states', zoom=4, maptype='roadmap') ggmap(map)+geom_point(aes(x=longitude, y=latitude, size=(total.cost)), data=ph, alpha=.5) This is all you need to do if you already have the longitude and latitude coordinates. Again, see creating a density map in R”>this tutorial if you don’t have already clean data.

That was easy!
That was easy!

Creating a Choropleth Map
This user guide is how I figured it out and has much more information than I give.

Packets you’ll need:

> library(“choroplethr”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)
> library(“Hmisc”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)

Import Spreadsheet:

> df<- read.csv("C:\\Users\\Home\\Desktop\\W1923.csv", header = TRUE, sep = ",") You'll see that it's the same process as above, I've just switched the letters I'm using as the name to help confuse you. The beauty of choroplethr is that you don't need any latitude or longitude coordinates. The program can identify states by either full name or postal abbreviation, counties by FIPS code and even by zip code. For your spreadsheet, you'll just need to creat two columns: "value" which has your data, and "region" which is your state/county code/zip code. To Create an Choropleth Automatically: > choroplethr(df, “state”, num_buckets = 6, title = “W1923”, scaleName = “Buyers”, showLabels = T, states =

The size of the buckets will be automatically configured, but you can also have a continuous scale if you designate the number of buckets as 1. Here you should change the title and scaleName to whatever you want it to say. Note “df” tells the program the name of my spreadsheet and “state” tells the program what kind of “region” to look for in my data. You’re line would read “county” or “zip” if you are not using state names.

It's a continuous scale!
It’s a continuous scale!

Sizing your buckets
Now if you don’t want the program to automatically determine the size of your buckets, you can do the following:

> = bind_df_to_map(df, “state”)
>$value = cut2($value, cuts=c(0,50,100,150,Inf))
> render_choropleth(, “state”, “Grand Rapids Winter Market 1923”, “Buyers Attending”)

Here I’ve told the program to create buckets with dividing lines at 0, 50, 100, and 150. Simply add or subtract numbers here to create the desired number and size of your buckets. Also notice that the data you are pull from has changed from df to which I created with “bind_df_to_map”. Again, “state” would be replaced with “county” or “zip” if using one of them them.

Ta Da!
Ta Da!

I’m still learning R and figuring out how to better improve these maps, but if you’re looking for something quick and effective these maps are hard to beat.

The only question remains is: R you ready to give it a try?

(Be thankful I only included one “R” based pun)

Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to

Lincoln Eats (and Drinks)

With people in town for DH 2013 I thought I’d try to be useful and offer some of quick impressions of local restaurants and bars in hopes that visitors leave thinking Nebraska has things other than chain sandwich shops (seriously, there are way too many downtown).

General Geography
“O” Street: a dozen or so square blocks located directly South of UNL, undergrad focused places so quick lunch places and bars with cheap drinks
Haymarket district: a slightly smaller area just West of campus, a bit fancier on the whole, date night locations and hangouts for people with jobs so think sit down meals and cocktails

“O” Street
Grateful Greens: Big salads made to order and sandwiches, get the salad (duh)
The Sultan’s Kite: If Chipotle had a gyro making sibling, get the falafel
The Watering Hole: Best wings in town and very good burgers, but not great if you’re a vegetarian, get the grilled wings
Bisonwitches: A good spot for soup and sandwiches, get a half sandwich and cup of soup (your choice)
Misty’s: Go here if you have someone footing the bill for your food, get a steak (it’s Nebraska after all)
The Dish: See above
Yia Yia’s: Specialty pizzas and a large beer selection, but beware of hipsters, get a PBR tall boy (so you blend in)
Barrymore’s: It’s hidden down an alley, but it’s worth looking for, one of the nicer bars in the O street area, a good place to get a cocktail
Zen’s: Similar to Barrymore’s but the entrance is on the street
The Zoo Bar: Best place to catch live music in town, get a drink that goes well with blues music

Bread & Cup: Delicious food, freshly prepared, get everything (seriously, go there and order one of everything)
Maggie’s Vegetarian Wraps: More than just wraps, lots of vegan options, get something vegetarian, but make sure to bring cash
Ivanna Cone: Go here for your after dinner cone of ice cream, get the strangest flavor you can find on the menu
Lazlo’s: A menu of typical Applebee’s-ish American food but with their own brewery, get something to match your choice of beer
Buzzard Billy’s: A sports bar with some southern flair, get the aligator
The Oven: One of the nicer restaurants in Lincoln, great Indian food and extensive wine list, get a bunch of things to share but make sure to order the soup
Starlight Lounge: A retro themed bar located below street level, great cocktails, get a martini
The Cask: I just found this place, a smaller bar with a good selection of liquor, get a whiskey

For the More Adventurous
Valentino’s: Local chain of pizza, get the bacon cheeseburger pizza delivered or go to the mega-buffet on 70th Street
Ming’s House: Hands down my favorite place to eat in Lincoln, it’s outside of downtown and doesn’t deliver so it only really works if you have access to a car, I’ve spent literally hundreds of dollars on Kung Pao chicken and crab rangoons while living here
Heoya: One of Lincoln’s food trucks so you’ll have to check out their Facebook or Twitter accounts to track them down, but they’re worth the effort, Asian fusion type food, get a few different type of the tacos and firecracker rangoons

Bars you’ll be likely to run into a grad student
Duffy’s, O’Rourkes: Dive-ier bars but cheap drinks and friendly people, two of my more frequent hangouts
The Tavern: Smaller bar with a nice area for sitting outside, no draft beer which always bugs me but a good selection of whiskeys and bottled beer

Suggestions from Twitter [Updated section]
Blue Orchid: Best Thai food in town, located downtown
HF Crave: Locally sourced burgers, located in South Lincoln (outside of downtown)

DH Forum

Attending part of UNL’s Digital Humanities Forum last Friday, a rather simple concept struck me as deeply important. As scholars, how certain are we of our conclusions? What percentage? Using a specific measurement, can we express our certainty?

In a sense, historical arguments are mostly circumstantial. Historians use sources to describe societies, ideas, and events, but complete reconstruction or replication is impossible. Instead, we build a case for our arguments with our supporting evidence to convince our audience, that’s why there are so many large monographs with extensive citations.

Fields in which replication of experiments is possible seem to write less. My limited interaction with social scientists for example suggests a big difference in the main form of scholarly publication: books and some articles for historians and articles and maybe some books for other fields. At the core, we are doing much of the same work. Collecting data to build an argument about some experience. Historians rely on examples from large collections of sources instead of running experiments though. While most social scientists will quantify their results through statistics, most historians utilize more qualitative methods.

Certainly, historians must not abandon narrative. In fact, historical scholarship needs more narrative in its history if anything. But wouldn’t it be nice to have a quantifiable measure of certainty supporting an argument? Quantitative history was fashionable for a short period of time, but using complex statistics correctly and writing a rich narrative are difficult enough to do on their own, let alone together.

Though a more quantitative argument could prove alienating to those unfamiliar with statistics, I think it would actually provide scholars with an easier way to engage the public and distinguish their work from less historically rigorous popular history books and wild claims by historians. Obviously, these are preliminary thoughts (with no supporting statistics), but the question of certainty is one in which historians must not take as assumed.

Is DH Hipster?

As a self-described digital humanist with admittedly hipster tendencies (I have a record player after all) this question may be entirely self-serving. However, I’m not the first person to put the two together, so I thought I would throw the comparison out there.

Hipsters like organic and local.
DHers like open access and open source.

Hipsters like old things (record players, typewriters, old clothing etc).
DHers like old things (especially the historians).

Hipsters like new technology (i.e. Apple products).
DHers like new technology (it’s the “digital” part).

Hipsters listen to music you’ve never heard of.
DHers have #dhmusic.

Hipsters fight over who is a hipster.
DHers fight over who is in DH.

Hipsters “tend to have obscure, intellectual or artsy college degrees” and “have a certain attitude — a blend of indifference, sleepiness and snobbery”.
DHers went to grad school. And I mean if this guy’s profile picture doesn’t epitomize a blend of indifference, sleepiness and snobbery, I don’t know what does.


Every couple of months it seems that one of my friends teases me about one of my first blog posts [re-posted here]. I’ll admit liking Pomplamoose is pretty hipster, but hey I like the music. I also really like their success in going around the traditional gatekeepers of the music industry. They first gained success by posting music on youtube (free of charge, of course), but what really struck me was when I saw that half of Pomplamoose (Nataly Dawn) raised over $100,000 to record a solo album after asking for $20,000 on Kickstarter. Then a gaming company raised over 1 million for a game console in 8 hours (that’s over 2k a minute). Clearly people will pay to fund the things they like. And this got me thinking would people fund my research? Would someone give me $25 to present at a conference if I sent them the conference paper and talked with them about it online? Would someone give me $50 to give a sample run of my presentation to them over Skype? Would someone give me $100 for research if I thanked them in an article or my dissertation?

Well, Kickstarter doesn’t allow “fund my life” projects, which is what asking for funds for research or conference presentations would likely be classified as. And anyways there are already funds for travel and research in academia (from departments, institutions, and external agencies), right? Of course, but it always seems like humanities funding is hanging by a thread. The humanities always need to justify their importance in the changing modern world and what better way to justify your work than to have the public prove it with their support?

Kickstarter is filled with people self-publishing books, magazines, films, albums and other things. While self-publishing is a big no-no in academia, could a journal get funded through a Kickstarter project? (I’d be interesting in hearing about it if one has). What if someone made an academic version of Kickstarter? Maybe only projects that have built in buff fanbases (like the American Civil War) would get funded, but I’m sure those historians wouldn’t mind. One of thing that I have put into many different blog posts is that many people love history and academic historians seem unable or unwilling to take advantage of this. Browsing Kickstarter, I just see so much money being moved around for creative projects and I can’t help but think academics are missing out on a powerful funding force (the people).