An Easy Way to Map Data in R with Plotly

A couple of years ago, I wrote The complete n00bs guide to mapping in R, my first adventure into R. While that tutorial still holds up, if you’re looking to make a state-level Choropleth Map, there really isn’t anything easier than working with Ploty in R.

Once you get R and RStudio installed and set up, there’s only a few steps that you need to take. If you have a spreadsheet or can make one easily enough of state-level data, like this ranking of mental health and access in the USA by states , you only need a couple of lines of code (minus all the comments that follow the #s).

Let’s start by getting plotly set up.
install.packages("plotly") #installs the Plotly library for R
library("plotly") #tells R you want to use the Plotly library package

I just made a CSV file by copying the information from the website. One column was “state” and the other was “rank.” Because the data was which state and its rankings it was simple. Warning: In this case, the states do need to be copied as postal abbreviations for this to work. Plotly can also do countries. Check out their documentation for the changes you’ll need to make.

Import your spreadsheet and give it a name to use in R
mh<- read.csv("C:\\Users\\Documents\\mentalhealth.csv", header = TRUE, sep = ",")

And we're ready to plot it!
plot_ly(type="choropleth", locations=mh$state, locationmode="USA-states", z=mh$rank) %>% layout(geo=list(scope="usa"), )

Boom!
And did I mention it's interactive!

A map of the US with the states shaded in colors ranging from dark blue to yellow
Click for a full size map

Another Example
Because plotly makes the mapping so simple, I finally got around to looking at the geographic distribution of the All-America City Award. It started in 1949 and the city I grew up in (Grand Rapids, MI) was one of the inaugural winners in 1949. Since finding out about the award, I was curious if there was any states that did particularly well. However, it was one of those curiosities that was never really worth the effort. Until Plotly made it super easy!

I copied the table straight from Wikipedia and stripped it down to the just the state column. A few metropolitan areas are listed with multiple state winners so split the entry giving both states their own line in the data. I also deleted Puerto Rico (sorry Puerto Rico). I then brought it the file as above.
usacities<- read.csv("C:\\Users\\Documents\\allamericanwins.csv", header = TRUE, sep = ",")

This time, I took a few extra steps. Because my data was just a list of states over and over again (Alabama, Alabama, Wisconsin, North Carolina, Wisconsin, etc), I needed to count how many times each state was in the file. R makes it easy to generate a frequency table.
table(usacities)
(I told you it was easy)

I then made a new file out of the frequency table because that's how I roll.
write.csv(table(usacities), file = "C:\\Users\\Documents\\allamericanwinsfreq.csv")

I changed the state names to postal abbreviations and add NV and UT who had zero wins, then I was ready to bring in the file and plot the map.
usa<- read.csv("C:\\Users\\Documents\\allamericanwins1.csv", header = TRUE, sep = ",") plot_ly(type="choropleth", locations=usa$state, locationmode="USA-states", z=usa$wins) %>% layout(geo=list(scope="usa"), )
I suppose I could have done this at the beginning and skipped the whole writing a new document thing but hey that's hindsight for you.

A map of the US with states shaded from dark red to light grey
Click for a full size map

It looks like the upper Midwest and and North Carolina/Virginia are the big winners of the award. Because I wanted to get an idea of how this fit with the state's populations, I did some more simple calculations (finding the difference in each state's ranking of number of times they won the award and the rank in total population. This map shows Alaska as a big winner. The Great Plains stays as a winner and this time the South is shown to be a loser when it comes to this award. New England did ok too.

A map of the US with states shaded in colors ranging from blue to red
Click for a full size map



Now go forth and create choropleth maps!

The Complete n00b’s Guide to Gephi

Because my last tutorial, The Complete n00b’s Guide to Mapping in R, received a positive response, I decided to create another beginner’s guide to visualizing data. For this edition, I’ve chosen Gephi, an excellent and simple tool to do social network analysis. This tutorial is meant to get you started quickly and provide the basics of using Gephi.

Step 1: Get set up
Download Gephi, install it, open it up and start a new project.

Step 2: Import a Spreadsheet
So you have a spreadsheet, maybe one like this list of bankers in Grand Rapids from 1902 (gleaned from Google Books). You will need to have your spreadsheet saved as a CSV file (comma-separated values).

We will be importing an “Edges” table, meaning the spreadsheet will have the necessary data to establish relationships between nodes. Your CSV file will need two columns, “Source” and “Target.” In my spreadsheet, I’ve made the following changes: Name –> Source / Bank –> Target. In my data, there is no direction of the relationship. If your data does have a specific direction you will need to carefully select which is Source and which is Target.

To import your spreadsheet, make sure you have a “Data Table” tab open (1). If you don’t, click on “Window” (2) in the menu and select “Data Table” to open it. My spreadsheet is an Edges table so make sure, you have clicked on the “Edges” (3) button on the left side of the window. This will allow you to view the edges once you import them. Then click on the “Import Spreadsheet” button (4) in the middle of the Data Table tab and browse for your file (5). Make sure the file will be imported as a edges table. (6)

Screenshot of importing a spreadsheet

After hitting “Next” you will have options as to which columns to import. For my spreadsheet, it is not necessary to import the other columns so you can uncheck those boxes. Keep the “Create missing nodes” box checked and you will not need to import another spreadsheet.

Tip: If you have a separate spreadsheet(s) of data relating to your visualization you can repeat the above steps. If the data only relates to your nodes and not the relationships between them (edges), save it as a “Nodes” table instead of an Edges table (6 – above). When importing multiple spreadsheets be careful creating nodes twice. The best way to avoid duplication is by having an “id” column. The “id” column will tell Gephi the data is all the same nodes instead of creating new nodes each time you import it. Like the “Create missing nodes” box in Edges tables, there is a “Force nodes to be created as new ones” option when importing Nodes. If you’ve already created nodes with your edges spreadsheet, you should probably avoid creating new nodes with a nodes table.

Step 3: Layout your Visualization
Make sure you have a Graph tab open (1) and then check out your visualization. It will resemble a grey blob (2). Now you can run some data analysis. Gephi has a number of options on the right side of the window. Gephi will give you short explanations after you choose to run them. For my example, I will select Eigenvector Centrality (undirected) (3) and Modularity (without weights) (4). After selecting options and running the tests a graph will appear. You can look at it or ignore it and continue, your call.

You can also choose from a number of layouts to better organize your visualizations (5). For my spreadsheet, I’ve selected a Yifan Hu layout. Click run (6) and watch the nodes scramble to new locations.

Gephi screenshots of laying out a visualizations

Step 4: Make it Pretty
Once you have tests run, you can begin altering the color and size of your nodes or edges. For my example, we will click on the Nodes button (1) to change the color (2) and size (3) of Nodes. I have selected to use Eigenvector Centrality for the size and modularity class for the color using the drop down menu (4) and clicking apply (5) after each one. Your resulting visualizations (6) will look much better.

Screenshot of changing color and size of nodes

Step 5: Apply Polish
To get a better look at your visualization, click on the “Preview” tab (1). You may need to hit “refresh” first (2). There are a number of default settings (3) that you can explore and customize.

Screenshot of Preview

Node labels are a valuable tool to help your visualization, but we have not set them up yet and they were not in our original spreadsheet. To create these labels, we will go back to the “Data Table” tab (1) and copy data (2) from “Id” to “Label” (3). Once you return to the preview tab, you can add node labels without any problem.

Screenshot of adding label

Step 6: Export
Clicking on File > Export will give you a number of options to export your image. Another useful tool to use is Sigmajs Exporter, a plugin for Gephi that will allow you to export your visualization as a dynamic webpage.

Hopefully, this tutorial has given you a quick way to understand the basics of working with Gephi. There are a number of ways to customize your visualizations so keep exploring!




Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com




The complete n00b’s guide to mapping in R

You should also check out the next tutorial in the series: The Complete n00b’s Guide to Gephi

A few weeks ago, I presented to the UNL DH community about a project that I’m beginning while a fellow at the CDRH’s Digital Scholarship Incubator. The project is an effort to utilize digital tools to visualize business and organizational records related to my dissertation on industrialization in small cities. During my talk, I noted I was still uncertain as to what tool to use to create my maps, but thankfully, James Austin Wehrwein was also presenting. Afterwards he suggested I consider R and check out his tutorial on creating a density map in R.

Frankly, I was blown away how easy it was to create a map in R. His tutorial was easy to follow and acclimated me to R rather quickly. On top of this tutorial, I realized that the data I had used initially in Gephi already contained the coordinates for each geographic location and I would not need to clean up my data, reducing the number of steps even further. Convinced R was my new best friend, I began looking around for a way to create choropleth maps, which were another type of visualization I wanted in my project. I was thrilled to find someone had already done much of the heavy lifting and there was a packet that made the process so easy even I could create maps without pulling my hair out.

In the interest in helping out other n00bs, I’ve posted my steps in creating these maps below:

Creating a Density Map

Packets you’ll need:

> library(“ggmap”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)

Import Spreadsheet:

> ph<- read.csv("C:\\Users\\Home\\Documents\\school\\shipping.csv", header = TRUE, sep = ",") ph is just a placeholder, use whatever name you want Create Map: > map<-get_map(location='united states', zoom=4, maptype='roadmap') ggmap(map)+geom_point(aes(x=longitude, y=latitude, size=(total.cost)), data=ph, alpha=.5) This is all you need to do if you already have the longitude and latitude coordinates. Again, see creating a density map in R”>this tutorial if you don’t have already clean data.


That was easy!
That was easy!



Creating a Choropleth Map
This user guide is how I figured it out and has much more information than I give.

Packets you’ll need:

> library(“choroplethr”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)
> library(“Hmisc”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)

Import Spreadsheet:

> df<- read.csv("C:\\Users\\Home\\Desktop\\W1923.csv", header = TRUE, sep = ",") You'll see that it's the same process as above, I've just switched the letters I'm using as the name to help confuse you. The beauty of choroplethr is that you don't need any latitude or longitude coordinates. The program can identify states by either full name or postal abbreviation, counties by FIPS code and even by zip code. For your spreadsheet, you'll just need to creat two columns: "value" which has your data, and "region" which is your state/county code/zip code. To Create an Choropleth Automatically: > choroplethr(df, “state”, num_buckets = 6, title = “W1923”, scaleName = “Buyers”, showLabels = T, states = state.abb)

The size of the buckets will be automatically configured, but you can also have a continuous scale if you designate the number of buckets as 1. Here you should change the title and scaleName to whatever you want it to say. Note “df” tells the program the name of my spreadsheet and “state” tells the program what kind of “region” to look for in my data. You’re line would read “county” or “zip” if you are not using state names.


It's a continuous scale!
It’s a continuous scale!



Sizing your buckets
Now if you don’t want the program to automatically determine the size of your buckets, you can do the following:

> df.map = bind_df_to_map(df, “state”)
> df.map$value = cut2(df.map$value, cuts=c(0,50,100,150,Inf))
> render_choropleth(df.map, “state”, “Grand Rapids Winter Market 1923”, “Buyers Attending”)

Here I’ve told the program to create buckets with dividing lines at 0, 50, 100, and 150. Simply add or subtract numbers here to create the desired number and size of your buckets. Also notice that the data you are pull from has changed from df to df.map which I created with “bind_df_to_map”. Again, “state” would be replaced with “county” or “zip” if using one of them them.


Ta Da!
Ta Da!



I’m still learning R and figuring out how to better improve these maps, but if you’re looking for something quick and effective these maps are hard to beat.

The only question remains is: R you ready to give it a try?

(Be thankful I only included one “R” based pun)




Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com