Because my last tutorial, The Complete n00b’s Guide to Mapping in R, received a positive response, I decided to create another beginner’s guide to visualizing data. For this edition, I’ve chosen Gephi, an excellent and simple tool to do social network analysis. This tutorial is meant to get you started quickly and provide the basics of using Gephi.
Step 1: Get set up
Download Gephi, install it, open it up and start a new project.
Step 2: Import a Spreadsheet
So you have a spreadsheet, maybe one like this list of bankers in Grand Rapids from 1902 (gleaned from Google Books). You will need to have your spreadsheet saved as a CSV file (comma-separated values).
We will be importing an “Edges” table, meaning the spreadsheet will have the necessary data to establish relationships between nodes. Your CSV file will need two columns, “Source” and “Target.” In my spreadsheet, I’ve made the following changes: Name –> Source / Bank –> Target. In my data, there is no direction of the relationship. If your data does have a specific direction you will need to carefully select which is Source and which is Target.
To import your spreadsheet, make sure you have a “Data Table” tab open (1). If you don’t, click on “Window” (2) in the menu and select “Data Table” to open it. My spreadsheet is an Edges table so make sure, you have clicked on the “Edges” (3) button on the left side of the window. This will allow you to view the edges once you import them. Then click on the “Import Spreadsheet” button (4) in the middle of the Data Table tab and browse for your file (5). Make sure the file will be imported as a edges table. (6)
After hitting “Next” you will have options as to which columns to import. For my spreadsheet, it is not necessary to import the other columns so you can uncheck those boxes. Keep the “Create missing nodes” box checked and you will not need to import another spreadsheet.
Tip: If you have a separate spreadsheet(s) of data relating to your visualization you can repeat the above steps. If the data only relates to your nodes and not the relationships between them (edges), save it as a “Nodes” table instead of an Edges table (6 – above). When importing multiple spreadsheets be careful creating nodes twice. The best way to avoid duplication is by having an “id” column. The “id” column will tell Gephi the data is all the same nodes instead of creating new nodes each time you import it. Like the “Create missing nodes” box in Edges tables, there is a “Force nodes to be created as new ones” option when importing Nodes. If you’ve already created nodes with your edges spreadsheet, you should probably avoid creating new nodes with a nodes table.
Step 3: Layout your Visualization
Make sure you have a Graph tab open (1) and then check out your visualization. It will resemble a grey blob (2). Now you can run some data analysis. Gephi has a number of options on the right side of the window. Gephi will give you short explanations after you choose to run them. For my example, I will select Eigenvector Centrality (undirected) (3) and Modularity (without weights) (4). After selecting options and running the tests a graph will appear. You can look at it or ignore it and continue, your call.
You can also choose from a number of layouts to better organize your visualizations (5). For my spreadsheet, I’ve selected a Yifan Hu layout. Click run (6) and watch the nodes scramble to new locations.
Step 4: Make it Pretty
Once you have tests run, you can begin altering the color and size of your nodes or edges. For my example, we will click on the Nodes button (1) to change the color (2) and size (3) of Nodes. I have selected to use Eigenvector Centrality for the size and modularity class for the color using the drop down menu (4) and clicking apply (5) after each one. Your resulting visualizations (6) will look much better.
Step 5: Apply Polish
To get a better look at your visualization, click on the “Preview” tab (1). You may need to hit “refresh” first (2). There are a number of default settings (3) that you can explore and customize.
Node labels are a valuable tool to help your visualization, but we have not set them up yet and they were not in our original spreadsheet. To create these labels, we will go back to the “Data Table” tab (1) and copy data (2) from “Id” to “Label” (3). Once you return to the preview tab, you can add node labels without any problem.
Step 6: Export
Clicking on File > Export will give you a number of options to export your image. Another useful tool to use is Sigmajs Exporter, a plugin for Gephi that will allow you to export your visualization as a dynamic webpage.
Hopefully, this tutorial has given you a quick way to understand the basics of working with Gephi. There are a number of ways to customize your visualizations so keep exploring!
Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com