The Complete n00b’s Guide to Gephi

Because my last tutorial, The Complete n00b’s Guide to Mapping in R, received a positive response, I decided to create another beginner’s guide to visualizing data. For this edition, I’ve chosen Gephi, an excellent and simple tool to do social network analysis. This tutorial is meant to get you started quickly and provide the basics of using Gephi.

Step 1: Get set up
Download Gephi, install it, open it up and start a new project.

Step 2: Import a Spreadsheet
So you have a spreadsheet, maybe one like this list of bankers in Grand Rapids from 1902 (gleaned from Google Books). You will need to have your spreadsheet saved as a CSV file (comma-separated values).

We will be importing an “Edges” table, meaning the spreadsheet will have the necessary data to establish relationships between nodes. Your CSV file will need two columns, “Source” and “Target.” In my spreadsheet, I’ve made the following changes: Name –> Source / Bank –> Target. In my data, there is no direction of the relationship. If your data does have a specific direction you will need to carefully select which is Source and which is Target.

To import your spreadsheet, make sure you have a “Data Table” tab open (1). If you don’t, click on “Window” (2) in the menu and select “Data Table” to open it. My spreadsheet is an Edges table so make sure, you have clicked on the “Edges” (3) button on the left side of the window. This will allow you to view the edges once you import them. Then click on the “Import Spreadsheet” button (4) in the middle of the Data Table tab and browse for your file (5). Make sure the file will be imported as a edges table. (6)

Screenshot of importing a spreadsheet

After hitting “Next” you will have options as to which columns to import. For my spreadsheet, it is not necessary to import the other columns so you can uncheck those boxes. Keep the “Create missing nodes” box checked and you will not need to import another spreadsheet.

Tip: If you have a separate spreadsheet(s) of data relating to your visualization you can repeat the above steps. If the data only relates to your nodes and not the relationships between them (edges), save it as a “Nodes” table instead of an Edges table (6 – above). When importing multiple spreadsheets be careful creating nodes twice. The best way to avoid duplication is by having an “id” column. The “id” column will tell Gephi the data is all the same nodes instead of creating new nodes each time you import it. Like the “Create missing nodes” box in Edges tables, there is a “Force nodes to be created as new ones” option when importing Nodes. If you’ve already created nodes with your edges spreadsheet, you should probably avoid creating new nodes with a nodes table.

Step 3: Layout your Visualization
Make sure you have a Graph tab open (1) and then check out your visualization. It will resemble a grey blob (2). Now you can run some data analysis. Gephi has a number of options on the right side of the window. Gephi will give you short explanations after you choose to run them. For my example, I will select Eigenvector Centrality (undirected) (3) and Modularity (without weights) (4). After selecting options and running the tests a graph will appear. You can look at it or ignore it and continue, your call.

You can also choose from a number of layouts to better organize your visualizations (5). For my spreadsheet, I’ve selected a Yifan Hu layout. Click run (6) and watch the nodes scramble to new locations.

Gephi screenshots of laying out a visualizations

Step 4: Make it Pretty
Once you have tests run, you can begin altering the color and size of your nodes or edges. For my example, we will click on the Nodes button (1) to change the color (2) and size (3) of Nodes. I have selected to use Eigenvector Centrality for the size and modularity class for the color using the drop down menu (4) and clicking apply (5) after each one. Your resulting visualizations (6) will look much better.

Screenshot of changing color and size of nodes

Step 5: Apply Polish
To get a better look at your visualization, click on the “Preview” tab (1). You may need to hit “refresh” first (2). There are a number of default settings (3) that you can explore and customize.

Screenshot of Preview

Node labels are a valuable tool to help your visualization, but we have not set them up yet and they were not in our original spreadsheet. To create these labels, we will go back to the “Data Table” tab (1) and copy data (2) from “Id” to “Label” (3). Once you return to the preview tab, you can add node labels without any problem.

Screenshot of adding label

Step 6: Export
Clicking on File > Export will give you a number of options to export your image. Another useful tool to use is Sigmajs Exporter, a plugin for Gephi that will allow you to export your visualization as a dynamic webpage.

Hopefully, this tutorial has given you a quick way to understand the basics of working with Gephi. There are a number of ways to customize your visualizations so keep exploring!




Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com




Forever Student Syndrome

Graduate student life has many upsides. You largely make your own schedule. You can do most of your work from any location. And you still get student discounts at the movies. But you do not want to be a graduate student forever. The life of the mind is an alluring idea, but at some point you will want a job you can explain to your relatives, livable salary, and a retirement account.

With little oversight, no standard timetable and vague requirements, even students who want to finish their degrees in a timely fashion have difficulty completing their programs on-time. I call this affliction “Forever Student Syndrome.” This disease can strike anyone. You may even know someone right now with the aliment! They might not even know they have it! The great danger of Forever Student Syndrome is not knowing you have it. You feel like you are making advances in starting your career, but you are actually only delaying the beginning.

Much like the body’s four humors, those afflicted with Forever Student Syndrome lack a balance between academia’s building blocks. Being a graduate student is a combination of a student, professor, and employee. The key to being successful is progressing through the stages of your program as a student and striking a balance between research, service, and teaching while being an effective independent worker. The biggest danger of falling into Forever Student Syndrome is the feeling that you need to be perfect. Do not worry about perfection. Do good work, finish, and move on.

Though most Forever Students have a mixture of problems, I will briefly describe the main types: the Sloth, Student, Researcher, Committee Member, and Adjunct. Though obviously exaggerated, the descriptions below stem from my graduate student experience. At different times during graduate school, I have noticed parts of each one of these Forever Students in myself. You will likely also find yourself in a position as a Sloth, Student, Researcher, Committee Member, and Adjunct, or may have already. The trick to avoiding Forever Student Syndrome is recognizing the affliction and changing your ways to find balance again.

Content to watch another episode of Judge Judy rather than sit down and write, the Sloth is the most obvious kind of Forever Student. Finding any excuse to avoid work, aside from pressing deadlines, the fact that they will not begin to think of finishing their degree in the next few years paralyzes them in the present. Unlike other Forever Students, they understand that their behaviors are not beneficial. Often an early stage graduate student, they feel like there will be plenty of time for work, later. The Sloth may not be adequately prepared for treating the program as a career and may treat graduate school as a means to extend the college experience and avoid the workforce. The Sloth will likely regret their inaction later in their program, if they make it through.

Another early stage Forever Student takes every course offering they can find. A love of learning drove them to graduate school. They find comfort and self-worth in report cards. Often a nice revenue source for the department, they resist ending coursework. Afraid to leave the structure of the classroom that has comforted them since their elementary school, the Student continues searching for their professors’ approval. Feeling they need to take every relevant course offered in their field and reluctant to closely read the course requirements, they pick courses exclusively based on their interests, leaving their program of studies to another day. The Student may also get held up when faced with comprehensive exams, feeling they must master ever detail of the preparation before continuing, further delaying their candidacy.

Always with their nose in a book, the Researcher went to graduate school for a love of learning and a distaste for small talk. Their lengthy publication list looks impressive at first glance. A closer look, however, reveals a number of smaller publications of questionable value. Book reviews, encyclopedia entries, and other acts of free labor with the hope of gaining some vague sense of “exposure.” Afraid of missing “something,” the Researcher feels that they cannot begin writing their dissertation until they have collected every last bit of their materials. Constantly on research trips and endlessly submitting Interlibrary Loan requests, they must find and read every piece of writing on their topic. Even that tangentially related article from the 1931 edition of the Rhode Island Review of Boats.

Eternally swamped with meetings, the Committee Member is well-respected within the department. A master of friendly chit-chat and warm smiles, the Committee Member seems like the ideal student. The inability to say the word “no,” however, bogs down the Committee Member’s day-to-day life with endless time spent discussing the number of faculty parking spaces or how to facilitate a quicker time to degree for graduate students, without actually having time for their own degree. The more enterprising Committee Member may even organize a campus group or event, while the less motivated only find themselves attending the university’s professional development workshops. All of them. Believing that constantly being “busy” is a sign of productivity, the Committee Member pads their resume with impressive service, but resists actually doing their work. After all, they have a day full of meetings on campus tomorrow.

The Adjunct spends half their time in the car driving between colleges to teach intro-level courses and hold office hours out of their car. The most sympathetic of Forever Students, they may feel forced into the endless procession of temporary teaching positions because of financial and family obligation. At first, gaining teaching experience and making a wage that nearly covers basic living expenses with no benefits or job security feels like the beginning of a career as a professor. Quickly, however, the constant course prep, endless emails, and piles of papers to grade overwhelm any hope of finding time to do any researching or writing. After years of being the starving artist, the Adjunct may find a permanent position at a teaching college impressed with their experience. Or, more likely, continue starving as they continue hoping for their big break.

At the core of Forever Student Syndrome is a lack of prioritization. In graduate school, I have learned that important and urgent are not the same thing. While all graduate students will encounter periods of their program in which they fall victim to being one type or another of a Forever Student, maintaining balance allows you to continue your progress and avoid catching a serious case of Forever Student Syndrome.




Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com

The complete n00b’s guide to mapping in R

You should also check out the next tutorial in the series: The Complete n00b’s Guide to Gephi

A few weeks ago, I presented to the UNL DH community about a project that I’m beginning while a fellow at the CDRH’s Digital Scholarship Incubator. The project is an effort to utilize digital tools to visualize business and organizational records related to my dissertation on industrialization in small cities. During my talk, I noted I was still uncertain as to what tool to use to create my maps, but thankfully, James Austin Wehrwein was also presenting. Afterwards he suggested I consider R and check out his tutorial on creating a density map in R.

Frankly, I was blown away how easy it was to create a map in R. His tutorial was easy to follow and acclimated me to R rather quickly. On top of this tutorial, I realized that the data I had used initially in Gephi already contained the coordinates for each geographic location and I would not need to clean up my data, reducing the number of steps even further. Convinced R was my new best friend, I began looking around for a way to create choropleth maps, which were another type of visualization I wanted in my project. I was thrilled to find someone had already done much of the heavy lifting and there was a packet that made the process so easy even I could create maps without pulling my hair out.

In the interest in helping out other n00bs, I’ve posted my steps in creating these maps below:

Creating a Density Map

Packets you’ll need:

> library(“ggmap”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)

Import Spreadsheet:

> ph<- read.csv("C:\\Users\\Home\\Documents\\school\\shipping.csv", header = TRUE, sep = ",") ph is just a placeholder, use whatever name you want Create Map: > map<-get_map(location='united states', zoom=4, maptype='roadmap') ggmap(map)+geom_point(aes(x=longitude, y=latitude, size=(total.cost)), data=ph, alpha=.5) This is all you need to do if you already have the longitude and latitude coordinates. Again, see creating a density map in R”>this tutorial if you don’t have already clean data.


That was easy!
That was easy!



Creating a Choropleth Map
This user guide is how I figured it out and has much more information than I give.

Packets you’ll need:

> library(“choroplethr”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)
> library(“Hmisc”, lib.loc=”C:/Users/Home/Documents/R/win-library/3.1″)

Import Spreadsheet:

> df<- read.csv("C:\\Users\\Home\\Desktop\\W1923.csv", header = TRUE, sep = ",") You'll see that it's the same process as above, I've just switched the letters I'm using as the name to help confuse you. The beauty of choroplethr is that you don't need any latitude or longitude coordinates. The program can identify states by either full name or postal abbreviation, counties by FIPS code and even by zip code. For your spreadsheet, you'll just need to creat two columns: "value" which has your data, and "region" which is your state/county code/zip code. To Create an Choropleth Automatically: > choroplethr(df, “state”, num_buckets = 6, title = “W1923”, scaleName = “Buyers”, showLabels = T, states = state.abb)

The size of the buckets will be automatically configured, but you can also have a continuous scale if you designate the number of buckets as 1. Here you should change the title and scaleName to whatever you want it to say. Note “df” tells the program the name of my spreadsheet and “state” tells the program what kind of “region” to look for in my data. You’re line would read “county” or “zip” if you are not using state names.


It's a continuous scale!
It’s a continuous scale!



Sizing your buckets
Now if you don’t want the program to automatically determine the size of your buckets, you can do the following:

> df.map = bind_df_to_map(df, “state”)
> df.map$value = cut2(df.map$value, cuts=c(0,50,100,150,Inf))
> render_choropleth(df.map, “state”, “Grand Rapids Winter Market 1923”, “Buyers Attending”)

Here I’ve told the program to create buckets with dividing lines at 0, 50, 100, and 150. Simply add or subtract numbers here to create the desired number and size of your buckets. Also notice that the data you are pull from has changed from df to df.map which I created with “bind_df_to_map”. Again, “state” would be replaced with “county” or “zip” if using one of them them.


Ta Da!
Ta Da!



I’m still learning R and figuring out how to better improve these maps, but if you’re looking for something quick and effective these maps are hard to beat.

The only question remains is: R you ready to give it a try?

(Be thankful I only included one “R” based pun)




Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com

A Reflection on Coursework

Normally, I try to blog every week, but I haven’t blogged for three months (not the most successful run in the history of blogging). Last semester was pretty busy, with organizing a conference and taking an extra course so I could finish up this summer (my final week of coursework is this week—woot). I hope the more flexible scheduling of studying for comps will allow me to return to more regular blogging.

After finishing up all my history courses this spring, I did take some time to think about my coursework experience. I learned a great deal and I think my writing improved drastically, but part of me was still a bit disappointed. I had some really good assignments, like trying to make an iPad app, constructing a digital project, writing a grant application, and literally applying theory to the real world (using Lincoln’s physical environment to explore trends in urban history). However, most of my assignments were the traditional exercises in historical writing (book review, research paper, synthetic essay). Obviously these assignments are important—I remarked to another grad student that all the papers have been preparing me for comps (and ultimately the dissertation)—but there was a point where I did feel a bit bored. How helpful was the 50th book review in comparison to the 15th? Why not use assignments that build similar historical skills but will aid me in other parts of being an academic not normally taught in graduate school (i.e. grant writing)? Why not challenge students to do something really non-traditional, even if they will fail (i.e. iPad app)

Of course, I write this blog post knowing I was a coward during coursework. I am sure if I had proposed some alternate assignment (a digital project instead of a traditional research paper for example) many of my professors would have allowed me. I’ll admit, I was lazy in that regard, as doing something innovative is usually more taxing than cranking out yet another paper. I like to rationalize it by saying grad school is a marathon (I mean after three years of coursework, I’m closer to the halfway point than the finish line). But like I said, that’s my cowardice talking.

I look forward to getting into the research on my dissertation, but even then I’ve gotten the impression from many others that finishing the dissertation is the most important part of writing it. Again, that seems to disincentive innovation. Why take longer on an non-traditional dissertation that some departments may not even appreciate it? While I want do at least some research that is creative, I can only really hope I’m brave enough.




Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com

My View on Historical Reenactments

I’ll just admit it up front: I’m not a big fan of historical reenactments. I always tend to look at them the way this Monty Python sketch portrays them. That being said, reenactments are not innately bad, just very hard to do well. In a pure sense, reenactments are another attempt at understanding the world of the past, just as academic scholarship should attempt to do. The problem seems to be in execution. Bad reenactments can innocently allow specific details, like clothing, to overtake the importance of understanding the meanings of the event being reenacted or, more sinisterly, whitewash history with patently wrong interpretations of history. With an admittedly pessimistic view towards reenactments, I breakdown three main categories (in my experience) of the types of reenactments:

Reenacting the plainness of the everyday

Certainly, these demonstrations sanitize history by rarely showing the ugly side of the everyday, but I am still call this category, on average, the least harmful reenactments, though also the least useful. In this category, I am primarily thinking of the places that have “living histories” of the 17th, 18th, or 19th centuries. Grade school kids often go and see a man working at a blacksmith shop and a woman making clothes and have a sack lunch outside. That has been largely, my experience with these types of reenactments.

However, even everyday activities can horribly mangle the historical record. The Golden Era Society A group that celebrates the “dress, music, manners and lifestyles” of the 1920-1950s “Golden Era,” at best ignores (at worst cherishes) the rampant racial, class, and sex discrimination, not to mention places the Great Depression, within the “Golden Era.” This sort of ahistoricism, even when restricted to a small group of people obsessed with very small slices of the past, endangers the historical memory–keep in mind that while it may seem silly to call the 1920-1950s a “Golden Era,” many people do not object to calling much of the same time period home to the “Greatest Generation.”

Reenacting the ugliness of the everyday

While attempting to tackle the ugliness of the past is a tricky assignment, these reenactments hold the potential for the most benefit. Tanya Roth has a great piece about a recent reenactment of a slave auction in St. Louis. Understanding the impossibility of truly reenacting a slave auction, the event opened with a historical framing and ended with an open forum, which likely helped guide those in attendance to a useful educational experience. In her piece, Tanya raises the important point that while Civil War reenactments are frequent (see the next category), reenactments of the ugly side of the Civil War Era (i.e. slavery) are few and far between.

The slave auction last week was imperfect in its execution, but I think it drew important attention to an issue that seems to be getting buried in the politics of popular culture. We’re about to embark on a four-year commemoration of one of the most popular events in American history: sometimes it seems that everyone’s a Civil War armchair historian.

As we spend these four years remembering our past, where will slavery fit in to the narrative – and what will the placement of slavery in that narrative tell us about modern America?

Another somewhat recent event that I would place in this category was the sixth grade class in South Carolina that “simulated” the discrimination of Jews in 1930s Germany. Though reportedly, it was conducted well, I am still have my reservations. I am sure reading “No Jews and Dogs Allowed” is eye opening for a middle-school student and I am sure these students understand discrimination better after the exercise. However, the dehumanization and persecution that Jews experienced was nothing near what you could (humanely) simulate. Good teaching should try and cover this gap between reality and experience, like the slave auction’s introduction and forum. I just worry that the gap is not being covered in every classroom that may try a similar exercise.

Most revealing, however, is that the South Carolina school simulated 1930s Germany when teaching the Jim Crow South would have been a more powerful lesson. Removing the problem of discrimination to another place (and frankly another time) obscures the United States’ large history of dehumanization, discrimination, and infliction of terror. To be fair, I am sure there are valid reasons in choosing 1930s Germany, among them the fact that I doubt parents would be alright with the activity if it had been Jim Crow South that was being “simulated,” even though the activity would have proceeded in essentially the same way.

Reenacting war

I am sure there are plenty of good groups of upstanding citizens reenacting battles for good and educational reasons, but reenactments of war are the most common example of reenactments and provide the richest example of bad reenactments.

The details battle reenactors often cherish (uniforms, guns, supplies) are better suited in reenacting the everyday, not an extraordinary circumstance that is primarily about killing, maiming, and other not-so-nice intentions. Too many of these groups are not educational, but only concerned with guns and clothing, as if that wearing the same clothes and carry the same items some how replicates the experience of war (I’ve never been in war, but I would imagine some sort of fearing for your life is included in the experience). Instead of accurately representing much of anything, too many of these groups obscure large issues behind war (like imprisoning or killing people based on race), which only disservices society (and its history).

My rant being somewhat complete, commemorating war is still a very common phenomenon in countries across the globe, but how some of these groups commemorate war only butchers history. I will point you towards this group of Italian reenactors of a Louisiana infantry division, partly just because I found it so strange–The group’s homepage asks you if you want the Italian version of the site (with an Italian flag) or the English (with a Confederate flag)

Like I said in the beginning, historical reenactments are not innately bad, but when the “HISTORY OF THE REAL 14th LOUISIANA INFANTRY” only lists the battles in which the group participated, it is fair to call the reenactments bad history and a negative educational experience (sorry, history is much more than names, dates, and numbers). An even more appalling example was the Congressman who served as a Nazi reenactor. At best he completely misses the historical reality of the SS (which raises the question of whether this man is intelligent enough to help lead the country) and at worst  tries to use “history” to cover up bigotry (which prompts the same question).

A final disclaimer and conclusion

I will repeat it once more as a disclaimer, though I may have already lost any pro-reenactments readers, historical reenactments are not innately bad. Reenactments are another way people try to replicate the historical experiences. Just as the words of historians attempt to understand and convey the historical experience, so do these reenactments. The problem of reenactments comes when the present day experience of reenacting becomes more important than the message being sent to the audience. Reenactments need to understand and acknowledge they are not perfect and seek only to convey the important aspects of the history they are representing. The meanings of the Civil War are important, not the guns used or clothing worn.




Try Audible and Get Two Free Audiobooks

Brian is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com