The Wu-Tang Clan Network

December 10, 2018

All code can be found here

In honor of Wu-Tang Clan day, I dug out this old post I created for one of my grad school classes. We were learning about graph analysis and this is what I put together. All code is in python using the networkx package and graphlab package. Graphlab is pretty great.

For this project, we were tasked with exploring and analyzing a bi-modal network. It was important to me to choose a data set that I was familiar with. Familiarity with the data helped me make sure the calculations produce coherent results.

So why Wu-Tang Clan? The Wu-Tang Clan is a rap group that formed in the early 1990s in Staten Island, New York. There are 9 members of the group and they all went on to produce solo albums with varying degrees of success. A feature of almost all rap albums is that there are many collaborators. Using the Wu-Tang Clan and its members’ discography, it is possible to create an artist network that can be explored and analyzed.

Analyzing this network should allow us to answer some Wu-Tang related questions.

I was a teacher for 10 years. This data set is a good introductory data set because it is small, the data analysis should dovetail with the real life Wu-Tang story, and it is original. And also, for me it’s personal.

I moved to Staten Island in 1993. This was the same year that the Wu-Tang released their debut platinum selling album, Enter the Wu-Tang (36 Chambers). Before moving to Staten Island’s North Shore, I lived in a small town in upstate New York where most of the kids were listening to Alt-Rock. The music that the kids from Staten Island were listening to was radically different. I don’t really listen to rap music anymore but in the mid 90s on Staten Island the Wu-Tang was where it was at.

#read the file
import pandas as pd

url = "https://github.com/capstat/postups/tree/master/wutang/data/wutang.csv"
wutang = pd.read_csv(url, encoding='utf8')

I manually created a csv file that lists every artist featured on every Wu-Tang album or Wu-Tang member’s album from 1993 to 2007. For each artist on every album, the number of appearances is listed along with some information about the album (Year, Number of Songs, and the RIAA certification - Gold, Platinum, Multi-Platinum, None). I decided not to scrape the internet for this information because the data set was small and I believe that I would spend just as much time coding and checking the results as just typing.

I stopped collecting data after 2007 because Wikipedia stopped listing the Wu-Tang member featured on each song after 2007. If I want to continue on with the data collection I could get the information from thier penultimate album from other webistes. However, I decided that I collected enough data for the aim of this project. Also, it would be pointless to try to collect the info from thier most recent album. They only created one copy and the track info is still unknown.

The edge weights are the proportion of artist appearances on an album. The artisit of a solo album was assigned a weight of 1.0. This assumes that the solo artist appears on all of their own album’s tracks. This assumption is not true. For example, only Raekwon raps on the track “The Faster Blade” from Ghostface’s Ironman album. I was not about to listen to every song to verify that the solo artists were on all of their own tracks. I think if I did this again, I would scrape the lyrics and info from here.

#filter out any 0 appearances
wutang = wutang.loc[wutang['Appearances']!=0]
#create edge weights
wutang['App_Prop'] = wutang['Appearances']/wutang['Songs_on_Album']
wutang[wutang['App_Prop'] != 1].sort_values('App_Prop', ascending=False).head()

Above are the artists that appeared on the highest proportion of songs on an album, excluding artists’ own solo album.

From 1993 to 2007 members of the Wu-Tang Clan put out 33 albums either as a group or as solo artists. Astonishingly, from 1993 to 1998, the Wu-Tang clan released 9 albums, all were certified Gold by RIAA, and 6 were certified Platinum. The only rapper with more Platinum albums is Eminem, and it took him 20 years.

Let’s take a look at the Wu-Tang universe!

import graphlab as gl

sf = gl.SFrame(wutang)
g = gl.SGraph()
g = g.add_edges(sf,

g.show(vlabel='__id', vlabel_hover=True, highlight=sf['Artist'])