This repository contains a number of scripts in python, scala to run the Connected Components algorithm proposed in Kiveras et al., (2015). All of which are designed to be run in a cluster with access to Hadoop in the case of the python script (for Hadoop streaming), or Spark (for scala).
kelly_graph_proj.pdf describes the results of my experiements.