Saturday, August 27, 2011

DIY @mention constellations, Part IV

So far there's been

and now we're going to take a look at applying Graphviz to large Twitter mention graphs.

Getting the data is easy enough: see the embedded 39-line Python script in the sample code. So let's say you run that Python code to give you 50,000 random mentions from the Twitter firehose. Let's say that having done that, having sort'd and uniq'd, and added a one-line header and one-line footer you have, like I do, a 47,375-line file which begins with

digraph mentions {
"0001am_" -> "kaaly_"
"000eca000" -> "000eca000"
"000eca000" -> "kira_moka"
"000parra" -> "amolosflips_"
"00_dag" -> "nishinoakihiro"
"00alliesmeaton" -> "caitlanpratt"
"00kuro" -> "sena1029"
"00nelht" -> "gvwriters"
"00rico00" -> "tsubo0307"
and ends with
"zwackleby" -> "gauravh1"
"zxicee" -> "parnnnparnnn"
"zyhafiyah" -> "amyshaheera"
"zyhnlyh" -> "elsaaps"
"zymecca" -> "wowkonyol"
"zz0_ee" -> "becky_aisha"
"zzangfia" -> "somin_somu"
"zzz_ho" -> "dewwanna"
"zzzoob" -> "ko5712"
that is, you have a directed graph in DOT format representing unique mentions amongst a random sample of 50,000 from Twitter.

Having seen how simple Graphviz is, you'll probably render a graph directly from this file, with a command like

sfdp -Gbgcolor=black -Ncolor=white -Ecolor=white -Nwidth=0.02 \
    -Nheight=0.02 -Nfixedsize=true -Nlabel='' -Earrowsize=0.4 \
    -Gsize=75 -Gratio=fill -Tpng mentions.gv > mentions.png
from which you'd get this image, after waiting a long long time (on my machine about four hours):
Full Mention Graph

Most likely you'd be agog, like I was when I ran this process for the first time. And then, like I did, you'd wonder how to make it faster and how to get rid of the fairly dull stuff round the edge. You might also wonder, like I did, what's inside that blob in the center.

That's for next time.

Next: Part V

No comments: