At work I wrote a document entitled "Pig for Dilettantes and Cargo-Culters". If you're the kind of person who's at least once used Terminal.app on the Mac, but know little about distributed computing or Twitter's big data schemata, then following the steps in that document is probably the fastest way to get to the point of being able to extract meaningful data out of the Twitter Hadoop cluster. From there you can explore, tweak the scripts, and eventually you'll be able to get the data that you're actually interested in.
In a similar vein I present this post. If you've never opened Terminal.app on the Mac then this probably isn't for you. If you know basically what's going on at the command line, and you're a hardy explorer/experimenter, then read on.
First of all, I presume you've read Part I and have installed Graphviz. Both are required, I'm afraid. Not strictly required is a Mac, but if you're running something other than OS X then you're likely going to need to make some small adaptations for your platform.
So, with Graphviz installed, check out the mention-graph shell script I put on pastebin. Copy it, save it to your Mac as mention-graph, chmod +x it, and you're set.
Using this script, I just ran a constellation of 50,000 live mentions from the Twitter Streaming API, 75 inches square (72 dpi), by running "./mention-graph -n 50000 -u isaach -o -v -s 75":
and here's the output it dropped as mention-graph.png:You need to supply your Twitter credentials (the above command, which you should edit to use your own username, will ask for your password) and note that this script sends them in the clear to Twitter. If this worries you then feel free to either edit the script to meet your security standards, or create a Twitter account dedicated to this kind of use, separate from your primary account.
Next time: what this all means and how to take it further. In the meantime, let me know on Twitter how you get on.
No comments:
Post a Comment