Saturday, February 12, 2011

Yahoo! and Hadoop

I loved this recent post from Eric Baldeschwieler on the Yahoo! Developer Blog. Eric heads up the Hadoop effort at Yahoo!, and sums up why the company has nearly 100 people working on the project. What does Yahoo! get from this investment in open source?

  • help recruiting world-class scientists;
  • help building Hadoop and new tools;
  • access to trained talent, and easier collaboration;
  • avoiding obsolescence; and
  • good will from doing good.

This was music to my ears. A year ago when making the transition from Google to Twitter I published a couple of posts about this exact issue. In "Learning or Earning" I wrote

Over time, though, [the proprietary] advantage naturally erodes in relative terms. The open source stack grows ever thicker and by now includes pieces of technology like Cassandra, ZooKeeper, HDFS and Pig (and indeed the Hadoop project in general). The principles of huge-scale computing on commodity hardware are being better understood, and the open stack becomes ever-more viable for real-world work.

As the gap between commodity and proprietary narrows, the downsides of a homegrown stack become increasingly palpable. It takes longer to migrate acquired companies to your platform. There's no liquid talent market into which to tap when hiring. Maintaining a custom toolchain becomes burdensome. You risk making your engineers feel like outsiders in the broader tech community—ironically despite the hyper-advanced technology with which they work. Your existing employees may even resist or resent developing skills which aren't marketable elsewhere.
and I'm chuffed to hear directly from Yahoo! that the theory is true in practice, and converts into measurable value at the company level.

At Twitter I use Hadoop to analyze interesting tweeting phenomena. I get personal value from the fact that for Pig and Hadoop there's a public community of users producing amazing documentation, best practices, tips and tricks and tutorials on the technology. You don't get that when working on a proprietary framework.

No comments: