Hadoop: Big Data Opportunity, Powered by Open Source

Re-Blogged from thevarguy.com


Wondering what comes after the cloud? Literally, usually sunshine — haha. But metaphorically speaking, the next great frontier may well be big-data. And Hadoop, an open-source project enjoying ever-increasing buzz as of late, will likely be at the fore as that niche evolves. If you don’t know much about Hadoop, it’s time to learn. Here are the basics.

Named after a toy elephant, Hadoop at first appearance may not inspire the greatest confidence among users. But it deserves to be taken seriously. As an open-source, freely distributable solution for distributed computing, it has myriad mission-critical applications in data centers across the world — particularly when it comes to crunching big data by spreading the workload across hundreds or thousands of machines, as more and more organizations are apt to do these days.

Hadoop: Past, Present and Future

The conceptual origins of Hadoop can be traced to research by Google, but since 2007 theApache Foundation has been responsible for the software’s development. In that time the framework has evolved steadily, regularly churning out new releases and new features as big data has become ever-bigger and more demanding.

Hadoop’s userbase, meanwhile, has expanded rapidly to include a wide variety of organizations, both large and small. Among them are dozens of big names, including Facebook, eBay and, of course, Google itself.

Yet as popular as Hadoop has already become, it sits at the center of an array of opportunities waiting to be exploited as the big-data channel develops further. In other words, the really interesting side of Hadoop isn’t the software itself — although its innards are certainly fascinating for the programming buffs out there — or its current users, but the way the big-data channel is poised to grow around it.

For one, like most open-source projects, Apache’s distribution of Hadoop is rough around the edges. That isn’t to say it’s not stable — it is — but it doesn’t come with many user-friendly frills to make life easier for organizations looking to deploy the software. And that means demand will abound for value-added solutions from third-parties for setting up and maintaining Hadoop deployments. Commercial entities like Cloudera are already providing some of these services, but look for this market to expand as Hadoop grows more popular and more mission-critical.

In addition, clear demand is also developing for simpler Hadoop interface tools — at least, that’s what companies like Talend, which has made seamless integration with Hadoop a major focus of its Open Studio data-management platform, are betting. This is another area poised to grow as Hadoop deployments become more more complex. After all, the last thing most IT administrators have time for is learning yet another programming language, a headache from which tools like Talend’s save them.

And generally speaking, there can be little doubt that Hadoop deployments will continue to grow larger and more complex, along with data itself. Like death and taxes, big data loads will always be around — and in most cases, they will expand in scale by leaps and bounds as computing power, and the sheer accumulation of information itself, increase.

Hadoop, then, is something worth getting to know not only for organizations with lots of data to manage, but for anyone interested in preparing for what comes after the cloud is passé. Even if you don’t understand the software itself, understand its implications, because this pillar of the big-data and open-source channels may soon be as big a deal as, say, Linux itself.