There's plenty of hoopla about Hadoop this week as three new solutions come to market. EMC, Hortonworks and
have all announced new Hadoop products.
An open-source framework for storing and processing large volumes of diverse data on a scalable cluster of servers, Hadoop has rapidly emerged as the preferred solution for Big Data analytics applications. That's because Hadoop is flexible, scalable, inexpensive, fault-tolerant, and enjoys rapid adoption rates and a rich ecosystem surrounded by massive investment.
However, customers face high hurdles to broadly adopting Hadoop as their singular data repository. A lack of useful interfaces and high-level tooling for Business Intelligence and data mining -- components that are critical to data analytics and building a data-driven -- are among the challenges.
How are EMC, Intel and Hortonworks tackling those Big Data mountains with their Hadoop solutions? Each in their own way.
Three New Hadoop Solutions
EMC announced Pivotal HD, which features native integration of its Greenplum massively parallel processing (MPP) database with Apache Hadoop. The new EMC Greenplum-developed HAWQ technology brings 10 years of large scale data management research and development to Hadoop and delivers more than 100 times performance improvements when compared with existing SQL-like services on top of Hadoop.
Intel's pitch is called Intel Distribution for Apache Hadoop . The offering, which includes Intel Manager for Apache Hadoop software, is built from the silicon up to deliver industry-leading performance and improved security features. The Intel Distribution is the first to provide complete encryption with support of Intel AES New Instructions in the Intel Xeon processor.
Hortonworks recently released Hortonworks Data Platform for Windows. This is the first and only Hadoop-based platform available on both Windows and Linux and provides interoperability across Windows, Linux and Windows Azure.
EMC Turns Heads
Charles King, principal analyst at Pund-IT, told us EMC, Intel and Hortonworks are taking different approaches and targeting different audiences.
"EMC's Pivotal HD is the likely headline leader of the three, especially given the stunning 10 times to 100 times-plus performance improvements -- in concert with Greenplum's MPP database with Apache Hadoop -- it offers compared to other SQL-like services for Hadoop," King said. "But the company's new HAWQ promises to make a potentially greater impact on the commercial Big Data market."
King said that by creating a true SQL parallel database running on top of the Hadoop Distributed File System, EMC Greenplum is extending the considerable value of Hadoop to organizations that have invested in SQL training and personnel, which is to say virtually every commercial business. And that, he said, means that Big Data benefits could and should become far more accessible than ever before and help overcome the skills shortage often associated with Big Data.
"With its new HDP for Windows, Hortonworks is taking a proletarian approach to enabling companies' Big Data aspirations by supporting solutions that are the operating environment and applications of choice among tens of thousands of businesses and other organizations," King said. "That makes perfect sense strategically given the size of the potential market. But it is also technologically sensible, since Hortonworks helped Microsoft develop its own Hadoop-compatible HDInsight Server for Windows and Windows Azure HDInsight Service offerings."
The Intel Question
The open question: How well Intel's approach will fit into a market where numerous Big Data vendors are struggling to find an identity? Pre-integrated silicon and security optimization for Hadoop seems like a no-brainer, King said, especially considering the difficulty and complexity vendors face in developing similar capabilities on their own.
"Just as clearly, many commercial Big Data players are far more interested in standing apart from the crowd than they are in being seen as complementary toward or seamlessly plugging into other Apache Hadoop distributions," King said. "In the end, that may not matter much to the myriad vendors, big and small, that believe they will benefit from Intel's Apache Hadoop Distribution and Manager software offerings."