EMC just rolled out version 4.2 of its Greenplum Database. Available now, the latest iteration of the in-database analytics promises new levels of Big Data integration.
Practically speaking, EMC said that means enterprises can run massive-scale, mission-critical analysis faster than with the last version of the software. The idea is to drive analytic productivity, business value and business decision-making speed.
EMC aims to accomplish those goals by including a high-performance gNet for Hadoop, language and compatibility enhancements for faster migrations, simpler, scalable backup with EMC Data Domain Boost, an extension framework and turnkey in-database analytics, and targeted performance optimization, to version 4.2.
A Hadoop First
Let's start with the Hadoop play. Greenplum 4.2 now enables parallel import and export of all data -- compressed and uncompressed -- from Hadoop using gNet for Hadoop, a parallel communications transport. This marks the first direct query interoperability between Greenplum Database and Hadoop.
Another feature is the advanced integration with EMC Data Domain de-duplication storage systems via EMC Data Domain Boost. EMC claims this drives faster and more efficient backup.
Essentially, the integration distributes parts of the de-duplication process to Greenplum database servers. That makes it possible to send only unique data to the Data Domain system, which increases aggregate throughput and reduces the amount of data transferred over the network. It also eliminates the need to create and manage virtual drives.
With release 4.2, Greenplum also offers turnkey in-database analytics via Greenplum extensions. And release 4.2 supports dynamic partition elimination and query memory optimization, which aims to reduce the data scanned for a query. EMC said this accelerates query processing and breeds more concurrency.
Finally, EMC announced the Greenplum Command Center, a Web-based Big Data infrastructure management console that offers a unified administrative and real-time/historical health-monitoring dashboard for all on-market Greenplum products.
EMC Tackling Thorny Problems
Wayne Kernochan, an analyst at Infostructure Associates, said EMC is tackling the thorny problems of scaling up Big Data and connecting it to the less-risky data stores that form the greater part of EMC's present market.
"The result should be, I believe, a customizable, flexible, scalable, value-added front end to Big Data stores out there on the Web, and one that should clearly improve your ability to handle my three barriers to IT Big Data success," he said.
"And, EMC, now that you know how much fun the briar-patch experience can be, maybe you can consider jumping into a few more," Kernochan said. "Like combining Greenplum's columnar-relational data compression with a variant of your storage-data compression -- as I suggested, oh, what was it, a couple of years ago?"
As Kernochan sees it, the mere fact that such compression advances have already shown that they can deliver as much additional scalability as two years' worth of new database versions should in no way cause IT buyers to change their mind and leap into yet another thorny, painful, and pointless technology implementation.
EMC Greenplum Big Data solution should be quite sufficient for some key IT Big-Data needs of today, Kernochan said, and it's definitely worth taking a look -- or a data-insight-rich mature-plum-producing implementation.