New Releases - March 30/2015



Cloudera announced a maintenance release of Apache Accumulo for CDH 5 to fix the POODLE vulnerability.



Version 1.1.2 of Luigi, the workflow management tool, was recently released. The new version includes improved support for Spark.



The SDK for Google's Cloud Dataflow (similar to many DSLs like Scalding and Spark) is open source. The main "runner" implementation uses the Google Cloud Platform, but there's also implementation for Apache Spark. This week, the Apache Flink project announced a runner, which allows any pipeline written for Cloud Dataflow to run on a Flink cluster.



MicroStrategy announced that Apache Drill is certified with the MicroStrategy Analytics Enterprise Platform. The MapR blog has a brief introduction of how to configure the integration.



EMC has announced the Federation Business Data Lake, which combines several pieces of software with hardware. The software includes Pivotal HD (with mention of the Open Data Platform) and hardware includes EMC Isilon.



Cloudera Director 1.1.1 was released this week. Cloudera Director is a tool for provisioning and managing Hadoop clusters in AWS. This release includes several bug fixes and documentation updates.



Cask has announced version 2.8.0 of the Cask Data Application Platform (CDAP). The new version adds namespaces, fork/join for the workflow system, a new metrics layer, and more.



Sematext, makes of the SPM Performance Monitoring system, have announced that SPM now supports monitoring, alerting, anomaly detection for Apache HBase 0.98. The tool monitors a number of metrics including cache, replication, the WAL, and much more (290 metrics in total).



LinkedIn has open-sourced a golang library for Apache Avro. The library, called Goavro, supports decoding and encoding of data according to version 1.7.7 of the Avro specification. More details (including a few limitations) are described on the github site.



Version 0.9.6 of RDMA for Apache Hadoop was released this week.  The package is a derivative of Apache Hadoop that allows a cluster to use remote direct memory access (RDMA) interconnects to improve performance. It supports a Lustre and a hybrid file system where data is stored both in memory and on disk.