AZURE MARKETPLACE
The Azure Marketplace offering is IaaS based. As its name suggests, a consumer can load their preferred version of Hadoop to use on virtual machines (VM) that they control. Each VM will have a dedicated storage, with a conventional Hadoop landscape built around it. The IaaS nature of Marketplace provides complete compatibility with on-premise installations but does not take advantage of the features that MS Azure offers. Additionally, an IaaS focused solution requires significantly more administration from both a Hadoop perspective and an IT Delivery/Operations/Security perspective.
AZURE HDINSIGHT
The Azure HDInsight offering, as its name explicitly states, is based on HDInsight. It is a hybrid PaaS offering in Azure that provides a Hortonworks distribution as a service. This covers almost the entire Hortonworks stack, including Ambari, Spark, Storm, Kafka and Ranger to Active Directory integration. HDInsight allows you to scale elastically and takes advantage of separation between compute and storage. The compute nodes can be destroyed (saving significant cost) and then spun back and connected to the storage as needed. The capability can even be implemented in SSIS and Azure Data Factory as part of a larger data pipeline. In this model, there is no concern around HDFS, since that capability is covered by Azure storage and leveraged by the HDInsight cluster. This PaaS solution requires minimal Hadoop administration and even less IT Delivery/Operations/Security support.
AZURE DATA LAKE ANALYTICS
The Azure Data Lake Analytics offering is a PaaS and the most flexible MS Azure offering. It is Microsoft’s Big Data-as-a-Service offering. There are no clusters or storage to configure. It is truly a “plug and play” solution. Jobs are created in Azure Data Lake Analytics to transform data previously landed into Azure Blob or Azure Data Lake Store. In this scenario, there is no Hadoop or IT Delivery administration required at all.
In a nutshell, the future of Hadoop and HDFS in the cloud is already here. Through our exposition of the various MS Azure flavors, we hopefully have dispelled any concerns about cloud/vendor lock-in. We have also walked through considerations (e.g., control vs. ease of use) for companies going through the process of selecting Hadoop offerings. We hope this information has been helpful in charting a trajectory for where Hadoop is shifting. It is encouraging to know the technologies themselves have become commodities and are offered without the need for deep technology expertise.
Our next blog post in this series will focus on the second question that was raised; “What drives this concern (why do you care)?”