The cloud has enabled an unprecedented race towards commoditization. Data-enabling technologies are being enhanced, simplified and abstracted so that technology is easier to implement and support. One of the best examples of this process can be seen with Hadoop and the HDFS technology. In Azure today, most, if not all, of the technological capabilities HDFS and Hadoop offer are now integrated as a basic function of storage technologies. In the recent past, we would have recommended and leveraged Hadoop when we needed the following:
1. A method of storing structured, semi-structured and unstructured data that was highly redundant
2. The ability to perform batch functions on large amounts of data
Today, both of these criteria are met with capabilities that are now standard within Azure. Microsoft has made large investments into storage technology in Azure and their latest solution is Azure Data Lake Storage (ADLS) Gen2. ADLS Gen2 has become a commercially viable option for hosting a data lake, due to the following features:
- ADLS Gen2 can be set up, deployed and configured with just a few clicks of a mouse (or as part of a DevOps activity).
- The data stored in ADLS Gen2 enjoys the security offered by encryption at rest, firewalls, Active Directory integration and POSIX style controls, all with unlimited scalability, hierarchical name spaces, five 9s of availability and options for seamless disaster recovery.
- All of these options are “built in” and do not require a significant investment of time and resources to implement and support.
ADLS Gen2 provides deep integration with other Azure technologies, which increases the functionality that a data professional can apply to a large data set. We see many customers that use the features of ADLS Gen2 eliminate data hops and easily integrate structured, semi-structured and unstructured data into Azure SQL and Azure SQLDW as part of their larger data ecosystem. It also integrates easily with CosmosDB, PowerBI, the Azure Advanced Analytics suite and is a default destination for Azure Data Factory and Azure Databricks.
Cost pressures are another significant area of technology commoditization. For our clients, we see that not only is the technology more powerful and easy to use, but it is also much less expensive than trying to use purpose-built solutions on premise. Hosting large data sets in ADLS Gen2 is orders of magnitude more cost effective than hosting it in dedicated Hadoop clusters, because it is optimized for cloud scale, data workloads. ADLS Gen2 also manages and tiers workload data, to help data users minimize the total cost of ownership of their data. By managing the hosting of the data and integration into other tools, ADLS Gen2 allows IT departments to deliver projects with fewer specialized technical skills, allowing them to focus on higher value business projects rather than focusing on keeping expensive technology running. We have seen a dramatic increase in speed-to-value for clients who have been able to devote more time to value-added activities after adopting this technology approach, while managing costs in a flexible pay-as-you-go model.