Microsoft readying Hadoop for Microsoft Azure

WEBINAR: On-demand webcast

How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >

Hadoop offers a massive data store upon which developers can run map/reduce jobs. It also manages clusters and distributed file systems. Microsoft will provide Hadoop within a "few months," said a Microsoft executive who wished to remain anonymous.

The technology makes it possible for applications to analyze petabytes of both structured and unstructured data. Data is stored in clusters, and applications work on it programmatically.

"They are probably seeing Hadoop adoption trending up, and possibly have some large customers demanding it," said Forrester principal analyst Jeffrey Hammond. "Microsoft is all about money first; PHP support with IIS and the Web PI initiative were all about numbers and creating platform demand. If Hadoop support helps creates platform demand for Azure, why not support it? Easiest way to lead a parade is to find one and get in front of it."

Further, AppFabric, a Microsoft Azure platform for developing composite applications, currently lacks support for data grids. Microsoft has experienced difficulty in porting Velocity, a distributed in-memory application cache platform, to Windows Azure, because Velocity requires administrative privileges to install, the anonymous executive told SD Times.

"Do they feel so 'way behind' that they are rolling out a Java-based product without a .NET-based 'superior' alternative ready to go?" asked Larry O’Brien, a private consultant and author of the "Windows & .NET Watch" column for SD Times. "Perhaps they feel that distributed map/reduce is not really all that important, that they can put Hadoop on the 'check-off box' and it won't be embarrassing that it gives Java developers a capability that .NET developers don't have?"

Microsoft's map/reduce solution, codenamed "Dryad," is still a reference architecture and not a production technology.

Microsoft's Velocity caching technology resides in the business logic layer, in the sense that it is in-memory and object oriented, said William Bain, CEO of ScaleOut Software. Dryad emphasizes its parallel computation model, which can be integrated with file storage, he added. Hadoop integrates parallel data analysis with the data storage layer rather than solely residing in the business logic layer. Its use of its own distributed file system with data accessed from the file system "creates its storage integration and the associated complexity," Bain claimed

"Map/reduce is definitely lower-level than SQL," said O'Brien. "However, because it's more like a powerful mathematical technique than a black box whose internal workings are unclear, it's not necessarily 'more complex' to think through a tough data analysis problem in terms of map/reduce. Map: 'Farm out all of this identical work to a whole bunch of different computers.' Reduce: 'Gather those results and combine duplicates.' Feed the reduced dataset into another map/reduction process. Conceptually, that's pretty easy to reason with!"

Additionally, there are open-source map/reduce solutions for .NET. A project called hadoopdotnet ports Hadoop to the .NET platform. It is available under the Apache 2.0 open-source license. MySpace Qizmt, a map/reduce framework built using .NET, is another alternative. It is licensed under GNU General Public License v3.

Microsoft is preparing to provide Hadoop, a Java software framework for data-intensive distributed applications, for Microsoft Azure customers.

View Article



Comments

  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • Chuze Fitness is a fast-growing fitness chain with over 21 locations spanning California, Arizona and Colorado. Chief information and marketing officer, Kris Peterson, explains why access to fast and reliable Wi-Fi is a "must have" service at their gyms and why they switched to Ruckus Cloud Wi-Fi. Chuze Fitness needed to provide a good user experience to the hundreds of guests streaming music, podcasts and videos as they worked out. They also needed to adequately cover their sprawling 20-40,000 square foot …

  • The software-defined data center (SDDC) and new trends in cloud and virtualization bring increased agility, automation, and intelligent services and management to all areas of the data center. Businesses can now more easily manage the entire lifecycle of their applications and services via the SDDC. This Aberdeen analyst report examines how a strong foundation in both the cloud and internal data centers is empowering organizations to fully leverage their IT infrastructure and is also preparing them to be able …

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date