Introduction Apache Hadoop is an open-source software framework used for analyzing dataset of big data . It consists of clusters built from hardware. Hadoop provides Hadoop Distributed File System (HDFS) as well as a Java API that allows parallel processing across the nodes of the cluster. In the world of big data, laptop memories is never enough, and sometimes not even close so by implementing hadoop, it gives us the ability to store more data and process the data faster and efficiently by adding new hardware without limits. Hadoop Distributions Hortonworks Data Platform: Hortonworks Hadoop stands in the list of Top 100 winners of “Red Herring”.
Hortonworks is a Hadoop company that drives open source Hadoop distributions in the IT market. The main goal of Hortonworks is to provide all its innovations through the Hadoop open data platform and build an ecosystem of partners that speeds up the process of Hadoop adoption amongst enterprises.Apache Ambari is a Hadoop cluster management console made by Hortonworks Hadoop vendor for providing storage and managing Hadoop clusters.
The Hortonworks Hadoop vendor works with some giant accounts like Samsung, Spotify, Bloomberg and eBay. Hortonworks has garnered strong collaboration with RedHat, Microsoft, SAP and Teradata. Cloudera Hadoop Distribution: Cloudera Hadoop Vendor provies cluster since 2008 and ranks top in the big data business. Cloudera was formed by a group of engineers from Yahoo, Google and Facebook . Cloudera is focused on providing asnwers of Hadoop with customer support and training. Cloudera Hadoop has close to 350 clients including the U.S Army, All State and Monsanto.
Some of them boast of deploying 1000 nodes on a Hadoop cluster to crunch big data analytics for one petabyte of data. Cloudera partners are Oracle, IBM, HP, NetApp and MongoDB. MapR Hadoop Distribution: MapR has been known for it’s distributions in Hadoop.
MapR has been named in the Gartner report “Cool Vendors in Information Infrastructure and Big Data, 2012.” MapR stood at the first place for its Hadoop distributions against all other vendors. MapR has made huge investments to get over the obstacles to worldwide adoption of Hadoop which include enterprise grade reliability, data security, integrating Hadoop easily into existing environment and infrastructure to provide support for real time operations. In 2015, MapR made much more investments to maintain its place in the Hadoop vendors list. MapR is to announce its technical innovations for Hadoop with the intent of supporting ‘business-as-it-happens’- to increase investments, minimize risks and reduce expenditure. Amazon Web Services: The Amazon Hadoop Vendor is one of the oldest vendors of Hadoop, and it is famous for the innovation of Hadoop distributions in the open data platform. AWS Elastic MapReduce renders an easy to use and well-organized data analytics platform built on the powerful HDFS architecture. AWS major focus is on MapR queries, AWS EMR provides a high scale and safer systems platform to its users.
Amazon Web Services EMR stands in one of the top commercial Hadoop distributions with the highest market share leading the global market. Comparison Between Hortonworks, Cloudera ; MapR Hortonworks Cloudera MapR Data ingest Batch Batch Batch with streaming writes HBase Latency Spikes Latency Spikes Consistent low Latency NoSQL Application Batch application Batch application Batch application with real time applications High Availability Single failure recovery Single failure recovery Self-healing across multiple failures Disaster Recovery No File copy scheduling Mirroring Volume Support No No Yes Management Tools Ambari Cloudera management MapR control system Fig 2.1 Conclusion With the demand for big data technologies expanding rapidly, Apache Hadoop is at the heart of the big data revolution.
Yahoo is the biggest user of hadoop. From fig 2.1 it can be seen that MapR hadoop distribution is better than horton and cloudera as it provides better features such as real time applications, volume support, better high availability feature and many more. MapR hadoop distribution can be used in attendance monitoring in school and it can further be used in companies for market analysis. References 1 Ronald Taylor for an overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics in Bioinformatics Open Source Conference 2010.
2 “Welcome to Apache Hadoop!”. hadoop.apache.org. Retrieved 2016-08-25. 3 Deepak Motwani, V.K.
Chaubey, A. S. Saxena for Hadoop based Information Extract from Text Document in IJSRSET, January 2016. 4 Preeti Joshi, Arati Bhandari, Kalyani Jamunkar , Kanchan Warghade , Priyanka Lokhande for Network Traffic Analysis Measurement and Classification Using Hadoop in IJARCCE, March 2016. 5 Prity Vijay, Bright Keshwani for Emergence of Big Data with Hadoop in IOSRJEN, March 2016.