by Tony Harris • December 8, 2020
Ever wondered “What is Hadoop?” This article is for you. Below, we’ll go over everything you need to know, including the features of Hadoop, the pros and cons of Hadoop, and what real users have to say in Hadoop reviews. In particular, we’ll discuss:
What is Hadoop?FeaturesPros and ConsHadoop Reviews
Did you know you can generate a full-featured, documented, and secure REST API in minutes using DreamFactory? Sign up for our free 14 day hosted trial to learn how! Our guided tour will show you how to create an API using an example MySQL database provided to you as part of the trial!
Create Your RESTful API Now
Apache Hadoop is a software framework for distributed processing of very large data sets maintained by the Apache Software Foundation, a non-profit open-source software development community. The Hadoop project was first released to the public in 2006, based on work by Doug Cutting and Mike Cafarella at Yahoo.
Today, hundreds of major tech companies use Hadoop as part of their tech stack, from IBM and Amazon to Uber and Airbnb. Reportedly, Yahoo and Facebook have the largest Hadoop clusters in the world: Yahoo, for one, has more than 100,000 CPUs in 40,000 servers running Hadoop, with a total data storage size of 455 petabytes (455,000,000 gigabytes).
Hadoop consists of five different modules:
The pros of using Hadoop include:
The cons of using Hadoop include:
With all that said, is Hadoop the right choice for you? Hadoop user reviews are generally positive. On the software review website TrustRadius, for example, Hadoop has an average of 8.5 out of 10 stars, based on 244 ratings. Senior network administrator Mark McCully gives Hadoop a score of 9 out of 10, writing:
“Hadoop has allowed us to scale out a few of our tier-1, customer facing applications to provide very fast access to reports and analytics. It was easily implemented by our Linux team and onboarded by our Hadoop admins. Hadoop has been a very stable platform and only goes down due to server patching or other maintenance.”
Meanwhile, data engineer Johanes Siregar gives Hadoop a score of 8 out of 10, writing:
“Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes… Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on a particular proprietary technology.”
However, he also notes a few issues with the platform, adding: “User and access management are still challenging to implement in Hadoop… Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.”
Hadoop is one of the most popular choices for distributed processing of very large data sets—and that’s why DreamFactory has decided to commence development of a Hadoop connector as part of it’s integration suite. DreamFactory makes it easy to automatically generate and manage APIs without writing a single line of code, making you more productive and profitable.
Want to learn how DreamFactory can digitally transform your business? Get in touch with our team today for a chat about your business needs and objectives, or to start your free trial of the DreamFactory platform.
Join the DreamFactory newsletter list.