Blog

What is Hadoop? Features, Pros and Cons, and Reviews

Written by Spencer Nguyen | December 9, 2020

Ever wondered “What is Hadoop?” This article is for you. Below, we’ll go over everything you need to know, including the features of Hadoop, the pros and cons of Hadoop, and what real users have to say in Hadoop reviews. In particular, we'll discuss:

What is Hadoop?
Features
Pros and Cons
Hadoop Reviews

Did you know you can generate a full-featured, documented, and secure REST API in minutes using DreamFactory? Sign up for our free 14 day hosted trial to learn how! Our guided tour will show you how to create an API using an example MySQL database provided to you as part of the trial!

Create Your RESTful API Now

What is Hadoop?

Apache Hadoop is a software framework for distributed processing of very large data sets maintained by the Apache Software Foundation, a non-profit open-source software development community. The Hadoop project was first released to the public in 2006, based on work by Doug Cutting and Mike Cafarella at Yahoo.

Today, hundreds of major tech companies use Hadoop as part of their tech stack, from IBM and Amazon to Uber and Airbnb. Reportedly, Yahoo and Facebook have the largest Hadoop clusters in the world: Yahoo, for one, has more than 100,000 CPUs in 40,000 servers running Hadoop, with a total data storage size of 455 petabytes (455,000,000 gigabytes).

Features

Hadoop consists of five different modules:

  • Hadoop Common: The collection of common utilities and libraries that support the other Hadoop modules.
  • Hadoop Distributed File System (HDFS): A distributed file system for processing very large unstructured data sets, designed to improve the scalability of Hadoop clusters by running on commodity hardware.
  • Hadoop YARN: Short for “Yet Another Resource Navigator”; used for job scheduling and cluster data management.
  • Hadoop MapReduce: A programming model for processing very large data in parallel that consists of two steps: a map step that applies some map or function to each data element, and a reduce step that combines data and reduces the size of the data set.
  • Hadoop Ozone: A distributed object store designed for big data.

Pros and Cons

The pros of using Hadoop include:

  • Cost-effective: Hadoop is a free and open-source project—you don’t have to pay a cent to use it, and you can modify its source code as necessary. What’s more, Hadoop was designed to run on low-cost commodity hardware, not massive supercomputers, so even businesses with limited IT budgets can use it.
  • Highly scalable: Because Hadoop is able to divide up computation among several or many machines, scalability is one of the project’s primary goals. It’s easy to improve the speed and performance of a Hadoop system through horizontal scaling.
  • Flexibility: Hadoop is designed to process many different data types, including structured, semi-structured, and unstructured data. This means that Hadoop has a wide variety of applications, from tracking real-time social media to performing fraud detection.

The cons of using Hadoop include:

  • Lack of support: The downside to Hadoop being free and open-source is that you won’t have access to dedicated support and maintenance. While the Hadoop community is large and helpful, businesses who depend on Hadoop for their daily operations will likely need to use a paid “Hadoop as a service” offering.
  • Limited use cases: Hadoop performs best on a small number ofvery large files. On the other hand, using Hadoop for smaller data sets, or for a large number of small files, will limit the performance gains that you see.

Hadoop Reviews

With all that said, is Hadoop the right choice for you? Hadoop user reviews are generally positive. On the software review website TrustRadius, for example, Hadoop has an average of 8.5 out of 10 stars, based on 244 ratings. Senior network administrator Mark McCully gives Hadoop a score of 9 out of 10, writing:

“Hadoop has allowed us to scale out a few of our tier-1, customer facing applications to provide very fast access to reports and analytics. It was easily implemented by our Linux team and onboarded by our Hadoop admins. Hadoop has been a very stable platform and only goes down due to server patching or other maintenance.”

Meanwhile, data engineer Johanes Siregar gives Hadoop a score of 8 out of 10, writing:

“Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes… Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on a particular proprietary technology.”

However, he also notes a few issues with the platform, adding: “User and access management are still challenging to implement in Hadoop… Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.”

Did you know you can generate a full-featured, documented, and secure REST API in minutes using DreamFactory? Sign up for our free 14 day hosted trial to learn how! Our guided tour will show you how to create an API using an example MySQL database provided to you as part of the trial!

Create Your RESTful API Now

Conclusion

Hadoop is one of the most popular choices for distributed processing of very large data sets—and that’s why DreamFactory has decided to commence development of a Hadoop connector as part of it's integration suite. DreamFactory makes it easy to automatically generate and manage APIs without writing a single line of code, making you more productive and profitable.

Want to learn how DreamFactory can digitally transform your business? Get in touch with our team today for a chat about your business needs and objectives, or to start your free trial of the DreamFactory platform.