Questions tagged [hadoop]

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

23 questions
4
votes
2 answers

how to convince other we should move to hadoop?

Everything I've read about Hadoop seems like exactly the technology we need to make our enterprise more scalable. We have terabytes of raw data that is in non-relational form (text files of some kind). We're quickly approaching the upper limits of…
Ramy
  • 162
4
votes
3 answers

Why do HDFS clusters have only a single NameNode?

I'm trying to understand better how Hadoop works, and I'm reading The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is…
grautur
  • 141
3
votes
3 answers

Is cloudera hadoop certification worth the investment

I am considering investing time to learn Hadoop and it's related technologies. The problem is that my current day job will not be using Hadoop any time soon and even if I learn from books, blogs personal projects I will not have much to backup when…
geoaxis
  • 237
  • 1
  • 2
  • 7
2
votes
1 answer

How best to implement a Dashboard from data in HDFS/Hadoop

We have a bunch of data (several TB) in Hadoop HDFS and it's growing. We want to create a dashboard that reports on the contents in there e.g counts of different types of objects, trends over time etc. Our first thought was to use something like…
kellyfj
  • 131