Learn to write real, working data-driven Java programs that can run in parallel on multiple machines by using Hadoop.
Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.
All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop’s MapReduce and HDFS components originally derived respectively from Google’s MapReduce and Google File System (GFS) papers.
https://www.classcentral.com/course/udemy-java-parallel-computation-on-hadoop-in-4-ho-66615

Leave a Reply