First Hadoop job
Dataset You may generate Yourself something, You may take some text file or just google the internet to find data sets from random to specially prepared to experiment with machine learning and AI. Code I started with the immortal word count. By and large googled bigger parts of this code and tried to glue from those fragments something myself. The biggest problem was the fact the API was a subject for a change. The below code works on Hadoop 2.6, so not sure if it refers to the most fresh API version. By and large the code schema as follows: public class WordCount // convenient constant object to initialize the keys in a map private final static IntWritable one = new IntWritable(1); // the code trigger public static void main (String [] args) throws Exception { // in version 1.x it was done differently Configuration c = new Configuration(); [..] Job j = Job.getInstance(c, "wordcount"); // main class declaration j.setJarByClass(WordCount....