Find out the word count on the example_data.txt (The content of the example_data.txt file is: coding,jamming,ice,river,man,driving) using MapReduce.
To find out the word count on the example_data.txt using MapReduce we will be looking for the unique words and the number of times those unique words appeared.
- First, we break the input into three divisions-
- coding, ice, jamming,
- river, driving, ice
- man, ice, jamming This will share the work among all the map nodes.
- Then, all the words are tokenized in each of the mappers, and a hardcoded value (1) to each of the tokens is given. The reason behind giving a hardcoded value equal to 1 is that every word by itself will, at least, occur once.
- Now, a list of key-value pairs will be created where the key is nothing but the individual words and value is one. So, for the first line (Coding Ice Jamming), we have three key-value pairs – Coding, 1; Ice, 1; Jamming, 1.
- The mapping process persists the same on all the nodes.
- Next, a partition process occurs where sorting and shuffling follow so that all the tuples with the same key are sent to the identical reducer.
- Subsequent to the sorting and shuffling phase, every reducer will have a unique key and a list of values matching that very key. For example, Coding, [1,1]; Ice, [1,1,1].., etc.
- Now, each Reducer adds the values which are present in that list of values. The reducer gets a list of values [1,1] for the key Jamming. Then, it adds the number of ones in the same list and gives the final output as – Jamming, 2.
- Lastly, all the output key/value pairs are then assembled and written in the output file.