By Joel Westerberg at Follow the data
This is a short tutorial to explain the concept of map/reduce. This tutorial can be executed on a Unix system, like Linux or OS X. We’ll first process the data sequentially and then with parallel mapper tasks. As a simple example we will try to compile a list of prime numbers from some text files containing numbers (some prime, some not) and then calculate the sum of all the primes found. Finding primes can be parallelized and is thus on the map side of the algorithm but calculating the sum cannot and is therefore our reduce function. Let’s first start out with creating some test data that is easy to debug, and small, so it’ll run fast. We’ll do this in a terminal shell using ruby. The -e options tells ruby to evaluate the string, and the “>” redirects the output to the filename after.