Which process describes the lifecycle of a Mapper?
A . The JobTracker calls the TaskTracker’s configure () method, then its map () method and finally its close () method.
B . The TaskTracker spawns a new Mapper to process all records in a single input split.
C . The TaskTracker spawns a new Mapper to process each key-value pair.
D . The JobTracker spawns a new Mapper to process all records in a single file.
Answer: B
Explanation:
For each map instance that runs, the TaskTracker creates a new instance of your mapper.
Note:
* The Mapper is responsible for processing Key/Value pairs obtained from the InputFormat. The mapper may perform a number of Extraction and Transformation functions on the
Key/Value pair before ultimately outputting none, one or many Key/Value pairs of the same, or different Key/Value type.
* With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defines an ‘Identity’ map function by default – every input Key/Value pair obtained from the InputFormat is written out.
Examining the run() method, we can see the lifecycle of the mapper: /**
* Expert users can override this method for more complete control over the
* execution of the Mapper.
* @param context
* @throws IOException
*/
public void run(Context context) throws IOException, InterruptedException { setup(context);
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
cleanup(context);
}
setup(Context) – Perform any setup for the mapper. The default implementation is a no-op method.
map(Key, Value, Context) – Perform a map operation in the given Key / Value pair. The default implementation calls Context.write(Key, Value)
cleanup(Context) – Perform any cleanup for the mapper. The default implementation is a no-op method.
Reference: Hadoop/MapReduce/Mapper