Types
Even though the previous pseudo-code is written in terms of string inputs and outputs, conceptually the map and reduce functions supplied by the user have associated types:map (k1,v1) ! list(k2,v2)
reduce (k2,list(v2)) ! list(v2)
I.e., the input keys and values are drawn from a different domain than the output keys and values. Furthermore, the intermediate keys and values are from the same domain as the output keys and values. Our C++ implementation passes strings to and from the user-de_ned functions and leaves it to the user code to convert between strings and appropriate types.
Inverted Index: The map function parses each document, and emits a sequence of hword; document IDi pairs. The reduce function accepts all pairs for a given word, sorts the corresponding document IDs and emits a hword; list(document ID)i pair. The set of all output pairs forms a simple inverted index. It is easy to augment this computation to keep track of word positions.
Distributed Sort: The map function extracts the key from each record, and emits a hkey; recordi pair. The reduce function emits all pairs unchanged.

No comments:
Post a Comment