1. What is a SequenceFile?
2. Is there a map input format?
3. In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
4. Which of the following best describes the workings of TextInputFormat?
5. Which of the following statements most accurately describes the relationship between MapReduce and Pig?
6. You need to import a portion of a relational database every day as files to HDFS, and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this?
7. You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
8. Workflows expressed in Oozie can contain:
9. You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use?
10. Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
11. You are running a Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undetected.?
12. Which of49. What happens if mapper output does not match reducer input? the following scenarios makes HDFS unavailable?
13. Which MapReduce stage serves as a barrier, where all previous stages must be completed before it may proceed?
14. Which of the following statements most accurately describes the general approach to error recovery when using MapReduce?
15. The Combine stage, if present, must perform the same aggregation operation as Reduce.
16. What is the implementation language of the Hadoop MapReduce framework?
17. Which of the following MapReduce execution frameworks focus on execution in shared-memory environments?
18. How can a distributed filesystem such as HDFS provide opportunities for optimization of a MapReduce operation?
19. What is the input to the Reduce function?
20. Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution?
21. The size of block in HDFS is
22. The switch given to “hadoop fs” command for detailed help is
23. RPC means
24. Which method of the FileSystem object is used for reading a file in HDFS
25. How many states does Writable interface defines
26. What are supported programming languages for Map Reduce?
27. How does Hadoop process large volumes of data?
28. What are sequence files and why are they important?
29. What are map files and why are they important?
30. How can you use binary data in MapReduce?
31. What is map - side join?
32. What is reduce - side join?
33. What is HIVE?
34. What is PIG?
35. How can you disable the reduce step?
36. Why would a developer create a map-reduce without the reduce step?
37. What is the default input format?
38. How can you overwrite the default input format?
39. What are the common problems with map-side join?
40. Which is faster: Map-side join or Reduce-side join? Why?
41. Will settings using Java API overwrite values in configuration files?
42. What is AVRO?
43. Can you run Map - Reduce jobs directly on Avro data?
44. What is distributed cache?
45. What is the best performance one can expect from a Hadoop cluster?
46. What is writable?
47. The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes. What are these writable data types optimized for?
48. Can a custom type for data Map-Reduce processing be implemented?
49. What happens if mapper output does not match reducer input?
50. Can you provide multiple input paths to a map-reduce jobs?
2. Is there a map input format?
3. In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
4. Which of the following best describes the workings of TextInputFormat?
5. Which of the following statements most accurately describes the relationship between MapReduce and Pig?
6. You need to import a portion of a relational database every day as files to HDFS, and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this?
7. You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
8. Workflows expressed in Oozie can contain:
9. You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use?
10. Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
11. You are running a Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undetected.?
12. Which of49. What happens if mapper output does not match reducer input? the following scenarios makes HDFS unavailable?
13. Which MapReduce stage serves as a barrier, where all previous stages must be completed before it may proceed?
14. Which of the following statements most accurately describes the general approach to error recovery when using MapReduce?
15. The Combine stage, if present, must perform the same aggregation operation as Reduce.
16. What is the implementation language of the Hadoop MapReduce framework?
17. Which of the following MapReduce execution frameworks focus on execution in shared-memory environments?
18. How can a distributed filesystem such as HDFS provide opportunities for optimization of a MapReduce operation?
19. What is the input to the Reduce function?
20. Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution?
21. The size of block in HDFS is
22. The switch given to “hadoop fs” command for detailed help is
23. RPC means
24. Which method of the FileSystem object is used for reading a file in HDFS
25. How many states does Writable interface defines
26. What are supported programming languages for Map Reduce?
27. How does Hadoop process large volumes of data?
28. What are sequence files and why are they important?
29. What are map files and why are they important?
30. How can you use binary data in MapReduce?
31. What is map - side join?
32. What is reduce - side join?
33. What is HIVE?
34. What is PIG?
35. How can you disable the reduce step?
36. Why would a developer create a map-reduce without the reduce step?
37. What is the default input format?
38. How can you overwrite the default input format?
39. What are the common problems with map-side join?
40. Which is faster: Map-side join or Reduce-side join? Why?
41. Will settings using Java API overwrite values in configuration files?
42. What is AVRO?
43. Can you run Map - Reduce jobs directly on Avro data?
44. What is distributed cache?
45. What is the best performance one can expect from a Hadoop cluster?
46. What is writable?
47. The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes. What are these writable data types optimized for?
48. Can a custom type for data Map-Reduce processing be implemented?
49. What happens if mapper output does not match reducer input?
50. Can you provide multiple input paths to a map-reduce jobs?