Hadoop Interview Questions For Freshers
  1. List characteristics of big data.
  2. Explain Hadoop MapReduce.
  3. Find out the word count on the example_data.txt (The content of the example_data.txt file is: coding,jamming,ice,river,man,driving) using MapReduce.
  4. What is shuffling in MapReduce?
  5. What is Yarn?
  6. List Hadoop HDFS Commands.
  7. What are the differences between regular FileSystem and HDFS?
  8. What are the two types of metadata that a NameNode server holds?
  9. What is the difference between a federation and high availability?
  10. If you have an input file of 350 MB, how many input splits would HDFS create and what would be the size of each input split?
  11. How does rack awareness work in HDFS?
  12. What would happen if you store too many small files in a cluster on HDFS?
  13. When do you use the dfsadmin -refreshNodes and rmadmin -refreshNodes commands?
  14. Is there any way to change the replication of files on HDFS after they are already written to HDFS?
  15. Who takes care of replication consistency in a Hadoop cluster?
  16. what do under/over replicated blocks mean?
  17. What role do RecordReader, Combiner, and Partitioner play in a MapReduce operation?
  18. Why is MapReduce slower in processing data in comparison to other processing frameworks?
  19. Is it possible to change the number of mappers to be created in a MapReduce job?
  20. Name some Hadoop-specific data types that are used in a MapReduce program.
  21. What is speculative execution in Hadoop?
  22. How is identity mapper different from chain mapper?
  23. What is the role of the OutputCommitter class in a MapReduce job?
  24. What happens when a node running a map task fails before sending the output to the reducer?
  25. What benefits did YARN bring in Hadoop 2.0 and how did it solve the issues of MapReduce v1?
  26. Can we have more than one ResourceManager in a YARN-based cluster?
  27. Why do we use Hadoop for Big Data?
  28. What are some limitations of Hadoop?
  29. What is indexing? How is indexing done in HDFS?
  30. What is meant by a block and block scanner?
  31. Explain the three core methods of a reducer.
  32. What are the different scheduling policies you can use in YARN?
  33. Why is block size set to 128 MB in Hadoop HDFS?
  34. How data or file is written into HDFS?
  35. Can multiple clients write into an HDFS file concurrently?
  36. How data or file is read in HDFS?
  37. Why HDFS stores data using commodity hardware despite the higher chance of failures?
  38. In HDFS, how Name node determines which data node to write on?
  39. Why is Reading done in parallel and writing is not in HDFS?
  40. What is Mapper in Hadoop?
  41. What is Reducer in Hadoop?
  42. How to set mappers and reducers for MapReduce jobs?
  43. What is the need of key-value pair to process the data in MapReduce?
  44. If no custom partitioner is defined in Hadoop then how is data partitioned before it is sent to the reducer?
  45. How to write a custom partitioner for a Hadoop MapReduce job?
  46. Why aggregation cannot be done in Mapper?
  47. Explain map-only job?
  48. Define Writable data types in Hadoop MapReduce.
Advanced Hadoop Interview Questions
  1. What is the difference between RDBMS with Hadoop MapReduce?
  2. When is it not recommended to use MapReduce paradigm for large scale data processing?
  3. Explain the usage of Context Object.
  4. How many InputSplits will be made by hadoop framework?
  5. How is the splitting of file invoked in Hadoop ?
  6. What are the parameters of mappers and reducers?
  7. What is Chain Mapper?
  8. Explain the process of spilling in MapReduce.
  9. How to add/delete a Node to the existing cluster?
  10. Is Namenode machine same as DataNode machine as in terms of hardware in Hadoop?
  11. How NameNode tackle Datanode failures in Hadoop?
  12. How many Reducers run for a MapReduce job?
  13. What counter in Hadoop MapReduce?
  14. What happen if number of reducer is set to 0 in Hadoop?
  15. What is KeyValueTextInputFormat in Hadoop?
  16. Explain about the partitioning, shuffle and sort phase in MapReduce?
  17. What is meant by streaming access?
  18. Explain what happens if, during the PUT operation, HDFS block is assigned a replication factor 1 instead of the default value 3?
  19. If DataNode increases, then do we need to upgrade NameNode in Hadoop?
Hadoop Interview Questions For Experienced
  1. What is meant by a heartbeat in HDFS?
  2. What is DistCp?
  3. Why are blocks in HDFS huge?
  4. What is the default replication factor?
  5. How can you skip the bad records in Hadoop?
  6. Where are the two types of metadata that NameNode server stores?
  7. Explain the purpose of the dfsadmin tool?
  8. Explain the actions followed by a Jobtracker in Hadoop.
  9. Explain the distributed Cache in MapReduce framework.
  10. List the actions that happen when a DataNode fails.
  11. What are the basic parameters of a mapper?
  12. Mention the main Configuration parameters that has to be specified by the user to run MapReduce.
  13. How can you restart NameNode and all the daemons in Hadoop?
  14. What is Apache Flume in Hadoop ?
  15. Mention the consequences of Distributed Applications.
  16. Explain how YARN allocates resources to an application with the help of its architecture.
  17. Explain Data Locality in Hadoop?
  18. What is Safemode in Hadoop?
  19. How is security achieved in Hadoop?
  20. Why does one remove or add nodes in a Hadoop cluster frequently?
  21. What is throughput in Hadoop?
  22. What does jps command do in Hadoop?
  23. What is fsck?
  24. How to debug Hadoop code?
  25. Explain Hadoop streaming?
  26. How Hadoop’s CLASSPATH plays a vital role in starting or stopping in Hadoop daemons?
  27. What is configured in /etc/hosts and what is its role in setting Hadoop cluster?
  28. How is the splitting of file invoked in Hadoop framework?
  29. How to provide multiple input to Hadoop?
  30. How to have hadoop job output in multiple directories?
  31. How to copy a file into HDFS with a different block size to that of existing block size configuration?
  32. Why HDFS performs replication, although it results in data redundancy?
  33. Explain Hadoop Archives?
  34. Explain the Single point of Failure in Hadoop?
  35. Explain Erasure Coding in Hadoop?
  36. What is Disk Balancer in Hadoop?
  37. Explain the difference between a MapReduce InputSplit and HDFS block using an example?
  38. What is a Backup node in Hadoop?
  39. What is active and passive NameNode in Hadoop?
  40. What are the most common OutputFormat in Hadoop?
  41. What is LazyOutputFormat in Hadoop?
  42. How to handle record boundaries in Text files or Sequence files in MapReduce InputSplits?
  43. What is Identity Mapper?
  44. What is Identity reducer?