General
  1. What is PageRank in GraphX?
  2. What is the significance of Sliding Window operation?
  3. What do you understand by Transformations in Spark?
  4. What makes Spark good at low latency workloads like graph processing and Machine Learning?
  5. How is Streaming implemented in Spark?
  6. Explain Idle Assessment.
  7. Illustrate some demerits of using Spark.
  8. What are the types of Transformation on DStream?
  9. Explain Executor Memory in a Spark application.
  10. Why is BlinkDB used?
  11. Hadoop uses replication to achieve fault tolerance. How is this achieved in Apache Spark?
  12. Explain the key features of Apache Spark.
  13. What are the functions of Spark SQL?
  14. What are Spark Datasets?
  15. Explain Caching in Spark Streaming.
  16. What is Executor Memory in a Spark application?
  17. What does MLlib do?
  18. What is YARN in Spark?
  19. Is there any benefit of learning MapReduce if Spark is better than MapReduce?
  20. What are benefits of Spark over MapReduce?
  21. What is shuffling in Spark? When does it occur?
  22. What is the difference between CreateOrReplaceTempView and createGlobalTempView?
  23. How can you trigger automatic clean-ups in Spark to handle accumulated metadata?
  24. Compare map() and flatMap() in Spark.
  25. Define Actions in Spark.
  26. Explain Sparse Vector.
  27. What is a Parquet file?
  28. Explain Pair RDD.
  29. What do you understand by SchemaRDD in Apache Spark RDD?
  30. Explain Lazy Evaluation.
  31. What are the various levels of persistence in Apache Spark?
  32. What are receivers in Apache Spark Streaming?
  33. What is SparkContext in PySpark?
  34. Does Apache Spark provide check pointing?
  35. Under what scenarios do you use Client and Cluster modes for deployment?
  36. Name the types of Cluster Managers in Spark.
  37. What is a Lineage Graph?
  38. What is RDD?
  39. What are DStreams?
  40. What are scalar and aggregate functions in Spark SQL?
  41. Explain coalesce in Spark.
  42. How does Spark Streaming handle caching?
  43. What is RDD Lineage?
  44. Why is there a need for broadcast variables when working with Apache Spark?
  45. What are the steps involved in structured API execution in Spark?
  46. Explain Immutable in reference to Spark.
  47. What are the various functionalities supported by Spark Core?
  48. What do you understand by worker node?
  49. Define Partitions in Apache Spark.
  50. How can you trigger automatic clean-ups in Spark to handle accumulated metadata?
  51. Explain receivers in Spark Streaming.
  52. Does Apache Spark provide checkpoints?
  53. What is the role of accumulators in Spark?