Apache Spark

Features

  • Real time processing: it uses in memory computation, which makes it really fast.
  • Generality: combines SQL, streaming and complex analytics
  • Speed: 100x faster than Hadoop
  • Deployment: easily deployed
  • Ease of use: Application be written easily in java, scala, Python, R and SQL.
  • Powerful caching: provides powerful cache system.

Questions:

  1. How are spark and Hadoop connected? I thought its completely different big data platforms. While downloading it asked for Choose package type: Pre-built for Apache Hadoop 2.7, what is the meaning of this?

Data warehousing and Data management BigData