-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
This message from our mailing list, posted by @fhueske might be a good skeleton:
Similar to Spark, Stratosphere is a complete data processing system, i.e., it has a programming API, a program compiler (optimizer), and an own execution runtime.
It is also an alternative for Hadoop MapReduce and in several design points quite similar to Spark:
- Programs are executed as DAGs
- Higher-level programming primitives (compared to Hadoop MR)
- APIs in Scala and Java
- Reads data from external data stores (has no own data storage), e.g., HDFS, S3, RDBMS, ...
However, Stratosphere is also different in some aspects:
- Database-inspired processing using pipelining, gradually going to disk if memory is not sufficient (Hybridhash Joins, external sorts)
- Sophisticated cost-based optimizer choosing execution strategies (broadcasting vs. partitioning, sort vs. hash joins, ...)
- Implemented in Java (in contrast to Spark which uses Scala)
- No intermediate result materialization in memory (this is on the roadmap)
Stratosphere and Spark can be rather seen as alternatives.
We do not build on any of Sparks components as we have our own programming API and execution engine.
Metadata
Metadata
Assignees
Labels
No labels