Answer how Stratosphere compares to Apache Spark

This message from our mailing list, posted by @fhueske might be a good skeleton:

Similar to Spark, Stratosphere is a complete data processing system, i.e., it has a programming API, a program compiler (optimizer), and an own execution runtime.
It is also an alternative for Hadoop MapReduce and in several design points quite similar to Spark:
- Programs are executed as DAGs
- Higher-level programming primitives (compared to Hadoop MR)
- APIs in Scala and Java
- Reads data from external data stores (has no own data storage), e.g., HDFS, S3, RDBMS, ...

However, Stratosphere is also different in some aspects:
- Database-inspired processing using pipelining, gradually going to disk if memory is not sufficient (Hybridhash Joins, external sorts)
- Sophisticated cost-based optimizer choosing execution strategies (broadcasting vs. partitioning, sort vs. hash joins, ...)
- Implemented in Java (in contrast to Spark which uses Scala)
- No intermediate result materialization in memory (this is on the roadmap)

Stratosphere and Spark can be rather seen as alternatives. 
We do not build on any of Sparks components as we have our own programming API and execution engine.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Answer how Stratosphere compares to Apache Spark #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Answer how Stratosphere compares to Apache Spark #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions