- This processes images to determine the most common colour values:

- The source code for the k-means implementation is found under the k-means directory.
- This includes instructions to run.
- This calculates the average comment score per sub reddit and was used to compare frameworks.
- The source code for the reddit comment implementations is found under the reddit-comments directory.
- This has been grouped by framework (couchDB, Hadoop, Spark, Cloud Haskell).
- The sequential Java version is found within the Hadoop source code or here.
- The data set is taken from here. We uncompressed it and took the first 20,000,000 lines (approx 11GB of JSON).
- The latest binaries for all implementations are found zipped on the releases page.
- This includes input images/video (see the resources directory) and instructions to run so you can reproduce our results.

Image credit: http://www.well-typed.com/blog/73/