-
Notifications
You must be signed in to change notification settings - Fork 15
Spark Streaming basic
First, we create a JavaStreamingContext object, which is the main entry point for all streaming functionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.
import org.apache.spark.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java.*;
import scala.Tuple2;
// Create a local StreamingContext with two working thread and batch interval of 1 second
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));
// Create a DStream that will connect to hostname:port, like localhost:9999
JavaReceiverInputDStream<String> lines = jssc.socketTextStream("localhost", 9999);
// Split each line into words
JavaDStream<String> words = lines.flatMap(
new FlatMapFunction<String, String>() {
@Override public Iterator<String> call(String x) {
return Arrays.asList(x.split(" ")).iterator();
}
});
Note that when these lines are executed, Spark Streaming only sets up the computation it will perform after it is started, and no real processing has started yet. To start the processing after all the transformations have been setup, we finally call start method.
jssc.start(); // Start the computation
jssc.awaitTermination(); // Wait for the computation to terminate
http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations