Skip to content

Spark Streaming basic

Vaquar Khan edited this page Jan 2, 2017 · 6 revisions

First, we create a JavaStreamingContext object, which is the main entry point for all streaming functionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.

  import org.apache.spark.*;
  import org.apache.spark.api.java.function.*;
  import org.apache.spark.streaming.*;
  import org.apache.spark.streaming.api.java.*;
  import scala.Tuple2;

// Create a local StreamingContext with two working thread and batch interval of 1 second
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount");
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));

// Create a DStream that will connect to hostname:port, like localhost:9999

 JavaReceiverInputDStream<String> lines = jssc.socketTextStream("localhost", 9999);

// Split each line into words

   JavaDStream<String> words = lines.flatMap(
    new FlatMapFunction<String, String>() {
    @Override public Iterator<String> call(String x) {
       return Arrays.asList(x.split(" ")).iterator();
    }
   });

Note that when these lines are executed, Spark Streaming only sets up the computation it will perform after it is started, and no real processing has started yet. To start the processing after all the transformations have been setup, we finally call start method.

jssc.start();              // Start the computation
jssc.awaitTermination();   // Wait for the computation to terminate

http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams

http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations

Clone this wiki locally