C implementation of FRAMES[0], data-driven windows
The executables takes several inputs to customize the framing operation on the input file, here is the ordered list of the program arguments (all required):
- (string)
input file pathpath to the csv file representing the events to be processed, should be formatted as "ts, key, value" - (int)
frame typeTHRESHOLD = 0 | DELTA = 1 | AGGREGATE = 2 - (int)
report policyON CLOSE = 0 | ON UPDATE = 1 - (int)
order policyIN ORDER = 0 | OUT OF ORDER = 1 - (int)
buffer typeSINGLE BUFFER = 0 | MULTI BUFFER = 1 - (int)
XSINGLE BUFFER: after X frames created / MULTI BUFFER: after X ms passed / X = -1 to not evict frames - (int)
Yevict the older Y frames
- (string)
input file pathpath to the csv file representing the events to be processed, should be formatted as "ts, key, value" - (int)
frame typeTHRESHOLD = 0 | DELTA = 1 | AGGREGATE = 2
Several define inside the single/multi_buffer.c file are used to customize the execution:
THRESHOLDthreshold of the event's value being evaluated for the Threshold Frames constructionDELTAa Delta Frame is emitted whenever the delta between the minimum and maximum value of "value" becomes greater than this parametersAGGREGATEspecifies the aggregation function for the Aggregate Frames construction (AVG = 0 | SUM = 1)AGGREGATE THRESHOLDreports a new frame if the aggregate value becomes greater than this parameter
MAX CHARSMax admitted characters in a line representing the eventMAX FRAMESMax size of multi-buffer, pay attention to choosing an eviction policy that does not cause overflow
DEBUGset totrueto print the current frame when the report policy is satisfied
To perform the evaluation, we measure the execution time of the SECRET[1] methods while framing the input stream of events, we also save the number of tuples and frames created until that moment to measure the algorithmic complexity of the program. To customize an evaluation, it is possible to specify which tests will be executed: you can find in the /evaluation floder two .ini configuration file, one for single and one for multi buffer, list the commands set for all the configuration that you want to run following the Usage instructions, then run the python evaluation script corresponding to the chosen buffer structure in the main folder. The output files will be saved in the evaluation/results folder, in a format that can be processed on the jupyther notebook available in the evaluation folder.
In the resources folder you can find the dataset used to perform the evaluation, it is also available a script ooo_generator/file_generator.py to create an input csv file containing out of order events. Use the config.json file to configure a new input stream file, created from an existing one, with some delayed events, pay attention to include the name of the new file (with .csv extension) in the output_dir field.
A technical report of this project is available in the docs folder
[0] Grossniklaus, Michael & Maier, David & Miller, James & Moorthy, Sharmadha & Tufte, Kristin. (2016). Frames: Data-driven Windows. 13-24. 10.1145/2933267.2933304.
[1] Botan, Irina & Derakhshan, Roozbeh & Dindar, Nihal & Haas, Laura & Miller, Renée & Tatbul, Nesime. (2010). SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. PVLDB. 3. 232-243.