You are here

Stateful Adaptive Streams with Approximate Computing and Elastic Scaling

TitleStateful Adaptive Streams with Approximate Computing and Elastic Scaling
Publication TypeConference Paper
Year of Publication2023
AuthorsFrancisco, J, Coimbra, ME, Neto, PFernandes, Freitag, F, Veiga, L
Conference NameProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
PublisherAssociation for Computing Machinery
Conference LocationNew York, NY, USA
ISBN Number9781450395175
Keywordsadaptive stream processing, apache flink, approximate computation, stateful functions
AbstractThe model of approximate computing can be used to increase performance or optimize resource usage in stream and graph processing. It can be used to satisfy performance requirements (e.g., throughput, lag) in stream processing by reducing the effort that applications need to process datasets. There are currently multiple stream processing platforms, and most of them do not natively support approximate results. A recent one, Stateful Functions, is an API that uses Flink to enable developers to easily build stream and graph processing applications. It also retains Flink's features like stateful computations, fault-tolerance, scalability, control events and its graph processing library Gelly. Herein we present Approxate, an extension over this platform to support approximate results. It can also support more efficient stream and graph processing by allocating available resources adaptively, driven by user-defined requirements on throughput, lag, and latency. This extension enables flexibility in computational trade-offs such as trading accuracy for performance. The user can choose which metrics should be guaranteed at the cost of others, and/or the accuracy. Approxate incorporates approximate computing (using load shedding) with adaptive accuracy and resource manegement in state-of-the-art stream processing platforms, which are not targeted in other relevant related work. It does not require significant modifications to application code, and minimizes imbalance in data source representation when dropping events.