


16-11-2021
Authors: Fakrudeen Ali Ahmed, Jianmei Ye, and Jody Arthur
This post offers a behind-the-scenes look at Adobe Experience Platform process for evaluating streaming frameworks for large-scale event processing to provide better customer experiences with Adobe Experience Cloud.
Adobe Experience Cloud provides personalization, advanced analytics, and other services to help enterprises using Adobe Experience Platform meet the rising customer expectations for personalized experiences. Helping our customers deliver these kinds of experiences in real-time requires an ability to process huge amounts of data while at the same time ensuring reliability and peak performance.
Every day, billions of events are uploaded to Adobe Experience Platform and into Adobe Experience Cloud in batches and processed with daily or weekly workflows on Apache Spark. And the volume of data our system receives is growing fast. We evaluated four different streaming frameworks in order to find the best one to support our need for large-scale event processing in order to process an ever-increasing volume of data needed by our customers.
This post describes our methods, which development teams in other organizations can use as a model for evaluating streaming frameworks for their own use cases.
Our primary goal was to convert the static workflow we built for our weekly batch event processing in Adobe Experience Cloud into a real-time streaming workflow with sub-second latency. There are many frameworks available to support event streaming. After doing some initial research to narrow down the list, we chose to evaluate Storm, Flink, Samza, and Spark based on their feature set relative to our particular use case. In this section, we provide a brief description of each of these frameworks in terms of its potential benefits for our use case.
Apache Storm is an open-source real-time streaming framework that integrates with any queueing and database technologies that may already exist in the stack and can be used with any programming language.
Storm is fast, too. It's ability to process streaming data has been measured at more than a million events per second per node. Storm topology is able to consume streams of data and process them in arbitrarily complex ways and repartitioning them from node to node in any way required for the workflow.
Figure 1: Example of a streaming dataflow in Storm. (Source: Storm.Apache.org)
Apache Flink supports both streaming and batch processing. Flink runs in all common cluster environments and is capable of performing computations at in-memory speed and at any scale. By precisely controlling time and state, Flink's runtime can run any kind of application on unbounded streams. When configured for high availability, Flink does not have a single point of failure and is capable of scaling to thousands of cores and terabytes of application state, while still delivering high throughput and low latency.
Figure 2: Example of a streaming dataflow in Flink. (Source: Apache.org)
Figure 3: Overview of Flink streaming. (Source: Apache.org)
Apache Spark Streaming is an extension of the core Spark API. Spark Streaming is very scalable, offering high-throughput, fault-tolerant processing of live data streams. Data can be ingested from many sources, processed and pushed out to filesystems, databases, and live dashboards.
Figure 4: High-level architecture of Spark. (Source: Spark.Apache.org)
It is important to note that Spark with Spark Streaming is not a native streaming framework. Unlike the other three systems, we evaluated Spark Streaming processes data in micro-batches, which is a variant of traditional batch processing. Because it allows processing in very small batches, those batches can be processed in rapid succession, closely mimicking real-time streaming of discrete event data.
It could be argued that comparing Spark Streaming with native streaming frameworks isn't really a fair test. However, Spark is so widely used in the industry, one might reasonably assume it to be a logical choice, especially with the availability of Spark Streaming, for building streaming workflows.
But, is it the best choice? Without testing, there was no way for us to know if this option would be a wise choice in our use case. We also wanted to see how micro-batch processing built on a system as reliable as Spark would stack up against other widely-used native streaming frameworks.
Figure 5: Example of streaming data in micro-batches with Spark with Spark Streaming. (Source: Spark.Apache.org)
Apache Samza offers scalable, high-performance storage that supports stateful stream-processing applications. One of its key features is stateful stream processing. The ability to maintain state allows Samza to support very sophisticated stream processing jobs, such as joining input streams, grouping messages, and aggregating groups of messages.
In Samza, each task is associated with its own instance of a local database, which only stores data corresponding to the partitions processed by that task. When scaling a workflow, Samza provides more computational resources by transparently migrating the tasks from one machine to another, giving each task its own state. This allows tasks to be relocated without impacting the overall performance.
Figure 7: Illustration of local state management in Samza facilitating streaming processes. (Source: Samza.Apache.org)
Our evaluation included both qualitative and quantitative criteria. While quantitative benchmarking is critical in evaluating a streaming framework, the additional, more qualitative measures included in the list below were also important to consider in choosing a streaming framework to support the real-time needs of Adobe Experience Platform customers using Adobe Experience Cloud:
While these criteria will not be relevant to all use cases, the basic process we describe here for evaluating qualitative criteria can be applied to any situation in which a developer needs to identify the best solution to meet his/or needs.
The following are two tables for criteria we used for our qualitative evaluation with the factors we considered for them and some of our results.
Table 1: Qualitative evaluation criteria and the factors evaluated specific to Adobe Experience Platform’s use case.
Table 2: Results for a few of the factors we evaluated for our qualitative criteria that includes Spark batch processing changes
For each framework, we evaluated the relative difficulty in set-up and operational complexity as part of our quantitative benchmarking.
The figure below provides a generalized diagram of our benchmarking set-up. Every streaming framework has typically two components: the source, which consumes the data from the message queue and sends it to the processors, and the processors that do the computational work. The interrelationship between these components is important for back pressure awareness and for the source to be able to throttle when necessary to ensure even consumption in line with processing.
Figure 8: Benchmarking setup.
For our performance testing, we set up our event stream processing to run multiple times with one million event runs for each test. We then ran a long-running reliability test for three days. We repeated this for each of the four frameworks we were evaluating.
Figure 9: Process latency results for our Storm test.
Figure 10: Process latency results for our Flink test.
Figure 11: Process latency results for our Spark test.
Figure 12: Process latency results for our Samza test.
Bringing all of our qualitative and quantitative results into a single table, we were able to easily identify the best streaming framework for the large-scale event processing necessary to support the growing real-time needs of our Adobe Experience Platform customers. Apache Flink was our platform of choice for the following reasons:
Table 3: Results combined to support decision-making.
Adobe Experience Platform processes billions of events each day within Adobe Experience Cloud at more than 200,000 events per second. With so much data coming into the system, we knew we needed to find a better way to process it to support the growing real-time needs of Adobe Experience Platform customers. In order to build a streaming system capable of processing huge amounts of data in real-time, we needed to thoroughly evaluate our options for streaming frameworks.
We believe the evaluation methods we've described here offer a useful model for developer teams in other organizations tasked with choosing a framework that will help them meet the challenges inherent in processing streaming events at high scale.
Follow the Adobe Experience Platform Community Blog for more developer stories and resources, and check out Adobe Developers on Twitter for the latest news and developer products. Sign up here for future Adobe Experience Platform Meetups.
Originally published: Sep 5, 2019
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.