Batch aggregation overview

Abstract

Batch aggregation delivers performance improvements to the xDB aggregation framework.

In the Sitecore Experience Database (xDB), the aggregation process groups and reduces live or historical data from the MongoDB collection database so that it can be used by the SQL Server reporting database and Sitecore reporting applications. In a scalable xDB architecture, aggregation is usually performed on one or more dedicated aggregation servers.

When you configure a processing server for aggregation, you can specify the number of agents or threads that you want to run concurrently. Batch aggregation enables you to group interactions together into batches to improve the performance and throughput of your aggregation processing framework.

Batch processing:

Makes more optimal use of SQL Server resources.
Processes more interactions in fewer SQL Server transactions.
Can improve the performance of your existing aggregation framework.
Can reduce network traffic.
Means that fewer database input/output operations are required to process interactions.

In xDB, batch aggregation comes as part of the standard Sitecore installation. The default number of interactions that you can process in a single batch has been set so that, for each transaction, the cost and execution time per row is low. However, solutions can vary, and you may need configure the batch aggregation settings to suit your own requirements.

You can change the default number of interactions that are processed in each batch in the MaximumBatchSize setting, and you can apply this setting separately for live or history collections.

Note

Do not make changes directly to the configuration files, but instead create your own custom configuration patch file that performs the required changes during run time.

Batch aggregation components

When you use batch aggregation, there are several components containing settings that you can change to improve the performance of your solution.

The batch aggregation agent

The batch aggregation agent is a background service that you can schedule to run at regular intervals to process live interactions. Each time it runs, it gathers a batch of interactions from the collection database and runs them through the batch aggregator. When the aggregator has finished, it marks each interaction that it has processed as complete and reschedules any interactions that have failed.

You can configure the batch aggregation agent using the Sitecore.Analytics.Processing.Aggregation.Services.config file.

The following example shows the default batch aggregation configuration:

  <sitecore>
    <aggregation>
      <aggregator type="Sitecore.Analytics.Aggregation.InteractionBatchAggregationAgent,   
         Sitecore.Analytics.Aggregation">
        <Context ref="aggregation/aggregationContexts/interaction/live" />
        <DateTimeStrategy ref="aggregation/dateTimePrecisionStrategy" />
        <Aggregator type="Sitecore.Analytics.Aggregation.InteractionBatchAggregator,
         Sitecore.Analytics.Aggregation" singleInstance="true">
          <MultiplexingTimeout>0.00:00:01</MultiplexingTimeout>
        </Aggregator>
        <MaximumBatchSize>64</MaximumBatchSize>
      </aggregator>
    <aggregation>
  <sitecore>

You can change the following settings in the configuration file for the batch aggregation agent:

Configuration node	Description
`Context`	Specify the path or location of the data that you want to aggregate and the location where you want the results saved.
`Aggregator`	Specify the batch aggregator that you want to use to process interactions.
`MaximumBatchSize`	Specify the maximum number of interactions to include in a single batch.

The batch aggregator

The batch aggregator takes one or more interactions at a time, runs the aggregation pipeline for each interaction in the batch, and combines the aggregated data into a larger data set. The combined data set is then saved back to the reporting database.

The multiplexer

The multiplexer reduces the number of requests made to the reporting database by combining individual aggregation threads into a single batch or data set, which can then be saved more efficiently to the reporting database. This can significantly reduce the amount of traffic sent across the network.

The MultiplexingTimeout configuration setting enables you to specify the maximum time that you want the multiplexer to wait for other batch aggregators before saving the data set.

The Microsoft SQL Server reporting storage provider now supports storing batches of aggregation data sets, and its robustness has been improved. It has been optimized to save large data sets in a single transaction while at the same time minimizing the use of system resources.

The history worker

The history worker agent enables you to rebuild the reporting database, and it also comes with support for batch aggregation.

You can configure the history worker agent using the Sitecore.Analytics.Processing.Aggregation.Services.config file.

It contains the same parameters as the batch aggregation live agent: MultiplexingTimeout and MaximumBatchSize.

The following example shows the default configuration for the history worker:

      <!-- Configure the historyWorker agent: -->
      <historyWorker 
       type="Sitecore.Analytics.Aggregation.Data.Processing.InteractionBatchHistoryWorker,  
        Sitecore.Analytics.Aggregation">
        <HistoryTaskManager ref="aggregation/historyTaskManager" />
        <DateTimePrecisionStrategy ref="aggregation/dateTimePrecisionStrategy"/>
        <CollectionData ref="aggregation/collectionData" />
        <AggregationContext ref="aggregation/aggregationContexts/interaction/history" />
        <Aggregator type="Sitecore.Analytics.Aggregation.InteractionBatchAggregator, 
         Sitecore.Analytics.Aggregation" singleInstance="true">
          <MultiplexingTimeout>0.00:00:01</MultiplexingTimeout>
        </Aggregator>
        <MaximumBatchSize>64</MaximumBatchSize>
      </historyWorker>