Module: workloads/bestbuy_wordcount

Measure performance of aggregation's $merge stage against the BestBuy Developer API data, specifically stressing the exchange logic and comparing the performance against the mapReduce command for older branches less than version 4.4 and earlier, for newer branches >=4.5 just aggregation pipelines will be exercised. Each of the operations will compute the same thing: a histogram of words in the 'name' field of each game or software product in the database, something like: {_id: "word", count: 32}. The results will be spilled to a collection using either $merge or the 'output' option to mapReduce. To stress the exchange optimization in a sharded deployment, that collection is expected to be set up as a sharded collection, (unless shard_collections is false) though the test will still work in unsharded deployments.

Pre-requisite

The dataset must be installed on the target cluster before running the test. The data can be downloaded from here and installed using mongorestore (mongorestore --gzip --archive=bestbuyproducts.bson.gz)

In a sharded cluster (if shard_collections is not false), the target collection ('target_range_id') is expected to be sharded by the key {_id: 1} and have chunks distributed amongst the shards.

Setup

None

Test

The tests use a simple for loop of 2 minutes to repeatedly run a query which computes the word count in the names of products. The computation is performed in three ways, once with $merge with the exchange optimization enabled, and once with $merge with the exchange optimization disabled. Each run will report the throughput in documents processed per second.

Owning-team

mongodb/product-query

Source: