Module: workloads/map_reduce

A simple test of basic Map/Reduce fuctionality

Test

Execute an aggregation pipeline to sum all of the amounts grouped by uid. ~numJobs~ (default 200) output documents are generated.
The number of input documents equals the product of:

  numJobs * batches * batchSize * statusRange

The default case is:

  200 * 40 * 1000 * 5 = 40,000,000

Results are reported as docs processed per second.

Setup

All variants (standalone, replica, sharded)

Notes

  • This test stage will evenly distribute documents over 200 UID, the agg pipeline will calculate sum of amount based on uid.

Owning-team

mongodb/product-query

Source:

Members

(inner) batches

The number of batches. The default is 40.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) batchSize

The unorderedBulkOp batch size to use when generating the documents. The default is 1000.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) db_name

the destination database name.

Source:

(inner) numJobs

The range of uids to generate. The default 200.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) poolSize

The thread pool size to use generating the documents. The default is 32.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) statusRange

The range of status values to use when generating documents. It default to 5. So values 0 through 4 are generated in this case.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

Methods

(inner) createJobs(numJobs, func, db_name, batches, batchSize, statusRange) → {array}

Create an array of jobs to insert the documents for the map reduce test. In this instance, the job paramter are fixed except for the uid. Each job generates a set of documents for a given uid in the desired range (0 to numJobs -1).

Parameters:
Name Type Description
numJobs integer

the range of uids to generate

func function

the staging data function

db_name string

the mr database name

batches string

the number of batches to invoke.

batchSize string

the size of a batch.

statusRange string

the range of status values (0 to statusRange -1).

Source:
Returns:

returns an array of jobs that can be pased to runJobsInPool. A single job is an array containing the function to call as the first element of the array and the remaining elements of the array are the parameters to the function.

Type
array

(inner) staging_data(db_name, batches, batchSize, uid, statusRange) → {object}

Create a range of documents for the map / reduce test.

Parameters:
Name Type Description
db_name string

The database name.

batches integer

the number of batches to insert.

batchSize integer

the number of documents per batch. Note: if this value is greater than 1000, then the bulk operator will transparently create batches of 1000.

uid integer

the value of the uid field to insert.

statusRange integer

the range of status values (zero based).

Source:
Returns:

a json document with the following fields: ok: if 1 then all the insert batches were successful and nInserted is the expected value nInserted: the number of documents inserted results: if ok is 1 then this field is an empty array, otherwise it contains all the batch results

Type
object