Module: workloads/word_count

A simple test of basic Map/Reduce fuctionality.

Test

Execute a Map/Reduce task to sum the incidence of all words in a 10 word sentence from a set of documents.

Results are reported as docs processed / sec.

Setup

All variants (standalone, replica, sharded)

Notes

  • Each document contains a 10 word 'sentence'.
  • A word is generated from a random number between 0 and 999, but it should group the incidence of any given number towrds the center of the range. If the WORDS array is defined, then the word at this index is used, otherwise the index value is used.
  • Up to 1000 documents will be outputted (as the words are between 0 and 999).
  • We make sure to emphasize js perf by running the job in jsmode (except for sharded see SERVER-5448).
  • As a result of SERVER-5448, jsMode only works for non-sharded workloads.

Owning-team

mongodb/product-query

Source:

Members

(inner) batches

The number of batches. The default is 150.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) batchSize

The unorderedBulkOp batch size to use when generating the documents. The default is 1000.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) numJobs

The number of insertion jobs to schedule. The default is 100.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) poolSize

The thread pool size to use generating the documents. The default is 32.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) wordsPerSentence

a constant (10) containing the number of words per sentence.

Source:

Methods

(inner) createJobs(staging_data, numJobs, db_name, batches, batchSize, wordsPerSentence, words) → {array}

Create an array of jobs to insert the documents for the word count test.

Parameters:
Name Type Description
staging_data function

the staging data function

numJobs integer

the number of jobs to create

db_name string

the mr database name

batches string

the number of batches (batches) to invoke.

batchSize string

the size of a batch.

wordsPerSentence string

the number of words per sentence.

words array

an array of word to select from. If empty then the inde will be used.

Source:
Returns:

returns an array of jobs that can be pased to runJobsInPool. A single job is an array containing the function to call as the first element of the array and the remaining elements of the array are the parameters to the function.

Type
array

(inner) staging_data(db_name, batches, batchSize, wordsPerSentence) → {object}

Create a range of documents for the word count test.

Parameters:
Name Type Description
db_name string

The database name.

batches integer

the number of batches to insert.

batchSize integer

the number of documents per batch. Note: if this value is greater than 1000, then the bulk operator will transparently create batches of 1000.

wordsPerSentence integer

the number of words in a sentence.

Source:
Returns:

a json document with the following fields: ok: if 1 then all the insert batches were successful and nInserted is the expected value nInserted: the number of documents inserted results: if ok is 1 then this field is an empty array, otherwise it contains all the batch results

Type
object