Module: workloads/initialsync

Test and measure initial sync performance.

Test

This workload performs the following actions:

  • creates num_dbs databases. (We use 1 or 32.).
  • creates num_collections / database. (We use 1 or 32.).
  • inserts a fixed number of documents. (5,000,000 docs in total).
  • creates a single compound index for each of the collections.
  • apply a write work load while syncing if write_load is true.

On completion of the steps, above, a third member is added to the replica set.

The initialsync metric is the number of (docs synced) / second for the duration of the intial sync. The sync is defined to have completed when the third node reports that it has transitioned to 'SECONDARY' state.

The other metrics (e.g cloneDBs) represent the phases of the initial sync. These actual phases vary, depending on the MongoDB version, and are generally only useful for debugging purposes. They are reported as negative values (latency) and higher is considered better.

Results are reported as docs synced per second.

Setup

The starting point for this test is a 2 node replica set:

  • one primary
  • one secondary.

Once these 2 nodes are populated, an additional (empty) third data bearing node is added to the replica set. This additional node is added with rs.add(), as it was configured with the same replSetName.

See the Add Members to a Replica Set tutorial for more details.

Notes

  • Insert 5M (num_docs) documents across num_dbs databases and num_collections collections.
  • Each document contains the following fields:
    • _id: an ObjectId
    • name: a string "Wile E. Coyote"
    • age: an int between 0 and 120
    • i: an int, from 0 to num_docs / ( num_dbs * num_collections)
    • address.street: the string "443 W 43rd St",
    • address.zip_code: a rand Int between 0 and 100000
    • address.city: the string "New York"
    • random: a rand Int between 0 and 10000000
    • phone_no: a string comprised of the concatenation of a rand Int between 0 and 1000 with "-" and a rand Int between 0 and 10000
    • long_string: a string created by concatenating a rand Int between 0 and 100000000 and the number of 'a' characters equal to the min value of string_field_size and 1000.
    • other_long_string: an unindexed string created by concatenating a rand Int between 0 and the number of 'a' characters used as overflow for when string_field_size > 1024.
    • str: a string of 1K 'a's
    • numericField: a random integer numeric field in the range 0 to 100M -1.

Owning-team

mongodb/replication

Source:

Members

(inner) build_user_indexes

This value tells whether or not to build indexes other than the _id index. The default is true.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) db_path

This value represents the directory where the mongod instance stores its data. The default is data/dbs.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) empty_node_addr

The IP address of an empty data bearing node that needs to sync data from the primary. The default is 10.2.0.200.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) num_collections

The number of collections to create. The default is 1.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) num_dbs

The number of databases to create. The default is 1.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) num_docs

This value represents the number of documents inserted. The default is 5 million.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) port

The port on which mongod process runs. The default is 27017.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) primary_addr

The IP address of the primary node. The default is 10.2.0.190.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) start_mongod

This tells how the mongod process has to be restarted in a workload. The default is mongod --config /tmp/mongo_port_27017.conf.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) string_field_size

This value represents the size of long_string to be created in each doc. The default is 1KB. When string_field_size > 1000, overflow is added onto other_long_string instead of long_string.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) sync_type

This value represents how an empty node should sync data from the primary. The default is initialSync. Valid sync_type values are 'initialSync', 'rsync'.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) write_load

This value controls whether a write load is applied while running the workload. If set to true then 100,000 docs are inserted while the initial sync is in progress. The default is false.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source: