Module: workloads/large_initialsync

Test initial sync performance.

This test begins with a single node replica set (with a pre-seeded ebs volume for data path) and an empty data bearing node which has not yet been added to the replica set.

Once the empty node is added to the replicaset, the test tracks how long it takes for a full initial sync to complete.

Periodically, the initial sync progress is printed.

A simultaneous mongoreplay payload is applied to the replicaset in order to simulate a real world environment.

Test

On start, the primary has been populated with an EBS volume (which has also been pre-warmed). An empty secondary is added to the replica set and the test is started.

The state of the secondary is tracked through the replSetGetStatus command.

The initial sync of the secondary will have completed (either successfully or not) when the stateStr of the syncing member is no longer STARTUP2.

Results are reported as duration of the initial sync (in milliseconds). It is reported as a negative value (latency). As a result, higher values are better.

Setup

The starting point for this test is a single node replica set with a pre-seeded dbpath.

Before the test is started the ebs volume is warmed. See the 'WITH_EBS' section of system-setup.sh

Once the node has been populated and warmed, an additional (empty) data bearing node is added to the replica set. This additional node must be configured the same replSetName, if it is to be successfully added with rs.add(), as it was configured with .

See the Add Members to a Replica Set tutorial for more details.

mongoreplay is used to apply a load to the primary while the initial sync is ongoing.

Notes

  • initialsync-logkeeper task
    • Dataset contains one database named "buildlogs" and three collections.
    • "tests" collection has 355 million documents. Avgerage document size is 613 KB. Storage size 25 GB. 2 Indexes - "_id", "build_id_1_started_1".
    • "builds" collection has 4 million documents. Avgerage document size is 216 KB. Storage size 182 MB. 2 Indexes - "_id", "buildnum_1_builder_1".
    • "logs" collection has 71 million documents. Avgerage document size is 57 KB. Storage size 607 GB. 3 Indexes - "_id", "build_id_1_test_id_1_seq_1", "build_id_1_started_1".
  • initialsync-logkeeper-short task
    • Uses pre-compressed data files (of a node that got restored from snapshot "snap-09d40a2412085bc5a") to load primary data.
    • The snapshot was taken out of this dataset.
    • Dataset contains one database named "buildlogs" and one collection named "logs".
    • Collection has 1 million documents. Document size can range from few hundred bytes to few hundred kilobytes.
    • Total collection storage size is ~10GB. And, the collection has one index built out of "_id" field.
  • population of the replica set is performed with a pre-seeded EBS volume test_contrl yml file. See the on_workload_client pre_task sections of this file.
  • mongoreplay is used to apply a pre-recorded workload on the primary while the initial sync is ongoing. Mongoreplay process is currently commented out and unlikely to ever return.

Owning-team

mongodb/replication

Source:

Members

(inner) db_path

This value represents the directory where the mongod instance stores its data. The default is data/dbs.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) empty_node_addr

The IP address of an empty data bearing node that needs to sync data from the primary. The default is 10.2.0.200.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) port

The port on which mongod process runs. The default is 27017.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) primary_addr

The IP address of the primary node. The default is 10.2.0.190.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) start_mongod

This tells how the mongod process has to be restarted in a workload. The default is mongod --config /tmp/mongo_port_27017.conf.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

(inner) sync_type

This value represents how an empty node should sync data from the primary. The default is initialSync. Valid sync_type values are 'initialSync', 'rsync'.

The actual values in use are injected by run_workloads.py, which gets it from config file, see this hello world example.

Source:

Methods

(inner) printRsyncProgress()

Prints rsync progress.

Source: