CoMD
A Mini-app for Co-Design of Classical Molecular Dynamics.
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Measuring Performance

CoMD implements a simple and extensible system of internal timers to measure the performance profile of the code.

As explained in performanceTimers.c, it is easy to create additional timers and associate them with code regions of specific interest. In addition, the getTime() and getTick() functions can be easily reimplemented to take advantage of platform specific timing resources.

A timing report is printed at the end of each simulation.

Timings for Rank 0
Timer # Calls Avg/Call (s) Total (s) % Loop
___________________________________________________________________
total 1 50.6701 50.6701 100.04
loop 1 50.6505 50.6505 100.00
timestep 1 50.6505 50.6505 100.00
position 10000 0.0000 0.0441 0.09
velocity 20000 0.0000 0.0388 0.08
redistribute 10001 0.0003 3.4842 6.88
atomHalo 10001 0.0002 2.4577 4.85
force 10001 0.0047 47.0856 92.96
eamHalo 10001 0.0001 1.0592 2.09
commHalo 60006 0.0000 1.7550 3.46
commReduce 12 0.0000 0.0003 0.00
Timing Statistics Across 8 Ranks:
Timer Rank: Min(s) Rank: Max(s) Avg(s) Stdev(s)
_____________________________________________________________________________
total 3: 50.6697 0: 50.6701 50.6699 0.0001
loop 0: 50.6505 4: 50.6505 50.6505 0.0000
timestep 0: 50.6505 4: 50.6505 50.6505 0.0000
position 2: 0.0437 0: 0.0441 0.0439 0.0001
velocity 2: 0.0380 4: 0.0392 0.0385 0.0004
redistribute 0: 3.4842 1: 3.7085 3.6015 0.0622
atomHalo 0: 2.4577 7: 2.6441 2.5780 0.0549
force 1: 46.8624 0: 47.0856 46.9689 0.0619
eamHalo 3: 0.2269 6: 1.2936 1.0951 0.3344
commHalo 3: 1.0803 6: 2.1856 1.9363 0.3462
commReduce 6: 0.0002 2: 0.0003 0.0003 0.0000
---------------------------------------------------
Average atom update rate: 9.39 us/atom/task
---------------------------------------------------

This report consists of two blocks. The upper block lists the absolute wall clock time spent in each timer on rank 0 of the job. The lower block reports minimum, maximum, average, and standard deviation of times across all tasks. The ranks where the minimum and maximum values occured are also reported to aid in identifying hotspots or load imbalances.

The last line of the report gives the atom update rate in microseconds/atom/task. Since this quantity is normalized by both the number of atoms and the number of tasks it provides a simple figure of merit to compare performance between runs with different numbers of atoms and different numbers of tasks. Any increase in this number relative to a large number of atoms on a single task represents a loss of parallel efficiency.

Choosing the problem size correctly has important implications for the reported performance. Small problem sizes may run entirely in the cache of some architectures, leading to very good performance results. For general characterization of performance, it is probably best to choose problem sizes which force the code to access main memory, even though there may be strong scaling scenarios where the code is indeed running mainly in cache.