Performance profiling
ADIOS2 provides built-in performance profiling capabilities to help users understand the runtime behavior of their I/O operations and identify potential bottlenecks. This documentation outlines how to interpret the performance profiling features in ADIOS2 and how to enable profiling with external libraries.
JSON Performance File
ADIOS2, for file-based transfers, automatically enables performance profiling by default. During the execution of an ADIOS2 application, a bp folder is created. This folder contains the data and metadata generated by the application. In addition to these files, a profiling.json file is generated within the same directory. This file holds detailed performance information about various internal operations of the ADIOS2 I/O library.
The structure of the profiling.json file is a JSON array, where each element typically corresponds to the profiling information from a single MPI rank. The following is an example of the content of a profiling.json file when an ADIOS2 application is run with two MPI ranks:
{ "rank":0, "start":"Wed_Dec_03_22:24:44_2025","ES_DSB_Broadcast_mus": 0, "ES_DSB_Broadcast":{"mus":0, "nCalls":1},"ES_DSB_mus": 5, "ES_DSB":{"mus":5, "nCalls":1},"ES_DSB_AllGather_mus": 2, "ES_DSB_AllGather":{"mus":2, "nCalls":1},"ES_MDAgg_GatherWriteMeta_mus": 255, "ES_MDAgg_GatherWriteMeta":{"mus":255, "nCalls":1},"ES_MDAgg_AggInfo_MetaInfoBcast_mus": 2, "ES_MDAgg_AggInfo_MetaInfoBcast":{"mus":2, "nCalls":1},"ES_MDAgg_AggInfo_FixedMetaInfoGather_mus": 13428, "ES_MDAgg_AggInfo_FixedMetaInfoGather":{"mus":13428, "nCalls":1},"InitAgg-dsb_GPI_partition_mus": 14, "InitAgg-dsb_GPI_partition":{"mus":14, "nCalls":1},"InitAgg-dsb_GPI_AllGather_mus": 2, "InitAgg-dsb_GPI_AllGather":{"mus":2, "nCalls":1},"InitAgg-dsb_mus": 46, "InitAgg-dsb":{"mus":46, "nCalls":1},"InitAgg-dsb_GPI_mus": 23, "InitAgg-dsb_GPI":{"mus":23, "nCalls":1},"ES_MDAgg_GatherWriteMeta_MDBlocks_mus": 4, "ES_MDAgg_GatherWriteMeta_MDBlocks":{"mus":4, "nCalls":1},"DC_mus": 493, "DC":{"mus":493, "nCalls":1},"WriteMD_Blocks_mus": 10, "WriteMD_Blocks":{"mus":10, "nCalls":1},"WriteMD_mus": 64, "WriteMD":{"mus":64, "nCalls":1},"WriteData_mus": 150467, "WriteData":{"mus":150467, "nCalls":2},"ES_WriteData_mus": 150208, "ES_WriteData":{"mus":150208, "nCalls":1},"BS_mus": 1, "BS":{"mus":1, "nCalls":1},"ES_MDAgg_AggInfo_mus": 13693, "ES_MDAgg_AggInfo":{"mus":13693, "nCalls":1},"ES_MDAgg_mus": 13967, "ES_MDAgg":{"mus":13967, "nCalls":1},"PDW_mus": 264, "PDW":{"mus":264, "nCalls":1},"ES_MDAgg_AggInfo_SelectMetaInfoGather_mus": 3, "ES_MDAgg_AggInfo_SelectMetaInfoGather":{"mus":3, "nCalls":1},"ES_CloseTS_mus": 122, "ES_CloseTS":{"mus":122, "nCalls":1},"ES_mus": 164310, "ES":{"mus":164310, "nCalls":1},"WriteMmD_mus": 88, "WriteMmD":{"mus":88, "nCalls":1}, "databytes":0, "metadatabytes":0, "metametadatabytes":0, "transport_0":{"type":"File_POSIX", "wbytes":335544320, "close":{"mus":186, "nCalls":1}, "write":{"mus":150145, "nCalls":10}, "open":{"mus":90, "nCalls":1}}, "transport_1":{"type":"File_POSIX", "wbytes":10480, "close":{"mus":276, "nCalls":1}, "write":{"mus":54, "nCalls":5}, "open":{"mus":50, "nCalls":1}} },
{ "rank":1, "start":"Wed_Dec_03_22:24:44_2025","ES_DSB_Broadcast_mus": 0, "ES_DSB_Broadcast":{"mus":0, "nCalls":1},"ES_DSB_mus": 134, "ES_DSB":{"mus":134, "nCalls":1},"ES_DSB_AllGather_mus": 131, "ES_DSB_AllGather":{"mus":131, "nCalls":1},"ES_MDAgg_GatherWriteMeta_mus": 324, "ES_MDAgg_GatherWriteMeta":{"mus":324, "nCalls":1},"ES_MDAgg_AggInfo_MetaInfoBcast_mus": 17, "ES_MDAgg_AggInfo_MetaInfoBcast":{"mus":17, "nCalls":1},"ES_MDAgg_AggInfo_FixedMetaInfoGather_mus": 3, "ES_MDAgg_AggInfo_FixedMetaInfoGather":{"mus":3, "nCalls":1},"InitAgg-dsb_GPI_partition_mus": 14, "InitAgg-dsb_GPI_partition":{"mus":14, "nCalls":1},"InitAgg-dsb_GPI_AllGather_mus": 85, "InitAgg-dsb_GPI_AllGather":{"mus":85, "nCalls":1},"InitAgg-dsb_mus": 129, "InitAgg-dsb":{"mus":129, "nCalls":1},"InitAgg-dsb_GPI_mus": 106, "InitAgg-dsb_GPI":{"mus":106, "nCalls":1},"DC_mus": 2297, "DC":{"mus":2297, "nCalls":1},"WriteData_mus": 163332, "WriteData":{"mus":163332, "nCalls":2},"ES_WriteData_mus": 162999, "ES_WriteData":{"mus":162999, "nCalls":1},"BS_mus": 2, "BS":{"mus":2, "nCalls":1},"ES_MDAgg_AggInfo_mus": 74, "ES_MDAgg_AggInfo":{"mus":74, "nCalls":1},"ES_MDAgg_mus": 411, "ES_MDAgg":{"mus":411, "nCalls":1},"PDW_mus": 337, "PDW":{"mus":337, "nCalls":1},"ES_MDAgg_AggInfo_SelectMetaInfoGather_mus": 1, "ES_MDAgg_AggInfo_SelectMetaInfoGather":{"mus":1, "nCalls":1},"ES_CloseTS_mus": 105, "ES_CloseTS":{"mus":105, "nCalls":1},"ES_mus": 163667, "ES":{"mus":163667, "nCalls":1}, "databytes":0, "metadatabytes":0, "metametadatabytes":0, "transport_0":{"type":"File_POSIX", "wbytes":335544320, "close":{"mus":2295, "nCalls":1}, "write":{"mus":162966, "nCalls":10}, "open":{"mus":114, "nCalls":1}} },
Each JSON object within the array provides profiling information for a specific rank and includes details such as:
rank: The MPI rank of the process.
start: The timestamp when profiling began for this rank.
<Operation>_mus: The total time spent in microseconds for a specific ADIOS2 operation (e.g.,
ES_musfor Engine Step).<Operation>: A dictionary containing the total time (
mus) and the number of calls (nCalls) for that operation.databytes: The total number of data bytes processed.
metadatabytes: The total number of metadata bytes processed.
metametadatabytes: The total number of meta-metadata bytes processed.
transport_<id>: Details about specific communication transports used, including the type and the number of bytes and calls for operations like open, close, read, and write.
Note: The specific ADIOS2 library code regions and operations tracked within the profiling.json file can vary between different versions of ADIOS2. The keys and the level of detail provided in the JSON output might be subject to change as the library evolves.
To aid in the visual analysis of I/O performance, ADIOS provides a utility script designed for plotting the data contained within these JSON profile files. This script, located in the source/utils/profiler_simplified/ directory of the source tree, offer simple command-line interfaces to generate visualizations for common output metrics for each rank for a given step. Example json files are in data/. The command line options are here:
**./writeSummary.sh <f1.json> [<f2.json> .. ] ** – produces a high-level summary of write metrics for each JSON file provided.
**python3 plot_json.py <f1.json> [<f2.json> .. ] ** - generates quick, informative plots for top level time-measurement tags in the provided JSON files.
The common metrics covered by the plotting scripts include PP (PerformPut), PDW (PerformDataWrite), and the EndStep (ES) components: ES_WriteData, ES_MDAgg, ES_CloseTS, etc. Volume metrics, representing the total bytes written to storage by the primary transport layer, are reported under transport\_0.wbytes.
The example plots are in ``python_plots/`. All plots reflect per-rank measurements with the BP5 engine. Other engines may omit some or all of these metrics.
** ES.png ** shows ** EndStep ** times
** PDW+PP_ong ** shows the time spent on ** PerformDataWrite ** or ** PerformPut ** calls
** dataSize.png ** volume of data written using transport_0 (data)
** meatadataSize.png ** volume of data written using transport_1 (md.0)
** ES+PDW+PP+BS+DC.png ** breaks down of time spent in ADIOS calls: ** EndStep ** ** PerformDataWrite ** ** PeformPut ** ** BeginStep ** ** DoClose ** (Total ADIOS I/O impact)
** ES_DSB+ES_WriteData+ES_MDAgg+ES_CloseTS.png ** visualizes how the total EndStep time is broken down, showing the relative contribution of each component/sub-stage.
#Examples of how to run the scripts and the resulting output files are available in the ADIOS source directory under source/utils/profiler/tests. A typical execution example plotting the first step for a profile file generated by a run of 512 ranks is shown below, demonstrating how the scripts process the attributes and generate individual rank plots (via plotRanks.py) and aggregated stack plots (via plotStack.py):
#.. code-block:: sh # # $ source 1.sh ../scripts zero ../sample_data/t0/t0.json # Attributes: PP PDW ES ES_AWD ES_aggregate_info MetaInfoBcast FixedMetaInfoGather transport_0.wbytes # Processing ../sample_data/t0/t0.json, PP key= t0 # … (processing details truncated) … # outs/t0_secs_PP -> outs/zero/t0_secs_PP # Data extracted, now plotting.. # … (plotting details truncated) … # ==> plot all the times spent on rank 0: python3 ../scripts/plotStack.py t0 –set dataDir=outs/zero whichRank=0 plotPrefix=plots/single/ews/zero/t0 # Script name: ../scripts/plotStack.py # async counter = 0, false false false # Finished. plots are in: plots/single/ews/zero
External Profiling Libraries
ADIOS2 utilizes PERFSTUBS_SCOPED_TIMER hooks at various points within its codebase. These hooks provide a standardized mechanism for external performance analysis tools to instrument and measure the execution time of different ADIOS2 code regions.
One such external library that can leverage these hooks is the Tuning and Analysis Utilities (TAU). TAU is a comprehensive parallel performance analysis toolkit capable of profiling and tracing parallel programs written in various languages, including C, C++, Fortran, and Python. TAU can automatically detect and instrument the PERFSTUBS_SCOPED_TIMER regions within ADIOS2 for all backends.
Example TAU Output:
When TAU is used to profile an ADIOS2 application, the output might look similar to the following:
%Time Exclusive Inclusive Ncalls #threads visits bytes Function Name
----- ----------- ----------- ----------- --------- ---------- ---------- --------------
100.0 0.174 1:04.251 1 1 1 64251713 .TAU application
100.0 1:00.333 1:04.251 1 12490 0 64251539 int taupreload_main(int, char **, char **)
2.5 1,599 1,600 101 2230 <...> 15850 BP5Writer::EndStep
1.6 1,004 1,004 12000 0 <...> 84 MPI_Sendrecv()
1.4 1 902 303 202 <...> 2977 void adios2::format::BP5Serializer::Marshal(void*, const char*, adios2::DataType, std::size_t, std::size_t, const size_t*, const size_t*, const size_t*, const void*, bool, adios2::format::BufferV::BufferPos*)
1.4 901 901 202 0 <...> 4460 void adios2::format::GetMinMax(const void*, std::size_t, adios2::DataType, adios2::MinMaxStruct&, adios2::MemorySpace)
In this example output:
%Time: The percentage of the total execution time spent in the function.
Exclusive: The time spent solely within the function (excluding calls to other functions).
Inclusive: The total time spent within the function, including calls to other functions.
Ncalls: The number of times the function was called.
Function Name: The name of the ADIOS2 function or code region that was instrumented.
TAU files generated from ADIOS2 applications can then be analyzed using a variety of performance analysis tools, such as the ParaProf Profile Browser or Vampir, to visualize and understand the application’s behavior.
More information about TAU can be found at https://github.com/UO-OACISS/tau2.
Note: The specific ADIOS2 code regions surrounded by hooks can vary between different versions of ADIOS2.
Real-time Performance Monioring
The TAU performance system now offers a dedicated plugin for ADIOS2, enabling the storage of performance metrics directly within ADIOS files.
When the TAU ADIOS plugin is active, performance metrics from instrumented code regions are recorded as a series of attributes and variables. These data follow a specific naming convention, providing detailed information about the measured performance events. An example of the output generated by the TAU ADIOS plugin might look like this:
string TAU:0:0:MetaData:CPU Cores attr = "64"
string TAU:0:0:MetaData:CWD attr = "kokkos-simulation"
double BP5Writer::EndStep / Calls
double BP5Writer::EndStep / Exclusive TIME
double BP5Writer::EndStep / Inclusive TIME
double Kokkos::parallel_reduce / Calls
double Kokkos::parallel_reduce / Exclusive TIME
double Kokkos::parallel_reduce / Inclusive TIME
double MPI_Sendrecv() / Calls
double MPI_Sendrecv() / Exclusive TIME
double MPI_Sendrecv() / Inclusive TIME
Here, the variables prefixed with TAU:rank:thread:MetaData: provide contextual information about the profiling run, such as the number of CPU cores or the current working directory.
Subsequent variables capture performance metrics for specific code regions (e.g., BP5Writer::EndStep, Kokkos::parallel_reduce, MPI_Sendrecv()), including the number of calls, exclusive execution time (time spent solely within the function), and inclusive execution time (total time spent within the function including calls to other functions).
Having TAU performance metrics stored as ADIOS files offers a couple of advantages for managing and analyzing performance data:
Campaign Integration: Performance files can be seamlessly into campaigns alongside simulation output data.
Near Real-time Streaming: The performance metrics can be streamed in near real time using ADIOS’s streaming capabilities. This enables live performance monitoring and analysis of long-running simulations, providing immediate insights into the application’s behavior as it executes.
A tutorial on how to use TAU with the ADIOS2 plugin can be found here (page 206): https://users.nccs.gov/~pnorbert/ADIOS_tutorial_SC23.pdf.