Supported Engines

This section provides a description of the Available Engines in ADIOS2 and their specific parameters to allow extra-control from the user. Parameters are passed in key-value pairs for:

  1. Engine specific parameters

  2. Engine supported transports and parameters

Parameters are passed at:

  1. Compile time IO::SetParameters (adios2_set_parameter in C, Fortran)

  2. Compile time IO::AddTransport (adios2_set_transport_parameter in C, Fortran)

  3. Runtime Configuration Files in the ADIOS component.

BP4

The BP4 Engine writes and reads files in ADIOS2 native binary-pack (bp version 4) format. This is a new format for ADIOS 2.x which improves on the metadata operations of the older BP3 format. Compared to the older format, BP4 provides three main advantages:

  • Fast and safe appending of multiple output steps into the same file. Better performance than writing new files each step. Existing steps cannot be corrupted by appending new steps.

  • Streaming through files (i.e. online processing). Consumer apps can read existing steps while the Producer is still writing new steps. Reader’s loop can block (with timeout) and wait for new steps to arrive. Same reader code can read the entire data in post or in situ. No restrictions on the Producer.

  • Burst buffer support for writing data. It can write the output to a local file system on each compute node and drain the data to the parallel file system in a separate asynchronous thread. Streaming read from the target file system are still supported when data goes through the burst buffer. Appending to an existing file on the target file system is NOT supported currently.

BP4 files have the following structure given a “name” string passed as the first argument of IO::Open:

io.SetEngine("BP4");
adios2::Engine bpFile = io.Open("name", adios2::Mode::Write);

will generate:

% BP4 datasets are always a directory
name.bp/

% data and metadata files
name.bp/
        data.0
        data.1
        ...
        data.M
        md.0
        md.idx

Note

BP4 file names are compatible with the Unix (/) and Windows (\\) file system naming convention for directories and files.

This engine allows the user to fine tune the buffering operations through the following optional parameters:

  1. Profile: turns ON/OFF profiling information right after a run

  2. ProfileUnits: set profile units according to the required measurement scale for intensive operations

  3. Threads: number of threads provided from the application for buffering, use this for very large variables in data size

  4. InitialBufferSize: initial memory provided for buffering (minimum is 16Kb)

  5. BufferGrowthFactor: exponential growth factor for initial buffer > 1, default = 1.05.

  6. MaxBufferSize: maximum allowable buffer size (must be larger than 16Kb). If too large adios2 will throw an exception.

  7. FlushStepsCount: users can select how often to produce the more expensive collective metadata file in terms of steps: default is 1. Increase to reduce adios2 collective operations footprint, with the trade-off of reducing checkpoint frequency. Buffer size will increase until first steps count if MaxBufferSize is not set.

  8. SubStreams: (MPI-only) users can select how many sub-streams (M sub-files) are produced during a run, ranges between 1 and the number of mpi processes from MPI_Size (N), adios2 will internally aggregate data buffers (N-to-M) to output the required number of sub-files. If Substream is out of bounds it will pick either 1 (SubStreams < 1 -> N-to-1) or N ((SubStreams > N -> N-to-N) and ADIOS2 will issue a WARNING message. Use for performance tuning.

  9. OpenTimeoutSecs: (Streaming mode) Reader may want to wait for the creation of the file in io.Open(). By default the Open() function returns with an error if file is not found.

  10. BeginStepPollingFrequencySecs: (Streaming mode) Reader can set how frequently to check the file (and file system) for new steps. Default is 1 seconds which may be stressful for the file system and unnecessary for the application.

  11. StatsLevel: Turn on/off calculating statistics for every variable (Min/Max). Default is On. It has some cost to generate this metadata so it can be turned off if there is no need for this information.

  12. StatsBlockSize: Calculate Min/Max for a given size of each process output. Default is one Min/Max per writer. More fine-grained min/max can be useful for querying the data.

  13. NodeLocal or Node-Local: For distributed file system. Every writer process must make sure the .bp/ directory is created on the local file system. Required when writing to local disk/SSD/NVMe in a cluster. Note: the BurstBuffer* parameters are newer and should be used for using the local storage as temporary instead of this parameter.

  14. BurstBufferPath: Redirect output file to another location and drain it to the original target location in an asynchronous thread. It requires to be able to launch one thread per aggregator (see SubStreams) on the system. This feature can be used on machines that have local NVMe/SSDs on each node to accelerate the output writing speed. On Summit at OLCF, use “/mnt/bb/<username>” for the path where <username> is your user account name. Temporary files on the accelerated storage will be automatically deleted after the application closes the output and ADIOS drains all data to the file system, unless draining is turned off (see the next parameter). Note: at this time, this feature cannot be used to append data to an existing dataset on the target system.

  15. BurstBufferDrain: To write only to the accelerated storage but to not drain it to the target file system, set this flag to false. Data will NOT be deleted from the accelerated storage on close. By default, setting the BurstBufferPath will turn on draining.

  16. BurstBufferVerbose: Verbose level 1 will cause each draining thread to print a one line report at the end (to standard output) about where it has spent its time and the number of bytes moved. Verbose level 2 will cause each thread to print a line for each draining operation (file creation, copy block, write block from memory, etc).

Key

Value Format

Default and Examples

Profile

string On/Off

On, Off

ProfileUnits

string

Microseconds, Milliseconds, Seconds, Minutes, Hours

Threads

integer > 1

1, 2, 3, 4, 16, 32, 64

InitialBufferSize

float+units >= 16Kb

16Kb, 10Mb, 0.5Gb

MaxBufferSize

float+units >= 16Kb

at EndStep, 10Mb, 0.5Gb

BufferGrowthFactor

float > 1

1.05, 1.01, 1.5, 2

FlushStepsCount

integer > 1

1, 5, 1000, 50000

SubStreams

integer >= 1

MPI_Size (N-to-N), MPI_Size/2, … , 2, (N-to-1) 1

OpenTimeoutSecs

float

0, 10.0, 5

BeginStepPollingFrequencySecs

float

1, 10.0

StatsLevel

integer, 0 or 1

1, 0

StatsBlockSize

integer > 0

a very big number, 1073741824 for blocks with 1M elements

NodeLocal

string On/Off

Off, On

Node-Local

string On/Off

Off, On

BurstBufferPath

string

“”, /mnt/bb/norbert, /ssd

BurstBufferDrain

string On/Off

On, Off

BurstBufferVerbose

integer, 0-2

0, 1, 2

Only file transport types are supported. Optional parameters for IO::AddTransport or in runtime config file transport field:

Transport type: File

Key

Value Format

Default and Examples

Library

string

POSIX (UNIX), FStream (Windows), stdio, IME

The IME transport directly reads and writes files stored on DDN’s IME burst buffer using the IME native API. To use the IME transport, IME must be avaiable on the target system and ADIOS2 needs to be configured with ADIOS2_USE_IME. By default, data written to the IME is automatically flushed to the parallel filesystem at every EndStep() call. You can disable this automaic flush by setting the transport parameter SyncToPFS to OFF.

BP3

The BP3 Engine writes and reads files in ADIOS2 native binary-pack (bp) format. BP files are backwards compatible with ADIOS1.x and have the following structure given a “name” string passed as the first argument of IO::Open:

adios2::Engine bpFile = io.Open("name", adios2::Mode::Write);

will generate:

% collective metadata file
name.bp

% data directory and files
name.bp.dir/
            name.bp.0
            name.bp.1
            ...
            name.bp.M

Note

BP3 file names are compatible with the Unix (/) and Windows (\\) file system naming convention for directories and files.

Caution

The default BP3 engine will check if the .bp is the extension of the first argument of IO::Open and will add .bp and .bp.dir if not.

This engine allows the user to fine tune the buffering operations through the following optional parameters:

  1. Profile: turns ON/OFF profiling information right after a run

  2. ProfileUnits: set profile units according to the required measurement scale for intensive operations

  3. CollectiveMetadata: turns ON/OFF forming collective metadata during run (used by large scale HPC applications)

  4. Threads: number of threads provided from the application for buffering, use this for very large variables in data size

  5. InitialBufferSize: initial memory provided for buffering (minimum is 16Kb)

  6. BufferGrowthFactor: exponential growth factor for initial buffer > 1, default = 1.05.

  7. MaxBufferSize: maximum allowable buffer size (must be larger than 16Kb). If too large adios2 will throw an exception.

  8. FlushStepsCount: users can select how often to produce the more expensive collective metadata file in terms of steps: default is 1. Increase to reduce adios2 collective operations footprint, with the trade-off of reducing checkpoint frequency. Buffer size will increase until first steps count if MaxBufferSize is not set.

  9. SubStreams: (MPI-only) users can select how many sub-streams (M sub-files) are produced during a run, ranges between 1 and the number of mpi processes from MPI_Size (N), adios2 will internally aggregate data buffers (N-to-M) to output the required number of sub-files. If Substream is out of bounds it will pick either 1 (SubStreams < 1 -> N-to-1) or N ((SubStreams > N -> N-to-N) and ADIOS2 will issue a WARNING message. Use for performance tuning.

  10. Node-Local: For distributed file system. Every writer process must make sure the .bp/ directory is created on the local file system. Required for using local disk/SSD/NVMe in a cluster.

Key

Value Format

Default and Examples

Profile

string On/Off

On, Off

ProfileUnits

string

Microseconds, Milliseconds, Seconds, Minutes, Hours

CollectiveMetadata

string On/Off

On, Off

Threads

integer > 1

1, 2, 3, 4, 16, 32, 64

InitialBufferSize

float+units >= 16Kb

16Kb, 10Mb, 0.5Gb

MaxBufferSize

float+units >= 16Kb

at EndStep, 10Mb, 0.5Gb

BufferGrowthFactor

float > 1

1.05, 1.01, 1.5, 2

FlushStepsCount

integer > 1

1, 5, 1000, 50000

SubStreams

integer >= 1

MPI_Size (N-to-N), MPI_Size/2, … , 2, (N-to-1) 1

Node-Local

string On/Off

Off, On

Only file transport types are supported. Optional parameters for IO::AddTransport or in runtime config file transport field:

Transport type: File

Key

Value Format

Default and Examples

Library

string

POSIX (UNIX), FStream (Windows), stdio, IME

HDF5

In ADIOS2, the default engine for reading and writing HDF5 files is called “HDF5”. To use this engine, you can either specify it in your xml config file, with tag <engine type=HDF5> or, set it in client code. For example, here is how to create a hdf5 reader:

adios2::IO h5IO = adios.DeclareIO("SomeName");
h5IO.SetEngine("HDF5");
adios2::Engine h5Reader = h5IO.Open(filename, adios2::Mode::Read);

In addition, with HDF5 distribution greater or equal to 1.11, one can use the engine HDF5Mixer to write files with the VDS (virtual dataset) feature from HDF5. The corresponding tag in the xml file is: <engine type=HDF5Mixer>

and a sample code for VDS writer is:

adios2::IO h5IO = adios.DeclareIO("SomeName");
h5IO.SetEngine("HDF5Mixer");
adios2::Engine h5Writer = h5IO.Open(filename, adios2::Mode::Write);

To read back the h5 files generated with VDS to ADIOS2, one can use the HDF5 engine. Please make sure you are using the HDF5 library that has version greater than or equal to 1.11 in ADIOS2.

The h5 file generated by ADIOS2 has two levels of groups: The top Group, / and its subgroups: Step0StepN, where N is number of steps. All datasets belong to the subgroups.

Any other h5 file can be read back to ADIOS as well. To be consistent, when reading back to ADIOS2, we assume a default Step0, and all datasets from the original h5 file belong to that subgroup. The full path of a dataset (from the original h5 file) is used when represented in ADIOS2.

We can pass options to HDF5 API from ADIOS xml configuration. Currently we support CollectionIO (default false), and chunk specifications. The chunk specification uses space to seperate values, and by default, if a valid H5ChunkDim exists, it applies to all variables, unless H5ChunkVar is specified. Examples:

<parameter key="H5CollectiveMPIO" value="yes"/>
<parameter key="H5ChunkDim" value="200 200"/>
<parameter key="H5ChunkVar" value="VarName1 VarName2"/>

We suggest to read HDF5 documentation before appling these options.

SST Sustainable Staging Transport

In ADIOS2, the Sustainable Staging Transport (SST) is an engine that allows direct connection of data producers and consumers via the ADIOS2 write/read APIs. This is a classic streaming data architecture where the data passed to ADIOS on the write side (via Put() deferred and sync, and similar calls) is made directly available to a reader (via Get(), deferred and sync, and similar calls).

SST is designed for use in HPC environments and can take advantage of RDMA network interconnects to speed the transfer of data between communicating HPC applications; however, it is also capable of operating in a Wide Area Networking environment over standard sockets. SST supports full MxN data distribution, where the number of reader ranks can differ from the number of writer ranks. SST also allows multiple reader cohorts to get access to a writer’s data simultaneously.

To use this engine, you can either specify it in your xml config file, with tag <engine type=SST> or, set it in client code. For example, here is how to create an SST reader:

adios2::IO sstIO = adios.DeclareIO("SomeName");
sstIO.SetEngine("SST");
adios2::Engine sstReader = sstIO.Open(filename, adios2::Mode::Read);

and a sample code for SST writer is:

adios2::IO sstIO = adios.DeclareIO("SomeName");
sstIO.SetEngine("SST");
adios2::Engine sstWriter = sstIO.Open(filename, adios2::Mode::Write);

The general goal of ADIOS2 is to ease the conversion of a file-based application to instead use a non-file streaming interconnect, for example, data producers such as computational physics codes and consumers such as analysis applications. However, there are some uses of ADIOS2 APIs that work perfectly well with the ADIOS2 file engines, but which will not work or will perform badly with streaming. For example, SST is based upon the “step” concept and ADIOS2 applications that use SST must call BeginStep() and EndStep(). On the writer side, the Put() calls between BeginStep and EndStep are the unit of communication and represent the data that will be available between the corresponding Begin/EndStep calls on the reader.

Also, it is recommended that SST-based applications not use the ADIOS2 Get() sync method unless there is only one data item to be read per step. This is because SST implements MxN data transfer (and avoids having to deliver all data to every reader), by queueing data on the writer ranks until it is known which reader rank requires it. Normally this data fetch stage is initiated by PerformGets() or EndStep(), both of which fulfill any pending Get() deferred operations. However, unlike Get() deferred, the semantics of Get() sync require the requested data to be fetched from the writers before the call can return. If there are multiple calls to Get() sync per step, each one may require a communication with many writers, something that would have only had to happen once if Get() differed were used instead. Thus the use of Get() sync is likely to incur a substantial performance penalty.

On the writer side, depending upon the chosen data marshaling option there may be some (relatively small) performance differences between Put() sync and Put() deferred, but they are unlikely to be as substantial as between Get() sync and Get() deferred.

Note that SST readers and writers do not necessarily move in lockstep, but depending upon the queue length parameters and queueing policies specified, differing reader and writer speeds may cause one or the other side to wait for data to be produced or consumed, or data may be dropped if allowed by the queueing policy. However, steps themselves are atomic and no step will be partially dropped, delivered to a subset of ranks, or otherwise divided.

The SST engine allows the user to customize the streaming operations through the following optional parameters:

1. RendezvousReaderCount: Default 1. This integer value specifies the number of readers for which the writer should wait before the writer-side Open() returns. The default of 1 implements an ADIOS1/flexpath style “rendezvous”, in which an early-starting reader will wait for the writer to start, or vice versa. A number >1 will cause the writer to wait for more readers and a value of 0 will allow the writer to proceed without any readers present. This value is interpreted by SST Writer engines only.

2. RegistrationMethod: Default “File”. By default, SST reader and writer engines communicate network contact information via files in a shared filesystem. Specifically, the "filename" parameter in the Open() call is interpreted as a path which the writer uses as the name of a file to which contact information is written, and from which a reader will attempt to read contact information. As with other file-based engines, file creation and access is subject to the usual considerations (directory components are interpreted, but must exist and be traversable, writer must be able to create the file and the reader must be able to read it). Generally the file so created will exist only for as long as the writer keeps the stream Open(), but abnormal process termination may leave “stale” files in those locations. These stray “.sst” files should be deleted to avoid confusing future readers. SST also offers a “Screen” registration method in which writers and readers send their contact information to, and read it from, stdout and stdin respectively. The “screen” registration method doesn’t support batch mode operations in any way, but may be useful when manually starting jobs on machines in a WAN environment that don’t share a filesystem. A future release of SST will also support a “Cloud” registration method where contact information is registered to and retrieved from a network-based third-party server so that both the shared filesystem and interactivity can be avoided. This value is interpreted by both SST Writer and Reader engines.

3. QueueLimit: Default 0. This integer value specifies the number of steps which the writer will allow to be queued before taking specific action (such as discarding data or waiting for readers to consume the data). The default value of 0 is interpreted as no limit. This value is interpreted by SST Writer engines only.

4. QueueFullPolicy: Default “Block”. This value controls what policy is invoked if a non-zero QueueLimit has been specified and new data would cause the queue limit to be reached. Essentially, the “Block” option ensures data will not be discarded and if the queue fills up the writer will block on EndStep until the data has been read. If there is one active reader, EndStep will block until data has been consumed off the front of the queue to make room for newly arriving data. If there is more than one active reader, it is only removed from the queue when it has been read by all readers, so the slowest reader will dictate progress. NOTE THAT THE NO READERS SITUATION IS A SPECIAL CASE: If there are no active readers, new timesteps are considered to have completed their active queueing immediately upon submission. They may be retained in the “reserve queue” if the ReserveQueueLimit is non-zero. However, if that ReserveQueueLimit parameter is zero, timesteps submitted when there are no active readers will be immediately discarded.

Besides “Block”, the other acceptable value for QueueFullPolicy is “Discard”. When “Discard” is specified, and an EndStep operation would add more than the allowed number of steps to the queue, some step is discarded. If there are no current readers connected to the stream, the oldest data in the queue is discarded. If there are current readers, then the newest data (I.E. the just-created step) is discarded. (The differential treatment is because SST sends metadata for each step to the readers as soon as the step is accepted and cannot reliably prevent that use of that data without a costly all-to-all synchronization operation. Discarding the newest data instead is less satisfying, but has a similar long-term effect upon the set of steps delivered to the readers.) This value is interpreted by SST Writer engines only.

5. ReserveQueueLimit: Default 0. This integer value specifies the number of steps which the writer will keep in the queue for the benefit of late-arriving readers. This may consist of timesteps that have already been consumed by any readers, as well as timesteps that have not yet been consumed. In some sense this is target queue minimum size, while QueueLimit is a maximum size. This value is interpreted by SST Writer engines only.

6. DataTransport: Default varies. This string value specifies the underlying network communication mechanism to use for exchanging data in SST. Generally this is chosen by SST based upon what is available on the current platform. However, specifying this engine parameter allows overriding SST’s choice. Current allowed values are “RDMA” and “WAN”. (ib and fabric are accepted as equivalent to RDMA and evpath is equivalent to WAN.) Generally both the reader and writer should be using the same network transport, and the network transport chosen may be dictated by the situation. For example, the RDMA transport generally operates only between applications running on the same high-performance interconnect (e.g. on the same HPC machine). If communication is desired between applications running on different interconnects, the Wide Area Network (WAN) option should be chosen. This value is interpreted by both SST Writer and Reader engines.

7. ControlTransport: Default tcp. This string value specifies the underlying network communication mechanism to use for performing control operations in SST. SST can be configured to standard TCP sockets, which are very reliable and efficient, but which are limited in their scalability. Alternatively, SST can use a reliable UDP protocol, that is more scalable, but as of ADIOS2 Release 2.4.0 still suffers from some reliability problems. (sockets is accepted as equivalent to tcp and udp, rudp, and enet are equivalent to scalable. Generally both the reader and writer should be using the same control transport. This value is interpreted by both SST Writer and Reader engines.

8. NetworkInterface: Default NULL. In situations in which there are multiple possible network interfaces available to SST, this string value specifies which should be used to generate SST’s contact information for writers. Generally this should NOT be specified except for narrow sets of circumstances. It has no effect if specified on Reader engines. If specified, the string value should correspond to a name of a network interface, such as are listed by commands like “netstat -i”. For example, on most Unix systems, setting the NetworkInterface parameter to “lo” (or possibly “lo0”) will result in SST generating contact information that uses the network address associated with the loopback interface (127.0.0.1). This value is interpreted by only by the SST Writer engine.

9. ControlInterface: Default NULL. This value is similar to the NetworkInterface parameter, but only applies to the SST layer which does messaging for control (open, close, flow and timestep management, but not actual data transfer). Generally the NetworkInterface parameter can be used to control this, but that also aplies to the Data Plane. Use ControlInterface in the event of conflicting specifications.

10. DataInterface: Default NULL. This value is similar to the NetworkInterface parameter, but only applies to the SST layer which does messaging for data transfer, not control (open, close, flow and timestep management). Generally the NetworkInterface parameter can be used to control this, but that also aplies to the Control Plane. Use DataInterface in the event of conflicting specifications. In the case of the RDMA data plane, this parameter controls the libfabric interface choice.

11. FirstTimestepPrecious: Default FALSE. FirstTimestepPrecious is a boolean parameter that affects the queueing of the first timestep presented to the SST Writer engine. If FirstTimestepPrecious is TRUE*, then the first timestep is effectively never removed from the output queue and will be presented as a first timestep to any reader that joins at a later time. This can be used to convey run parameters or other information that every reader may need despite joining later in a data stream. Note that this queued first timestep does count against the QueueLimit parameter above, so if a QueueLimit is specified, it should be a value larger than 1. Further note while specifying this parameter guarantees that the preserved first timestep will be made available to new readers, other reader-side operations (like requesting the LatestAvailable timestep in Engine parameters) might still cause the timestep to be skipped. This value is interpreted by only by the SST Writer engine.

12. AlwaysProvideLatestTimestep: Default FALSE. AlwaysProvideLatestTimestep is a boolean parameter that affects what of the available timesteps will be provided to the reader engine. If AlwaysProvideLatestTimestep is TRUE*, then if there are multiple timesteps available to the reader, older timesteps will be skipped and the reader will see only the newest available upon BeginStep. This value is interpreted by only by the SST Reader engine.

13. OpenTimeoutSecs: Default 60. OpenTimeoutSecs is an integer parameter that specifies the number of seconds SST is to wait for a peer connection on Open(). Currently this is only implemented on the Reader side of SST, and is a timeout for locating the contact information file created by Writer-side Open, not for completing the entire Open() handshake. Currently value is interpreted by only by the SST Reader engine.

14. SpeculativePreloadMode: Default AUTO. In some circumstances, SST eagerly sends all data from writers to every readers without first waiting for read requests. Generally this improves performance if every reader needs all the data, but can be very detrimental otherwise. The value AUTO for this engine parameter instructs SST to apply its own heuristic for determining if data should be eagerly sent. The value OFF disables this feature and the value ON causes eager sending regardless of heuristic. Currently SST’s heuristic is simple. If the size of the reader cohort is less than or equal to the value of the SpecAutoNodeThreshold engine parameter (Default value 1), eager sending is initiated. Currently value is interpreted by only by the SST Reader engine.

15. SpecAutoNodeThreshold: Default 1. If the size of the reader cohort is less than or equal to this value and the SpeculativePreloadMode parameter is AUTO, SST will initiate eager data sending of all data from each writer to all readers. Currently value is interpreted by only by the SST Reader engine.

Key

Value Format

Default and Examples

RendezvousReaderCount

integer

1

RegistrationMethod

string

File, Screen

QueueLimit

integer

0 (no queue limits)

QueueFullPolicy

string

Block, Discard

ReserveQueueLimit

integer

0 (no queue limits)

DataTransport

string

default varies by platform, RDMA, WAN

ControlTransport

string

TCP, Scalable

NetworkInterface

string

NULL

ControlInterface

string

NULL

DataInterface

string

NULL

FirstTimestepPrecious

boolean

FALSE, true, no, yes

AlwaysProvideLatestTimestep

boolean

FALSE, true, no, yes

OpenTimeoutSecs

integer

60

SpeculativePreloadMode

string

AUTO, ON, OFF

SpecAutoNodeThreshold

integer

1

SSC Strong Staging Coupler

The SSC engine is designed specifically for strong code coupling. Currently SSC only supports fixed IO pattern, which means once the first step is finished, users are not allowed to write or read a data block with a start and count that have not been written or read in the first step. SSC uses a combination of one sided MPI and two sided MPI methods. In any cases, all user applications are required to be launched within a single mpirun or mpiexec command, using the MPMD mode.

The SSC engine takes the following parameters:

  1. RendezvousAppCount: Default 2. The number of applications, including both writers and readers, that will work on this stream. The SSC engine’s open function will block until all these applications reach the open call. If there are multiple applications in a workflow, this parameter needs to be set respectively for every application. For example, in a three-app coupling scenario: App 0 writes Stream A to App 1; App 1 writes Stream B to App 0; App 2 writes Stream C to App 1; App 1 writes Stream D to App 2, the parameter RendezvousAppCount for engine instances of every stream should be all set to 2, because for each of the streams, two applications will work on it. In another example, where App 0 writes Stream A to App 1 and App 2; App 1 writes Stream B to App 2, the parameter RendezvousAppCount for engine instances of Stream A and B should be set to 3 and 2 respectively, because three applications will work on Stream A, while two applications will work on Stream B.

  2. MaxStreamsPerApp: Default 1. The maximum number of streams that all applications sharing this MPI_COMM_WORLD can possibly open. It is required that this number is consistent across all ranks from all applications. This is used for pre-allocating the vectors holding MPI handshake informations and due to the fundamental communication mechanism of MPI, this information must be set statically through engine parameters, and the SSC engine cannot provide any mechanism to check if this parameter is set correctly. If this parameter is wrongly set, the SSC engine’s open function will either exit early than expected without gathering all applications’ handshake information, or it will block until timeout. It may cause other unpredictable errors too.

  3. OpenTimeoutSecs: Default 10. Timeout in seconds for opening a stream. The SSC engine’s open function will block until the RendezvousAppCount is reached, or timeout, whichever comes first. If it reaches the timeout, SSC will throw an exception.

  4. MaxFilenameLength: Default 128. The maximum length of filenames across all ranks from all applications. It is used for allocating the handshake buffer. Due to the limitation of MPI communication, this number must be set statically. The default number should work for most use cases. SSC will throw an exception if any rank opens a stream with a filename longer than this number.

  5. MpiMode: Default TwoSided. MPI communication modes to use. Besides the default TwoSided mode using two sided MPI communications, MPI_Isend and MPI_Irecv, for data transport, there are four one sided MPI modes: OneSidedFencePush, OneSidedPostPush, OneSidedFencePull, and OneSidedPostPull. Modes with Push are based on the push model and use MPI_Put for data transport, while modes with Pull are based on the pull model and use MPI_Get. Modes with Fence use MPI_Win_fence for synchronization, while modes with Post use MPI_Win_start, MPI_Win_complete, MPI_Win_post and MPI_Win_wait.

Key

Value Format

Default and Examples

RendezvousAppCount

integer

2, 3, 5, 10

MaxStreamsPerApp

integer

1, 2, 4, 8

OpenTimeoutSecs

integer

10, 2, 20, 200

MaxFilenameLength

integer

128, 32, 64, 512

MpiMode

string

TwoSided, OneSidedFencePush, OneSidedPostPush, OneSidedFencePull, OneSidedPostPull

DataMan for Wide Area Network Data Staging

The DataMan engine is designed for data staging over the wide area network. It is supposed to be used in cases where a few writers send data to a few readers over long distance.

DataMan does NOT guarantee that readers receive EVERY data step from writers. The idea behind this is that for experimental data, which is the target use case of this engine, the data rate of writers should not be slowed down by readers. If readers cannot keep up with the experiment, the experiment should still continue, and the readers should read the latest data steps. The design also helps improving performance because it saves the communication time for checking step completeness, which usually means ~100 milliseconds every step for transoceanic connections.

For wide area data staging applications that require readers to receive EVERY data step, the SST engine is recommended.

The DataMan engine takes the following parameters:

  1. IPAddress: No default value. The IP address of the host where the writer application runs. This parameter is compulsory in wide area network data staging.

  2. Port: Default 50001. The port number on the writer host that will be used for data transfers.

  3. Timeout: Default 5. Timeout in seconds to wait for every send / receive operation. Packages not sent or received within this time are considered lost.

  4. RendezvousReaderCount: Default 1. This integer value specifies the number of readers for which the writer should wait before the writer-side Open() returns. By default, an early-starting writer will wait for the reader to start, or vice versa. A number >1 will cause the writer to wait for more readers, and a value of 0 will allow the writer to proceed without any readers present. This value is interpreted by DataMan Writer engines only.

  5. DoubleBuffer: Default true for reader, false for writer. Whether to use double buffer for caching send and receive operations. Enabling double buffer will cause extra overhead for managing threads and buffer queues, but will improve the continuity of data steps for the reader, for the pub/sub mode. Advice for generic uses cases is to keep the default values, true for reader and false for writer.

  6. TransportMode: Default fast. The fast mode is optimized for latency-critical applications. It enforces readers to only receive the latest step. Therefore, in cases where writers are faster than readers, readers will skip some data steps. The reliable mode ensures that all steps are received by readers, by sacrificing performance compared to the fast mode.

Key

Value Format

Default and Examples

IPAddress

string

N/A, 22.195.18.29

Port

integer

50001, 22000, 33000

Timeout

integer

5, 10, 30

RendezvousReaderCount

integer

1, 0, 3

DoubleBuffer

bool

true for reader, false for writer

TransportMode

string

fast, reliable

Inline for zero-copy

The Inline engine provides in-process communication between the writer and reader, and seeks to avoid copying data buffers.

This engine is experimental, and is focused on the N -> N case: N writers share a process with N readers, and the analysis happens ‘inline’ without writing the data to a file or copying to another buffer. It has similar considerations to the streaming SST engine, since analysis must happen per step.

To use this engine, you can either specify it in your XML config file, with tag <engine type=Inline> or set it in your application code:

adios2::IO inlineIO = adios.DeclareIO("ioName");
inlineIO.SetEngine("Inline");
inlineIO.SetParameters({{"writerID", "inline_write"}, {"readerID", "inline_read"}});
adios2::Engine inlineWriter = inlineIO.Open("inline_write", adios2::Mode::Write);
adios2::Engine inlineReader = inlineIO.Open("inline_read", adios2::Mode::Read);

Notice that unlike other engines, the reader and writer share an IO instance. Also, the writerID parameter allows the reader to connect to the writer, and readerID allows writer to connect to the reader. Both the writer and reader must be opened before either tries to call BeginStep/PerformPuts/PeformGets.

For successful operation, the writer will perform a step, then the reader will perform a step in the same process. Data is decomposed between processes, and the writer can write its portion of the data like other ADIOS engines. When the reader starts its step, the only data it has available is that written by the writer in its process. To select this data in ADIOS, use a block selection. The reader then can retrieve whatever data was written by the writer. The reader does require the use of a new Get() call that was added to the API:

void Engine::Get<T>(                                       \
    Variable<T>, typename Variable<T>::Info & info, const Mode);

This version of Get is only used for the inline engine and requires passing a Variable<T>::Info object, which can be obtained from calling the reader’s BlocksInfo(). See the example below for details.

Note

This Get() method is preliminary and may be removed in the future when the span interface on the read side becomes available.

Note

The inline engine does not support Sync mode for writing. In addition, since the inline engine does not do any data copy, the writer should avoid changing the data contents before the reader has read the data.

Typical access pattern:

// ... Application data generation

inlineWriter.BeginStep();
inlineWriter.Put(var, data); // always use deferred mode
inlineWriter.EndStep();
// Unlike other engines, data should not be reused at this point (since ADIOS
// does not copy the data), though ADIOS cannot enforce this.
// Must wait until reader is finished using the data.

inlineReader.BeginStep();
auto blocksInfo = inlineReader.BlocksInfo(var, step);
for (auto& info : blocksInfo)
{
    var.SetBlockSelection(info.BlockID);
    inlineReader.Get(var, info);
}
inlineReader.EndStep();

// do application analysis -
// use info.Data() to get the pointer for each element in blocksInfo

// After any desired analysis is finished, writer can now reuse data pointer

Parameters:

  1. writerID: Match the string passed to the IO::Open() call when creating the writer. The reader uses this parameter to fetch the correct writer.

  2. readerID: Match the string passed to the IO::Open() call when creating the reader. The writer uses this parameter to fetch the correct reader.

Key

Value Type

Default and Examples

writerID

string

none, match the writer name

readerID

string

none, match the reader name

InSitu MPI

Coming soon…

Null

The Null Engine by-passes any heavy I/O operations that other Engines might potentially execute, for example, memory allocations, buffering, transport data movement. Calls to the Null engine would effectively return immediately without doing any effective operations.

The overall goal is to provide a mechanism to isolate an application behavior without the ADIOS 2 footprint. Use this engine to have an idea of the overhead cost of using a certain ADIOS 2 Engine (similar to writing to /dev/null) in an application.

Supported Virtual Engine Names

This section provides a description of the Virtual Engines that can be used to set up an actual Engine with specific parameters. These virtual names are used for beginner users to simplify the selection of an engine and its parameters. The following I/O uses cases are supported by virtual engine names:

  1. File: File I/O (Default engine).

    This sets up the I/O for files. If the file name passed in Open() ends with “.bp”, then the BP4 engine will be used starting in v2.5.0. If it ends with “.h5”, the HDF5 engine will be used. For old .bp files (BP version 3 format), the BP3 engine will be used for reading (v2.4.0 and below).

  2. FileStream: Online processing via files.

    This allows a Consumer to concurrently read the data while the Producer is writing new output steps into it. The Consumer will wait for the appearance of the file itself in Open() (for up to one hour) and wait for the appearance of new steps in the file (in BeginStep() up to the specificed timeout in that function).

  3. InSituAnalysis: Streaming data to another application.

    This sets up ADIOS for transferring data from a Producer to a Consumer application. The Producer and Consumer are synchronized at Open(). The Consumer will receive every single output step from the Producer, therefore, the Producer will block on output if the Consumer is slow.

  4. InSituVisualization:: Streaming data to another application without waiting for consumption.

    This sets up ADIOS for transferring data from a Producer to a Consumer without ever blocking the Producer. The Producer will throw away all output steps that are not immediately requested by a Consumer. It will also not wait for a Consumer to connect. This kind of streaming is great for an interactive visualization session where the user wants to see the most current state of the application.

  5. CodeCoupling:: Streaming data between two applications for code coupling.

    Producer and Consumer are waiting for each other in Open() and every step must be consumed. Currently, this is the same as in situ analysis.

These virtual engine names are used to select a specific engine and its parameters. In practice, after selecting the virtual engine name, one can modify the settings by adding/overwriting parameters. Eventually, a seasoned user would use the actual Engine names and parameterize it for the specific run.

These are the actual settings in ADIOS when a virtual engine is selected. The parameters below can be modified before the Open call.

  1. File. Refer to the parameter settings for these engines of BP4, BP3 and HDF5 engines earlier in this section.

  2. FileStream. The engine is BP4. The parameters are set to:

Key

Value Format

Default and Examples

OpenTimeoutSecs

float

3600 (wait for up to an hour)

BeginStepPollingFrequencySecs

float

1 (poll the file system with 1 second frequency

  1. InSituAnalysis. The engine is SST. The parameters are set to:

Key

Value Format

Default and Examples

RendezvousReaderCount

integer

1 (Producer waits for the Consumer in Open)

QueueLimit

integer

1 (only buffer one step)

QueueFullPolicy

string

Block (wait for the Consumer to get every step)

FirstTimestepPrecious

bool

false (SST default)

AlwaysProvideLatestTimestep

bool

false (SST default)

  1. InSituVisualization. The engine is SST. The parameters are set to:

Key

Value Format

Default and Examples

RendezvousReaderCount

integer

0 (Producer does NOT wait for Consumer in Open)

QueueLimit

integer

3 (buffer first step + last two steps)

QueueFullPolicy

string

Discard (slow Consumer will miss out on steps)

FirstTimestepPrecious

bool

true (First step is kept around for late Consumers)

AlwaysProvideLatestTimestep

bool

false (SST default)

  1. Code Coupling. The engine is SST. The parameters are set to:

Key

Value Format

Default and Examples

RendezvousReaderCount

integer

1 (Producer waits for the Consumer in Open)

QueueLimit

integer

1 (only buffer one step)

QueueFullPolicy

string

Block (wait for the Consumer to get every step)

FirstTimestepPrecious

bool

false (SST default)

AlwaysProvideLatestTimestep

bool

false (SST default)