Interface Components
Components Overview
Note
If you are doing simple tasks where performance is a non-critical aspect please go to the High-Level APIs section for a quick start. If you are an HPC application developer or you want to use ADIOS2 functionality in full please read this chapter.
The simple way to understand the big picture for the ADIOS2 unified user interface components is to map each class to the actual definition of the ADIOS acronym.
Component |
Acronym |
Function |
ADIOS |
ADaptable |
Set MPI comm domain Set runtime settings Own other components |
IO |
I/O |
Set engine Set variables/attributes Set compile-time settings |
Engine |
System |
Execute heavy IO tasks Manage system resources |
ADIOS2’s public APIs are based on the natural choice for each supported language to represent each ADIOS2 components and its interaction with application datatypes. Thus,
Language |
Component API |
Application Data |
---|---|---|
C++(11/newer) |
objects/member functions |
pointers/references/std::vector |
C |
handler/functions |
pointers |
Fortran |
handler/subroutines |
arrays up to 6D |
Python |
objects/member functions |
numpy arrays. |
The following section provides a common overview to all languages based on the C++11 APIs. For each specific language go to the Full APIs section, but it’s highly recommended to read this section as components map 1-to-1 in other languages.
The following figure depicts the components hierarchy from the application’s point of view.
- ADIOS: the ADIOS component is the starting point between an application and the ADIOS2 library. Applications provide:
the scope of the ADIOS object through the MPI communicator,
an optional runtime configuration file (in XML format) to allow changing settings without recompiling.
The ADIOS component serves as a factory of adaptable IO components. Each IO must have a unique name within the scope of the ADIOS class object that created them with the DeclareIO function.
- IO: the IO component is the bridge between the application specific settings, transports. It also serves as a factory of:
Variables
Attributes
Engines
Variable: Variables are the link between self-describing representation in the ADIOS2 library and data from applications. Variables are identified by unique names in the scope of the particular IO that created them. When the Engine API functions are called, a Variable must be provided along with the application data.
Attribute: Attributes add extra information to the overall variables dataset defined in the IO class. They can be single or array values.
Engine: Engines define the actual system executing the heavy IO tasks at Open, BeginStep, Put, Get, EndStep and Close. Due to polymorphism, new IO system solutions can be developed quickly reusing internal components and reusing the same API. If IO.SetEngine is not called, the default engine is the binary-pack bp file reader and writer: BPFile.
Operator: These define possible operations to be applied on adios2-managed data, for example, compression. This higher level abstraction is needed to provide support for callbacks, transforms, analytics, data models, etc. Any required task will be executed within the Engine. One or many operators can be associated with any of the adios2 objects or a group of them.
ADIOS
The adios2::ADIOS
component is the initial contact point between an application and the ADIOS2 library.
Applications can be classified as MPI and non-MPI based.
We start by focusing on MPI applications as their non-MPI equivalent just removes the MPI communicator.
/** ADIOS class factory of IO class objects */
adios2::ADIOS adios("config.xml", MPI_COMM_WORLD);
This component is created by passing :
Runtime config file (optional): ADIOS2 xml runtime config file, see Runtime Configuration Files.
MPI communicator : which determines the scope of the ADIOS library components in an application.
adios2::ADIOS
objects can be created in MPI and non-MPI (serial) mode.
Optionally, a runtime configuration file can be passed to the constructor indicating the full file path, name and extension.
Constructors for MPI applications
/** Constructors */
// version that accepts an optional runtime adios2 config file
adios2::ADIOS(const std::string configFile,
MPI_COMM mpiComm = MPI_COMM_SELF);
adios2::ADIOS(MPI_COMM mpiComm = MPI_COMM_SELF);
/** Examples */
adios2::ADIOS adios(MPI_COMM_WORLD);
adios2::ADIOS adios("config.xml", MPI_COMM_WORLD);
Constructors for non-MPI (serial) applications
/** Constructors */
adios2::ADIOS(const std::string configFile);
adios2::ADIOS();
/** Examples */
adios2::ADIOS adios("config.xml");
adios2::ADIOS adios; // Do not use () for empty constructor.
Factory of IO components: Multiple IO components (IO tasks) can be created from within the scope of an ADIOS object by calling the DeclareIO
function:
/** Signature */
adios2::IO ADIOS::DeclareIO(const std::string ioName);
/** Examples */
adios2::IO bpWriter = adios.DeclareIO("BPWriter");
adios2::IO bpReader = adios.DeclareIO("BPReader");
This function returns a reference to an existing IO class object that lives inside the ADIOS object that created it.
The ioName
string must be unique; declaring two IO objects with the same name will throw an exception.
IO names are used to identify IO components in the runtime configuration file, Runtime Configuration Files.
As shown in the diagram below, each resulting IO object is self-managed and independent, thus providing an adaptable way to perform different kinds of I/O operations. Users must be careful not to create conflicts between system level unique I/O identifiers: file names, IP address and port, MPI Send/Receive message rank and tag, etc.
Tip
The ADIOS component is the only one whose memory is owned by the application. Thus applications must decide on its scope. Any other component of the ADIOS2 API refers to a component that lives inside the ADIOS component(e.g. IO, Operator) or indirectly in the IO component(Variable, Engine)
IO
The IO
component is the connection between how applications set up their input/output options by selecting an Engine
and its specific parameters, subscribing variables to data, and setting supported transport modes to a particular Engine
.
Think of IO
as a control panel for all the user-defined parameters that applications would like to fine tune.
None of the IO
operations are heavyweight until the Open
function that generates an Engine
is called.
Its API allows
generation of
Variable
andAttribute
components containing information about the data in the input output processsetting
Engine
-specific parameters and adding supported modes of transportgeneration of
Engine
objects to execute the actual IO tasks.
Note
If two different engine types are needed (e.g. BPFile
, SST
), you must define two IO
objects.
Also, at reading always define separate IOs to avoid Variable
name clashes.
Setting a Particular Engine and its Parameters
Engines execute the heavy operations in ADIOS2.
Each IO
may select a type of Engine
through the SetEngine
function.
If SetEngine
is not called, then the BPFile
engine is used.
/** Signature */
void adios2::IO::SetEngine( const std::string engineType );
/** Example */
bpIO.SetEngine("BPFile");
Each Engine
allows the user to fine tune execution of buffering and output tasks via parameters passed to the IO
object.
These parameters are then propagated to the Engine
.
For a list of parameters allowed by each engine see Available Engines.
Note
adios2::Params
is an alias to std::map<std::string,std::string>
to pass parameters as key-value string pairs, which can be initialized with curly-brace initializer lists.
/** Signature */
/** Passing several parameters at once */
void SetParameters(const adios2:Params& parameters);
/** Passing one parameter key-value pair at a time */
void SetParameter(const std::string key, const std::string value);
/** Examples */
io.SetParameters( { {"Threads", "4"},
{"ProfilingUnits", "Milliseconds"},
{"MaxBufferSize","2Gb"},
{"BufferGrowthFactor", "1.5" }
{"FlushStepsCount", "5" }
} );
io.SetParameter( "Threads", "4" );
Adding Supported Transports with Parameters
The AddTransport
function allows the user to specify how data is moved through the system, e.g. RDMA, wide-area networks, or files.
It returns an unsigned int
handler for each transport that can be used with the Engine::Close
function at different times.
AddTransport
must provide library specific settings that the low-level system library interface allows.
/** Signature */
unsigned int AddTransport( const std::string transportType,
const adios2::Params& parameters );
/** Examples */
const unsigned int file1 = io.AddTransport( "File",
{ {"Library", "fstream"},
{"Name","file1.bp" }
} );
const unsigned int file2 = io.AddTransport( "File",
{ {"Library", "POSIX"},
{"Name","file2.bp" }
} );
const unsigned int wan = io.AddTransport( "WAN",
{ {"Library", "Zmq"},
{"IP","127.0.0.1" },
{"Port","80"}
} );
Defining, Inquiring and Removing Variables and Attributes
The template functions DefineVariable<T>
allows subscribing to data into ADIOS2 by returning a reference to a Variable
class object whose scope is the same as the IO
object that created it.
The user must provide a unique name, the dimensions: MPI global: shape, MPI local: start and offset, optionally a flag indicating that dimensions are known to be constant, and a data pointer if defined in the application.
Note: data is not passed at this stage.
This is done by the Engine
functions Put
and Get
for Variables.
See the Variable section for supported types and shapes.
Tip
adios2::Dims
is an alias to std::vector<std::size_t>
, while adios2::ConstantDims
is an alias to bool true
. Use them for code clarity.
/** Signature */
adios2::Variable<T>
DefineVariable<T>(const std::string name,
const adios2::Dims &shape = {}, // Shape of global object
const adios2::Dims &start = {}, // Where to begin writing
const adios2::Dims &count = {}, // Where to end writing
const bool constantDims = false);
/** Example */
/** global array of floats with constant dimensions */
adios2::Variable<float> varFloats =
io.DefineVariable<float>("bpFloats",
{size * Nx},
{rank * Nx},
{Nx},
adios2::ConstantDims);
Attributes are extra-information associated with the current IO
object.
The function DefineAttribute<T>
allows for defining single value and array attributes.
Keep in mind that Attributes apply to all Engines created by the IO
object and, unlike Variables which are passed to each Engine
explicitly, their definition contains their actual data.
/** Signatures */
/** Single value */
adios2::Attribute<T> DefineAttribute(const std::string &name,
const T &value);
/** Arrays */
adios2::Attribute<T> DefineAttribute(const std::string &name,
const T *array,
const size_t elements);
In situations in which a variable and attribute has been previously defined:
1) a variable/attribute reference goes out of scope, or 2) when reading from an incoming stream, the IO
can inquire about the status of variables and attributes.
If the inquired variable/attribute is not found, then the overloaded bool()
operator of returns false
.
/** Signature */
adios2::Variable<T> InquireVariable<T>(const std::string &name) noexcept;
adios2::Attribute<T> InquireAttribute<T>(const std::string &name) noexcept;
/** Example */
adios2::Variable<float> varPressure = io.InquireVariable<float>("pressure");
if( varPressure ) // it exists
{
...
}
Note
adios2::Variable
overloads operator bool()
so that we can check for invalid states (e.g. variables haven’t arrived in a stream, weren’t previously defined, or weren’t written in a file).
Caution
Since InquireVariable
and InquireAttribute
are template functions, both the name and type must match the data you are looking for.
Opening an Engine
The IO::Open
function creates a new derived object of the abstract Engine
class and returns a reference handler to the user.
A particular Engine
type is set to the current IO
component with the IO::SetEngine
function.
Engine polymorphism is handled internally by the IO
class, which allows subclassing future derived Engine
types without changing the basic API.
Engine
objects are created in various modes.
The available modes are adios2::Mode::Read
, adios2::Mode::ReadRandomAccess
, adios2::Mode::Write
, adios2::Mode::Append
, adios2::Mode::Sync
, adios2::Mode::Deferred
, and adios2::Mode::Undefined
.
/** Signatures */
/** Provide a new MPI communicator other than from ADIOS->IO->Engine */
adios2::Engine adios2::IO::Open(const std::string &name,
const adios2::Mode mode,
MPI_Comm mpiComm );
/** Reuse the MPI communicator from ADIOS->IO->Engine \n or non-MPI serial mode */
adios2::Engine adios2::IO::Open(const std::string &name,
const adios2::Mode mode);
/** Examples */
/** Engine derived class, spawned to start Write operations */
adios2::Engine bpWriter = io.Open("myVector.bp", adios2::Mode::Write);
/** Engine derived class, spawned to start Read operations on rank 0 */
if( rank == 0 )
{
adios2::Engine bpReader = io.Open("myVector.bp",
adios2::Mode::Read,
MPI_COMM_SELF);
}
Caution
Always pass MPI_COMM_SELF
if an Engine
lives in only one MPI process.
Open
and Close
are collective operations.
Variable
An adios2::Variable
is the link between a piece of data coming from an application and its metadata.
This component handles all application variables classified by data type and shape.
Each IO
holds a set of Variables, and each Variable
is identified with a unique name.
They are created using the reference from IO::DefineVariable<T>
or retrieved using the pointer from
IO::InquireVariable<T>
functions in IO.
Data Types
Only primitive types are supported in ADIOS2.
Fixed-width types from <cinttypes> and <cstdint> should be
preferred when writing portable code. ADIOS2 maps primitive types to equivalent fixed-width types
(e.g. int
-> int32_t
). In C++, acceptable types T
in Variable<T>
along with their preferred fix-width
equivalent in 64-bit platforms are given below:
Data types Variables supported by ADIOS2 Variable<T>
std::string (only used for global and local values, not arrays)
char -> int8_t or uint8_t depending on compiler flags
signed char -> int8_t
unsigned char -> uint8_t
short -> int16_t
unsigned short -> uint16_t
int -> int32_t
unsigned int -> uint32_t
long int -> int32_t or int64_t (Linux)
long long int -> int64_t
unsigned long int -> uint32_t or uint64_t (Linux)
unsigned long long int -> uint64_t
float -> always 32-bit = 4 bytes
double -> always 64-bit = 8 bytes
long double -> platform dependent
std::complex<float> -> always 64-bit = 8 bytes = 2 * float
std::complex<double> -> always 128-bit = 16 bytes = 2 * double
Tip
It’s recommended to be consistent when using types for portability.
If data is defined as a fixed-width integer, define variables in ADIOS2 using a fixed-width type, e.g. for int32_t
data types use DefineVariable<int32_t>
.
Note
C, Fortran APIs: the enum and parameter adios2_type_XXX only provides fixed-width types.
Note
Python APIs: use the equivalent fixed-width types from numpy.
If dtype
is not specified, ADIOS2 handles numpy defaults just fine as long as primitive types are passed.
Shapes
ADIOS2 is designed for MPI applications.
Thus different application data shapes must be supported depending on their scope within a particular MPI communicator.
The shape is defined at creation from the IO
object by providing the dimensions: shape, start, count in the
IO::DefineVariable<T>
. The supported shapes are described below.
1. Global Single Value: Only a name is required for their definition. These variables are helpful for storing global information, preferably managed by only one MPI process, that may or may not change over steps: e.g. total number of particles, collective norm, number of nodes/cells, etc.
if( rank == 0 ) { adios2::Variable<uint32_t> varNodes = io.DefineVariable<uint32_t>("Nodes"); adios2::Variable<std::string> varFlag = io.DefineVariable<std::string>("Nodes flag"); // ... engine.Put( varNodes, nodes ); engine.Put( varFlag, "increased" ); // ... }Note
Variables of type
string
are defined just like global single values. Multidimensional strings are supported for fixed size strings through variables of typechar
.
2. Global Array: This is the most common shape used for storing data that lives in several MPI processes. The image below illustrates the definitions of the dimension components in a global array: shape, start, and count.
Warning
Be aware of data ordering in your language of choice (row-major or column-major) as depicted in the figure above. Data decomposition is done by the application, not by ADIOS2.
Start and Count local dimensions can be later modified with the
Variable::SetSelection
function if it is not a constant dimensions variable.
3. Local Value:
Values that are local to the MPI process.
They are defined by passing the adios2::LocalValueDim
enum as follows:
adios2::Variable<int32_t> varProcessID = io.DefineVariable<int32_t>("ProcessID", {adios2::LocalValueDim}) //... engine.Put<int32_t>(varProcessID, rank);
These values become visible on the reader as a single merged 1-D Global Array whose size is determined by the number of writer ranks.
4. Local Array:
Arrays that are local to the MPI process.
These are commonly used to write checkpoint-restart data.
Reading, however, needs to be handled differently: each process’ array has to be read separately, using SetSelection
per rank.
The size of each process selection should be discovered by the reading application by inquiring per-block size information of the variable, and allocate memory accordingly.
Note
Constants are not handled separately from step-varying values in ADIOS2. Simply write them only once from one rank.
5. Joined Array:
Joined arrays are a variation of the Local Array described above.
Where LocalArrays are only available to the reader via their block
number, JoinedArrays are merged into a single global array whose
global dimensions are determined by the sum of the contributions of
each writer rank. Specifically: JoinedArrays are N-dimensional
arrays where one (and only one) specific dimension is the Joined
dimension. (The other dimensions must be constant and the same across
all contributions.) When defining a Joined variable, one specifies a
shape parameter that give the dimensionality of the array with the
special constant adios2::JoinedDim
in the dimension to be joined.
Unlike a Global Array definition, the start parameter must be an empty
Dims value.
For example, the definition below defines a 2-D Joined array where the
first dimension is the one along which blocks will be joined and the
2nd dimension is 5. Here this rank is contributing two rows to this array.
auto var = outIO.DefineVariable<double>("table", {adios2::JoinedDim, 5}, {}, {2, 5});
If each of N writer ranks were to declare a variable like this and do a single Put() in a timestep, the reader-side GlobalArray would have shape {2*N, 5} and all normal reader-side GlobalArray operations would be applicable to it.
Note
JoinedArrays are currently only supported by BP4 and BP5 engines, as well as the SST engine with BP5 marshalling.
Global Array Capabilities and Limitations
ADIOS2 is focusing on writing and reading N-dimensional, distributed, global arrays of primitive types. The basic idea is that, usually, a simulation has such a data structure in memory (distributed across multiple processes) and wants to dump its content regularly as it progresses. ADIOS2 was designed to:
to do this writing and reading as fast as possible
to enable reading any subsection of the array
The figure above shows a parallel application of 12 processes producing a 2D array. Each process has a 2D array locally and the output is created by placing them into a 4x3 pattern. A reading application’s individual process then can read any subsection of the entire global array. In the figure, a 6 process application decomposes the array in a 3x2 pattern and each process reads a 2D array whose content comes from multiple producer processes.
The figure hopefully helps to understand the basic concept but it can be also misleading if it suggests limitations that are not there. Global Array is simply a boundary in N-dimensional space where processes can place their blocks of data. In the global space:
one process can place multiple blocks
does NOT need to be fully covered by the blocks
blocks can overlap
each process can put a different size of block, or put multiple blocks of different sizes
some process may not contribute anything to the global array
Over multiple output steps
the processes CAN change the size (and number) of blocks in the array
the global dimensions CAN change over output steps
Limitations of the ADIOS global array concept
Indexing starts from 0
Cyclic data patterns are not supported; only blocks can be written or read
If Some blocks may fully or partially fall outside of the global boundary, the reader will not be able to read those parts
Note
Technically, the content of the individual blocks is kept in the BP format (but not in HDF5 format) and in staging. If you really, really want to retrieve all the blocks, you need to handle this array as a Local Array and read the blocks one by one.
Attribute
Attributes are extra information associated with a particular IO component.
They can be thought of as a very simplified Variable
, but with the goal of adding extra metadata.
The most common use is the addition of human-readable metadata (e.g. "experiment name"
, "date and time"
, "04,27,2017"
, or a schema).
Currently, ADIOS2 supports single values and arrays of primitive types (excluding complex<T>
) for the template type in the IO::DefineAttribute<T>
and IO::InquireAttribute<T>
function (in C++).
The data types supported for ADIOS2 Attributes
are
std::string
char
signed char
unsigned char
short
unsigned short
int
unsigned int
long int
long long int
unsigned long int
unsigned long long int
float
double
long double
The returned object (DefineAttribute
or InquireAttribute
) only serves the purpose to inspect the current Attribute<T>
information within code.
Engine
The Engine abstraction component serves as the base interface to the actual IO systems executing the heavy-load tasks performed when producing and consuming data.
Engine functionality works around two concepts:
Variables are published (
Put
) and consumed (Get
) in “steps” in either “File” random-access (all steps are available) or “Streaming” (steps are available as they are produced in a step-by-step fashion).Variables are published (
Put
) and consumed (Get
) using a “sync” or “deferred” (lazy evaluation) policy.
Caution
The ADIOS2 “step” is a logical abstraction that means different things depending on the application context. Examples: “time step”, “iteration step”, “inner loop step”, or “interpolation step”, “variable section”, etc. It only indicates how the variables were passed into ADIOS2 (e.g. I/O steps) without the user having to index this information on their own.
Tip
Publishing and consuming data is a round-trip in ADIOS2.
Put
and Get
APIs for write/append and read modes aim to be “symmetric”, reusing functions, objects, and semantics as much as possible.
The rest of the section explains the important concepts.
BeginStep
Begins a logical step and return the status (via an enum) of the stream to be read/written. In streaming engines
BeginStep
is where the receiver tries to acquire a new step in the reading process. The full signature allows for a mode and timeout parameters. See Supported Engines for more information on what engine allows. A simplified signature allows each engine to pick reasonable defaults.
// Full signature
StepStatus BeginStep(const StepMode mode,
const float timeoutSeconds = -1.f);
// Simplified signature
StepStatus BeginStep();
EndStep
Ends logical step, flush to transports depending on IO parameters and engine default behavior.
Tip
To write portable code for a step-by-step access across ADIOS2 engines (file and streaming engines) use BeginStep
and EndStep
.
Danger
Accessing random steps in read mode (e.g. Variable<T>::SetStepSelection
in file engines) will create a conflict with BeginStep
and EndStep
and will throw an exception.
In file engines, data is either consumed in a random-access or step-by-step mode, but not both.
Close
Close current engine and underlying transports. An
Engine
object can’t be used after this call.
Put: modes and memory contracts
Put
publishes data in ADIOS2.
It is unavailable unless the Engine
is created in Write
or Append
mode.
The most common signature is the one that passes a Variable<T>
object for the metadata, a const
piece of contiguous memory for
the data, and a mode for either Deferred
(data may be collected at
Put() or not until EndStep/PerformPuts/Close) or Sync
(data is reusable immediately).
This is the most common use case in applications.
Deferred (default) or Sync mode, data is contiguous memory
void Put(Variable<T> variable, const T* data, const adios2::Mode = adios2::Mode::Deferred);
ADIOS2 Engines also provide direct access to their buffer memory.
Variable<T>::Span
is based on a subset of the upcoming C++20 std::span, which is a non-owning reference to a block of contiguous memory.
Spans act as a 1D container meant to be filled out by the application.
They provide the standard API of an STL container, providing begin()
and end()
iterators, operator[]
and at()
, as well as data()
and size()
.
Variable<T>::Span
is helpful in situations in which temporaries are needed to create contiguous pieces of memory from non-contiguous pieces (e.g. tables, arrays without ghost-cells), or just to save memory as the returned Variable<T>::Span
can be used for computation, thus avoiding an extra copy from user memory into the ADIOS2 buffer.
Variable<T>::Span
combines a hybrid Sync
and Deferred
mode, in which the initial value and memory allocations are Sync
, while data population and metadata collection are done at EndStep/PerformPuts/Close.
Memory contracts are explained later in this chapter followed by examples.
The following Variable<T>::Span
signatures are available:
Return a span setting a default
T()
value into a default bufferVariable<T>::Span Put(Variable<T> variable);
3. Return a span setting an initial fill value into a certain buffer.
If span is not returned then the fillValue
is fixed for that block.
Variable<T>::Span Put(Variable<T> variable, const size_t bufferID, const T fillValue);
In summary, the following are the current Put signatures for publishing data in ADIOS 2:
Deferred
(default) orSync
mode, data is contiguous memory put in an ADIOS2 buffer.void Put(Variable<T> variable, const T* data, const adios2::Mode = adios2::Mode::Deferred);
2. Return a span setting a default T()
value into a default ADIOS2 buffer.
If span is not returned then the default T()
is fixed for that block (e.g. zeros).
Variable<T>::Span Put(Variable<T> variable);
3. Return a span setting an initial fill value into a certain buffer.
If span is not returned then the fillValue
is fixed for that block.
Variable<T>::Span Put(Variable<T> variable, const size_t bufferID, const T fillValue);
The following table summarizes the memory contracts required by ADIOS2 engines between Put
signatures and the data memory coming from an application:
Put |
Data Memory |
Contract |
Deferred |
Pointer Contents |
do not modify until PerformPuts/EndStep/Close consumed at Put or PerformPuts/EndStep/Close |
Sync |
Pointer Contents |
modify after Put consumed at Put |
Span |
Pointer Contents |
modified by new Spans, updated span iterators/data consumed at PerformPuts/EndStep/Close |
Note
In Fortran (array) and Python (numpy array) avoid operations that modify the internal structure of an array (size) to preserve the address.
Each Engine
will give a concrete meaning to each functions signatures, but all of them must follow the same memory contracts to the “data pointer”: the memory address itself, and the “data contents”: memory bits (values).
Put in Deferred or lazy evaluation mode (default): this is the preferred mode as it allows
Put
calls to be “grouped” before potential data transport at the first encounter ofPerformPuts
,EndStep
orClose
.Put(variable, data); Put(variable, data, adios2::Mode::Deferred);
Deferred memory contracts:
“data pointer” do not modify (e.g. resize) until first call to
PerformPuts
,EndStep
orClose
.“data contents” may be consumed immediately or at first call to
PerformPuts
,EndStep
orClose
. Do not modify data contents after Put.
Usage:
// recommended use: // set "data pointer" and "data contents" // before Put data[0] = 10; // Puts data pointer into adios2 engine // associated with current variable metadata engine.Put(variable, data); // Modifying data after Put(Deferred) may result in different // results with different engines // Any resize of data after Put(Deferred) may result in // memory corruption or segmentation faults data[1] = 10; // "data contents" must not have been changed // "data pointer" must be the same as in Put engine.EndStep(); //engine.PerformPuts(); //engine.Close(); // now data pointer can be reused or modified
Tip
It’s recommended practice to set all data contents before
Put
in deferred mode to minimize the risk of modifying the data pointer (not just the contents) before PerformPuts/EndStep/Close.
2. Put in Sync mode: this is the special case, data pointer becomes reusable right after Put
.
Only use it if absolutely necessary (e.g. memory bound application or out of scope data, temporary).
Put(variable, *data, adios2::Mode::Sync);Sync memory contracts:
“data pointer” and “data contents” can be modified after this call.
Usage:
// set "data pointer" and "data contents" // before Put in Sync mode data[0] = 10; // Puts data pointer into adios2 engine // associated with current variable metadata engine.Put(variable, data, adios2::Mode::Sync); // data pointer and contents can be reused // in application
Put returning a Span: signature that allows access to adios2 internal buffer.
Use cases:
population from non-contiguous memory structures
memory-bound applications
Limitations:
does not allow operations (compression)
must keep engine and variables within scope of span usage
Span memory contracts:
“data pointer” provided by the engine and returned by
span.data()
, might change with the generation of a new span. It follows iterator invalidation rules from std::vector. Use span.data() or iterators, span.begin(), span.end() to keep an updated data pointer.span “data contents” are published at the first call to
PerformPuts
,EndStep
orClose
Usage:
// return a span into a block of memory // set memory to default T() adios2::Variable<int32_t>::Span span1 = Put(var1); // just like with std::vector::data() // iterator invalidation rules // dataPtr might become invalid // always use span1.data() directly T* dataPtr = span1.data(); // set memory value to -1 in buffer 0 adios2::Variable<float>::Span span2 = Put(var2, 0, -1); // not returning a span just sets a constant value Put(var3); Put(var4, 0, 2); // fill span1 span1[0] = 0; span1[1] = 1; span1[2] = 2; // fill span2 span2[1] = 1; span2[2] = 2; // here collect all spans // they become invalid engine.EndStep(); //engine.PerformPuts(); //engine.Close(); // var1 = { 0, 1, 2 }; // var2 = { -1., 1., 2.}; // var3 = { 0, 0, 0}; // var4 = { 2, 2, 2};
The data
fed to the Put
function is assumed to be allocated on the Host (default mode). In order to use data allocated on the device, the memory space of the variable needs to be set to Cuda.
variable.SetMemorySpace(adios2::MemorySpace::CUDA); engine.Put(variable, gpuData, mode);
Note
Only CUDA allocated buffers are supported for device data. Only the BP4 and BP5 engines are capable of receiving device allocated buffers.
PerformPuts
Executes all pending
Put
calls in deferred mode and collects span data. Specifically this call copies Put(Deferred) data into internal ADIOS buffers, as if Put(Sync) had been used instead.
Note
This call allows the reuse of user buffers, but may negatively impact performance on some engines.
PerformDataWrite
If supported by the engine, moves data from prior
Put
calls to disk
Note
Currently only supported by the BP5 file engine.
Get: modes and memory contracts
Get
is the function for consuming data in ADIOS2.
It is available when an Engine is created using Read
mode at IO::Open
.
ADIOS2 Put
and Get
semantics are as symmetric as possible considering that they are opposite operations (e.g. Put
passes const T*
, while Get
populates a non-const T*
).
The Get
signatures are described below.
Deferred
(default) orSync
mode, data is contiguous pre-allocated memory:Get(Variable<T> variable, const T* data, const adios2::Mode = adios2::Mode::Deferred);
In this signature,
dataV
is automatically resized by ADIOS2 based on theVariable
selection:Get(Variable<T> variable, std::vector<T>& dataV, const adios2::Mode = adios2::Mode::Deferred);
The following table summarizes the memory contracts required by ADIOS2 engines between Get
signatures and the pre-allocated (except when using C++11 std::vector
) data memory coming from an application:
Get |
Data Memory |
Contract |
Deferred |
Pointer Contents |
do not modify until PerformGets/EndStep/Close populated at Get or PerformGets/EndStep/Close |
Sync |
Pointer Contents |
modify after Get populated at Get |
Get in Deferred or lazy evaluation mode (default): this is the preferred mode as it allows
Get
calls to be “grouped” before potential data transport at the first encounter ofPerformPuts
,EndStep
orClose
.Get(variable, data); Get(variable, data, adios2::Mode::Deferred);
Deferred memory contracts:
“data pointer”: do not modify (e.g. resize) until first call to
PerformPuts
,EndStep
orClose
.“data contents”: populated at
Put
, or at first call toPerformPuts
,EndStep
orClose
.
Usage:`
std::vector<double> data; // resize memory to expected size data.resize(varBlockSize); // valid if all memory is populated // data.reserve(varBlockSize); // Gets data pointer to adios2 engine // associated with current variable metadata engine.Get(variable, data.data() ); // optionally pass data std::vector // leave resize to adios2 //engine.Get(variable, data); // "data pointer" must be the same as in Get engine.EndStep(); // "data contents" are now ready //engine.PerformPuts(); //engine.Close(); // now data pointer can be reused or modified
2. Put in Sync mode: this is the special case, data pointer becomes reusable right after Put. Only use it if absolutely necessary (e.g. memory bound application or out of scope data, temporary).
Get(variable, *data, adios2::Mode::Sync);Sync memory contracts:
“data pointer” and “data contents” can be modified after this call.
Usage:
.. code-block:: c++ std::vector<double> data; // resize memory to expected size data.resize(varBlockSize); // valid if all memory is populated // data.reserve(varBlockSize); // Gets data pointer to adios2 engine // associated with current variable metadata engine.Get(variable, data.data() ); // "data contents" are ready // "data pointer" can be reused by the application
Note
Get
doesn’t support returning spans.
PerformGets
Executes all pending
Get
calls in deferred mode.
Engine usage example
The following example illustrates the basic API usage in write mode for data generated at each application step:
adios2::Engine engine = io.Open("file.bp", adios2::Mode::Write);
for( size_t i = 0; i < steps; ++i )
{
// ... Application *data generation
engine.BeginStep(); //next "logical" step for this application
engine.Put(varT, dataT, adios2::Mode::Sync);
// dataT memory already consumed by engine
// Application can modify dataT address and contents
// deferred functions return immediately (lazy evaluation),
// dataU, dataV and dataW pointers and contents must not be modified
// until PerformPuts, EndStep or Close.
// 1st batch
engine.Put(varU, dataU);
engine.Put(varV, dataV);
// in this case adios2::Mode::Deferred is redundant,
// as this is the default option
engine.Put(varW, dataW, adios2::Mode::Deferred);
// effectively dataU, dataV, dataW are "deferred"
// possibly until the first call to PerformPuts, EndStep or Close.
// Application MUST NOT modify the data pointer (e.g. resize
// memory) or change data contents.
engine.PerformPuts();
// dataU, dataV, dataW pointers/values can now be reused
// ... Application modifies dataU, dataV, dataW
//2nd batch
dataU[0] = 10
dataV[0] = 10
dataW[0] = 10
engine.Put(varU, dataU);
engine.Put(varV, dataV);
engine.Put(varW, dataW);
// Application MUST NOT modify dataU, dataV and dataW pointers (e.g. resize),
// Contents should also not be modified after Put() and before
// PerformPuts() because ADIOS may access the data immediately
// or not until PerformPuts(), depending upon the engine
engine.PerformPuts();
// dataU, dataV, dataW pointers/values can now be reused
// Puts a varP block of zeros
adios2::Variable<double>::Span spanP = Put<double>(varP);
// Not recommended mixing static pointers,
// span follows
// the same pointer/iterator invalidation
// rules as std::vector
T* p = spanP.data();
// Puts a varMu block of 1e-6
adios2::Variable<double>::Span spanMu = Put<double>(varMu, 0, 1e-6);
// p might be invalidated
// by a new span, use spanP.data() again
foo(spanP.data());
// Puts a varRho block with a constant value of 1.225
Put<double>(varMu, 0, 1.225);
// it's preferable to start modifying spans
// after all of them are created
foo(spanP.data());
bar(spanMu.begin(), spanMu.end());
engine.EndStep();
// spanP, spanMu are consumed by the library
// end of current logical step,
// default behavior: transport data
}
engine.Close();
// engine is unreachable and all data should be transported
...
Tip
Prefer default Deferred
(lazy evaluation) functions as they have the potential to group several variables with the trade-off of not being able to reuse the pointers memory space until EndStep
, PerformPuts
, PerformGets
, or Close
.
Only use Sync
if you really have to (e.g. reuse memory space from pointer).
ADIOS2 prefers a step-based IO in which everything is known ahead of time when writing an entire step.
Danger
The default behavior of ADIOS2 Put
and Get
calls IS NOT synchronized, but rather deferred.
It’s actually the opposite of MPI_Put
and more like MPI_rPut
.
Do not assume the data pointer is usable after a Put
and Get
, before EndStep
, Close
or the corresponding PerformPuts
/PerformGets
.
Avoid using temporaries, r-values, and out-of-scope variables in Deferred
mode.
Use adios2::Mode::Sync
in these cases.
Available Engines
A particular engine is set within the IO
object that creates it with the IO::SetEngine
function in a case insensitive manner.
If the SetEngine
function is not invoked the default engine is the BPFile
.
Application |
Engine |
Description |
File |
BP5 HDF5 |
DEFAULT write/read ADIOS2 native bp files write/read interoperability with HDF5 files |
Wide-Area-Network (WAN) |
DataMan |
write/read TCP/IP streams |
Staging |
SST |
write/read to a “staging” area: e.g. RDMA |
Engine
polymorphism has two goals:
Each
Engine
implements an orthogonal IO scenario targeting a use case (e.g. Files, WAN, InSitu MPI, etc) using a simple, unified API.
2. Allow developers to build their own custom system solution based on their particular requirements in the own playground space. Reusable toolkit objects are available inside ADIOS2 for common tasks: bp buffering, transport management, transports, etc.
A class that extends Engine
must be thought of as a solution to a range of IO applications.
Each engine must provide a list of supported parameters, set in the IO object creating this engine using IO::SetParameters
, and supported transports (and their parameters) in IO::AddTransport
.
Each Engine’s particular options are documented in Supported Engines.
Operator
The Operator abstraction allows ADIOS2 to act upon the user application data, either from a adios2::Variable
or a set of Variables in an adios2::IO
object.
Current supported operations are:
Data compression/decompression, lossy and lossless.
Callback functions (C++11 bindings only) supported by specific engines
ADIOS2 enables the use of third-party libraries to execute these tasks.
Operators can be attached onto a variable in two modes: private or shared. In most situations, it is recommended to add an operator as a private one, which means it is owned by a certain variable. A simple example code is as follows.
#include <vector>
#include <adios2.h>
int main(int argc, char *argv[])
{
std::vector<double> myData = {
0.0001, 1.0001, 2.0001, 3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001,
1.0001, 2.0001, 3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001,
2.0001, 3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001,
3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001,
4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001,
5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001,
6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001,
7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001, 2.0001,
8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001, 2.0001, 1.0001,
9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001, 2.0001, 1.0001, 0.0001,
};
adios2::ADIOS adios;
auto io = adios.DeclareIO("TestIO");
auto varDouble = io.DefineVariable<double>("varDouble", {10,10}, {0,0}, {10,10}, adios2::ConstantDims);
// add operator
varDouble.AddOperation("mgard",{{"accuracy","0.01"}});
// end add operator
auto engine = io.Open("hello.bp", adios2::Mode::Write);
engine.Put<double>(varDouble, myData.data());
engine.Close();
return 0;
}
For users who need to attach a single operator onto multiple variables, a shared operator can be defined using the adios2::ADIOS object, and then attached to multiple variables using the reference to the operator object. Note that in this mode, all variables sharing this operator will also share the same configuration map. It should be only used when a number of variables need exactly the same operation. In real world use cases this is rarely seen, so please use this mode with caution.
#include <vector>
#include <adios2.h>
int main(int argc, char *argv[])
{
std::vector<double> myData = {
0.0001, 1.0001, 2.0001, 3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001,
1.0001, 2.0001, 3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001,
2.0001, 3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001,
3.0001, 4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001,
4.0001, 5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001,
5.0001, 6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001,
6.0001, 7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001,
7.0001, 8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001, 2.0001,
8.0001, 9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001, 2.0001, 1.0001,
9.0001, 8.0001, 7.0001, 6.0001, 5.0001, 4.0001, 3.0001, 2.0001, 1.0001, 0.0001,
};
adios2::ADIOS adios;
auto io = adios.DeclareIO("TestIO");
auto varDouble = io.DefineVariable<double>("varDouble", {10,10}, {0,0}, {10,10}, adios2::ConstantDims);
// define operator
auto op = adios.DefineOperator("SharedCompressor","mgard",{{"accuracy","0.01"}});
// add operator
varDouble.AddOperation(op);
// end add operator
auto engine = io.Open("hello.bp", adios2::Mode::Write);
engine.Put<double>(varDouble, myData.data());
engine.Close();
return 0;
}
Warning
Make sure your ADIOS2 library installation used for writing and reading was linked with a compatible version of a third-party dependency when working with operators. ADIOS2 will issue an exception if an operator library dependency is missing.
Runtime Configuration Files
ADIOS2 supports passing an optional runtime configuration file to the ADIOS component constructor (adios2_init
in C, Fortran).
This file contains key-value pairs equivalent to the compile time IO::SetParameters
(adios2_set_parameter
in C, Fortran), and IO::AddTransport
(adios2_set_transport_parameter
in C, Fortran).
Each Engine
and Operator
must provide a set of available parameters as described in the Supported Engines section.
Prior to version v2.6.0 only XML is supported; v2.6.0 and later support both XML and YAML.
Warning
Configuration files must have the corresponding format extension .xml
, .yaml
: config.xml
, config.yaml
, etc.
XML
<?xml version="1.0"?>
<adios-config>
<io name="IONAME_1">
<engine type="ENGINE_TYPE">
<!-- Equivalent to IO::SetParameters-->
<parameter key="KEY_1" value="VALUE_1"/>
<parameter key="KEY_2" value="VALUE_2"/>
<!-- ... -->
<parameter key="KEY_N" value="VALUE_N"/>
</engine>
<!-- Equivalent to IO::AddTransport -->
<transport type="TRANSPORT_TYPE">
<!-- Equivalent to IO::SetParameters-->
<parameter key="KEY_1" value="VALUE_1"/>
<parameter key="KEY_2" value="VALUE_2"/>
<!-- ... -->
<parameter key="KEY_N" value="VALUE_N"/>
</transport>
</io>
<io name="IONAME_2">
<!-- ... -->
</io>
</adios-config>
YAML
Starting with v2.6.0, ADIOS supports YAML configuration files. The syntax follows strict use of the YAML node keywords mapping to the ADIOS2 components hierarchy. If a keyword is unknown ADIOS2 simply ignores it. For an example file refer to adios2 config file example in our repo.
---
# adios2 config.yaml
# IO YAML Sequence (-) Nodes to allow for multiple IO nodes
# IO name referred in code with DeclareIO is mandatory
- IO: "IOName"
Engine:
# If Type is missing or commented out, default Engine is picked up
Type: "BP5"
# optional engine parameters
key1: value1
key2: value2
key3: value3
Variables:
# Variable Name is Mandatory
- Variable: "VariableName1"
Operations:
# Operation Type is mandatory (zfp, sz, etc.)
- Type: operatorType
key1: value1
key2: value2
- Variable: "VariableName2"
Operations:
# Operations sequence of maps
- {Type: operatorType, key1: value1}
- {Type: z-checker, key1: value1, key2: value2}
Transports:
# Transport sequence of maps
- {Type: file, Library: fstream}
- {Type: rdma, Library: ibverbs}
...
Caution
YAML is case sensitive, make sure the node identifiers follow strictly the keywords: IO, Engine, Variables, Variable, Operations, Transports, Type.
Tip
Run a YAML validator or use a YAML editor to make sure the provided file is YAML compatible.
Anatomy of an ADIOS Program
Anatomy of an ADIOS Output
ADIOS adios("config.xml", MPI_COMM_WORLD);
|
| IO io = adios.DeclareIO(...);
| |
| | Variable<...> var = io.DefineVariable<...>(...)
| | Attribute<...> attr = io.DefineAttribute<...>(...)
| | Engine e = io.Open("OutputFileName.bp", adios2::Mode::Write);
| | |
| | | e.BeginStep()
| | | |
| | | | e.Put(var, datapointer);
| | | |
| | | e.EndStep()
| | |
| | e.Close();
| |
| |--> IO goes out of scope
|
|--> ADIOS goes out of scope or adios2_finalize()
The pseudo code above depicts the basic structure of performing output. The ADIOS
object is necessary to hold all
other objects. It is initialized with an MPI communicator in a parallel program or without in a serial program.
Additionally, a config file (XML or YAML format) can be specified here to load runtime configuration. Only one ADIOS
object is needed throughout the entire application but you can create as many as you want (e.g. if you need to separate
IO objects using the same name in a program that reads similar input from an ensemble of multiple applications).
The IO
object is required to hold the variable and attribute definitions, and runtime options for a particular input
or output stream. The IO object has a name, which is used only to refer to runtime options in the configuration file.
One IO object can only be used in one output or input stream. The only exception where an IO object can be used twice is
one input stream plus one output stream where the output is reusing the variable definitions loaded during input.
Variable
and Attribute
definitions belong to one IO object, which means, they can only be used in one output.
You need to define new ones for other outputs. Just because a Variable is defined, it will not appear in the output
unless an associated Put() call provides the content.
A stream is opened and closed once. The Engine
object implements the data movement for the stream. It depends on the
runtime options of the IO object that what type of an engine is created in the Open() call. One output step is denoted
by a pair of BeginStep..EndStep block.
An output step consist of variables and attributes. Variables are just definitions without content, so one must call a Put() function to provide the application data pointer that contains the data content one wants to write out. Attributes have their content in their definitions so there is no need for an extra call.
Some rules:
Variables can be defined any time, before the corresponding Put() call
Attributes can be defined any time before EndStep
The following functions must be treated as Collective operations
ADIOS
Open
BeginStep
EndStep
Close
Note
If there is only one output step, and we only want to write it to a file on disk, never stream it to other application, then BeginStep and EndStep are not required but it does not make any difference if they are called.
Anatomy of an ADIOS Input
ADIOS adios("config.xml", MPI_COMM_WORLD);
|
| IO io = adios.DeclareIO(...);
| |
| | Engine e = io.Open("InputFileName.bp", adios2::Mode::Read);
| | |
| | | e.BeginStep()
| | | |
| | | | varlist = io.AvailableVariables(...)
| | | | Variable var = io.InquireVariable(...)
| | | | Attribute attr = io.InquireAttribute(...)
| | | | |
| | | | | e.Get(var, datapointer);
| | | | |
| | | |
| | | e.EndStep()
| | |
| | e.Close();
| |
| |--> IO goes out of scope
|
|--> ADIOS goes out of scope or adios2_finalize()
The difference between input and output is that while we have to define the variables and attributes for an output, we have to retrieve the available variables in an input first as definitions (Variable and Attribute objects).
If we know the particular variable (name and type) in the input stream, we can get the definition using InquireVariable(). Generic tools that process any input must use other functions to retrieve the list of variable names and their types first and then get the individual Variable objects. The same is true for Attributes.
Anatomy of an ADIOS File-only Input
Previously we explored how to read using the input mode adios2::Mode::Read. Nonetheless, ADIOS has another input mode named adios2::Mode::ReadRandomAccess. adios2::Mode::Read mode allows data access only timestep by timestep using BeginStep/EndStep, but generally it is more memory efficient as ADIOS is only required to load metadata for the current timestep. ReadRandomAccess can only be used with file engines and involves loading all the file metadata at once. So it can be more memory intensive than adios2::Mode::Read mode, but allows reading data from any timestep using SetStepSelection(). If you use adios2::Mode::ReadRandomAccess mode, be sure to allocate enough memory to hold multiple steps of the variable content. Note that ADIOS streaming engines (like SST, DataMan, etc.) do not support ReadRandomAccess mode. Also newer file Engines like BP5 to not allow BeginStep/EndStep calls in ReadRandomAccess mode.
ADIOS adios("config.xml", MPI_COMM_WORLD);
|
| IO io = adios.DeclareIO(...);
| |
| | Engine e = io.Open("InputFileName.bp", adios2::Mode::ReadRandomAccess);
| | |
| | | Variable var = io.InquireVariable(...)
| | | | var.SetStepSelection()
| | | | e.Get(var, datapointer);
| | | |
| | |
| | e.Close();
| |
| |--> IO goes out of scope
|
|--> ADIOS goes out of scope or adios2_finalize()
Previously we explored how to read using the input mode adios2::Mode::Read. Nonetheless, ADIOS has another input mode named adios2::Mode::ReadRandomAccess. adios2::Mode::Read mode allows data access only timestep by timestep using BeginStep/EndStep, but generally it is more memory efficient as ADIOS is only required to load metadata for the current timestep. ReadRandomAccess can only be used with file engines and involves loading all the file metadata at once. So it can be more memory intensive than adios2::Mode::Read mode, but allows reading data from any timestep using SetStepSelection(). If you use adios2::Mode::ReadRandomAccess mode, be sure to allocate enough memory to hold multiple steps of the variable content. Note that ADIOS streaming engines (like SST, DataMan, etc.) do not support ReadRandomAccess mode. Also newer file Engines like BP5 to not allow BeginStep/EndStep calls in ReadRandomAccess mode.
ADIOS adios("config.xml", MPI_COMM_WORLD);
|
| IO io = adios.DeclareIO(...);
| |
| | Engine e = io.Open("InputFileName.bp", adios2::Mode::ReadRandomAccess);
| | |
| | | Variable var = io.InquireVariable(...)
| | | | var.SetStepSelection()
| | | | e.Get(var, datapointer);
| | | |
| | |
| | e.Close();
| |
| |--> IO goes out of scope
|
|--> ADIOS goes out of scope or adios2_finalize()