I/O Concepts
This page explains the mechanisms and concepts behind ROOT’s I/O facilities, i.e. how ROOT converts your objects into a stream of bytes and back. It assumes that you have read the introduction on → ROOT files, → I/O of custom classes, and → Trees.
ROOT files, directories, and keys
ROOT stores data in ROOT files (TFile
), see → ROOT files.
It does so in a machine independent format.
Similar to a file system directory, a TFile
can contain directories (TDirectory
) and objects, accessible through the directory’s keys (TKey
)).
A TFile
is a directory itself: it inherits from TDirectory
.
Directories can be “entered” by TDirectory::cd()
; created by TDirectory::mkdir()
; and removed by TDirectory::rmdir()
.
The global “current directory”
ROOT uses two globals (thread-local static, to be precise) that point to the most recently opened file gFile
,
and the “current” directory gDirectory
.
The most recently opened ROOT file is always also the current directory.
You can change the current directory by assigning to gDirectory
; you can see the current directory with TDirectory::pwd()
Some objects operate (at least by default) on that global directory.
An example is TObject::Write()
(unlike the preferred TDirectory::WriteObject()
), or the TTree
constructor if no directory is specified.
Also object ownership relates to the current directory, see → Object ownership.
ROOT’s interpreter interfaces (both C++ and Python) make the objects of the current directory available as if they were variables. This is a strong motivation to use valid C++ identifiers as key names, i.e. without spaces, not starting with a digit, etc.
Example
Here, we show how opening a TFile
changes gDirectory
, and how one can use an object stored in the current directory as if it was a declared variable.
Rint:/
corresponds to “ROOT’s in-memory” directory, the default directory during startup.
Adding and removing objects from a ROOT file
When writing an object to a ROOT file, ROOT creates a directory “entry” (TKey
) representing the object, which mainly consists of a name and the object’s persistent data.
You can think of a TFile
as a collection of TKeys
, possibly inside nested TDirectories
.
The name of the TKey
can be either explicitly stated when writing, or it can be determined from TObject::GetName()
for classes inheriting from TObject
.
Example
An object read into memory is independent from the object on disk.
Changes of the in-memory object are not propagated to disk.
Instead, a new version of the object needs to be saved, for instance passing "overwrite"
as option to TDirectory::WriteObject()
, see the documentation of TDirectoryFile::WriteTObject()
.
Removing an object from a TDirectory
using TDirectory::Delete()
will generally not free the corresponding disk space.
Instead, the storage occupied by the deleted object will be made available (as a TFree
for subsequent objects to be written to this file.
Use → hadd
to defragment a ROOT file by rewriting it.
Iterating over a directory’s content
TFile
gives access to the list of keys through its base class, TDirectoryFile::GetListOfKeys().
Example
Get the list of keys from the demo.root file and print them.
For instance if you do not know an object’s name upfront, it can be useful to iterate through all of a directory’s entries.
Example
The output of such an iterate.C
ROOT macro could be:
root[] .x iterate.C
key: h0 points to an object of class: TH1F
key: h1 points to an object of class: TH1F
...
Note the concept of name cycles, see → Opening and inspecting a ROOT file.
ROOT’s C++ object serialization: from memory to disk and back
Writing an object to file means writing the current values of the object’s data members. This is done with the help of what ROOT calls a streamer: each serializable class has such a streamer which converts all data members into a buffer of raw bytes (see TBuffer).
Variables of composite data types such as classes, structures, and arrays can be decomposed into simple types such as longs, shorts, floats, and chars. These values are then written out in a machine-independent representation.
This happens recursively: a second streamer will be invoked for a member of class type, to stream that class’s members. A similar recursion happens for all base classes. In the end, the buffer contains all the simple data members of all the classes that make up that particular object.
At runtime, ROOT needs to determine which streamer function to call for a given object.
Using the ClassDef
macro inside the class definition makes this operation more performant:
without ClassDef
, the object’s dynamic type has to be looked up to determine which streamer to invoke.
See also → The ClassDef
Macro).
Data members of certain types need special treatment, for instance pointers and references. This will be explained below.
The methods of a class are not written to the ROOT file, it contains only the persistent data members.
See also → Restrictions on types ROOT I/O can handle.
Generating streamers
Streamers are C++ functions that are usually created as part of a class’s dictionary, see → I/O of custom classes.
rootcling
parses the class definition and determines how to stream the object in an optimal way, and which streamers need to be invoked for base classes and members.
Excluding data members from I/O
To prevent a data member from being written to the file, insert a !
as the first character in a single-line comment (//
) following the declaration of the member.
To accommodate doxygen-style documentation, this annotation can also be written as ///<!
.
Example
Marking pointers as never null
For a small performance benefit, pointer data members can be marked as always be pointing to valid memory (never being null):
this is done with the annotation //->
or ///<->
.
A pointer marked as such must not point back to the current object, not even indirectly.
Example
The pointer data member fH
is marked as never null: ROOT will be able to perform additional optimizations.
fTracks
, instead, will always be checked for nullptr
.
Array data members of fixed and variable size
ROOT supports I/O of fixed size arrays out of the box. For variable-size arrays, a special comment syntax is available to specify the name of a data member that holds the size of the array. Here is an example:
Example
The comment //[fNVertex]
tells ROOT that the length of the array is stored in the corresponding variable. In general, the syntax is:
LENGTH
must be the name of a data member that is defined before the array member, or in a base class.
Note Pointers to simple types (e.g.
float*
,int*
) are assumed to be variable-size arrays.
Preventing splitting
When data is written to a TTree, compound objects will be split by default (see -> Splitting), i.e. a column will be created for each data member.
If you know that a certain data member will always have to be read as a full object, it can be more performant to prevent its splitting.
To do so, add //||
as an annotation to its declaration:
Example
Double32_t: storing doubles with single precision
Some values have inherent reduced precision, yet benefit from double precision arithmetics.
The type alias Double32_t
represents a value that has double precision in memory, but it is stored with lower, adjustable precision.
The actual size on disk (before compression) is determined by the parameter next to the data element declaration:
Example
If the comment is absent or does not contain min
, max
, nbits
, the member is saved with single precision.
The min
and max
values themselves, if present, are saved with 32-bits precision. min
and max
can be either a floating point number or one of the following trivial mathematical expressions: pi
, 2*pi
, pi/2
, pi/4
.
If nbits
is present, the member is saved with nbits
-bits precision. For more details, see this tutorial.
Custom streamers
Usually, streamers are generated automatically by rootcling
(see → I/O of custom classes). However, you can also create your own streamer.
A common use case is as a post-read hook, for instance for the registration of a read object with other objects.
Example
You need to tell rootcling
not to build a streamer, by ending the #pragma
statement with a -
, e.g.
The following is an example of a customized Streamer
function for Event
. It takes a TBuffer
as a parameter, and first checks to see if this is a case of reading or writing the buffer.
Example
Note A class with a custom streamer cannot be split, and its members cannot be stored member-wise.
Disable storage of TObject data members in derived classes
Types do not have to inherit from TObject
for ROOT to be able to read/write them: the presence of a dictionary is sufficient.
Classes that do inherit from TObject
can exclude TObject’s data members from their I/O invoking myObject->Class()->IgnoreTObjectStreamer()
before any object of type MyClass
is written to a ROOT file.
This is useful in case you do not use TObject
’s fBits
and fUniqueID
data members and saving some space in the output file is important.
Storing networks of objects pointing at each other
ROOT supports storing multiple objects with complex networks of pointers between them, including in the presence of circular dependencies.
The network of pointers is preserved on disk and recreated when the data is read.
Note that in the special case of an object being pointed to, where one of its members is also pointed to, that member will be serialized both as part of the object and independently.
Compression and performance
Compressing data saves disk space, at the cost of additional work for the CPU to write and read the data. If your analysis is one of the rare cases which spends most of the time in CPU work, using uncompressed data might be beneficial.
Most analyses on the other hand will benefit from one of the fast compression algorithms that also reduce the amount of data to be read from disk or transferred over the network.
The compression factor, that is, the saving of storage space, varies with the type of data. A buffer containing N
identical values is compressed better than a set of values with higher entropy.
ROOT offers several options, such as LZMA with very high compression ratio, or LZ4 with very high decompression throughput, or ZSTD with a good compromise in performance.
The default compression for RNTuple
is ZSTD level 5; for everything else it’s zlib with compression level 1.
Algorithm and compression level can be selected using TFile::SetCompressionAlgorithm()
and TFile::SetCompressionLevel()
, respectively, at the time data is written. A compression level of 0 turns off compression completely. Both algorithm and level can be set an the same time using TFile::SetCompressionSettings()
.
The recommended algorithm for general purpose analysis data can be set with:
Note that different objects in a ROOT file might have been written with different compression settings. Even different branches or different baskets in a TTree might be using different settings.
XML interface
ROOT supports writing objects to XML files instead of ROOT files. While XML files are generally inappropriate for storing data (e.g. worse I/O performance, larger size, no compression), they can be opened with a normal text editor.
Therefore XML files should only be used for small amounts of data, typically histogram files, images, geometries, calibrations. XML files use the same streaming technology as regular ROOT files: any class with a dictionary can be stored in XML format. Contrary to ROOT files, XML files do not support subdirectories or trees.
To create an XML file, specify a filename with .xml
extension when calling TFile::Open().
Storing the class data layout
ROOT files store data members’ values together with some related metadata, e.g. their names and types. This allows ROOT to find discrepancies between the class layout in memory at the time of writing and at the time of reading, if the class definition changes over time (enabling schema evolution). It also allows ROOT to read data of classes for which no dictionary is available - potentially even when the corresponding library has not been loaded.
ROOT’s reflection library (TClass
) provides the name and type information, which is written to ROOT files in the form of TStreamerInfo
objects, describing a class’s members and types.
The TStreamerInfo
objects for all classes written to a file are accessible through TFile::GetStreamerInfoList()
.
These class description objects are versioned, as different generations of the same type might be written to different files, which in the end are merged, resulting in a file with multiple versions for the same type.
As long as a ROOT file that contains it has been opened, a class’s TStreamerInfo
for a given version can be retrieved through TClass::GetStreamerInfo(int version)
.
It contains entries for each data member and base class, in the form of TStreamerElement
objects.
They can be accessed through TStreamerInfo::GetStreamerElement()
.
TFile::MakeProject()
can use the information from a TStreamerInfo
to construct a C++ header which contains the class data members and their types, but no member functions.
This allows to create libraries of compiled objects simply from a data file, even if the original library is not available.
Abstraction of I/O operations on collections: collection proxy
Instead of implementing dedicated streaming functions for std::vector
, std::list
, etc., as well as ROOT’s collection types, ROOT implements an abstraction layer for the required I/O functionality, such as creation, insertion, and clearing.
They give access to collection data from disk, no matter what the original collection type was, and whether or not a dictionary for that collection (and its specific template specialization) exists.
These can be adapted to custom collections.
The abstract interface (“protocol”) to implement is TVirtualCollectionProxy
; a concrete example is TGenMapProxy
for std::map
.
For a given class, the collection proxy can be queried and set with TClass::GetCollectionProxy()
and TClass::SetCollectionProxy()
, respectively.
Dealing with changes in class layouts: schema evolution
With long-lived data, changes in the data layout become a concern. When a class layout (i.e. data member names, their types, order, etc.) changes, existing persistent data may no longer correspond to the foreseen target of the read operation: the in-memory layout of the latest version of a class definition might now differ from the persisted layout. “Schema evolution” is ROOT’s solution to this problem: in the case of a mismatch between the in-memory version and the persistent version of a class, ROOT maps the data in the file to the new layout of the object in memory.
ROOT supports two types of schema evolution: automatic schema evolution, which deals with changes in the class definition (e.g., reorder of data members, changes in their types, etc.), and “I/O rules” which allow for fine-tuned manual schema evolution.
Automatic schema evolution
Automatic schema evolution supports the following scenarios:
- Change in the order of data members in the class.
- Addition of a data member: the value of the missing member will be left unchanged by the I/O (so usually the value set by the default constructor).
- Removal of a data member: the corresponding data is not read.
- Move of a data member from a derived class to a base class or vice-versa.
- Change of the type of a member if it is a simple type or a pointer to a simple type, including
Double32_t
andFloat16_t
. A warning is given in case of loss of precision. - Addition or removal of a base class.
- Change of a member type from
T*
toT
or back. - Change of a member type from
T*
tounique_ptr<T>
or back. - Change of a member type from C-style array (such as
int[3]
) to itsstd::array
counterpart (such asarray<int, 3>
). - Change from variable-size array and size (such as
float *fArray; //[fSize]
andint fSize
) tostd::vector
(such asstd::vector<float> fArray;
). - Change between STL collection types, from / to
std::vector
,std::queue
,std::deque
,std::list
,std::forward_list
,std::set
,std::multiset
,std::unordered_set
,std::unordered_multiset
,std::valarray
,std::bitset
. - Change of STL associative containers, from / to
std::map
,std::unordered_map
,std::multimap
,std::unordered_map
,std::unordered_multimap
std::vector<std::pair<key,value>>
.
All transformations above are applied transparently with no intervention required on the part of user: ROOT will automatically recognize these cases and apply the relevant rules.
Example
Here is an example of the class layout changes that automatic schema evolution supports:
Manual schema evolution: user-defined I/O customization rules
The automatic schema evolution described above allows reading back the serialized data object if the definition of the classes representing these objects changed in one of the supported ways. It is also possible to manually set rules for arbitrary data transformations upon reading the classes.
ROOT provides two interfaces for users to define the conversion rules. The recommended way is to add a rule to the dictionary file by specifying it in the corresponding linkdef file. Alternatively, rules can be inserted into the TClass
object using its C++ API.
Specifying I/O customization rules in a linkdef file
I/O customization rules can be part of the generated dictionary for a class. These rules are specified through a linkdef file. The syntax of the rules is as follows:
The arguments in the rules have the following meaning:
sourceClass
(mandatory): The name of the persisted class used as input for the rule.source
(mandatory): A semicolon-separated list of data member declarations defining the data members of the source class that the rule needs to access.version
: A comma-separated list of versions or version ranges of the source class. The list has to be enclosed in square brackets. This rule is only applied to input classes matching any of these versions. One ofchecksum
orversion
must be present. The version is an integer number, whereas the version range is one of the following:a-b
: all the version numbers between and includinga
andb
.-a
: all the version numbers<= a
.a-
: all the version numbers>= a
.
checksum
: A comma-separated list of checksums of the source class that that this rule is applied to. The list has to be enclosed in square brackets. One ofchecksum
orversion
must be present.targetClass
(mandatory): Defines the name of the in-memory class that this rule is applied to.target
(mandatory): A comma-separated list of target class data member names that this rule is potentially updating.embed
: iftrue
(the default), the rule is written to the output file if an object of this class is serialized.include
: A comma-separated list of header files that need to be included for the code snippet.code
: The C++ code snippet implementing the rule’s actions.
The C++ code snippet has access to the following pre-defined variables:
newObj
: variable pointing to the target in-memory object.oldObj
: a variable of typeTVirtualObject
, behaving as a pointer to the source object.- variables representing the data members of the target object declared in the
target
property of the rule. onfile.variable_name
: variables declared in the source property of the rule
Specifying I/O customization rules through the C++ API
The schema evolution C++ API consists of the following classes:
- TSchemaRuleSet: objects of this type manage the sets of rules and ensure their consistency. There can be no conflicting rules in the rule sets. The rule sets are owned by the
TClass
objects corresponding to the target classes defined in the rules and can be accessed usingTClass::GetSchemaRules()
andTClass::AdoptSchemaRules()
. - TSchemaRule: it represent the rules and their fields have exactly the same meaning as the ones of rules specified in the dictionaries (see above).
Schema evolution with custom streamers
If you have written your own Streamer
as described in Custom streamers, you will have to manually add code for each version and manage the evolution of your class. When you add or remove data members, you must modify the Streamer
by hand. ROOT assumes that you have increased the class version number in the ClassDef
statement and introduced the relevant test in the read part of the Streamer. For example, if a new version of the Event
class above includes a new member: Int_t fNew
the ClassDef
statement should be changed to ClassDef(Event,2)
and the following lines should be added to the read part of the Streamer
:
If, in the same new version 2 you remove the member fH
, you must add
the following code to read the histogram object into some temporary
object and delete it:
Our experience with manual schema evolution shows that it is easy to
make mistakes and mismatches between Streamer
writers and readers are frequent
and increase as the number of classes increase. We recommend you use
rootcling
to automatically generate dictionaries for your classes and profit from the automatic schema evolution.