Fields in ytree
ytree supports multiple types of fields, each representing numerical
values associated with each halo in the
Arbor. These include the
native fields stored on disk, alias fields, derived fields, and
analysis fields.
The Field Info Container
Each Arbor contains a dictionary,
called field_info,
with relevant information for each available field. This information
can include the units, type of field, any dependencies or aliases, and
things relevant to reading the data from disk.
>>> import ytree
>>> a = ytree.load("tree_0_0_0.dat")
>>> print (a.field_info["Rvir"])
{'description': 'Halo radius (kpc/h comoving).', 'units': 'kpc/h ', 'column': 11,
'aliases': ['virial_radius']}
>>> print (a.field_info["mass"])
{'type': 'alias', 'units': 'Msun', 'dependencies': ['Mvir']}
Fields on Disk
Every field stored in the dataset’s files should be available within
the Arbor. The field_list
contains a list of all fields on disk
with their native names.
>>> print (a.field_list)
['scale', 'id', 'desc_scale', 'desc_id', 'num_prog', ...]
Alias Fields
Because the various dataset formats use different naming conventions for
similar fields, ytree allows fields to be referred to by aliases. This
allows for a universal set of names for the most common fields. Many are
added by default, including “mass”, “virial_radius”, “position_<xyz>”,
and “velocity_<xyz>”. The list of available alias and derived fields
can be found in the derived_field_list.
>>> print (a.derived_field_list)
['uid', 'desc_uid', 'scale_factor', 'mass', 'virial_mass', ...]
Additional aliases can be added with
add_alias_field.
>>> a.add_alias_field("amount_of_stuff", "mass", units="kg")
>>> print (a["amount_of_stuff"])
[ 1.30720461e+45, 1.05085632e+45, 1.03025691e+45, ...
1.72691772e+42, 1.72691772e+42, 1.72691772e+42]) kg
Derived Fields
Derived fields are functions of existing fields, including other
derived and alias fields. New derived fields are created by
providing a defining function and calling
add_derived_field.
>>> def potential_field(field, data):
... # data.arbor points to the parent Arbor
... return data["mass"] / data["virial_radius"]
...
>>> a.add_derived_field("potential", potential_field, units="Msun/Mpc")
[ 2.88624262e+14 2.49542426e+14 2.46280488e+14, ...
3.47503685e+12 3.47503685e+12 3.47503685e+12] Msun/Mpc
Field functions should take two arguments. The first is a dictionary
that will contain basic information about the field, such as its name.
The second argument represents the data container for which the field
will be defined. It can be used to access field data for any other
available field. This argument will also have access to the parent
Arbor as data.arbor.
Vector Fields
For fields that have x, y, and z components, such as position, velocity,
and angular momentum, a single field can be queried to return an array
with all the components. For example, for fields named “position_x”,
“position_y”, and “position_z”, the field “position” will return the
full vector. ytree will look through all available fields and do
some reasonably robust pattern matching in an attempt to identify
common field names with x/y/z variants.
>>> import ytree
>>> a = ytree.load("AHF_100_tiny/GIZMO-NewMDCLUSTER_0047.snap_128.parameter")
>>> print (a["position"])
[[0.0440018, 0.0672202, 0.9569643],
[0.7383264, 0.1961563, 0.0238852],
[0.7042797, 0.6165487, 0.500576 ],
...
[0.1822363, 0.1324423, 0.1722414],
[0.8649974, 0.4718005, 0.7349876]]) unitary
A list of defined vector fields can be seen by doing:
>>> print (a.field_info.vector_fields)
('Ec_star', 'Ec_gas', 'velocity', 'Ea_gas', 'position', 'L_star', 'Ea_star', 'L_gas', 'Vc', 'Eb_gas', 'Ec', 'Ea', 'Eb_star', 'L', 'Eb')
>>> >>> print (a.field_info["Ea_star"]["vector_components"])
['Eax_star', 'Eay_star', 'Eaz_star']
For all vector fields, a “_magnitude” field also exists, defined as the quadrature sum of the components.
>>> print (a["velocity_magnitude"])
[ 488.26936644 121.97143067 146.81450507, ...
200.74057711 166.13782652 529.7336846 ] km/s
The vector field pattern matching will identify most instances of
three common field names that differ by just x/y/z
(case-insensitively). Any that are missed (for example, if they are
not named with “x/y/z”) can be added manually with
the add_vector_field
function and using the vector_components keyword argument to specify
the component fields.
Vector fields can be added for dimensionality not equal to three in the same manner. This can also be used to create multidimensional fields where the components have nothing to do with x/y/z, provided the component fields have the same units.
>>> a.add_vector_field("mean_z", vector_components=["mean_z_gas", "mean_z_star"])
It will also be necessary to manually add a vector field for any Derived Fields or Analysis Fields created after the arbor is loaded.
Analysis Fields
Analysis fields provide a means for saving the results of complicated
analysis for any halo in the Arbor.
This would be operations beyond derived fields, for example, things that
might require loading the original simulation snapshots. New analysis
fields are created with
add_analysis_field and are
initialized to zero, or to a different default value if one is given
with the default keyword.
>>> a.add_analysis_field("saucer_sections", units="m**2", default=0.)
>>> my_tree = a[0]
>>> print (my_tree["tree", "saucer_sections"])
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0.,] m**2
>>> import numpy as np
>>> for halo in my_tree["tree"]:
... halo["saucer_sections"] = np.random.random() # complicated analysis
...
>>> print (my_tree["tree", "saucer_sections"])
[ 0.33919263 0.79557815 0.38264336 0.53073945 0.09634924 0.6035886, ...
0.9506636 0.9094426 0.85436984 0.66779632 0.58816873] m**2
Analysis fields will be saved when the
TreeNode objects that have been
analyzed are saved with save_arbor
or save_tree.
>>> my_trees = list(a[:]) # all trees
>>> for my_tree in my_trees:
... # do analysis...
>>> a.save_arbor(trees=my_trees)
Note
Note that we do my_trees = list(a[:]) and not just my_trees =
a[:]. This is because a[:] is a generator that will return a new
set of trees each time. The newly generated trees will not retain
changes made to any analysis fields. Thus, we must use list(a[:])
to explicitly store a list of trees.
Re-saving Analysis Fields and Updating Existing Arbors
All analysis fields are saved to sidecar files with the “-analysis” keyword
appended to them. They can be altered and the arbor re-saved as many times
as you like. If you are working from a standard dataset (i.e., one
that was NOT created with
save_arbor or
save_tree), you must first
re-save it with one of the above commands for this option to become
available. It is possible to start working with analysis fields
straight away from a standard dataset, but the first time they are
saved will necessarily create an entirely new dataset. When working
from a saved dataset, the option to update analysis fields in-place
can be disabled by specifying save_in_place=False in the call to
save_arbor or
save_tree. If this option
is used, the newly created dataset will only contained the trees
provided with the trees keyword.
Updating only the Root Nodes
For large datasets, the save_arbor
operation can be expensive as all the trees to be saved must be built. However,
if you have only modified the field value of the root of a tree, the save
operation can be sped up significantly by ignoring the rest of the tree. To
only update the analysis field values for the roots of trees, specify
save_roots_only=True when calling
save_arbor. Note,
save_roots_only=True cannot be set simultaneously with
save_in_place=False.