Working with Merger-Trees¶
Arbor class is responsible for loading
and providing access to merger-tree data. Below, we demonstrate how
to load data and what can be done with it.
Loading Merger-Tree Data¶
ytree can load merger-tree data from multiple sources using
>>> import ytree >>> a = ytree.load("consistent_trees/tree_0_0_0.dat")
This command will determine the correct format and read in the data accordingly. For examples of loading each format, see below.
Working with Merger-Tree Data¶
Very little happens immediately after a dataset has been loaded. All tree construction and data access occurs only on demand. After loading, information such as the simulation box size, cosmological parameters, and the available fields can be accessed.
>>> print (a.box_size) 100.0 Mpc/h >>> print (a.hubble_constant, a.omega_matter, a.omega_lambda) 0.695 0.285 0.715 >>> print (a.field_list) ['scale', 'id', 'desc_scale', 'desc_id', 'num_prog', ...]
How many trees are there?¶
The total number of trees in the arbor can be found using the
attribute. As soon as any information about the collection of trees within the
loaded dataset is requested, arrays will be created containing the metadata
required for generating the root nodes of every tree.
>>> print (a.size) Loading tree roots: 100%|██████| 5105985/5105985 [00:00<00:00, 505656111.95it/s] 327
Field data for all tree roots is accessed by querying the
Arbor in a
>>> print (a["mass"]) Getting root fields: 100%|██████████████████| 327/327 [00:00<00:00, 9108.67it/s] [ 6.57410072e+14 5.28489209e+14 5.18129496e+14 4.88920863e+14, ..., 8.68489209e+11 8.68489209e+11 8.68489209e+11] Msun
ytree uses the unyt package for symbolic units
on NumPy arrays.
>>> print (a["virial_radius"].to("Mpc/h")) [ 1.583027 1.471894 1.462154 1.434253 1.354779 1.341322 1.28617, ..., 0.173696 0.173696 0.173696 0.173696 0.173696] Mpc/h
When dealing with cosmological simulations, care must be taken to distinguish
between comoving and proper reference frames. Please read An Important Note on Comoving and Proper Units before
ytree journey begins.
Accessing Individual Trees¶
Individual trees can be accessed by indexing the
>>> print (a) TreeNode
TreeNode is one halo in a merger-tree.
The number is the universal identifier associated with halo. It is unique
to the whole arbor. Fields can be accessed for any given
TreeNode in the same dictionary-like
>>> my_tree = a >>> print (my_tree["mass"]) 657410071942446.1 Msun
Array slicing can also be used to select multiple
>>> all_trees = a[:] >>> print (all_trees["mass"]) 657410071942446.1 Msun
Arbor object does not
TreeNode objects, it
only generates them. Thus, one must explicitly keep around any
TreeNode object for changes to persist.
This is illustrated below:
>>> # this will not work >>> a.thing = 5 >>> print (a.thing) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'TreeNode' object has no attribute 'thing' >>> # this will work >>> my_tree = a >>> my_tree.thing = 5 >>> print (my_tree.thing) 5
The only exception to this is computing the number of nodes in a tree. This
information will be propagated back to the
Arbor as it can be expensive to compute
for large trees.
>>> my_tree = a print (my_tree.tree_size) # call function to calculate tree size 691 >>> new_tree = a print (new_tree.tree_size) # retrieved from a cache 691
Accessing the Nodes in a Tree or Forest¶
A node is defined as a single halo at a single time in a merger tree.
Throughout these docs, the words halo and node are used interchangeably.
Nodes in a given tree can be accessed in three different ways: by
Accessing All Nodes in a Tree, Accessing All Nodes in a Forest, or Accessing the Progenitor Lineage of a Tree.
Each of these will return a generator of
TreeNode objects or field
values for all
in the tree, forest, or progenitor line.
Accessing All Nodes in a Tree¶
The full lineage of the tree can be accessed by querying any
TreeNode with the tree keyword.
ytree version 3.0, this returns a generator that can be used
to loop through all nodes in the tree.
>>> print (my_tree["tree"]) <generator object TreeNode._tree_nodes at 0x11bbc1f20> >>> # loop over nodes >>> for my_node in my_tree["tree"]: ... print (my_node, my_node["mass"]) TreeNode 657410100000000.0 Msun TreeNode 657410100000000.0 Msun TreeNode 653956900000000.0 Msun TreeNode 650071960000000.0 Msun ...
To store all the nodes in a single structure, convert it to a list:
>>> print (list(my_tree["tree"])) [TreeNode, TreeNode, TreeNode, TreeNode, ... TreeNode]
Fields can be queried for the tree by including the field name.
>>> print (my_tree["tree", "virial_radius"]) [ 2277.73669065 2290.65899281 2301.43165468 2311.47625899 2313.99280576 ... 434.59856115 410.13381295 411.25755396] kpc
The above examples will work for any halo in the tree, not just the final halo. The full tree leading up to any given halo can be accessed in the same way.
>>> tree_nodes = list(my_tree["tree"]) >>> # start with the 3rd halo in the above tree >>> sub_tree = tree_nodes >>> print (list(sub_tree["tree"])) [TreeNode, TreeNode, TreeNode, TreeNode, ... TreeNode] >>> print (sub_tree["tree", "virial_radius"]) [2301.4316 2311.4763 2313.993 2331.413 2345.5454 2349.918 ... 434.59857 410.13382 411.25757] kpc
Accessing All Nodes in a Forest¶
>>> import ytree >>> a = ytree.load("consistent_trees_hdf5/soa/forest.h5", ... access="forest") >>> my_forest = a >>> # all halos in the forest >>> print (list(my_forest["forest"])) [TreeNode, TreeNode, TreeNode, ... TreeNode, TreeNode, TreeNode] >>> # all halo masses in forest >>> print (my_forest["forest", "mass"]) [3.38352524e+11 3.34071450e+11 3.34071450e+11 3.31709477e+11 ... 7.24092117e+09 4.34455270e+09] Msun
To find all of the roots in that forest, i.e., the start of all individual trees contained, one can do:
>>> my_forest = a >>> roots = [node for node in f["forest"] if node["desc_uid"] == -1] >>> print (roots) [TreeNode, TreeNode] >>> # all halos in second tree >>> print (list(roots["tree"])) [TreeNode, TreeNode, TreeNode, ... TreeNode, TreeNode, TreeNode]
Accessing a Halo’s Ancestors and Descendent¶
The direct ancestors of any
TreeNode object can be accessed
>>> my_ancestors = list(my_tree.ancestors) >>> print (my_ancestors) [TreeNode]
A halo’s descendent can be accessed in a similar fashion.
>>> print (my_ancestors.descendent) TreeNode
Accessing the Progenitor Lineage of a Tree¶
Similar to the tree keyword, the prog keyword can be used to access
the line of main progenitors. Just as above, this returns a generator
>>> print (list(my_tree["prog"])) [TreeNode, TreeNode, TreeNode, TreeNode, ... TreeNode]
Fields for the main progenitors can be accessed just like for the whole tree.
>>> print (my_tree["prog", "mass"]) [ 6.57410072e+14 6.57410072e+14 6.53956835e+14 6.50071942e+14 ... 8.29496403e+13 7.72949640e+13 6.81726619e+13 5.99280576e+13] Msun
Progenitor lists and fields can be accessed for any halo in the tree.
>>> tree_nodes = list(my_tree["tree"]) >>> # pick a random halo in the tree >>> my_halo = tree_nodes >>> print (list(my_halo["prog"])) [TreeNode, TreeNode, TreeNode, TreeNode, TreeNode, TreeNode] >>> print (my_halo["prog", "virial_radius"]) [1404.1354 1381.4087 1392.2404 1363.2145 1310.3842 1258.0159] kpc
Customizing the Progenitor Line¶
By default, the progenitor line is defined as the line of the most
massive ancestors. This can be changed by calling the
>>> a.set_selector("max_field_value", "virial_radius")
New selector functions can also be supplied. These functions should
minimally accept a list of ancestors and return a single
>>> def max_value(ancestors, field): ... vals = np.array([a[field] for a in ancestors]) ... return ancestors[np.argmax(vals)] ... >>> ytree.add_tree_node_selector("max_field_value", max_value) >>> >>> a.set_selector("max_field_value", "mass") >>> my_tree = a >>> print (list(my_tree["prog"]))
Searching for Halos¶
select_halos function can be used to
Arbor for halos matching a specific set of criteria.
This is similar to the type of selection done with a relational database.
>>> halos = a.select_halos('tree["tree", "redshift"] > 1', ... fields=["redshift"]) >>> print (halos) [TreeNode, TreeNode, TreeNode, TreeNode, ..., TreeNode, TreeNode, TreeNode]
The selection criteria string should be designed to
TreeNode object named,
fields keyword can
be used to specify a list of fields to preload for speeding up selection.
Saving Arbors and Trees¶
Arbors of any type can be saved to a universal file format with the
save_arbor function. These can be
reloaded with the
load command. This
format is optimized for fast tree-building and field-access and so is
recommended for most situations.
>>> fn = a.save_arbor() Setting up trees: 100%|███████████████████| 327/327 [00:00<00:00, 483787.45it/s] Getting fields [1/1]: 100%|████████████████| 327/327 [00:00<00:00, 36704.51it/s] Creating field arrays [1/1]: 100%|█| 613895/613895 [00:00<00:00, 7931878.47it/s] >>> a2 = ytree.load(fn)
By default, all trees and all fields will be saved, but this can be
customized with the
For convenience, individual trees can also be saved by calling
>>> my_tree = a >>> fn = my_tree.save_tree() Creating field arrays [1/1]: 100%|████| 4897/4897 [00:00<00:00, 13711286.17it/s] >>> a2 = ytree.load(fn)
An Important Note on Comoving and Proper Units¶
yt are likely familiar with conversion from proper to comoving
reference frames by adding “cm” to a unit. For example, proper “Mpc”
becomes comoving with “Mpccm”. This conversion relies on all the data
being associated with a single redshift. This is not possible here
because the dataset has values for multiple redshifts. To account for
this, the proper and comoving unit systems are set to be equal to each
>>> print (a.box_size) 100.0 Mpc/h >>> print (a.box_size.to("Mpccm/h")) 100.0 Mpccm/h
Data should be assumed to be in the reference frame in which it was saved. For length scales, this is typically the comoving frame. When in doubt, the safest unit to use for lengths is “unitary”, which a system normalized to the box size.
>>> print (a.box_size.to("unitary")) 1.0 unitary