.. _ytree_parallel: Parallel Computing with ytree ============================= ``ytree`` provides functions for iterating over trees and nodes in parallel. Underneath, they make use of the :func:`~yt.utilities.parallel_tools.parallel_analysis_interface.parallel_objects` function in ``yt``. This functionality is built on `MPI `__, so it can be used to parallelize analysis across multiple nodes of a distributed computing system. .. note:: Before reading this section, consult the :ref:`yt:parallel-computation` section of the ``yt`` documentation to learn how to configure ``yt`` for running in parallel. Enabling Parallelism and Running in Parallel -------------------------------------------- All parallel computation in ``yt`` (and hence, ``ytree``) begins by importing ``yt`` and calling the :func:`~yt.utilities.parallel_tools.parallel_analysis_interface.enable_parallelism` function. .. code-block:: python import yt yt.enable_parallelism() import ytree In all cases, scripts must be run with ``mpirun`` to work in parallel. For example, to run on 4 processors, do: .. code-block:: bash $ mpirun -np 4 python my_analysis.py where "my_analysis.py" is the name of the script. .. _parallel-performance: An Important Note on Parallel Performance ----------------------------------------- On its own, iterating over trees and nodes in parallel is fairly efficient. What complicates matters is the collection and storage of results with :ref:`analysis-fields`. This must be managed carefully and requires significant work behind the scenes. Two of the three available parallel iterators (:func:`~ytree.utilities.parallel.parallel_trees` and :func:`~ytree.utilities.parallel.parallel_nodes`) have the ability to save results to the arbor as they go. However, this operation is quite expensive. Both of these functions accept a ``collect_results`` keyword argument which can be set to False to skip results collection. If you do not need for results to be collected and/or saved within the arbor, then it is highly recommended to set ``collect_results=False`` as this will greatly improve performance. .. _parallel-iterators: Parallel Iterators ------------------ The three parallel iterator functions discussed below are designed to work in conjunction with analysis that makes use of :ref:`analysis-fields`. Minimally, they can be used to iterate over trees and nodes in parallel, but their main advantage is that they will handle the gathering and organization of new analysis field values so they can be properly saved and reloaded. The most efficient function to use will depend on the nature of your analysis. In the examples below, the "analysis" performed will be facilitated through an :ref:`analysis field `, called "test_field". The following should be assumed to happen before all the examples. .. code-block:: python import yt yt.enable_parallelism() import ytree a = ytree.load("arbor/arbor.h5") if "test_field" not in a.field_list: a.add_analysis_field("test_field", default=-1, units="Msun") .. _tree_parallel: Parallelizing over Trees ^^^^^^^^^^^^^^^^^^^^^^^^ The :func:`~ytree.utilities.parallel.parallel_trees` function will distribute a list of trees to be analyzed over all available processors. Each processor will work on a single tree in serial. .. code-block:: python trees = list(a[:]) for tree in ytree.parallel_trees(trees): for node in tree["forest"]: node["test_field"] = 2 * node["mass"] # this is our analysis At the end of the outer loop, the new values for "test_field" will be collected on the root process (i.e., the process with rank 0) and the arbor will be saved with :func:`~ytree.data_structures.save_arbor.save_arbor`. No additional code is required for the new analysis field values to be collected. By default, each processor will be allocated an equal number of trees. However, this can lead to an unbalanced load if the amount of work varies significantly for each tree. By including the ``dynamic=True`` keyword, trees will be allocated using a task queue, where each processor is only given another tree after it finishes one. Note, though, that the total number of working processes is one fewer than the number being run with as one will act as the server for the task queue. .. code-block:: python trees = list(a[:]) for tree in ytree.parallel_trees(trees, dynamic=True): for node in tree["forest"]: node["test_field"] = 2 * node["mass"] # this is our analysis For various reasons, it may be useful to save results after a certain number of loop iterations rather than only once at the very end. The analysis may take a long time, requiring scripts to be restarted, or keeping results for many trees in memory may be prohibitive. The ``save_every`` keyword can be used to specify a number of iterations before results are saved. The example below will save results every 8 iterations. .. code-block:: python trees = list(a[:]) for tree in ytree.parallel_trees(trees, save_every=8): for node in tree["forest"]: node["test_field"] = 2 * node["mass"] # this is our analysis The default behavior will allocate a tree to a single processor. To allocate more than one processor to each tree, the ``njobs`` keyword can be used to set the total number of process groups for the loop. For example, if running with 8 total processors, setting ``njobs=4`` will create 4 groups of 2 processors each. .. code-block:: python trees = list(a[:]) for tree in ytree.parallel_trees(trees, njobs=4): for node in tree["forest"]: if yt.is_root(): node["test_field"] = 2 * node["mass"] # this is our analysis The :func:`~yt.funcs.is_root` function can be used to determine which process is the root in a group. Only the results recorded by the root process will be collected. In the example above, it is up to the user to properly manage the parallelism within the loop. Before using this function, it is a good idea to read :ref:`parallel-performance`. .. _tree_node_parallel: Parallelizing over Nodes in a Single Tree ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The method presented above in :ref:`tree_parallel` works best when the work done on each node in a tree is small compared to the total number of trees. If the opposite is true, and either the total number of trees is small or the work done on each node is expensive, then it may be better to parallelize over the nodes in a single tree using the :func:`~ytree.utilities.parallel.parallel_tree_nodes` function. The previous example is parallelized over nodes in a tree in the following way. .. code-block:: python trees = list(a[:]) for tree in trees: for node in ytree.parallel_tree_nodes(tree): node["test_field"] = 2 * node["mass"] if yt.is_root(): a.save_arbor(trees=trees) Unlike the :func:`~ytree.utilities.parallel.parallel_trees` and :func:`~ytree.utilities.parallel.parallel_nodes` functions, no saving occurs automatically. Hence, the results must be saved manually, as in the above example. The ``group`` keyword can be set to ``forest`` (the default), ``tree``, or ``prog`` to control which nodes of the tree are looped over. A list of specific nodes can also be provided with the ``nodes`` keyword, in which case the ``group`` keyword will be ignored. Note, if the ``nodes`` keyword is used, all nodes must belong to the tree associated with the node provided in the first argument. The ``dynamic`` and ``njobs`` keywords have similar behavior as in :ref:`tree_parallel`. .. _node_parallel: Parallelizing over Nodes in a List of Trees ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The previous two examples use a nested loop structure, parallelizing either the outer loop over trees or the inner loop over nodes in a given tree. The :func:`~ytree.utilities.parallel.parallel_nodes` function combines these into a single iterator capable of adding parallelism to either the loop over trees, nodes in a tree, or both. With this function, the same example from above becomes: .. code-block:: python trees = list(a[:]) for node in ytree.parallel_nodes(trees): node["test_field"] = 2 * node["mass"] New analysis field values are collected and saved automatically as with the :func:`~ytree.utilities.parallel.parallel_trees` function. Similar to :func:`~ytree.utilities.parallel.parallel_trees`, the ``save_every`` keyword can be used to control the number of full trees to be completed before saving results. As well, the ``group`` keyword can be used to control the nodes iterated over in a tree, similar to how it works in :func:`~ytree.utilities.parallel.parallel_tree_nodes`. You will likely be unsurprised to learn that the :func:`~ytree.utilities.parallel.parallel_nodes` function is implemented using nested calls to :func:`~ytree.utilities.parallel.parallel_trees` and :func:`~ytree.utilities.parallel.parallel_tree_nodes`. The ``dynamic`` and ``njobs`` keywords also work similarly, only that they must be specified as tuples of length 2, where the first values control the loop over trees and the second values control the loop over nodes in a tree. Using this, it is possible to enable task queues for both loops (trees and nodes), as in the following example. .. code-block:: python trees = list(a[:]) for node in ytree.parallel_nodes(trees, save_every=8, njobs=(3, 0), dynamic=(True, True)): node["test_field"] = 2 * node["mass"] If the above example is run with 13 processors, the result will be a task queue with 3 process groups of 4 processors each. Each of those process groups will work on a single tree using its own task queue, consisting of 1 server process and 3 worker processes. What a world we live in. Before using this function, it is a good idea to read :ref:`parallel-performance`.