Using Parallel Maya

Overview

This guide describes the Maya features for accelerating playback and manipulation of animated scenes. It covers key concepts, shares best practices/usage tips, and lists known limitations that we will aim to address in subsequent versions of Maya.

This guide will be of interest to riggers, TDs, and plug-in authors wishing to take advantage of speed enhancements in Maya.

If you would like an overview of related topics prior to reading this document, check out Supercharged Animation Performance in Maya 2016.

Key Concepts

Starting from Maya 2016, Maya accelerates existing scenes by taking better advantage of your hardware. Unlike previous versions of Maya, which were limited to parallelizing individual nodes, Maya now includes a mechanism for scene-level graph analysis and parallelization. For example, if your scene contains different characters that are not constrained to one another, Maya recognizes this and evaluates each character at the same time.

Similarly, if your scene has a single complex character, it may be possible to evaluate sub-sections of the rig simultaneously. As you can imagine, the amount of parallelism depends on how your scene has been constructed. We will get back to this later. For now, let’s focus on understanding key Maya evaluation concepts.

At the heart of Maya’s new evaluation architecture is an Evaluation Manager (EM), responsible for creating a parallel-friendly description of your scene, called the Evaluation Graph (EG). The EM schedules EG nodes across available compute resources.

Prior to evaluating your scene, the EM checks if a valid EG graph exists. The EG is a simplified version of the Dependency Graph (DG), consisting of DG nodes and connections. Destination node(s) employ data from the Source node(s) in order for the Destination Node(s) to perform evaluation. This dependency is represented by a connection in the EG. A valid EG may not exist for various reasons. For example, you may have loaded a new scene and no EG may have been built yet, or you may have changed your scene, invalidating a prior EG.

Maya uses the DG’s dirty propagation mechanism to build the EG. Dirty propagation is the process of walking through the DG, from animation curves to renderable objects, and marking the attributes on DG nodes as needing to be re-evaluated (i.e., dirty). Unlike previous versions of Maya that propagated dirty on every frame, Maya now disables dirty propagation once the EG is built, and reuses the existing EG until it becomes invalid.

With dirty propagation disabled, computing your scene at a given frame involves walking the EG, scheduling, and evaluating EG nodes. Because the EG encodes node-level dependencies, when evaluating a given EG node, you know that all inputs coming from dependent nodes have already been calculated. This further enables pipelining of some operations. Specifically, when we find EG nodes without dependents, we can initiate additional processing (e.g., rendering) since we are guaranteed that no downstream nodes will require computed results.

Tip. If your scene contains expression nodes that use the getAttr command the DG graph will be missing explicit dependencies which will result in an incomplete EG. In addition to impacting correctness, expression nodes will also reduce the amount of parallelism in your scenes (see Scheduling Types for details).

Depending on how you have built your scene, the EG may contain circular node-level dependencies. If this is the case, the EM creates node clusters. At scene evaluation time, nodes in clusters are evaluated serially before continuing with other parallel parts of the EG. Multiple clusters may be evaluated at the same time. As with previous versions of Maya, you should avoid building scenes with attribute-level cycles as this is unsupported, and leads to unspecified behavior.

By default, the EM schedules node evaluation on available CPU resources. However, the EM also provides the ability to override evaluation for sub-sections of the EG, targeting computation to specific runtimes and/or hardware. One example of this is the GPU override feature included in Maya, which uses your graphics card’s graphics processing unit (GPU) to accelerate deformations.

When manipulating your rig, you may notice that performance improves once you have added at least 2 different keys on a controller. By default, only animated nodes are included in the EG. This limit helps keep the EG compact, making it fast to build, schedule, and evaluate. Hence, if you are manipulating a controller that has not yet been keyed yet, Maya relies on legacy DG evaluation. When 2 or more different keys are added, the EG rebuilds to include the newly-keyed nodes, permitting Parallel evaluation via the EM.

Tip. You can use the controller command to identify objects that will be used as controllers (and therefore animation sources) in your scene. If the Include controllers in evaluation graph option is set (see Windows > Settings/Preferences > Preferences, then Settings > Animation), the objects marked as controllers will automatically be added to the evaluation graph even if they are not animated yet. This will prevent the EG from being rebuilt when these objects are animated and will allow Parallel evaluation for manipulation even if they have not been keyed yet.

Supported Evaluation Modes

Maya starts in Parallel evaluation mode by default. This new evaluation mode replaces the legacy DG-based evaluation. Maya supports 3 evaluation modes:

Mode	What does it do?
DG	Uses the legacy Dependency Graph-based evaluation of your scene. This was the default evaluation mode prior to Maya 2016
Serial	Evaluation Manager Serial mode. Uses the EG but limits scheduling to a single core. Serial mode is a troubleshooting mode to pinpoint the source of evaluation errors.
Parallel	Evaluation Manager Parallel mode. Uses the EG and schedules evaluation across all available cores. This mode is the new Maya 2016 default.

When using either Serial or Parallel EM modes, you can also activate GPU Override to accelerate deformations on your GPU. You must be in Viewport 2.0 to use this feature (see Custom Evaluators).

To switch between different modes, go to the Preferences window (Windows > Settings/Preferences > Preferences > Animation). You can also use the evaluationManager MEL/Python command; see documentation for supported options.

To see the evaluation options that apply to your scene, turn on the Heads Up Display Evaluation options (Display > Heads Up Display > Evaluation).

First Make it Right Then Make it Fast

Before focusing on understanding how to make your scene fast in Maya using Parallel evaluation, it is important to ensure that evaluation in DG and EM modes generates the same results.

If you observe evaluation errors when you start Maya, (that is, what you see in the viewport differs from previous versions of Maya), determine the source of these errors. Errors may be due to an incorrect EG, threading related problems, or other issues. In the sections that follow we will review 2 important concepts related to errors: Evaluation Graph Correctness and Thread Safety

Evaluation Graph Correctness

In the event that you see evaluation errors, first try to test your scene in Serial evaluation mode (see Supported Evaluation Modes). Serial evaluation mode uses the EM to build an EG of your scene, but limits evaluation to a single core to eliminate threading as the possible source of differences. Note that since Serial evaluation mode is provided for debugging, it has not been optimized for speed and scenes may run slower in Serial than in DG evaluation mode. This is expected.

If transitioning to Serial evaluation eliminates evaluation errors, this indicates that the errors in your scene are likely due to a threading-related problem. However, if errors persist even after transitioning to Serial evaluation this indicates that the EM is building an incorrect EG for your scene. There are a few possible reasons for this:

Custom Plugins. If your scene uses custom plug-ins that rely on the MPxNode::setDependentsDirty function to manage attribute dirtying, this may be the source of problems. Plug-in authors sometimes use MPxNode::setDependentsDirty to avoid expensive calculations every time MPxNode::compute is called. Using this approach results from previous evaluations are typically cached and MPxNode::setDependentsDirty is used to trigger re-computation.

Since the EM relies on dirty propagation to create the EG, any custom plug-in logic that alters dependencies may interfere with the construction of a correct EG. Furthermore, since the EM evaluation does not propagate dirty messages, any custom caching or computation in MPxNode::setDependentsDirty is not called while the EM is evaluating.

If you suspect that your evaluation errors are related to custom plug-ins, temporarily remove the associated nodes from your scene and validate that both DG and Serial evaluation modes generate the same result. Once you have made sure this is the case, you will need to revisit the plug-in logic. The API Extensions section covers Maya 2016 SDK changes that will help you adapt plug-ins to Parallel evaluation.

Another debugging option is to use scheduling type overrides to force your custom nodes to be scheduled in a more conservative, i.e. safer but allowing less parallelism, way. This approach can enable the usage of Parallel evaluation even if only some of the nodes are not thread-safe. Scheduling types are described in more details in the Thread Safety section.

Errors in Autodesk Nodes. Although we have done our best to ensure that all out-of-the-box Autodesk Maya nodes correctly express dependencies, sometimes a scene uses nodes in an unexpected manner. If this is the case, we ask you make us aware of scenes where you encounter problems. We will do our best to address problems as quickly as possible.

Thread Safety

Prior to Maya 2016, evaluation was single-threaded and developers did not need to worry about making their code thread-safe. At each frame, they were guaranteed that evaluation would proceed serially and computation would finish for one node prior to moving onto another. This approach allowed for the caching of intermediate results in global memory and using external libraries without considering their ability to work correctly when called simultaneously from multiple threads.

These guarantees no longer apply for Parallel Maya. Developers now working in Maya must update plug-ins to ensure correct behavior during multi-core evaluation.

Two things to consider when updating plug-ins:

Different instances of a node type should not share resources. Correct evaluation requires this because two different nodes of the same type could have their compute() methods called concurrently.
Avoid non thread-safe lazy evaluation. In the EM, evaluation is scheduled from predecessors to successors on a per-node basis. Once computation has been performed for predecessors, results are cached, and made available to successors via connections. Any attempt to perform non-thread safe lazy evaluation could return different answers to different successors or, depending on the nature of the bug, instabilities.

Here’s a concrete example for a simple node network consisting of 4 nodes:

In this graph, evaluation first calculates outputs for Node1 in serial (i.e., Node1.A, Node1.B, Node1.C), followed by parallel evaluation of Nodes 2, 3, and 4 (that is, Read Node1.A to use in Node2, Read Node1.B to use in Node3, etc.).

Since we know that making legacy code thread-safe requires time, we have added new scheduling types to instruct the EM how to schedule nodes. Scheduling types provide a straightforward migration path, so you do not need to pass up parallelizing opportunities for some parts of your scenes just because a few nodes still need work.

There are 4 scheduling types:

Scheduling Type	What are you telling the scheduler?
Parallel	Asserts that the node and all third-party libraries used by the node are thread-safe. The scheduler may evaluate any instances of this node at the same time as instances of other nodes without restriction.
Serial	Asserts it is safe to run this node with instances of other nodes. However, all nodes with this scheduling type should be executed sequentially within the same evaluation chain.
Globally Serial	Asserts it is safe to run this node with instances of other nodes but only a single instance of this node should be run at a time. Use this type if the node relies on static state, which could lead to unpredictable results if multiple node instances are simultaneously evaluated. The same restriction may apply if third-party libraries store state.
Untrusted	Asserts this node is not thread-safe and that no other nodes should be evaluated while an instance of this node is evaluated. Untrusted nodes are deferred as much as possible (i.e. until there is nothing left to evaluate that does not depend on them), which can introduce costly synchronization.

By default, nodes scheduled as Serial provide a middle ground between performance and stability/safety. In some cases, this is too permissive and nodes must be downgraded to GloballySerial or Untrusted. In other cases, some nodes can be promoted to Parallel. As you can imagine, the more parallelism supported by nodes in your graph, the higher level of concurrency you are likely to obtain.

When testing your plug-ins with parallel Maya, a simple strategy is to schedule nodes with the most restrictive scheduling type (i.e., Untrusted), and then validate that the evaluation produces correct results. Raise individual nodes to the next scheduling level, and repeat the experiment.

You can also alter scheduling behavior dynamically at runtime. For example, Maya currently defaults to scheduling expression nodes as untrusted, since it is unclear ahead of time what actions an expression will perform. However, if Maya detects an expression node that is limited to arithmetic and has outputs that are purely a function of inputs, we can safely promote scheduling of that expression to GloballySerial. We cannot schedule expressions as Parallel since the Maya command interpreter is not thread-safe, because it must store state in order to provide useful logging and error reporting.

There are two ways to alter the scheduling level of your nodes:

Mel/Python Commands. Use the evaluationManager command to change the scheduling type of nodes at runtime. Below, we illustrate how you can change the scheduling of scene transform nodes:

Scheduling Type	Command
Parallel	`evaluationManager -nodeTypeParallel on "transform";`
Serial	`evaluationManager -nodeTypeSerialize on "transform";`
GloballySerial	`evaluationManager -nodeTypeGloballySerialize on "transform";`
Untrusted	`evaluationManager -nodeTypeUntrusted on "transform";`

C++/Python API methods. You can also schedule individual nodes at compile time by overriding the MPxNode::schedulingType function. Functions should return one of the enumerated values specified by MPxNode::schedulingType. See the Maya 2016 MPxNode class reference for more details.

Safe Mode

On rare occasions you may notice that during manipulation or playback, Maya switches from Parallel to Serial evaluation. This is due to Safe Mode, which is an attempt to trap errors that lead to instabilities such as crashes. If Maya detects that multiple threads are attempting to simultaneously access a single node instance at the same time, the evaluation is forced to Serial execution to prevent problems.

While Safe Mode catches many problems, it cannot catch them all. Therefore, we have also developed a special Analysis Mode that performs a more thorough and costly check of your scene. Analysis mode is designed for riggers and TDs who wish to troubleshoot evaluation problems when creating new rigs. Avoid using Analysis Mode during animation since it will slow down your scene. See Analysis Mode for details.

Tip. If Safe Mode forces your scene into Serial mode, the EM may not produce the expected incorrect results when manipulating. In such cases you can either disable the EM:

evaluationManager -mode "off";

or disable EM-accelerated manipulation:

evaluationManager -man 0;

Custom Evaluators

Once the EG has been created, Maya targets node sub-graphs evaluation. In this section, we will review how we have used custom evaluators to accelerate deformations and catch evaluation errors on specific scenes. Currently you cannot author new custom evaluators, but in the future, we may extend OpenMaya to support such extensions.

Tip. Use the evaluator command to query the available/active evaluators or modify currently active evaluators.

import maya.cmds as cmds

# Returns a list of all currently available evaluators. 
cmds.evaluator( query=True )
# Result: [u'dynamics',
u'ikSystem',
u'disabling',
u'deformer',
u'transformFlattening',
u'reference',
u'pruneRoots'] # 

# Returns a list of all currently enabled evaluators.
cmds.evaluator( query=True, enable=True )
# Result: [u'dynamics',
u'ikSystem',
u'deformer',
u'transformFlattening',
u'reference',
u'pruneRoots'] #

GPU Override

Maya contains a custom deformer evaluator that targets mesh deformations on the GPU using OpenCL to accelerate deformations in Viewport 2.0. The profoundly parallel nature of modern GPUs makes them ideal to tackle problems such as deformations that must perform the same operations on streams of data, such as mesh vertices and normals. We have included GPU implementations for 6 of the most commonly-used deformers in animated scenes: skinCluster, blendShape, cluster, tweak, groupParts, and softMod.

Unlike Maya’s previous deformer stack that performed deformations on the CPU and subsequently sent deformed geometry to the graphics card for rendering, the GPU override sends undeformed geometry to the graphics card, performs deformations in OpenCL and hands off the data to Viewport 2.0 for rendering without read-back overhead. We have observed substantial speed improvements from this approach in scenes with dense geometry.

Even if your scene uses only supported deformers, GPU override may not be enabled due to unsupported node features. For example, with the exception of softMod, deformers must currently apply to all vertices; there is no support for incomplete group components. Additional deformer-specific limitations are listed below:

Deformer	Limitation(s)
skinCluster	The following attribute values will be ignored:
	- bindMethod
	- bindPose
	- bindVolume
	- dropOff
	- heatmapFalloff
	- influenceColor
	- lockWeights
	- maintainMaxInfluences
	- maxInfluences
	- nurbsSamples
	- paintTrans
	- smoothness
	- weightDistribution
blendShape	The following attribute values will be ignored:
	- baseOrigin
	- icon
	- normalizationId
	- origin
	- parallelBlender
	- supportNegativeWeights
	- targetOrigin
	- topologyCheck
cluster	n/a
tweak	Only relative mode is supported. relativeTweak must be set to 1.
groupParts	n/a
softMod	Only volume falloff is supported when distance cache is disabled
	Falloff must occur on all axes
	Partial resolution must be disabled

A few other reasons that can prevent GPU override from accelerating your scene:

Meshes not sufficiently dense Unless meshes have a large number of vertices, it is still faster to perform deformations on the legacy CPU path. This is due to driver-specific overhead incurred when sending data to the GPU for processing. For deformations to happen on the GPU, your mesh needs over 500/2000 vertices, on AMD/NVIDIA hardware respectively. Use the MAYA_OPENCL_DEFORMER_MIN_VERTS environment variable to change the threshold. Setting the value to 0 makes all meshes connected to supported deformation chains to be processed on the GPU.
Downstream nodes in your graph read deformed mesh results No node, script, or viewport can read the mesh data computed by the GPU override. This means that GPU override is unable to accelerate portions of the deformation chain upstream of nodes, such as follicle or pointOnPolyConstraint, as it requires information about the deformed mesh. GPU data read-back is a known bottleneck in the area of GPGPU. Evolving standards and hardware aim to remove some of these inefficiencies. We will re-examine this limitation as software/hardware capabilities mature. When diagnosing GPU Override problems, this situation may be reported as an unsupported fan-out pattern. See deformerEvaluator command, below, for details.
Meshes have animated topology changes If your scene animates the number of mesh edges, vertices, and/or faces during playback, corresponding deformation chains are removed from the GPU deformation path.
Maya Catmull-Clark Smooth Mesh Preview is used We have included acceleration for OpenSubDiv (OSD)-based smooth mesh preview but there is currently no support for Maya’s legacy Catmull-Clark. To take advantage of OSD OpenCL acceleration, select OpenSubDiv Catmull-Clark as the subdivision method and make sure that OpenCL Acceleration is selected in the OpenSubDiv controls.
Unsupported streams are found Depending on the drawing mode you select for your geometry (for example, shrunken faces, hedge-hog normals, and so on,) and the material assigned to your geometry, Maya sends different information to the graphics card. This requirement means that Maya must allocate different streams. Since we have focused our efforts on the most common settings used in production, Maya does not currently handle all streams. If you determine that your meshes are failing to accelerate due to unsupported streams, change display modes and/or update the material used by the geometry.
Back face culling is enabled
Driver-related issues We are aware of various hardware issues related to driver support/stability for OpenCL. To maximize Maya’s stability, we have disabled GPU Override in the specific cases that will lead to problems. We expect to continue to eliminate restrictions in the future and are actively working with hardware vendors to address detected driver problems.

You can also increase support for new custom/proprietary deformers using new API extensions (refer to Custom GPU Deformers for details).

If you have enabled GPU Override and the HUD reports Enabled (0 k), this indicates that no deformations are happening on the GPU. There could be a number of reasons for this, such as those mentioned above.

To troubleshoot factors limiting use of GPU override for your particular scene, use the deformerEvaluator command. Supported options include:

Command	What does it do?
`deformerEvaluator;`	Prints the chain or a reason it is not supported for each selected node.
`deformerEvaluator -chains;`	Prints all active deformation chains.
`deformerEvaluator -meshes;`	Prints a chain for each mesh or a reason if it is not supported.

Dynamics Evaluator

Parallel evaluation in Maya 2016 only had limited support for animated dynamics. Although scenes with Bullet rigid bodies and Bifrost fluids evaluated correctly, legacy dynamics nodes (particles, fluids) and Nucleus nodes (nCloth, nHair, nParticles) disabled the Evaluation Manager, and reverted to DG-based evaluation on playback or manipulation.

Legacy dynamics disabled the EM because, in order to generate repeatable and stable results, they relied on evaluation rules that violated DG evaluation best practices. While these deviations from the safe path were accepted by the DG, once legacy dynamics were evaluated in Parallel, they created problems. The dynamics evaluator was originally created to detect these deviant node types and disable the EM, resorting to DG-based evaluation.

Since Maya 2016, the dynamics evaluator has been improved so that it can handle more complex dynamics setups. Now, it not only detects unsupported nodes and disables Parallel evaluation when it finds them, it also manages the tricky computation rules necessary for proper evaluation. This is one of the ways custom evaluators can be used to change Maya’s default evaluation behavior.

Note. Legacy dynamics nodes (particles, fluids) are still not supported. If the dynamics evaluator finds unsupported node types in the EG, it still disables parallel evaluation and resorts to DG-based evaluation.

By default, the following node types are blacklisted. If the dynamics evaluator finds them, it will disable the EM:

field
fluidShape
geoConnector
nucleus
particle
pointEmitter
rigidSolver
rigidBody

and any type derived from these.

In order for the dynamics evaluator to manage evaluation of dynamics, use the following commands:

evaluator -name dynamics -c "disablingNodes=unsupported";
evaluator -name dynamics -c "handledNodes=dynamics";
evaluator -name dynamics -c "action=evaluate";

The disablingNodes flag specifies the set of nodes that will force the dynamics evaluator to disable the EM, in this case, the nodes it does not support.

The handledNodes flag specifies the set of nodes that are going to be captured by the dynamics evaluator and scheduled in clusters that it will manage, in this case, any node type associated with dynamics.

The action flag specifies how the dynamics evaluator will handle its nodes, in this case, it will perform required evaluation tasks.

In this configuration, the node types that cause EM to be disabled are:

collisionModel
dynController
dynGlobals
dynHolder
fluidEmitter
fluidShape
membrane
particle (unless also a nBase)
rigidNode
rigidSolver
spring

and any type derived from these.

In order to return to the default configuration, use the following commands:

evaluator -name dynamics -c "disablingNodes=legacy2016";
evaluator -name dynamics -c "handledNodes=none";
evaluator -name dynamics -c "action=none";

Tip. To get a list of nodes that will make the dynamics evaluator disable the EM in its present configuration, use the following command:

evaluator -name "dynamics" -valueName "disabledNodes" -query;

You can configure the dynamics evaluator to ignore unsupported nodes. If you want to try Parallel evaluation on a scene where it is disabled because of the presence of unsupported node types, use the following commands:

evaluator -name dynamics -c "disablingNodes=none";
evaluator -name dynamics -c "handledNodes=dynamics";
evaluator -name dynamics -c "action=evaluate";

Note: Using the dynamics evaluator on unsupported nodes may cause evaluation problems and/or application crashes; this is unsupported behavior. Proceed with caution.

Tip. If you want the dynamics evaluator to skip evaluation of all dynamics nodes in the scene, use the following commands:
evaluator -name dynamics -c "disablingNodes=unsupported";
evaluator -name dynamics -c "handledNodes=dynamics";
evaluator -name dynamics -c "action=freeze";
This can be useful to quickly disable dynamics when the simulation has a big impact on animation performance.

Dynamics simulation with the Evaluation Manager (and therefore the dynamics evaluator,) can have slightly different results from DG-based evaluation. Dynamics simulation results often depend on evaluation order, but with DG-based evaluation, the order depends on the sequence from which the data is pulled. For instance, the order from which the renderer draws items in the scene may differ from the order that a script gets simulation results.

With EM-based evaluation, the EM determines the evaluation order and, although it might differ from DG-based evaluation, it is consistent regardless of whether evaluation occurs in the context of the scene being rendered in a Maya viewport or is baked with Maya Batch.

While this is particularly relevant for dynamics simulation, it also applies to any nodes that have order-dependent evaluation. Even if you want to avoid order-dependent evaluation because it often leads to unreliable evaluation results, the EM will stabilize the order regardless of the context from which evaluation is generated. This order can sometimes generate results that slightly differ from legacy DG-based evaluation.

Reference Evaluator

When a reference is unloaded it leaves several nodes in the scene representing reference edits to preserve. Though these nodes may inherit animation from upstream nodes, they do not contribute to what’s rendered and can be safely ignored during evaluation. The reference evaluator ensures all such nodes are not evaluated.

Other Evaluators

In addition to the GPU override and dynamics evaluators, additional evaluators exist for specialized tasks:

Evaluator	What does it do?
ikSystem	Automatically disables the EM when a multi-chain solver is present in the EG. For regular IK chains it will perform any lazy update prior to parallel execution.
disabling	Automatically disables the EM if user-specified nodes are present in the EG. This evaluator is used for troubleshooting purposes. It allow Maya to keep working stably until issues with problem nodes can be addressed.
transformFlattening	Consolidates deep transform hierarchies containing animated parents and static children, leading to faster evaluation. Consolidation takes a snapshot of the relative parent/child transformations, allowing concurrent evaluation of downstream nodes.
pruneRoots	We found that scenes with several thousand paramCurves become bogged down because of scheduling overhead from resulting EG nodes and lose potential gains from increased parallelism. To handle this situation, special clusters are created to group paramCurves into a small number of evaluation tasks, thus reducing overhead.

Custom evaluator names are subject to change as we introduce new evaluators and expand these functionalities.

Evaluator Conflicts

Sometimes, multiple evaluators will want to “claim responsibility” for the same node(s). This can result in conflict, negatively impacting performance. To avoid these conflicts, each evaluator is associated with a priority upon registration and nodes are assigned to the evaluator with the highest priority. Internal evaluators has been ordered to prioritize correctness and stability over speed.

API Extensions

We have added a few API extensions and tools that make the most of the evaluation capabilities to aid your pipeline . This section reviews API extensions for Parallel Evaluation, Custom GPU Deformers, and Profiling Plug-ins.

Parallel Evaluation

If your plug-in plays by the DG rules, you probably will not need many changes to make the plug-in work in Parallel mode. Porting your plug-in so it works in Parallel may be as simple as recompiling it against the latest version of OpenMaya!

If the EM generates different results than DG-based evaluation, make sure that your plug-in:

Overrides MPxNode::compute(). This is especially true of classes extending MPxTransform which previously relied on asMatrix(). See the rockingTransform SDK sample. For classes deriving from MPxDeformerNode and MPxGeometryFilter, override the deform() method.
Handles requests for evaluation at all levels of the plug tree. While the DG can request plug values at any level, the EM always requests the root plug. For example, for plug N.gp[0].p[1] your compute() method must handle requests for evaluation of N.gp, N.gp[0], N.gp[0].p, and N.gp[0].p[1].

If your plug-in relies on custom dependency management, you need to use new API extensions to ensure correct results. As described earlier, the EG is built using the legacy dirty-propagation mechanism. Therefore, optimizations used to limit dirty propagation during DG evaluation, such as those found in MPxNode::setDependentsDirty, may introduce errors in the EG. Use MEvaluationManager::graphConstructionActive() to detect if this is occurring.

There are new virtual methods you will want to consider implementing:

MPxNode::preEvaluation. To avoid performing expensive calculations each time MPxNode::compute() is called, one strategy that plug-in authors use is to store results from previous evaluations and then rely on MPxNode::setDependentsDirty to trigger re-computation. As discussed previously, once the EG has been built, dirty propagation is disabled and the EG is re-used. Threrefore, any custom logic in your plug-in that depends on setDependentsDirty no longer applies.

MPxNode::preEvaluation allows your plug-in to determine which plugs/attributes are dirty and if any action is needed. Use the new MEvaluationNode class to determine what has been dirtied.

Refer to the simpleEvaluationNode.cpp devkit example for an illustration of how to use MPxNode::preEvaluation.
MPxNode::postEvaluation. Until now it was difficult to determine at which point all processing for a particular node instance was complete. Users sometimes resorted to complex bookkeeping/callbacks schemes to detect this situation and perform additional work, such as custom rendering. This mechanism was cumbersome and error-prone.

A new method, MPxNode::postEvaluation, is called once all computations have been performed on a specific node instance. Since this method is called from a worker thread, it performs calculations for downstream graph operations without blocking other Maya processing tasks of non-dependent nodes.

See the simpleEvaluationDraw devkit example to understand how to use this method. If you run this example in regular evaluation, Maya slows down, since evaluation is blocked whenever expensive calculations are performed. When you run in Parallel Evaluation Mode, a worker thread calls the postEvaluation method and prepares data for subsequent drawing operations. When testing, you will see higher frame rates in Parallel evaluation versus regular or Serial evaluation. Please note that code in postEvaluation should be thread-safe.

Other recommended best practices include:

Avoid storing state in static variables. Store node state/settings in attributes. This has the additional benefit of automatically saving/restoring the plug-in state when Maya files are written/read.
Node computation should not have any dependencies beyond input values. Maya nodes should be like functions. Output values should be computed from input state and node-specific internal logic. Your node should never walk the graph or try to circumvent the DG.

Custom GPU Deformers

To make GPU Override work on scenes containing custom deformers, Maya 2016 provides new API classes that allow the creation of fast OpenCL deformer back-ends.

Though you will still need to have a CPU implementation for the times when it is not possible to target deformations on the GPU (see GPU Override), you can augment this with an alternate deformer implementation inheriting from MPxGPUDeformer. This applies to your own nodes as well as to standard Maya nodes.

The GPU implementation will need to:

Declare when it is valid to use the GPU-based backend (e.g., you may want to limit you GPU version to cases where various attributes are fixed, omit usage for specific attribute values, etc.)
Extract MDataBlock input values and upload values to the GPU
Define and call the OpenCL kernel to perform needed computation
Register itself with the MGPUDeformerRegistry system. This will tell the system which deformers you are claiming responsibility for.

When you have done this, do not forget to load your plug-in at startup. Two working devkit examples (offsetNode and identityNode) have been provided to get you started.

Tip. To get a sense for the maximum speed increase you can expect by providing a GPU backend for a specific deformer, tell Maya to treat specific nodes as passthrough. Here’s an example applied to polySoftEdge:
   GPUBuiltInDeformerControl
       -name polySoftEdge
       -inputAttribute inputPolymesh 
       -outputAttribute output
       -passthrough;
Although results will be incorrect, this test can confirm if it is worth investing time implementing an OpenCL version of your node.

Profiling Plug-ins

To visualize how long custom plug-ins are taking in the new profiling tools (see Profiling Your Scene) you will need to instrument your code. Maya provides C++, Python, and Mel interface for you to do this. Refer to the Profiling using MEL or Python or the API technical docs for more details.

Profiling Your Scene

In the past, it could be challenging to understand where Maya was spending time. To remove the guess work out of performance diagnosis, Maya includes a new integrated profiler that lets you see exactly how long different tasks are taking.

You can open the Profiler by selecting:

Windows > General Editors > Profiler from the Maya menu
Persp/Graph Layout from the Quick Layout buttons and choosing Panel Layout > Profiler.

Once the Profiler window is visible:

Load your scene and start playback
Click Start in the Profiler to record information in the pre-allocated record buffer.
Wait until the record buffer becomes full or click Stop in the Profiler to stop recording. The Profiler shows a graph demonstrating the processing time for your animation.
Try recording the scene in DG, Serial, Parallel, and GPU Override modes.

Tip. By default the profiler allocates a 20MB buffer to store results. The record buffer can be expanded via the UI or using the profiler -b value; command, where value is the desired size in MB. This may be needed for more complex scenes.

The Profiler includes information for all instrumented code, including playback, manipulation, authoring tasks, and UI/Qt events. When profiling your scene, make sure to capture several frames of data to ensure gathered results are representative of scene bottlenecks.

The Profiler supports several views depending on the task you wish to perform. The default Category View, shown below, classifies events by type (e.g., dirty, VP1, VP2, Evaluation, etc). The Thread and CPU views show how function chains are subdivided amongst available compute resources. Currently the Profiler does not support visualization of GPU-based activity.

Now that you have a general sense of what the Profiler tool does, let’s discuss key phases involved in computing results for your scene and how these are displayed. By understanding why scenes are slow, you can target scene optimizations.

Every time Maya updates a frame, it must compute and draw the elements in your scene. Hence, computation can be split into one of two main categories:

Evaluation (i.e., doing the math that determines the most up-to-date values for scene elements)
Rendering (i.e., doing the work that draws your scene in the viewport).

When the main bottleneck in your scene is evaluation, we say the scene is evaluation-bound. When the main bottleneck in your scene is rendering, we say the scene is render-bound.

Evaluation-Bound Performance

There are several different problems that may lead to evaluation-bound performance.

Lock Contention. When many threads try to access a shared resource you may experience Lock Contention, due to lock management overhead. One clue that this may be happening is that evaluation takes roughly the same duration regardless of which evaluation mode you use. This occurs since threads cannot proceed until other threads are finished using the shared resource.

Here the Profiler shows many separate identical tasks that start at nearly the same time on different threads, each finishing at different times. This type of profile offers a clue that there might be some shared resource that many threads need to access simultaneously.

Below is another image showing a similar problem.

In this case, since several threads were executing Python code, they all had to wait for the Global Interpreter Lock (GIL) to become available. Bottlenecks and performance loses caused by contention issues may be more noticeable when there is a high concurrency level, such as when your computer has many cores.

If you encounter contention issues, try to fix the code in question. For the above example, changing node scheduling converted the above profile to the following one, providing a nice performance gain. For this reason, Python plug-ins are scheduled as Globally Serial by default. As a result, they will be scheduled one after the other and will not block multiple threads waiting for the GIL to become available.

Clusters. As mentioned earlier, if the EG contains node-level circular dependencies, those nodes will be grouped into a cluster which represents a single unit of work to be scheduled serially. Although multiple clusters may be evaluated at the same time, large clusters limit the amount of work that can be performed simultaneously. Clusters can be identified in the Profiler as bars with the opaqueTaskEvaluation label, shown below.

If your scene contains clusters, analyze your rig’s structure to understand why circularities exist. Ideally, you should strive to remove coupling between parts of your rig, so rig sections (e.g., head, body, etc.) can be evaluated independently.

Tip. When troubleshooting scene performance issues, you can temporarily disable costly nodes using the per-node frozen attribute. This removes specific nodes from the EG. Although the result you see will change, it is a simple way to check that you have found the bottleneck for your scene.

Render-Bound Performance

The following is an illustration of a sample result from the Maya Profiler, zoomed to a single frame measured from a large scene with many animated meshes. Because of the number of objects, different materials, and the amount of geometry, this scene is very costly to render.

The attached profile has four main areas:

Evaluation (A)
GPUOverridePostEval (B)
Vp2BuildRenderLists (C)
Vp2Draw3dBeautyPass (D)

In this scene, a substantial number of meshes are being evaluated with GPU Override and some profiler blocks appear differently from what they would otherwise.

Evaluation. Area A depicts the time spent computing the state of the Maya scene. In this case, the scene is moderately well-parallelized. The blocks in shades of orange and green represent the software evaluation of DG nodes. The blocks in yellow are the tasks that initiate mesh evaluation via GPU Override. Mesh evaluation on the GPU starts with these yellow blocks and continues concurrently with the other work on the CPU.

An example of a parallel bottleneck in the scene evaluation appears in the gap in the center of the evaluation section. The large group of GPU Override blocks on the right depend on a single portion of the scene and must wait until that is complete.

Area A2 (above area A), depicts blue task blocks that show the work that VP2 does in parallel to the scene evaluation. In this scene, most of the mesh work is handled by GPU Override so it is mostly empty. When evaluating software meshes, this section shows the preparation of geometry buffers for rendering.

GPUOverridePostEval. Area B is where GPU Override finalizes some of its work. The amount of time spent in this block varies with different GPU and driver combinations. At some point there will be a wait for the GPU to complete its evaluation if it is heavily loaded. This time may appear here or it may show as additional time spent in the Vp2BuildRenderLists section.

Vp2BuildRenderList. Area C. Once the scene has been evaluated, VP2 builds the list of objects to render. Time in this section is typically proportional to the number of objects in the scene.

Vp2PrepareToUpdate. Area C2, very small in this profile. VP2 maintains an internal copy of the world and uses it to determine what to draw in the viewport. When it is time to render the scene, we must ensure that the objects in the VP2 database have been modified to reflect changes in the Maya scene. For example, objects may have become visible or hidden, their position or their topology may have changed, and so on. This is done by VP2 Vp2PrepareToUpdate.

Vp2PrepareToUpdate is slow when there are shape topology, material, or object visibility changes. In this example, Vp2PrepareToUpdate is almost invisible since the scene objects require little extra processing.

Vp2ParallelEvaluationTask is another profiler block that can appear in this area. If time is spent here, then some object evaluation has been deferred from the main evaluation section of the Evaluation Manager (area A) to be evaluated later. Evaluation in this section uses traditional DG evaluation.

Common cases for which Vp2BuildRenderLists or Vp2PrepareToUpdate can be slow during Parallel Evaluation are:

Large numbers of rendered objects (as in this example)
Mesh topology changes
Object types, such as image planes, requiring legacy evaluation before rendering
3rd party plug-ins that trigger API callbacks

Vp2Draw3dBeautyPass. Area D. Once all data has been prepared, it is time to render the scene. This is where the actual OpenGL or DirectX rendering occurs. This area is broken into subsections depending on viewport effects such as depth peeling, transparency mode, and screen space anti-aliasing.

Vp2Draw3dBeautyPass can be slow if your scene:

Has Many Objects to Render (as in this example).
Uses Transparency. Large numbers of transparent objects can be costly since the default transparency algorithm makes scene consolidation less effective. For very large numbers of transparent objects, setting Transparency Algorithm (in the vp2 settings) to Depth Peeling instead of Object Sorting may be faster. Switching to untextured mode can also bypass this cost
Uses Many Materials. In VP2, objects are sorted by material prior to rendering, so having many distinct materials makes this time-consuming.
Uses Viewport Effects. Many effects such as SSAO (Screen Space Ambient Occlusion), Depth of Field, Motion Blur, Shadow Maps, or Depth Peeling require additional processing.

Other Considerations. Although the key phases described above apply to all scenes, your scene may have different performance characteristics.

For static scenes with limited animation, or for non-deforming animated objects, consolidation is used to improve performance. Consolidation groups objects that share the same material. This reduces time spent in both Vp2BuildRenderLists and Vp2Draw3dBeatyPass, since there are fewer objects to render.

Troubleshooting Your Scene

Analysis Mode

The purpose of Analysis Mode is to perform more rigorous inspection of your scene to catch evaluation errors. Since Analysis Mode introduces overhead to your scene, only use this during debugging activities; animators should not enable Analysis Mode during their day-to-day work. Note that Analysis Mode is not thread-safe, so it is limited to Serial; you cannot use analysis mode while in Parallel evaluation.

The key function of Analysis Mode is to:

Search for errors at each playback frame. This is different than Safe Mode, which only tries to identify problems at the start of parallel execution.
Monitor read-access to node attributes. This ensures that nodes have a correct dependency structure in the EG.
Return diagnostics to better understand which nodes influence evaluation. This is currently limited to reporting one destination node at a time.

Tip. To activate Analysis Mode, use the dbtrace -k evalMgrGraphValid; MEL command.

Once active, error detection occurs after each evaluation. Missing dependencies are saved to a file in your machine’s temporary folder (e.g., %TEMP%\_MayaEvaluationGraphValidation.txt on Windows). The temporary directory on your platform can be determined using the internalVar -utd; MEL command.

To disable Analysis Mode, type: dbtrace -k evalMgrGraphValid -off;

Let’s assume that your scene contains the following three nodes. Because of the dependencies, the evaluation manager must compute the state of nodes B and C prior to calculating the state of A.

Now let’s assume Analysis Mode returns the following report:

Detected missing dependencies on frame 56
{
     A.output <-x- B
     A.output <-x- C [cluster]
}
Detected missing dependencies on frame 57
{
    A.output <-x- B
    A.output <-x- C [cluster]
}

The <-x- symbol indicates the direction of the missing dependency. The [cluster] term indicates that the node is inside of a cycle cluster, which means that any nodes from the cycles could be responsible for attribute access outside of evaluation order

In the above example, B accesses the output attribute of A, which is incorrect. These types of dependency do not appear in the Evaluation Graph and could cause a crash when running an evaluation in Parallel mode.

There are multiple reasons that missing dependencies occur, and how you handle them depends on the cause of the problem. If Analysis Mode discovers errors in your scene from bad dependencies due to:

A user plug-in. Revisit your strategy for managing dirty propagation in your node. Make sure that any attempts to use “clever” dirty propagation dirty the same attributes every time. Avoid using different notification messages to trigger pulling on attributes for computation.
A built-in node. You should communicate this information to us. This may highlight an error that we are unaware of. To help us best diagnose the causes of this bug, we would appreciate if you can provide us with the scene that caused the problem.

Graph Execution Order

There are two primary methods of displaying the graph execution order.

The simplest is to use the ‘compute’ trace object to acquire a recording of the computation order. This can only be used in Serial mode, as explained earlier. The goal of compute trace is to compare DG and EM evaluation results and discover any evaluation differences related to a different ordering or missing execution between these two modes.

Keep in mind that there will be many differences between runs since the EM executes the graph from the roots forward, whereas the DG uses values from the leaves. For example in the simple graph shown earlier, the EM guarantees that B and C will be evaluated before A, but provides no information about the relative ordering of B and C. However in the DG, A pulls on the inputs from B and C in a consistent order dictated by the implementation of node A. The EM could show either "B, C, A" or "C, B, A" as their evaluation order and although both might be valid, the user must decide if they are equivalent or not. This ordering of information can be even more useful when debugging issues in cycle computation since in both modes a pull evaluation occurs, which will make the ordering more consistent.

The EM Shelf

The BonusTools has a special shelf specifically aimed at working with the EM that contains features to query and analyze your scene and to toggle various modes on/off. See the accompanying shelf documentation for a complete list of all shelf features.

Known Limitations

This section lists known limitations for the new evaluation system.

VP2 Motion Blur will disable Parallel evaluation. For Motion Blur to work, the scene must be evaluated at different points in time. Currently the EM does not support this.
Scenes using FBIK will revert to Serial. For several years now Autodesk has been deprecating FBIK. We recommend using HIK for full-body retargeting/solving.
dbtrace will not work in Parallel mode. As stated in the Analysis Mode section, the dbtrace command will only work in Serial evaluation. Having traces enabled in Parallel mode will likely cause Maya to crash.
The DG Profiler crashes in Parallel Mode. Unless you are in DG evaluation mode, you will be unable to use the legacy DG profiler. As time permits, we expect to move features of the DG profiler into the new thread-safe integrated profiler.
Batch rendering scenes with XGen may produce incorrect results.
Evaluation manager in both Serial and Parallel mode changes the way attributes are cached. This is done to allow safe parallel evaluation and prevent re-computation of the same data by multiple threads. This means that some scenes may evaluate differently if multiple computations of the same attribute occur in one evaluation cycle. With the Evaluation Manager the first value will be cached.
VP2 Direct update does not work with polySoftEdge nodes.

Revisions

2016 Extension 2

Added tip about the controller command.
Updated Other Evaluators subsection in the Custom Evaluators section to describe the new evaluators.
New evaluators:
- transformFlattening
- reference
deformer evaluator is now enabled by default.
dynamics evaluator has a new behavior, enabled by default, to support Parallel evaluation of scenes with dynamics.
Updated Evaluator Conflicts subsection in the Custom Evaluators section.
Updated Python plug-ins scheduling to Globally Serial.
Updated Render-Bound Performance subsection in the Profiling Your Scene section.
Added new images for graph examples.
Miscellaneous typo fixes and small corrections.

2016

Initial version of the document.