home << prev next >> contents  


Thread-Local Storage

Sometimes shaders need to share data, but need to modify it. Sharing would allow using the first method for constant data, but that does not permit changing the data during rendering because no consistent writes are possible - multiple instances of the shaders in different threads may write to the same data simultaneously. This can corrupt even simple operations such as incrementing a variable, because the execution order is unpredictable. The sequence read_A write_A read_B write_B in threads A and B works, but there is no way to avoid the sequence read_A read_B write_A write_B. This problem is called a race condition, and it can cause one increment to be lost, in rare and hard-to-debug cases. Locking would prevent that but may cause an unacceptable performance loss.

mental ray 2.1 allows solving this problem by allocating an array with one member per thread in the init shader. The number of threads could be obtained by calling mi_par_nthreads, and it was guaranteed that no thread with a thread number outside that range would ever call a shader. However, mental ray 3.x no longer makes this guarantee; the number of threads may change at any time so mi_par_nthreads is deprecated. It is still available but always returns 65, which may allow unported shaders to limp along on hosts with few (say, up to 16) CPUs. This means that shaders have to implement a hashing scheme to use the array method in mental ray 3.x. To simplify this, mental ray 3.1 introduces a standard mechanism for thread-local storage.

Thread-local storage avoids the race condition by providing one copy of the data to each thread. Multiple threads can execute simultaneously but with any single thread the execution is strictly sequential, so that the read/write race condition cannot happen. Here is an example that counts shader calls:

     DLLEXPORT void myshader(           /* main shader */
         miState         *state,
         struct myshader *paras,
         miBoolean       *inst_req)
     {
         int             *counter;

         mi_query(miQ_FUNC_TLS_GET, state, miNULLTAG, &counter);
         if (!counter) {
             counter = mi_mem_allocate(sizeof(int));
             mi_query(miQ_FUNC_TLS_SET, state, miNULLTAG, &counter);
             *counter = 0;
         }
         (*counter)++;
         ...
         return(miTRUE);
     }

     DLLEXPORT miBoolean myshader_init(
        miState          *state,
        struct myshader  *paras,
        miBoolean        *init_req)
     {
        *init_req = miTRUE;
        return(miTRUE);
     }

     DLLEXPORT void myshader_exit(      /* exit shader */
         miState         *state,
         struct myshader *paras)
     {
         int             **ptrs;
         int             num, i, total = 0;

         if (!paras)
             return(miTRUE);
         mi_query(miQ_FUNC_TLS_GETALL, state, miNULLTAG, &counters, &num);
         for (i=0; i < num; i++) {
             total += *counters[i];
             mi_mem_release(counters[i]);
         }
         mi_info("myshader was called %d times", total);
         return(miTRUE);
     }

The thread-local data is a single integer that counts shader calls in this thread. Since init shaders are called once per shader or once per shader instance, but not once every time the shader is called in a new thread, the data cannot be installed and initialized in the init shader. Instead, it is created in the main body if it did not already exist. This is safe because no two threads will get the same pointer returned by miQ_FUNC_TLS_GET. (Note that setting * counter to zero is actually redundant because mi_mem_allocate always returns zeroed memory.)

The example exit shader collects all the thread-local counters of all threads that installed a counter, and computes and prints the total. It is done during shader instance exit, not the shader exit, by checking that paras is nonzero. This requires shader instance init/exit to be enabled in the init function by setting init_req to miTRUE.

This will only work on a single host because each host exits its own shaders, and there is no way to communicate the counters between hosts. Moreover, slave hosts may come and go, and may call their exit shaders multiple times for a single frame.

Thread-local shader storage relies on three new mi_query modes:

The second argument to mi_query must be the shader state, and the third must be a null tag. A mi_query call with these modes in mental ray 2.1 and 3.0 will return miFALSE.

home << prev next >> contents  


Copyright © 1986-2007 by mental images GmbH