Standard operations such as malloc allocate from the global heap. As a result, these operations are serialized with locks. When many threads are accessing the heap, this can become a bottleneck, particularly on OSX where the locking overhead is extremely high. Thread-aware memory allocators work by maintaining multiple local heaps rather than working with the main global heap. This avoids serialization of memory operations, improving performance. Examples of threaded memory allocators are Hoard, SmartHeap and the memory allocator supplied with TBB.
When threads are created, each thread gets its own stack. There is a limit to these stack sizes and it is possible to exceed it, leading to runtime failures. It may be necessary to increase the stack size, but increasing too far may lead to resource constraints.