We examine the claimed strengths of events over threads and show that the weaknesses of threads are artifacts of specific threading implementations and not inherent to the threading paradigm. As evidence, we present a user-level thread package that scales to 100,000 threads and achieves excellent performance in a web server. We also refine the duality argument of Lauer and Needham, which implies that good implementations of thread systems and event systems will have similar performance. Finally, we argue that compiler support for thread systems is a fruitful area for future research. It is a mistake to attempt high concurrency without help from the compiler, and we discuss several enhancements that are enabled by relatively simple compiler changes.
Fixing threads must have been a popular subject back then. I remember that Linux got fast user-space mutexes in December 2003. Python's PEP 342 "Coroutines via Enhanced Generators" is from May 2005. But that design builds on rejected PEP 325 "Resource-Release Support for Generators" from August 2003, which outlined a way to make generators more cooperative. Jeremy Hylton also wrote about "Scaling [Python] to 100,000 Threads" in September 2003. Coroutines are clearly a big part of the paper's solution and Python had them early on.
Node.js brought back the popularity of an event loop in May 2009. Many programmers rejoiced, others lamented its implications. Node is bound to a single thread like Python. It can only take advantage of multiple CPUs using subprocesses, also like Python. It's funny that Python's asyncio now focuses on event loops, like Node, for better asynchronous programming. But Node and JavaScript won't get async coroutines until ECMAScript 7. At least it's got a good JIT and you can use ES6 features today.
Go, released in March 2011, addresses all of the issues from the paper with its goroutines and channels. The paper mentions the importance of dynamic stack growth, which Go addressed with segmented stacks (as of Go 1.4, it uses a stack reallocation model instead). The paper suggests compiler support for preventing race conditions, which Go has built-in. The paper says that threads are bad for dynamic fan-in and fan-out, but Go and Python's asyncio solve those use-cases, too. Seems to me that Go is winning the race these days.
If I could retitle the original paper, I'd call it: "Why Coroutines Are a Good Idea". What's old is new; what's new is old — always.