We often have code that looks like this, especially for dashboards, where we need to do N independent queries to construct a view:
class DashboardHandler(RequestHandler): def get(self): user = MyUserModel.get_by_id(self.request.get('user_id')) context = {} posts = MyPosts.query().filter( MyPosts.user == user).fetch(10) for post in posts: # Populate the template context dictionary... comments = MyComments.query().filter( MyComments.user == user).fetch(10) for comment in comments: # Populate the template context dictionary... self.render('my_template.html', context)
Initially this is fine because it's only a couple of queries in serial. Eventually our DashboardHandler grows to be 10 or 20 separate operations that need to join together to construct the response. At that point the DashboardHandler would be excruciatingly slow. It also gets long, as the various methods that fetch and traverse objects are added to the handler's get() method.
Using NDB we'll split up something like this. We do it by finding the logical units of work in the DB (focusing on what's being fetched) and breaking them out into methods that are tasklets which execute concurrently:
class DashboardHandler(RequestHandler): @toplevel def get(self): user = MyUserModel.get_by_id(self.request.get('user_id')) post_data = self.get_posts(user) # These return futures comment_data = self.get_comments(user) context = {} context.update((yield post_data)) context.update((yield comment_data)) self.render('my_template.html', context) @tasklet def get_posts(self, user) context = {} posts = MyPosts.query().filter( MyPosts.user == user).fetch_async(10) for post in posts: # Populate the local template context dictionary... raise Return(context) @tasklet def get_comments(self, user) context = {} comments = MyComments.query().filter( MyComments.user == user).fetch_async(10) for comment in comments: # Populate the local template context dictionary... raise Return(context)
Now get_posts() and get_comments() will run in parallel, minimizing the idle time for the Python GIL thread, maximizing throughput. Simultaneously we've refactored our code to be more readable, logically separated, and potentially reusable. But it still reads like procedural code and can be tested synchronously, like this:
class MyTest(TestCase): def test_get_posts(self): handler = DashboardHandler() user = MyUserModel(id='my name') post = MyPosts(user=user) post.put() future = handler.get_posts(user) self.assertEquals(dict(posts=post), future.get_result())
So this style of refactoring using futures is all win with very little effort. With almost just copying and pasting code sections you can get tremendous latency improvements through simultaneous I/O. And it's way easier to understand than continuation-passing style asynchronous programming. In general I wish more APIs worked this way. Maybe Tornado could be paired with something for a non-App-Engine solution?
As an aside, this also illustrates why I'm not optimistic about nodejs's longevity. Programmers don't understand asynchronous programming, even when they've been warned. Futures are the sanest way to transition a synchronous project to async without using threads. When I see there's resistance to officially adopting the project that makes Node feel imperative it makes its future as an application platform look grim. I think what gets popular is what's easy to learn that solves a real need; what lasts is what's easy to make robust and fast.