While attending Foo Camp this past weekend I had the opportunity to chat with some smart folks at the forefront of data science. The conference kicked off by having each person provide an introduction of three keywords about themselves. "Data" was the most popular item by far.
What surprised me in conversations was the lack of emphasis on how data science is primarily derivative in nature. When you're analyzing click streams or transaction logs, the goal is to extract some kind of understanding from the raw information. Fundamentally you're trying to get back to the intent or behavior of humans. Why aren't people more upfront about this disconnect?
Probably the most common interaction people have with data science is Google Analytics. It tells you how many visitors your website has, how they arrived there, which content is the most popular, etc. As the publisher (and pseudo-scientist) you navigate through this data and try to figure out what it means. The outcome is a plan of action. You gain the choice to embrace what's working and ditch what's not. In this way, data science is the beginning of the feedback loop that leads to better products and new strategies.
There's just one problem with all of this: Website analytics are based on derivative information. They only let you view the world through a lens, intuiting the preferences of people by looking at correlations of activity. Why not just talk to your potential users instead? There are millions of people on the Internet, each of them with tens of thousands of opinions on the whole wealth of human existence. This is the big data of human opinion — literally billions of data points out there with the ground truth. The only hard part is collecting the information and making sure it's accurate.
Doing research by tapping into people's opinions should be a no-brainer. Before crunching logs, measure the boundaries of people's behaviors. Before building a product, listen to what people think are its most important attributes. Before spreading the word, make sure people understand your message. Before pitching a startup, determine people's potential interest. To draw a comparison: These days if you launched a website without analytics people would say you're crazy; you'd be flying blind. Not doing direct research should seem just as silly.
I'm Brett Slatkin and this is where I write about programming and related topics. You can contact me here or view my projects.
12 June 2012
About
I'm the author of the book
Effective Python. I'm a software engineer at Google (currently in the Office of the CTO) where I've worked for the past 19 years. Follow @haxor on Twitter
if you'd like to read posts from me in the future. You can also
email me here.
© 2009-2024 Brett Slatkin