Tuesday, September 11, 2012

Brute force computing doesn't replace models

Writing in The New York Times' Bits blog, Quentin Hardy notes that:

The brute force computing model is changing a lot of fields, with more to follow. It makes sense, in a world where more data is available than ever before, and even more is coming online, from a connected, sensor-laden world where data storage and server computing cycles cost almost nothing. In a sense, it is becoming a modification of the old “theorize-model-test-conclude” scientific method. Now the condition is to create and collect a lot of data, look for patterns amid the trash, and try to exploit the best ones.

I rather like the term "brute force computing."

On the one hand, it generalizes beyond Big Data to Big Compute as well. The common thread is that bits of storage and cycles of computing are cheap enough that they don't need to be applied judiciously. The article offers an example from Autodesk. "The idea is to try thousands of different conditions, like temperature, humidity, tensile strength or shape, in just a few seconds. Most of the outcomes will be lousy, a couple of them will probably affirm what a designer thought to begin with, and a few might deliver surprising insights no one had considered."

In another respect, "brute force computing" is a narrower term than Big Data, which really talks to the speed and size of the data rather than the sophistication applied to its analysis. The application of sophisticated models to large realtime data streams may fall under Big Data--but it would be hard to call such merely brute force. That there's such demand for data scientist skills is but one indicator that there's a lot more to data analytics than having a big server farm. Rather, the idea that useful results can fall out when lots of CPUs crank on lots of bytes is more akin to an idea that Wired's Chris Anderson popularized in his provocative 2008 article "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete."

And that's where I'd have liked to see a bit more counterpoint in Hardy's article. It's not that lots of compute plus lots of data can't yield interesting results. But as repeatedly discussed at conferences such as O'Reilly's Stata, it isn't that simple. The numbers often don't just speak for themselves. The right questions have to be asked and the right models, however refined and tested by data and compute, developed. "Brute force computing" has a place but it's got an even larger place when augmented with intelligence.

No comments: