Wednesday, March 28, 2012

More data, more noise, or more rare events

We are in a data splurge. Everyone is interested in data, how to gather it efficiently, how to store it, and most importantly, what to make out of it. Data is playing a key part from search, ad and movie recommendation, to development of social media based products. We want to know more and more about our users, be more personalized. So we collect more data, in an attempt to cover every aspect of our users' likes, preferences and habits.

But how does one decide what data is worth collecting? Or, do we just collect everything we can get our hands onto? How does one find the balance, between collecting noisy data and those informative events that will give us the crucial insight?

