Is Data Quality Overrated?

The inherent limitations of this approach spurred the author of the article, Patrick Meier, and his team to enhance Ushahidi with a set of Twitter classifiers — algorithms that could automatically identify Tweets that were relevant and informative to the crisis at hand. For example, classifiers automatically categorize eyewitness reports, infrastructure-damage assessments, casualties, humanitarian needs, offers of help and so on.

But given the quality of incoming data — terse text with an emphasis on emotion rather than nicety of speech — what results can we expect? Not too bad, as it turns out; initial accuracy rates range between 70% and 90%. Meier and his team are now working on developing more sophisticated algorithms that can be trained to better interpret incoming messages, leading to continued improvements in accuracy. read more

Leave a Reply