Is Data Quality Overrated?

The inherent limitations of this approach spurred the author of the article, Patrick Meier, and his team to enhance Ushahidi with a set of Twitter classifiers — algorithms that could automatically identify Tweets that were relevant and informative to the crisis at hand. For example, classifiers automatically categorize eyewitness reports, infrastructure-damage assessments, casualties, humanitarian needs, offers of help and so on.

But given the quality of incoming data — terse text with an emphasis on emotion rather than nicety of speech — what results can we expect? Not too bad, as it turns out; initial accuracy rates range between 70% and 90%. Meier and his team are now working on developing more sophisticated algorithms that can be trained to better interpret incoming messages, leading to continued improvements in accuracy. read more

About

Leave a Reply