Real Life Experiment Using Machine Learning to Classify News Stories

I wonder if anyone's done some analytic classification of the daily news.

If I were to classify stories into categories, I'd choose some basic group to cluster.

  • Distraction
  • Terror
  • Helplessness
  • Joyful

A distracting article would be like the OJ Simpson trial or some zany political candidate making outlandish claims with no real intent on running.  Capturing the attention of society with shocking revelations as a form of entertainment and distraction from the real issues.

A terror article would be a downed plane, murder, unfortunate event somewhere on the planet to shock the audience resulting in perpetual fear.

A helplessness article would be some distraught series of events, such as global warming, rampant inflation, eroding middle class, corruption, where the audience has absolutely no control or influence on the outcome, creating feelings of despair.

A joyful article would be where an authority figure or random person did some act of kindness with no expectations in return, which are quite rare, to give the occasional uplift to the viewer.

We could add a few more buckets to capture all events in the media perhaps.  But then feed the news articles for the past 50 years into a Hadoop cluster, run some machine learning jobs to classify each story into a neatly defined bucket, and extract the ratio of news stories.  And then run some statistical analysis on the result set, to produce some meaningful insight on how the news portrays it's stories.  And perhaps link that data to the stock market, or unemployment rates or key events during the same time frame to draw possible conclusions.

Perhaps we may conclude that the rain in Florida for the past 60 days, which is quite unprecedented, followed by the upcoming hurricane in the gulf, amplified the damages and resulted in excessive claims, which caused the insurance companies to double their rates.  Something like that.

And then we find a similar set of events, like the half dozen hurricanes in a single year from last decade also produced the same results.  And then you find patterns, and draw conclusions.

Or you do trend analysis and determine that it rains on every major holiday in the US based on conclusive evidence derived from the experiment.  Once you have the data to support your claim, then you dig in to find cause and effect, look for root cause analysis, cross reference other data sets.  And link that to the hurricane in New Orleans and the tsunami in Japan and the hurricane in Haiti.  Potential for mind blowing conclusions.

Simply an experiment in a real life scenario on how machine learning could be used as a tool to derive meaningful insights.