by Niranjan Thomas


NEW YORK – Can an aggregated news data set help explain a single catastrophic event and its evolution?

In an era where “data is king,” organizations (both non-profit and for-profit) may be overlooking one of the most powerful types of data available to them: news stories. The aggregation of news data from millions of articles can provide insights that solve meaningful problems, which other data sets struggle to address.

To best understand how news stories can be applied to a wide variety of events for the benefit of greater society (and how the same methodology can provide insights to business situations), we examined if the Dow Jones DNA platform—along with our partners—could show the evolution of impact and responses after Hurricane Harvey (which struck the Gulf Coast of the United States in August 2017) and the resulting network effects.

Analysts used the DNA Snapshot API to extract articles associated with Hurricane Harvey from July 2017 to December 2017, a period that began one month before the event and ended four months later.

The next task was understanding the roles and relationships of the different organizations that participated in the recovery efforts.

Presenting the data in a visual way makes it easier to grasp these relationships. In this case, the graph represented the entities as circles and the relationships as lines. The number of lines through an entity indicated if it had a greater impact during the event.


Related: It’s time to make AI innovation truly inclusive


This approach provided a compelling, visual way to identify the major players in the recovery effort immediately. It also was possible to identify which entities were related to each other based on the number of lines that connected them on the graph.

Beginning in August 2017, three topic clusters emerged during the first few days of the hurricane.

In the graph, these topics were given different colors to make them easy to identify.

RED represented forecasts:
These were reports of wind speeds and issuing warnings, dominated by the U.S. National Hurricane Center.

PURPLE represented impact:
These were reports about the potential, and actual, impact of the hurricane. The articles were denoted by language such as “pummeled Texas with rain” and “disrupted shipments.”

GREEN represented associations:
These were parallels and comparisons made with Hurricane Katrina in 2005, such as death tolls and recovery efforts. The stories talked about things like ‘evoking memories of’ Hurricane Katrina.

By including the dimension of time, as denoted by the dates on the news stories, the graphs showed how the relationships between entities change as the event unfolded. This provided vital insight into how the organizations worked together in different phases of the relief effort.

DNA analysis of data extracted from the platform was able to identify that the Houston Food Bank played a significant leadership role in the recovery efforts, and later (by identifying major sources of donations to Hurricane Harvey aid) how it could be more effective in future disaster recovery efforts. If the DNA platform had been used to identify this information — or any of the other secondary impacts it found — as it was being reported, the organizations may have been able to identify and address issues even more quickly.

As this shows, a news data set’s veracity can help people and organizations to understand the long-term and secondary impact of natural disasters.

Join Dow Jones and 20,000+ other technology and enterprise leaders at The AI Summit London, June 12-13. Find out more


This article is based on a white paper by Dow Jones, available here