Visual Text Analytics identifies urgency over global warming in COP24 tweets

3 Likes

Some of us are concerned about Global Warming, pondering on what collective and individual actions we can take to limit it. On October, the United Nations Intergovernmental Panel on Climate Change (IPCC) issued its special report SR1.5 indicating that there are clear benefits to limiting warming to 1.5 °C compared to 2 °C. The SR1.5 report was to be a key scientific input to the COP24 conference. COP24 is the informal name for the 24th Conference of the Parties to the United Nations Framework Convention on Climate Change held in Katowice, Poland from 2-14 December 2018. A main agreement reached at COP24 was on how governments will measure, report and verify their emissions-cutting efforts.

The topic deserves wide-ranging analyses. To start, I analyzed 8,000 tweets issued at COP24 during four different days using Visual Text Analytics (VTA) and Visual Analytics (VA). SAS Visual Text Analytics (VTA) is the SAS offering designed to effectively extract insights from unstructured data in large scale. Currently, VTA supports 32 languages and it has an open architecture supporting 3rd-party programming interfaces. Offered on the SAS Viya architecture, VTA combines the power of Natural Language Processing (NLP), Machine Learning (ML) and Linguistic Rules. Because of these capabilities highly customized models can be developed in VTA. In this article, I use basic linguistic rules to find how renewable energy is mentioned on the tweets.

What is the IPCC?

The Intergovernmental Panel on Climate Change (IPCC) is the UN body for assessing the science related to climate change. It was established by the United Nations Environment Programme (UN Environment) and the World Meteorological Organization (WMO) in 1988 to provide policymakers with regular scientific assessments concerning climate change, its implications and potential future risks, as well as to put forward adaptation and mitigation strategies. It has 195-member states. As mentioned above, its October Special Report on Global Warming was to be a key scientific input to the COP24 conference. However, the U.S., Russia, Saudi Arabia, and Kuwait blocked its acceptance at the conference.

This article describes the main steps of this analysis. If you would like to see more detailed instructions, you could check my previous article, Discover Main Topics on #MLKDayofService Tweets Using SAS Visual Text Analytics.

Step One: Bring Twitter Data into Visual Analytics

Twitter’s public API is used to import data. There are limitations on what data and on how much data SAS can download using the Twitter public search API. The maximum number of tweets in each download is 2000. Still, this article shows the main steps to follow for this analysis.

In Visual Analytics, you need to specify the search term(s) to import (I used the hashtag #COP24), the maximum tweets to import (up to 2,000) and whether to import retweets.

(select any image to enlarge)

Step Two: Create one table that contains all tweets to analyze

I used SAS Studio 5 to create one CAS table containing the 8,000 tweets. The table must contain an ID Column that has unique values. I ran a Profile to make sure the docID column is unique. If it would not have been unique, I would have created a column with unique values.

Step Three: Bring the table into Visual Analytics and Set a Unique Row Identifier for your data

From the SAS Home Menu, I selected Visual Analytics (VA). I converted docid from a Measure to a Category variable, and indicated docid was the unique row identifier.

Step Four: Add Text Topics Object and assign Roles

Continuing working in VA, I did the steps shown in the photo below:

On the left side bar, clicked Objects
Selected Text Topics and dropped it into the report page
Selected English
On the right-side bar, clicked Roles
To the Document collection, added “Body”. Selected English and click OK
Also under Options, I selected “Analyze document sentiment”

Note: VTA will auto-generate categories if you add additional categorical variables in this step, but variables added must have less than 400 levels. My twitter data didn’t have any variable that I could utilize at this step. In VTA 8.4 this limit is removed. Also, VTA 8.4 works with text that has emojis.

The photo below shows the out of the box results obtained in Visual Analytics: a word cloud with the most frequent terms, the list of main topics that were automatically extracted as well as their sentiments. Notice the topic with the terms “+la, de, en,+el” and since these term do not add value to the analysis I removed them in later steps in VTA. I saved this VA report.

Step Five: Create a Visual Text Analytics project

From the SAS Home Menu, I selected Build Models, that took me to SAS Model Studio, where I selected New Project and entered the following information:

After saving, in the Data tab I assigned to body the role Text, and selected the Pipeline tab to get the default VTA pipeline. In the next steps I will show how to add customizations in the Concepts, Text Parsing and Categories nodes.

Step Six: Develop Customized Concepts

Concepts are key pieces of information: energy, renewable energy, solar, food, carbon claim, transportation, etc. Concepts are useful for analyzing information in context and for extracting useful information.

VTA provides 9 predefined concepts such as dates, people, places, measurements, mentions of currency, etc. which are concepts whose rules are already written to save development time. Also in VTA you can write rules for recognizing concepts that are important to you, thereby creating custom concepts. I wanted to create custom concepts to find specific information in the tweets. In later steps, I used one of them “myEnergy” to build a new category.

LITI rules are SAS proprietary linguistic rules, and they are used to developed custom concepts. Basic Boolean operators are used to define custom concepts and categories.

Here is a very brief introduction to them:

AND/NOT operators are applied to the whole document. There other operators that search within the same sentence (SENT), the same paragraph (PARA) or a number of terms (DIST)
Any line that starts with “#” is a comment
Use CLASSIFIER to match a literal sequence
Use CONCEPT_RULE to use Boolean and proximity operators. The term extracted should use _c{ }

I developed six custom concepts, so I could see the tweets that mention those concepts: myClimateReport, myEnergy, myTransportation, myAction, myForest, and myFood. The photos below show for some of them, their LITI rules and at least one tweet that VTA matched using those rules.

For myClimateReport concept, notice that out of 8,000 tweets there were 162 where VTA found the literal sequences:” Climate Report”, “ClimateReport” or “Intergovernmental Panel on Climate Change”.

I used in later steps myEnergy concept to build a category. The photo below shows some tweets that matched myEnergy concept recommending using bicycles (that I love) or installing solar systems on residential roofs. The complete tweet was: “Bangladesh has been installing 50,000 #solar systems on residential roofs a month”

The photo below shows myForest concept appeared in 56 tweets out of the 8,000 totals, and that when I searched for the word “action”, VTA found 3 tweets out of those 56. I selected the tweet shown because it recommends something I can do: reduce consumption and promote renewables, while still planting some trees in my garden. Yes, I know the solution to this problem is not that simple, but it seems that there is something we all can do.

The concept myFood found 155 matches out of 8,000. The first tweet shown reminded me of a BBC article on Global Warming which listed several “individual actions.” I list them later in this post.

Step Seven: Work in the Text Parsing Node

In the Text Parsing Node, unstructured text is parsed and transformed into the structured form of a vector by using NLP and other tools. With the push towards Artificial Intelligence, SAS is applying more machine learning techniques, especially Deep Learning.

At the end of Step Four, I noticed that Visual Analytics listed the topic “+la, de, en,+el” and since these terms do not add value to the analysis I removed them. Also, in this node we can see TermMaps. I was interested on what were the terms that prompted people to retweet. Here is the TermMap for “rt”:

Which made me wonder who is Greta Thunberg? I searched for tweets with that literal string and found this:

I used the VTA default models for Sentiment Analysis which use SAS proprietary rules that identify and analyze terms, phrases, and character strings that imply sentiment. I continued customizations in the Topics Node.

Step Eight: Work in the Topics Node

These are the topics that VTA automatically generated:

I selected the topic “urgent appeal, appeal, strong agreement, nece, back” and searched for “greta.” (notice that VTA handles emojis 😊)

and found a retweet to Greta when I selected the topic “climate strike, 14 december, just talk, fridaysf, +strike.”

I decided to promote these two topics to categories as shown in this photo:

Step Nine: Work in the Categories Node

It is possible to create new Categories using LITI and Boolean rules. Notice in the photo below four categories: two that I promoted from the Topics node and two that I created using my previously defined custom concepts. The category CleanEnergy uses the custom concept myEnergy. There are 618 tweets in this category. When I searched for the term Pittsburgh, I found three encouraging tweets.

Recommendations on Individual Actions

This BBC article has great graphics and lists actions an individual can take recommended by the IPCC SR1.5 report:

Buy less meat, milk, cheese and butter and more locally sourced seasonal food - and throw less of it away
Drive electric cars but walk or cycle short distances
Take trains and buses instead of planes
Use videoconferencing instead of business travel
Use a washing line instead of a tumble dryer
Insulate homes
Demand low carbon in every consumer product

Conclusion

Using VTA, I quickly analyzed COP24 tweets, extracted information on terms of interest to me, and found the main themes discussed in the Conference.

In VTA one can develop highly customized models by combining the power of Natural Language Processing (NLP), Machine Learning (ML) and Linguistic Rules.

References

Model Studio: SAS® Visual Text Analytics 8.3 User’s Guide