Some of us are concerned about Global Warming, pondering on what collective and individual actions we can take to limit it. On October, the United Nations Intergovernmental Panel on Climate Change (IPCC) issued its special report SR1.5 indicating that there are clear benefits to limiting warming to 1.5 °C compared to 2 °C. The SR1.5 report was to be a key scientific input to the COP24 conference. COP24 is the informal name for the 24th Conference of the Parties to the United Nations Framework Convention on Climate Change held in Katowice, Poland from 2-14 December 2018. A main agreement reached at COP24 was on how governments will measure, report and verify their emissions-cutting efforts.
The topic deserves wide-ranging analyses. To start, I analyzed 8,000 tweets issued at COP24 during four different days using Visual Text Analytics (VTA) and Visual Analytics (VA). SAS Visual Text Analytics (VTA) is the SAS offering designed to effectively extract insights from unstructured data in large scale. Currently, VTA supports 32 languages and it has an open architecture supporting 3rd-party programming interfaces. Offered on the SAS Viya architecture, VTA combines the power of Natural Language Processing (NLP), Machine Learning (ML) and Linguistic Rules. Because of these capabilities highly customized models can be developed in VTA. In this article, I use basic linguistic rules to find how renewable energy is mentioned on the tweets.
What is the IPCC?
The Intergovernmental Panel on Climate Change (IPCC) is the UN body for assessing the science related to climate change. It was established by the United Nations Environment Programme (UN Environment) and the World Meteorological Organization (WMO) in 1988 to provide policymakers with regular scientific assessments concerning climate change, its implications and potential future risks, as well as to put forward adaptation and mitigation strategies. It has 195-member states. As mentioned above, its October Special Report on Global Warming was to be a key scientific input to the COP24 conference. However, the U.S., Russia, Saudi Arabia, and Kuwait blocked its acceptance at the conference.
This article describes the main steps of this analysis. If you would like to see more detailed instructions, you could check my previous article, Discover Main Topics on #MLKDayofService Tweets Using SAS Visual Text Analytics.
Twitter’s public API is used to import data. There are limitations on what data and on how much data SAS can download using the Twitter public search API. The maximum number of tweets in each download is 2000. Still, this article shows the main steps to follow for this analysis.
In Visual Analytics, you need to specify the search term(s) to import (I used the hashtag #COP24), the maximum tweets to import (up to 2,000) and whether to import retweets.
(select any image to enlarge)
I used SAS Studio 5 to create one CAS table containing the 8,000 tweets. The table must contain an ID Column that has unique values. I ran a Profile to make sure the docID column is unique. If it would not have been unique, I would have created a column with unique values.
From the SAS Home Menu, I selected Visual Analytics (VA). I converted docid from a Measure to a Category variable, and indicated docid was the unique row identifier.
Continuing working in VA, I did the steps shown in the photo below:
Note: VTA will auto-generate categories if you add additional categorical variables in this step, but variables added must have less than 400 levels. My twitter data didn’t have any variable that I could utilize at this step. In VTA 8.4 this limit is removed. Also, VTA 8.4 works with text that has emojis.
The photo below shows the out of the box results obtained in Visual Analytics: a word cloud with the most frequent terms, the list of main topics that were automatically extracted as well as their sentiments. Notice the topic with the terms “+la, de, en,+el” and since these term do not add value to the analysis I removed them in later steps in VTA. I saved this VA report.
From the SAS Home Menu, I selected Build Models, that took me to SAS Model Studio, where I selected New Project and entered the following information:
After saving, in the Data tab I assigned to body the role Text, and selected the Pipeline tab to get the default VTA pipeline. In the next steps I will show how to add customizations in the Concepts, Text Parsing and Categories nodes.
Concepts are key pieces of information: energy, renewable energy, solar, food, carbon claim, transportation, etc. Concepts are useful for analyzing information in context and for extracting useful information.
VTA provides 9 predefined concepts such as dates, people, places, measurements, mentions of currency, etc. which are concepts whose rules are already written to save development time. Also in VTA you can write rules for recognizing concepts that are important to you, thereby creating custom concepts. I wanted to create custom concepts to find specific information in the tweets. In later steps, I used one of them “myEnergy” to build a new category.
LITI rules are SAS proprietary linguistic rules, and they are used to developed custom concepts. Basic Boolean operators are used to define custom concepts and categories.
Here is a very brief introduction to them:
I developed six custom concepts, so I could see the tweets that mention those concepts: myClimateReport, myEnergy, myTransportation, myAction, myForest, and myFood. The photos below show for some of them, their LITI rules and at least one tweet that VTA matched using those rules.
For myClimateReport concept, notice that out of 8,000 tweets there were 162 where VTA found the literal sequences:” Climate Report”, “ClimateReport” or “Intergovernmental Panel on Climate Change”.
I used in later steps myEnergy concept to build a category. The photo below shows some tweets that matched myEnergy concept recommending using bicycles (that I love) or installing solar systems on residential roofs. The complete tweet was: “Bangladesh has been installing 50,000 #solar systems on residential roofs a month”
The photo below shows myForest concept appeared in 56 tweets out of the 8,000 totals, and that when I searched for the word “action”, VTA found 3 tweets out of those 56. I selected the tweet shown because it recommends something I can do: reduce consumption and promote renewables, while still planting some trees in my garden. Yes, I know the solution to this problem is not that simple, but it seems that there is something we all can do.
The concept myFood found 155 matches out of 8,000. The first tweet shown reminded me of a BBC article on Global Warming which listed several “individual actions.” I list them later in this post.
In the Text Parsing Node, unstructured text is parsed and transformed into the structured form of a vector by using NLP and other tools. With the push towards Artificial Intelligence, SAS is applying more machine learning techniques, especially Deep Learning.
At the end of Step Four, I noticed that Visual Analytics listed the topic “+la, de, en,+el” and since these terms do not add value to the analysis I removed them. Also, in this node we can see TermMaps. I was interested on what were the terms that prompted people to retweet. Here is the TermMap for “rt”:
Which made me wonder who is Greta Thunberg? I searched for tweets with that literal string and found this:
I used the VTA default models for Sentiment Analysis which use SAS proprietary rules that identify and analyze terms, phrases, and character strings that imply sentiment. I continued customizations in the Topics Node.
These are the topics that VTA automatically generated:
I selected the topic “urgent appeal, appeal, strong agreement, nece, back” and searched for “greta.” (notice that VTA handles emojis 😊)
and found a retweet to Greta when I selected the topic “climate strike, 14 december, just talk, fridaysf, +strike.”
I decided to promote these two topics to categories as shown in this photo:
It is possible to create new Categories using LITI and Boolean rules. Notice in the photo below four categories: two that I promoted from the Topics node and two that I created using my previously defined custom concepts. The category CleanEnergy uses the custom concept myEnergy. There are 618 tweets in this category. When I searched for the term Pittsburgh, I found three encouraging tweets.
This BBC article has great graphics and lists actions an individual can take recommended by the IPCC SR1.5 report:
Using VTA, I quickly analyzed COP24 tweets, extracted information on terms of interest to me, and found the main themes discussed in the Conference.
In VTA one can develop highly customized models by combining the power of Natural Language Processing (NLP), Machine Learning (ML) and Linguistic Rules.
BBC’s article Final call to save the world from 'climate catastrophe'
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.