Did you realize that your old-school SAS programming skills can be super handy in SAS Viya 3.5? This article walks through using SAS Studio on SAS Viya 3.5 to clean up messy data for mapping in SAS Visual Analytics 8.5 and provides sample code.
However. If we call a country by any other name, all kinds of problems ensue. We can’t merge our datasets. Furthermore, Visual Analytics won’t be able to map the data. Visual Analytics relies on standard country names for mapping.
To help you happily map data from a variety of data sources that use slightly different versions of country or region names, this article walks through how to address this problem in either:
Here’s what can happen. Let’s pull some data off the internet and tried to map it in VA 8.5. See below that some of the country names aren’t recognized by Visual Analytics, and thus can’t be mapped.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Bahamas, The seems easy to solve on our own. As I suspected, if we just get rid of , The then it works. But Cabo Verde? Russia? These seem pretty standard to me! But if we look at our map, we see no data for Russia.
First of all, it’s always a good practice to check our locale. We go to Visual Analytics Settings.
|
Where we choose the locale we want, in my case, English (United States). |
While we are in Settings, let’s double check our map provider. We see it is the default of OpenStreetMap. It will work similarly also for ESRI maps.
Also, while we’re at it, let’s check our browser locale (found at the bottom right of our browser--here Chrome)…
…just to make sure that we didn’t accidentally toggle to a French keyboard by hitting Windows key + space.
Let’s use an example with country population and country CO2 emissions from two separate internet data sets. To complicate things further, our data from the internet are not delimited, they are fixed-width data. So we cannot import them. Instead we use an input and list statement to first identify the structure of the data, and then we can use an input and infile statement to bring in the data.
While we are at it, we can leave out the commas in values, and concatenate the remainder of the data. We will use an input statement to convert character data to numeric.
The first step is to upload our data to the SAS Server, so that it is accessible to SAS Studio. We open the Explorer on the left pane and navigate to a file that we have write access to. In my case, on my course image, I have access to the intviya01\Home\sasuser.viya folder.
IMPORTANT: You must have write access to a folder on the SAS Server in order to upload files.
Next we right click and select upload files.
Select the + icon to upload, then navigate to where your files are located. Mine are located on my client server desktop, as shown below.
Let’s upload our WorldPopulation2018.txt file.
Now the file is visible in the Explorer tree along with the WorldCO2emissionsMetricTons2017est.txt file that I previously uploaded the same way.
Let’s create folders for our code and for our output datasets. Right click to accomplish this.
And name the folders.
The folder now also appears in our Explorer tree under intviya01\Home\sasuser.viya.
To get the path for our libname statement and our infile statement, we right click in the Explorer tree and select Insert as path, as shown below.
See below we have added the correct path to the dataset to our infile statement.
To help us appropriately create our input statement, we can begin by running
input;
list;
to see the data.
Looking at the log, we see a RULE line, which reveals where we need to stop and start our variables. For example, we see that the County starts @9 and the highest CO2 value starts @62 so we will set it @61 in case a future data pull includes a higher value.
We also see that the CO2 value has commas, which we will get rid of using our infile statement to bring it in in parts to concatenate in the next data step. Let’s bring all the variables in as characters, and we will create a numeric value for our CO2 value using a put statement. Notice that I like to use Temp (short for temporary) or Temp1, Temp2 etc. as a suffix for interim dataset names that I will later rename. Just my personal idiosyncratic method to help me keep things straight.
Now that we have brought in the data, we can save it as a SAS sas7bdat data set in our BethOut folder location.
Then we can simply use if-then statements to rename the Countries to the correct names. Ahhh, but what might those correct names be?? Luckily for us, there is a list of the Geographic Look up Values for SAS Visual Analytics 8.5.
We can create an “Other” category for those that don’t match up to any country. We also sometimes find country names with weird characters in them so that our if-then statements don’t work. For these I created substring variables and used those in my if-then statements.
Notice that once we have fixed the names to match my lookup table, we need to load our data set as a global table in CAS to make the data available to Visual Analytics. Here is a close up of example code to accomplish that. (For the complete code, including the if-then statements, read a little farther down.)
cas mysession sessopts=(caslib=casuser timeout=1800 locale="en_US");
proc casutil;
droptable casdata="CO2" quiet;
load data=BethOut.CO2 outcaslib = "casuser" casout="CO2" promote;
quit; *best practice is to close the CAS thread;
run;
To learn more, watch this video about caslibs.
Now let’s go to SAS Visual Analytics; we can use the hamburger icon to view our Applications menu.
And select Explore and Visualize to open the Visual Analytics interface.
Open a New Report (or go to the report I created previously if this is a repeated attempt). Select to add data.
Under the Data Sources, select cas-shared-default, and select the right arrow next to CASUSER(yourusername).
Be sure to hit the refresh button to get the latest version. Check the time on the data set to ensure that it refreshed, select the data set, and hit Ok.
Now lets pull our data into a Visual Analytics geomap, and we will be able to see if we still missed any names that need to be changed to the standard names. First change the Category Variable Country to Geography.
We select Geographic name or code lookup and Country or Region Names. We see that the only Country that did not map was our category of Other, so we are good!
If we did have any outstanding country names that needed to be corrected, we could go back to our SAS Studio program and repeat the steps outlined above.
Once we are satisfied with the results of our data cleaning program in SAS Studio, we can add other datasets and do the same thing.
And once the Country Names are standardized in separate data sets, then, ta dah, we can merge those data sets!
Here we merge CO2 emission data with Human Population data by country. The full program is below:
SAS Studio on Viya Sample Code:
*************************************************************************************************
* *
* COUNTRY NAMES: SAS Studio on Viya 3.5 Sample Code *
* Beth Ebersole *
* 22 JAN 2020 *
* *
*************************************************************************************************;
libname bethout '/home/beeber/sasuser.viya/BethOut';
***CO2 EMISSION DATA Example ***;
*Bring in the data;
data work.CO2temp;
infile '/home/beeber/sasuser.viya/WorldCO2emissionsMetricTons2017est.txt';
*input; *list;
input @1 delete1 $3. @9 CountryTemp $50. @61 part1 $3. @65 part2 $3. @69 part3 $3. @73 part4 $3. @87 delete2 4. @92 delete3 $4.;
run;
*Clean up the data and change nonstandard country names to VA 8.5 country names found at
http://support.sas.com/rnd/datavisualization/vageo/85/va85lookupvalues.html;
data work.CO2temp2;
set work.CO2temp (drop = delete1 delete2 delete3);
CO2temp = catt(part1, part2, part3, part4);
CO2 = input(CO2temp, 11.);
Country = CountryTemp;
First3Letters = substr(CountryTemp,1,3);
Last3Letters = substr(CountryTemp,length(CountryTemp)-2,3);
Country = CountryTemp;
if CountryTemp = "Bahamas, The" or CountryTemp = "The Bahamas" then Country = "Bahamas";
if CountryTemp = "Bolivia" then Country = "Bolivia, Plurinational State of";
if CountryTemp = "Brunei" then Country = "Brunei Darussalam";
if CountryTemp = "Burma" then Country = "Myanmar";
if CountryTemp = "Cabo Verde" or CountryTemp = "Republic of Cabo Verde" then Country = "Cape Verde";
if CountryTemp = "Congo, Democratic Republic of the" then Country = "Democratic Republic of Congo";
if CountryTemp = "Congo, Republic of the" then Country = "Congo";
if CountryTemp = "Cote d'Ivoire" then Country = "Ivory Coast";
if CountryTemp = "Czechia" then Country = "Czech Republic";
if CountryTemp = "Eswatini" then Country = "Swaziland";
if CountryTemp = "Falkland Islands (Islas Malvinas)" then Country = "Falkland Islands";
if CountryTemp = "Gambia, The" then Country = "Gambia";
if CountryTemp = "Guinea-Bissau" then Country = "Guinea Bissau";
if CountryTemp = "Jersey" then Country = "Bailiwick of Jersey";
if CountryTemp = "Korea, South" then Country = "South Korea";
if CountryTemp = "Korea, North" then Country = "North Korea";
if CountryTemp = "Micronesia, Federated States of" then Country = "Micronesia";
if CountryTemp = "Russia" then Country = "Russian Federation";
if CountryTemp = "Saint Helena, Ascension, and Tristan da Cunha" then Country = "Saint Helena, Ascension and Tristan da Cunha";
if CountryTemp = "Saint Vincent and the Grenadines" then Country = "Saint Vincent and Grenadines";
if CountryTemp = "Tanzania" then Country = "Tanzania, United Republic of";
if CountryTemp = "Timor-Leste" then Country = "Timor Leste";
if CountryTemp = "Turks and Caicos Islands" then Country = "Turks and Caicos";
if CountryTemp = "Venezuela" then Country = "Venezuela, Bolivarian Republic of";
if CountryTemp = "Vietnam" then Country = "Viet Nam";
if CountryTemp = "Virgin Islands" then Country = "US Virgin Islands";
if CountryTemp = "Hong Kong" or CountryTemp = "Macau" then Country = "Other";
if First3Letters = "Wes" and Last3Letters = "ank" then Country = "Other";
data work.CO2;
set work.CO2temp2 (drop = part1 part2 part3 part4 CO2temp First3Letters Last3Letters CountryTemp);
*proc contents data=work.CO2;
*proc print data=work.CO2;
run;
***POPULATION***;
data work.POPtemp;
infile '/home/beeber/sasuser.viya/WorldPopulation2018.txt' firstobs=2;
input @1 delete1 $5. @9 CountryTemp $50. @65 part1 $2. @68 part2 $3. @72 part3 $3. @76 part4 $3. @92 year $4.;
*input; * list;
run;
data work.POPtemp2;
set work.POPtemp (drop = delete1 year);
POPtemp = catt(part1, part2, part3, part4);
POP = input(POPtemp, 11.);
Country = CountryTemp;
First3Letters = substr(CountryTemp,1,3);
Last3Letters = substr(CountryTemp,length(CountryTemp)-2,3);
Country = CountryTemp;
if pop > 1384688985 then Country = "China";
if CountryTemp = "Bahamas, The" or CountryTemp = "The Bahamas" then Country = "Bahamas";
if CountryTemp = "Bolivia" then Country = "Bolivia, Plurinational State of";
if CountryTemp = "Brunei" then Country = "Brunei Darussalam";
if CountryTemp = "Burma" then Country = "Myanmar";
if CountryTemp = "Cabo Verde" or CountryTemp = "Republic of Cabo Verde" then Country = "Cape Verde";
if CountryTemp = "Congo, Democratic Republic of the" then Country = "Democratic Republic of Congo";
if CountryTemp = "Congo, Republic of the" then Country = "Congo";
if CountryTemp = "Cote d'Ivoire" then Country = "Ivory Coast";
if CountryTemp = "Czechia" then Country = "Czech Republic";
if CountryTemp = "Eswatini" then Country = "Swaziland";
if CountryTemp = "Falkland Islands (Islas Malvinas)" then Country = "Falkland Islands";
if CountryTemp = "Gambia, The" then Country = "Gambia";
if CountryTemp = "Guernsey" then Country = "Bailiwick of Guernsey";
if CountryTemp = "Guinea-Bissau" then Country = "Guinea Bissau";
if CountryTemp = "Holy See (Vatican City)" then Country = "Vatican City State";
if CountryTemp = "Jersey" then Country = "Bailiwick of Jersey";
if CountryTemp = "Korea, South" then Country = "South Korea";
if CountryTemp = "Korea, North" then Country = "North Korea";
if CountryTemp = "Micronesia, Federated States of" then Country = "Micronesia";
if CountryTemp = "Pitcairn Islands" then Country = "Pitcairn";
if CountryTemp = "Russia" then Country = "Russian Federation";
if CountryTemp = "Saint Helena, Ascension, and Tristan da Cunha" then Country = "Saint Helena, Ascension and Tristan da Cunha";
if CountryTemp = "Saint Vincent and the Grenadines" then Country = "Saint Vincent and Grenadines";
if CountryTemp = "Svalbard" then Country = "Svalbard and Jan Mayen";
if CountryTemp = "Tanzania" then Country = "Tanzania, United Republic of";
if CountryTemp = "Timor-Leste" then Country = "Timor Leste";
if CountryTemp = "Turks and Caicos Islands" then Country = "Turks and Caicos";
if CountryTemp = "Venezuela" then Country = "Venezuela, Bolivarian Republic of";
if CountryTemp = "Vietnam" then Country = "Viet Nam";
if CountryTemp = "Virgin Islands" then Country = "US Virgin Islands";
if CountryTemp = "Hong Kong" or CountryTemp = "Macau" then Country = "Other";
if First3Letters = "Wes" and Last3Letters = "ank" then Country = "Other";
if First3Letters = "Gaz" and Last3Letters = "rip" then Country = "Other";
data work.POP;
set work.POPtemp2 (drop = part1 part2 part3 part4 POPtemp First3Letters Last3Letters CountryTemp);
proc contents data=work.POP;
proc print data=work.POP;
run;
proc sort data = POP; by Country;
proc sort data = CO2; by Country;
data bethout.CO2byPOP;
merge POP CO2; by Country;
CO2byPOP = CO2/Pop;
if CO2 = . then delete;
if POP = . then delete;
proc print;
run;
cas mysession sessopts=(caslib=casuser timeout=1800 locale="en_US");
proc casutil;
droptable casdata="CO2byPOP" quiet;
load data=BethOut.CO2byPOP outcaslib = "casuser" casout="CO2byPOP" promote;
quit; *best practice is to close the CAS thread;
run;
Here are the Human Population results by country.
Notice that even if you have data for the whole world, and you have standardized your country names appropriately, you will see some areas are not filled in with any color. In some cases this is simply because it is part of another country, but not contiguous. For example, French Guiana is part of France, but is located in South America.
Not to be confused with nearby Guyana. Nor French Guinea in West Africa (now called Guinea-Bissau). Nor Guinea, nor Equatorial Guinea, nor Guernsey.
We use a filter in Visual Analytics to exclude Gibralter from the map. Gibralter is excluded because its CO2 emissions per person are orders of magnitude higher than any other country due to bunker fuel sales, https://www.theguardian.com/environment/2012/jul/16/gibraltar-carbon-emissions-distorted-table, and so it hides any other pertinent information in the map.
If you are an old SAS coder like I am, the SAS Studio code looks mighty familiar. Nothing is different except the way we access and save the data.
If you wish to do the same thing using SAS 9, the code is below. In this case we have saved the new standardized and merged dataset to the client desktop. From there we can import it into Visual Analytics easily.
SAS 9 Sample Code:
*************************************************************************************************
* *
* COUNTRY NAMES: SAS9 Sample Code *
* Beth Ebersole *
* 20 JAN 2020 *
* *
*************************************************************************************************;
libname bethout 'C:\Users\student\Desktop\BethOut';
***CO2 EMISSION DATA Example ***;
data work.CO2temp;
infile 'C:\Users\student\Desktop\World\WorldCO2emissionsMetricTons2017est.txt';
input @1 delete1 $3. @9 CountryTemp $50. @61 part1 $3. @65 part2 $3. @69 part3 $3. @73 part4 $3. @87 delete2 4. @92 delete3 $4.;
*list;
run;
data work.CO2temp2;
set work.CO2temp (drop = delete1 delete2 delete3);
CO2temp = catt(part1, part2, part3, part4);
CO2 = input(CO2temp, 11.);
Country = CountryTemp;
First3Letters = substr(CountryTemp,1,3);
Last3Letters = substr(CountryTemp,length(CountryTemp)-2,3);
Country = CountryTemp;
if CountryTemp = "Bahamas, The" or CountryTemp = "The Bahamas" then Country = "Bahamas";
if CountryTemp = "Bolivia" then Country = "Bolivia, Plurinational State of";
if CountryTemp = "Brunei" then Country = "Brunei Darussalam";
if CountryTemp = "Burma" then Country = "Myanmar";
if CountryTemp = "Cabo Verde" or CountryTemp = "Republic of Cabo Verde" then Country = "Cape Verde";
if CountryTemp = "Congo, Democratic Republic of the" then Country = "Democratic Republic of Congo";
if CountryTemp = "Congo, Republic of the" then Country = "Congo";
if CountryTemp = "Cote d'Ivoire" then Country = "Ivory Coast";
if CountryTemp = "Czechia" then Country = "Czech Republic";
if CountryTemp = "Eswatini" then Country = "Swaziland";
if CountryTemp = "Falkland Islands (Islas Malvinas)" then Country = "Falkland Islands";
if CountryTemp = "Gambia, The" then Country = "Gambia";
if CountryTemp = "Guinea-Bissau" then Country = "Guinea Bissau";
if CountryTemp = "Jersey" then Country = "Bailiwick of Jersey";
if CountryTemp = "Korea, South" then Country = "South Korea";
if CountryTemp = "Korea, North" then Country = "North Korea";
if CountryTemp = "Micronesia, Federated States of" then Country = "Micronesia";
if CountryTemp = "Russia" then Country = "Russian Federation";
if CountryTemp = "Saint Helena, Ascension, and Tristan da Cunha" then Country = "Saint Helena, Ascension and Tristan da Cunha";
if CountryTemp = "Saint Vincent and the Grenadines" then Country = "Saint Vincent and Grenadines";
if CountryTemp = "Tanzania" then Country = "Tanzania, United Republic of";
if CountryTemp = "Timor-Leste" then Country = "Timor Leste";
if CountryTemp = "Turks and Caicos Islands" then Country = "Turks and Caicos";
if CountryTemp = "Venezuela" then Country = "Venezuela, Bolivarian Republic of";
if CountryTemp = "Vietnam" then Country = "Viet Nam";
if CountryTemp = "Virgin Islands" then Country = "US Virgin Islands";
if CountryTemp = "Hong Kong" or CountryTemp = "Macau" then Country = "Other";
if First3Letters = "Wes" and Last3Letters = "ank" then Country = "Other";
data work.CO2;
set work.CO2temp2 (drop = part1 part2 part3 part4 CO2temp First3Letters Last3Letters CountryTemp);
*proc contents data=bethout.CO2;
*proc print data=bethout.CO2;
run;
***POPULATION***;
data work.POPtemp;
infile 'C:\Users\student\Desktop\World\WorldPopulation2018.txt' firstobs=2;
input @1 delete1 5. @9 CountryTemp $50. @65 part1 $2. @68 part2 $3. @72 part3 $3. @76 part4 $3. @87 delete2 6. @92;
*input; * list;
run;
data work.POPtemp2;
set work.POPtemp (drop = delete1 delete2);
POPtemp = catt(part1, part2, part3, part4);
POP = input(POPtemp, 11.);
Country = CountryTemp;
First3Letters = substr(CountryTemp,1,3);
Last3Letters = substr(CountryTemp,length(CountryTemp)-2,3);
Country = CountryTemp;
if pop > 1384688985 then Country = "China";
if CountryTemp = "Bahamas, The" or CountryTemp = "The Bahamas" then Country = "Bahamas";
if CountryTemp = "Bolivia" then Country = "Bolivia, Plurinational State of";
if CountryTemp = "Brunei" then Country = "Brunei Darussalam";
if CountryTemp = "Burma" then Country = "Myanmar";
if CountryTemp = "Cabo Verde" or CountryTemp = "Republic of Cabo Verde" then Country = "Cape Verde";
if CountryTemp = "Congo, Democratic Republic of the" then Country = "Democratic Republic of Congo";
if CountryTemp = "Congo, Republic of the" then Country = "Congo";
if CountryTemp = "Cote d'Ivoire" then Country = "Ivory Coast";
if CountryTemp = "Czechia" then Country = "Czech Republic";
if CountryTemp = "Eswatini" then Country = "Swaziland";
if CountryTemp = "Falkland Islands (Islas Malvinas)" then Country = "Falkland Islands";
if CountryTemp = "Gambia, The" then Country = "Gambia";
if CountryTemp = "Guernsey" then Country = "Bailiwick of Guernsey";
if CountryTemp = "Guinea-Bissau" then Country = "Guinea Bissau";
if CountryTemp = "Holy See (Vatican City)" then Country = "Vatican City State";
if CountryTemp = "Jersey" then Country = "Bailiwick of Jersey";
if CountryTemp = "Korea, South" then Country = "South Korea";
if CountryTemp = "Korea, North" then Country = "North Korea";
if CountryTemp = "Micronesia, Federated States of" then Country = "Micronesia";
if CountryTemp = "Pitcairn Islands" then Country = "Pitcairn";
if CountryTemp = "Russia" then Country = "Russian Federation";
if CountryTemp = "Saint Helena, Ascension, and Tristan da Cunha" then Country = "Saint Helena, Ascension and Tristan da Cunha";
if CountryTemp = "Saint Vincent and the Grenadines" then Country = "Saint Vincent and Grenadines";
if CountryTemp = "Svalbard" then Country = "Svalbard and Jan Mayen";
if CountryTemp = "Tanzania" then Country = "Tanzania, United Republic of";
if CountryTemp = "Timor-Leste" then Country = "Timor Leste";
if CountryTemp = "Turks and Caicos Islands" then Country = "Turks and Caicos";
if CountryTemp = "Venezuela" then Country = "Venezuela, Bolivarian Republic of";
if CountryTemp = "Vietnam" then Country = "Viet Nam";
if CountryTemp = "Virgin Islands" then Country = "US Virgin Islands";
if CountryTemp = "Hong Kong" or CountryTemp = "Macau" then Country = "Other";
if First3Letters = "Wes" and Last3Letters = "ank" then Country = "Other";
if First3Letters = "Gaz" and Last3Letters = "rip" then Country = "Other";
data work.POP;
set work.POPtemp2 (drop = part1 part2 part3 part4 POPtemp First3Letters Last3Letters CountryTemp);
*proc contents data=bethout.POP;
*proc print data=bethout.POP;
run;
proc sort data = POP; by Country;
proc sort data = CO2; by Country;
data bethout.CO2byPOP1;
merge POP CO2; by Country;
CO2byPOP = CO2/Pop;
if POP = . then delete;
if CO2 = . then delete;
proc print;
run;
In summary, we can use old school style SAS coding to clean and prepare our data and merge sets in SAS Viya 3.5 using SAS Studio. This is quite handy for old school SAS coders who are already familiar with traditional SAS code, because it is almost identical to the SAS Studio code.
We can also easily upload data to the SAS compute server using the Upload feature, from a shared drive or our local client. (Likewise we could download data from the SAS compute server.)
We can also easily load any data sets we create in SAS Studio to CAS, and access them from the Visual Analytics interface.
Much thanks to Uttam Kumar, Teri Patsilaras, and Allen Cunningham for their help!
NOTE: If you have a SAS Data Quality license or SAS Data Preparation license, see this article by my colleague Mary Kathryn Queen, which enables you to use condensed code to standardize country names.
More tips on mapping
More tips on bringing in data old school style
Data sources
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.