My colleague, Beth Ebersole, wrote an article on preparing messy data for mapping in Visual Analytics 8.5. In it she showed how you could do some "old school" SAS coding to prepare the data to match the Geographical Lookup Values for SAS Visual Analytics. In this article I will show how her code can be condensed if you have a SAS Data Quality license or SAS Data Preparation license (which includes Data Quality). I will also augment the code to output the ISO Numeric Code and ISO Map ID Value (ISO 2-Letter Code) in addition to the Country Name.
The original code used a series of If/Then statements to change the Country names to match the names on the Country or Region Name column on the Geographical Lookup Values for SAS Visual Analytics list.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
If you have a SAS Data Quality license (or a SAS Data Preparation license which includes SAS Data Quality), then you can replace the If/Then statements changing the Country names with a dqstandardize call. The dqstandardize function uses a Standardization definition in the SAS Quality Knowledge Base (QKB) to put data in a common format. There are out-of-the-box definitions to standardize country information to the standard Country name, ISO number, and ISO 2-character format. The standardization definitions to use are: Country, Country (ISO Number), and Country (ISO 2 Char).
However, before you can use the dqstandardize function in your SAS 9 code, you must first use the DQLOAD AUTOCALL Macro. The syntax is:
%DQLOAD(DQLOCALE=(< QKB locale(s) to be used in your code>), DQLSETUPLOC='<path to QKB>');
Below is the declaration I made when modifying the original code:
Now that I have added the %DQLOAD statement, I can replace the series of If/Then statements with some dqstandardize statements. The syntax of the dqstandardize function is:
dqstandardize(<string to standardize>, <QKB standardization definition name>, <QKB locale>);
Below are the dqstandardize statements I used to replace the If/Then statements for changing the values of Country.
The first statement where I use the Country standardization definition is the replacement for the series of If/Then changing the values of Country. The next two statements augment the original code by returning the standardized values for the countries ISO Number and ISO 2-character code. This information is also on the Geographical Lookup Values for SAS Visual Analytics list and will provide more flexibility when using this data for mapping purposes in SAS Visual Analytics.
Finally, to handle cases where the Country values in the source files are not able to be standardized by the out-of-the-box QKB definitions, I added the following code:
Below is a snippet of the output table results by running the revised programming using the SAS Data Quality logic:
The full revised program using the SAS Data Quality logic is below:
****************************************************************************************
* COUNTRY NAMES: SAS9 Sample Code *
* Beth Ebersole and MK Queen *
* 04 FEB 2020 *
* *
****************************************************************************************
/* NEW DQ Code to define locale and QKB location */
%DQLOAD(DQLOCALE=(ENUSA), DQSETUPLOC='/home/beeber/qkb_ci_31/');
libname bethout '/home/beeber/sasuser.viya/BethOut';
***CO2 EMISSION DATA Example ***;
*Bring in the data;
data work.CO2temp;
infile '/home/beeber/sasuser.viya/WorldCO2emissionsMetricTons2017est.txt';
*input; *list;
input @1 delete1 $3. @9 CountryTemp $50. @61 part1 $3. @65 part2 $3. @69 part3 $3. @73 part4 $3. @87 delete2 4. @92 delete3 $4.;
run;
*Clean up the data and change nonstandard country names to VA 8.5 country names found at
http://support.sas.com/rnd/datavisualization/vageo/85/va85lookupvalues.html;
data work.CO2temp2;
set work.CO2temp (drop = delete1 delete2 delete3);
CO2temp = catt(part1, part2, part3, part4);
CO2 = input(CO2temp, 11.);
/*
First3Letters = substr(CountryTemp,1,3);
Last3Letters = substr(CountryTemp,length(CountryTemp)-2,3);
Country = CountryTemp;
if CountryTemp = "Bahamas, The" or CountryTemp = "The Bahamas" then Country = "Bahamas";
if CountryTemp = "Bolivia" then Country = "Bolivia, Plurinational State of";
if CountryTemp = "Brunei" then Country = "Brunei Darussalam";
if CountryTemp = "Burma" then Country = "Myanmar";
if CountryTemp = "Cabo Verde" or CountryTemp = "Republic of Cabo Verde" then Country = "Cape Verde";
if CountryTemp = "Congo, Democratic Republic of the" then Country = "Democratic Republic of Congo";
if CountryTemp = "Congo, Republic of the" then Country = "Congo";
if CountryTemp = "Cote d'Ivoire" then Country = "Ivory Coast";
if CountryTemp = "Czechia" then Country = "Czech Republic";
if CountryTemp = "Eswatini" then Country = "Swaziland";
if CountryTemp = "Falkland Islands (Islas Malvinas)" then Country = "Falkland Islands";
if CountryTemp = "Gambia, The" then Country = "Gambia";
if CountryTemp = "Guinea-Bissau" then Country = "Guinea Bissau";
if CountryTemp = "Jersey" then Country = "Bailiwick of Jersey";
if CountryTemp = "Korea, South" then Country = "South Korea";
if CountryTemp = "Korea, North" then Country = "North Korea";
if CountryTemp = "Micronesia, Federated States of" then Country = "Micronesia";
if CountryTemp = "Russia" then Country = "Russian Federation";
if CountryTemp = "Saint Helena, Ascension, and Tristan da Cunha" then Country = "Saint Helena, Ascension and Tristan da Cunha";
if CountryTemp = "Saint Vincent and the Grenadines" then Country = "Saint Vincent and Grenadines";
if CountryTemp = "Tanzania" then Country = "Tanzania, United Republic of";
if CountryTemp = "Timor-Leste" then Country = "Timor Leste";
if CountryTemp = "Turks and Caicos Islands" then Country = "Turks and Caicos";
if CountryTemp = "Venezuela" then Country = "Venezuela, Bolivarian Republic of";
if CountryTemp = "Vietnam" then Country = "Viet Nam";
if CountryTemp = "Virgin Islands" then Country = "US Virgin Islands";
if CountryTemp = "Hong Kong" or CountryTemp = "Macau" then Country = "Other";
if First3Letters = "Wes" and Last3Letters = "ank" then Country = "Other";
*/
/* BEGIN NEW DQ CODE */
Country = dqstandardize(CountryTemp, 'Country', 'ENUSA');
Country_ISO_Num = dqstandardize(CountryTemp, 'Country (ISO Number)', 'ENUSA');
Country_ISO_2Char = dqstandardize(CountryTemp, 'Country (ISO 2 Char)', 'ENUSA');
/* END NEW DQ CODE */
/* Assign value of "Other" to unresolved cases */
if ((length(Country_ISO_Num) > 3) OR (length(Country_ISO_2Char) > 2)) then do;
Country="Other";
Country_ISO_Num="Other";
Country_ISO_2Char="Other";
end;
data work.CO2;
/*set work.CO2temp2 (drop = part1 part2 part3 part4 CO2temp First3Letters Last3Letters CountryTemp);*/
set work.CO2temp2 (drop = part1 part2 part3 part4 CO2temp CountryTemp);
*proc contents data=work.CO2;
*proc print data=work.CO2;
run;
***POPULATION***;
data work.POPtemp;
infile '/home/beeber/sasuser.viya/WorldPopulation2018.txt' firstobs=2;
input @1 delete1 $5. @9 CountryTemp $50. @65 part1 $2. @68 part2 $3. @72 part3 $3. @76 part4 $3. @92 year $4.;
*input; * list;
run;
data work.POPtemp2;
set work.POPtemp (drop = delete1 year);
POPtemp = catt(part1, part2, part3, part4);
POP = input(POPtemp, 11.);
/* Country = CountryTemp;
First3Letters = substr(CountryTemp,1,3);
Last3Letters = substr(CountryTemp,length(CountryTemp)-2,3);
Country = CountryTemp;
if pop > 1384688985 then Country = "China";
if CountryTemp = "Bahamas, The" or CountryTemp = "The Bahamas" then Country = "Bahamas";
if CountryTemp = "Bolivia" then Country = "Bolivia, Plurinational State of";
if CountryTemp = "Brunei" then Country = "Brunei Darussalam";
if CountryTemp = "Burma" then Country = "Myanmar";
if CountryTemp = "Cabo Verde" or CountryTemp = "Republic of Cabo Verde" then Country = "Cape Verde";
if CountryTemp = "Congo, Democratic Republic of the" then Country = "Democratic Republic of Congo";
if CountryTemp = "Congo, Republic of the" then Country = "Congo";
if CountryTemp = "Cote d'Ivoire" then Country = "Ivory Coast";
if CountryTemp = "Czechia" then Country = "Czech Republic";
if CountryTemp = "Eswatini" then Country = "Swaziland";
if CountryTemp = "Falkland Islands (Islas Malvinas)" then Country = "Falkland Islands";
if CountryTemp = "Gambia, The" then Country = "Gambia";
if CountryTemp = "Guernsey" then Country = "Bailiwick of Guernsey";
if CountryTemp = "Guinea-Bissau" then Country = "Guinea Bissau";
if CountryTemp = "Holy See (Vatican City)" then Country = "Vatican City State";
if CountryTemp = "Jersey" then Country = "Bailiwick of Jersey";
if CountryTemp = "Korea, South" then Country = "South Korea";
if CountryTemp = "Korea, North" then Country = "North Korea";
if CountryTemp = "Micronesia, Federated States of" then Country = "Micronesia";
if CountryTemp = "Pitcairn Islands" then Country = "Pitcairn";
if CountryTemp = "Russia" then Country = "Russian Federation";
if CountryTemp = "Saint Helena, Ascension, and Tristan da Cunha" then Country = "Saint Helena, Ascension and Tristan da Cunha";
if CountryTemp = "Saint Vincent and the Grenadines" then Country = "Saint Vincent and Grenadines";
if CountryTemp = "Svalbard" then Country = "Svalbard and Jan Mayen";
if CountryTemp = "Tanzania" then Country = "Tanzania, United Republic of";
if CountryTemp = "Timor-Leste" then Country = "Timor Leste";
if CountryTemp = "Turks and Caicos Islands" then Country = "Turks and Caicos";
if CountryTemp = "Venezuela" then Country = "Venezuela, Bolivarian Republic of";
if CountryTemp = "Vietnam" then Country = "Viet Nam";
if CountryTemp = "Virgin Islands" then Country = "US Virgin Islands";
if CountryTemp = "Hong Kong" or CountryTemp = "Macau" then Country = "Other";
if First3Letters = "Wes" and Last3Letters = "ank" then Country = "Other";
if First3Letters = "Gaz" and Last3Letters = "rip" then Country = "Other"; */
/* Manual correction for missing China value from original code */
if pop > 1384688985 then CountryTemp = "China";
/* BEGIN NEW DQ CODE */
Country = dqstandardize(CountryTemp, 'Country', 'ENUSA');
Country_ISO_Num = dqstandardize(CountryTemp, 'Country (ISO Number)', 'ENUSA');
Country_ISO_2Char = dqstandardize(CountryTemp, 'Country (ISO 2 Char)', 'ENUSA');
/* END NEW DQ CODE */
/* Assign value of "Other" to unresolved cases */
if ((length(Country_ISO_Num) > 3) OR (length(Country_ISO_2Char) > 2)) then do;
Country="Other";
Country_ISO_Num="Other";
Country_ISO_2Char="Other";
end;
data work.POP;
/*set work.POPtemp2 (drop = part1 part2 part3 part4 POPtemp First3Letters Last3Letters CountryTemp);*/
set work.POPtemp2 (drop = part1 part2 part3 part4 POPtemp CountryTemp);
proc contents data=work.POP;
proc print data=work.POP;
run;
proc sort data = POP; by Country;
proc sort data = CO2; by Country;
data bethout.CO2byPOP_DQ;
merge POP CO2; by Country;
CO2byPOP = CO2/Pop;
if CO2 = . then delete;
if POP = . then delete;
proc print;
run;
cas mysession sessopts=(caslib=casuser timeout=1800 locale="en_US");
proc casutil;
droptable casdata="CO2byPOP_DQ" quiet;
load data=BethOut.CO2byPOP_DQ outcaslib = "casuser" casout="CO2byPOP_DQ" promote;
quit; *best practice is to close the CAS thread;
run;
cas mysession terminate; *best practice is to terminate cas session;
Special thanks to Beth Ebersole for sharing her data and code to aid in the writing of this article. For more information on SAS Data Quality, please refer to its documentation.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.