Hi,
I am trying to filter a huge dataset to locations of interest and here is the syntax I developed. I am not sure why SAS only is filtering my last line and not the whole locations I selected in my IF statements. Any thoughts on this is much appreciated:
data XXX;
set YYY;
IF GEOID=38001 THEN LOCATION="Adams County";
IF GEOID=38003 THEN LOCATION="Barnes County";
IF GEOID=38005 THEN LOCATION="Benson County";
IF GEOID=38007 THEN LOCATION="Billings County";
IF GEOID=38009 THEN LOCATION="Bottineau County";
IF GEOID=38011 THEN LOCATION="Bowman County";
IF GEOID=38013 THEN LOCATION="Burke County";
IF GEOID=38015 THEN LOCATION="Burleigh County";
IF GEOID=38017 THEN LOCATION="Cass County";
IF GEOID=38019 THEN LOCATION="Cavalier County";
IF GEOID=38021 THEN LOCATION="Dickey County";
IF GEOID=38023 THEN LOCATION="Divide County";
IF GEOID=38025 THEN LOCATION="Dunn County";
IF GEOID=38027 THEN LOCATION="Eddy County";
IF GEOID=38029 THEN LOCATION="Emmons County";
IF GEOID=38031 THEN LOCATION="Foster County";
IF GEOID=38033 THEN LOCATION="Golden Valley County";
IF GEOID=38035 THEN LOCATION="Grand Forks County";
IF GEOID=38037 THEN LOCATION="Grant County";
IF GEOID=38039 THEN LOCATION="Griggs County";
IF GEOID=38041 THEN LOCATION="Hettinger County";
IF GEOID=38043 THEN LOCATION="Kidder County";
IF GEOID=38045 THEN LOCATION="LaMoure County";
IF GEOID=38047 THEN LOCATION="Logan County";
IF GEOID=38049 THEN LOCATION="McHenry County";
IF GEOID=38051 THEN LOCATION="McIntosh County";
IF GEOID=38053 THEN LOCATION="McKenzie County";
IF GEOID=38055 THEN LOCATION="McLean County";
IF GEOID=38057 THEN LOCATION="Mercer County";
IF GEOID=38059 THEN LOCATION="Morton County";
IF GEOID=38061 THEN LOCATION="Mountrail County";
IF GEOID=38063 THEN LOCATION="Nelson County";
IF GEOID=38065 THEN LOCATION="Oliver County";
IF GEOID=38067 THEN LOCATION="Pembina County";
IF GEOID=38069 THEN LOCATION="Pierce County";
IF GEOID=38071 THEN LOCATION="Ramsey County";
IF GEOID=38073 THEN LOCATION="Ransom County";
IF GEOID=38075 THEN LOCATION="Renville County";
IF GEOID=38077 THEN LOCATION="Richland County";
IF GEOID=38079 THEN LOCATION="Rolette County";
IF GEOID=38081 THEN LOCATION="Sargent County";
IF GEOID=38083 THEN LOCATION="Sheridan County";
IF GEOID=38085 THEN LOCATION="Sioux County";
IF GEOID=38087 THEN LOCATION="Slope County";
IF GEOID=38089 THEN LOCATION="Stark County";
IF GEOID=38091 THEN LOCATION="Steele County";
IF GEOID=38093 THEN LOCATION="Stutsman County";
IF GEOID=38095 THEN LOCATION="Towner County";
IF GEOID=38097 THEN LOCATION="Traill County";
IF GEOID=38099 THEN LOCATION="Walsh County";
IF GEOID=38101 THEN LOCATION="Ward County";
IF GEOID=38103 THEN LOCATION="Wells County";
IF GEOID=38105 THEN LOCATION="Williams County";
else delete;
run;
That is the expected behavior. Technically, ELSE refers to the most recent IF/THEN condition only and tests:
To connect all of the conditions in the way you would like, you would need to repeat the word ELSE, such as:
IF GEOID=38097 THEN LOCATION="Traill County";
ELSE IF GEOID=38099 THEN LOCATION="Walsh County";
ELSE IF GEOID=38101 THEN LOCATION="Ward County";
ELSE IF GEOID=38103 THEN LOCATION="Wells County";
ELSE IF GEOID=38105 THEN LOCATION="Williams County";
else delete;
For readability, I would suggest switching to SELECT/WHEN instead:
select (GEOID);
when (38097) location = 'Traill County';
when (38099) location = 'Walsh County';
when (38101) location = 'Ward County';
when (38103) location = 'Wells County';
when (38105) location = 'Williams County';
otherwise delete;
end;
Of course you have to add to the program to specify your entire list, not just 5 of the counties.
You could just recode leveraging a feature of Proc Format. Your code will begin where it says /* Start here */
You don't really need to create the Location variable unless you wish to export the data.
data yyy;
input GEOID desc $20.;
datalines;
38001 Adams County
38003 Barnes County
38005 Benson County
38007 Billings County
38009 Bottineau County
38011 Bowman County
;
proc print;
run;
/* Start here */
data crtfmt;
set yyy;
start=GEOID;
label=desc;
fmtname='geof';
run;
proc format library=work cntlin=crtfmt fmtlib;
select geof;
run;
Data xxx;
set yyy;
LOCATION=put(GEOID,geof.) ;
run;
proc print;
ID GEOID;
var LOCATION;
run;
/* End */
That's an example where using formats is beneficial.
@ghosh is basically proposing the same but using a table as source for the format instead of an explicit definition. I felt for your use case and assumed level of SAS expertise below syntax is likely easier to absorb.
proc format;
value GeoId2County
38001="Adams County"
38003="Barnes County"
38005="Benson County"
38007="Billings County"
38009="Bottineau County"
38011="Bowman County"
38013="Burke County"
38015="Burleigh County"
38017="Cass County"
38019="Cavalier County"
38021="Dickey County"
38023="Divide County"
38025="Dunn County"
38027="Eddy County"
38029="Emmons County"
38031="Foster County"
38033="Golden Valley County"
38035="Grand Forks County"
38037="Grant County"
38039="Griggs County"
38041="Hettinger County"
38043="Kidder County"
38045="LaMoure County"
38047="Logan County"
38049="McHenry County"
38051="McIntosh County"
38053="McKenzie County"
38055="McLean County"
38057="Mercer County"
38059="Morton County"
38061="Mountrail County"
38063="Nelson County"
38065="Oliver County"
38067="Pembina County"
38069="Pierce County"
38071="Ramsey County"
38073="Ransom County"
38075="Renville County"
38077="Richland County"
38079="Rolette County"
38081="Sargent County"
38083="Sheridan County"
38085="Sioux County"
38087="Slope County"
38089="Stark County"
38091="Steele County"
38093="Stutsman County"
38095="Towner County"
38097="Traill County"
38099="Walsh County"
38101="Ward County"
38103="Wells County"
38105="Williams County"
other=" "
;
run;
data want;
set have;
if put(geoid,GeoId2County.) = " " then delete;
run;
You may want to investigate use of the FIPS related functions if you are looking for North Dakota areas.
Although only the the first two digits would identify a state.
I might suggest this for the filter.
data xxx; set yyy; if fipstate(floor(geoid/1000))='ND';
run;
You should also have a data set SASHELP.Zipcode that has a county level FIPS code in the variable County that is the numeric county number that the FIPS codes use and the COUNTYNM has the name of the county.
I use these pieces of information to make custom formats so I can display county names based on either the FIPS or ZIP which is again a numeric value.
I have custom numeric and character versions so I don't have to deal with as many conversions when provided external data in either format.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.