BookmarkSubscribeRSS Feed
raheleh22
Obsidian | Level 7

Hi, 

I am trying to filter a huge dataset to locations of interest and here is the syntax I developed. I am not sure why SAS only is filtering my last line and not the whole locations I selected in my IF statements. Any thoughts on this is much appreciated: 

data XXX;
set YYY;
IF GEOID=38001 THEN LOCATION="Adams County";
IF GEOID=38003 THEN LOCATION="Barnes County";
IF GEOID=38005 THEN LOCATION="Benson County";
IF GEOID=38007 THEN LOCATION="Billings County";
IF GEOID=38009 THEN LOCATION="Bottineau County";
IF GEOID=38011 THEN LOCATION="Bowman County";
IF GEOID=38013 THEN LOCATION="Burke County";
IF GEOID=38015 THEN LOCATION="Burleigh County";
IF GEOID=38017 THEN LOCATION="Cass County";
IF GEOID=38019 THEN LOCATION="Cavalier County";
IF GEOID=38021 THEN LOCATION="Dickey County";
IF GEOID=38023 THEN LOCATION="Divide County";
IF GEOID=38025 THEN LOCATION="Dunn County";
IF GEOID=38027 THEN LOCATION="Eddy County";
IF GEOID=38029 THEN LOCATION="Emmons County";
IF GEOID=38031 THEN LOCATION="Foster County";
IF GEOID=38033 THEN LOCATION="Golden Valley County";
IF GEOID=38035 THEN LOCATION="Grand Forks County";
IF GEOID=38037 THEN LOCATION="Grant County";
IF GEOID=38039 THEN LOCATION="Griggs County";
IF GEOID=38041 THEN LOCATION="Hettinger County";
IF GEOID=38043 THEN LOCATION="Kidder County";
IF GEOID=38045 THEN LOCATION="LaMoure County";
IF GEOID=38047 THEN LOCATION="Logan County";
IF GEOID=38049 THEN LOCATION="McHenry County";
IF GEOID=38051 THEN LOCATION="McIntosh County";
IF GEOID=38053 THEN LOCATION="McKenzie County";
IF GEOID=38055 THEN LOCATION="McLean County";
IF GEOID=38057 THEN LOCATION="Mercer County";
IF GEOID=38059 THEN LOCATION="Morton County";
IF GEOID=38061 THEN LOCATION="Mountrail County";
IF GEOID=38063 THEN LOCATION="Nelson County";
IF GEOID=38065 THEN LOCATION="Oliver County";
IF GEOID=38067 THEN LOCATION="Pembina County";
IF GEOID=38069 THEN LOCATION="Pierce County";
IF GEOID=38071 THEN LOCATION="Ramsey County";
IF GEOID=38073 THEN LOCATION="Ransom County";
IF GEOID=38075 THEN LOCATION="Renville County";
IF GEOID=38077 THEN LOCATION="Richland County";
IF GEOID=38079 THEN LOCATION="Rolette County";
IF GEOID=38081 THEN LOCATION="Sargent County";
IF GEOID=38083 THEN LOCATION="Sheridan County";
IF GEOID=38085 THEN LOCATION="Sioux County";
IF GEOID=38087 THEN LOCATION="Slope County";
IF GEOID=38089 THEN LOCATION="Stark County";
IF GEOID=38091 THEN LOCATION="Steele County";
IF GEOID=38093 THEN LOCATION="Stutsman County";
IF GEOID=38095 THEN LOCATION="Towner County";
IF GEOID=38097 THEN LOCATION="Traill County";
IF GEOID=38099 THEN LOCATION="Walsh County";
IF GEOID=38101 THEN LOCATION="Ward County";
IF GEOID=38103 THEN LOCATION="Wells County";
IF GEOID=38105 THEN LOCATION="Williams County";
else delete;
run;

4 REPLIES 4
Astounding
PROC Star

That is the expected behavior.  Technically, ELSE refers to the most recent IF/THEN condition only and tests:

 

  1. Was the most recent IF/THEN condition evaluated?  and
  2. Was it found to be false?

To connect all of the conditions in the way you would like, you would need to repeat the word ELSE, such as:

 

IF GEOID=38097 THEN LOCATION="Traill County";
ELSE IF GEOID=38099 THEN LOCATION="Walsh County";
ELSE IF GEOID=38101 THEN LOCATION="Ward County";
ELSE IF GEOID=38103 THEN LOCATION="Wells County";
ELSE IF GEOID=38105 THEN LOCATION="Williams County";
else delete;

For readability, I would suggest switching to SELECT/WHEN instead:

select (GEOID);
   when (38097)  location = 'Traill County';
   when (38099)  location = 'Walsh County';
   when (38101)  location = 'Ward County';
   when (38103)  location = 'Wells County';
   when (38105)  location = 'Williams County';
   otherwise delete;
end;

Of course you have to add to the program to specify your entire list, not just 5 of the counties.

ghosh
Barite | Level 11

You could just recode leveraging a feature of Proc Format.   Your code will begin where it says /* Start here */

You don't really need to create the Location variable unless you wish to export the data.

data yyy;                                         
    input GEOID desc $20.;                       
datalines;       
38001 Adams County
38003 Barnes County
38005 Benson County
38007 Billings County
38009 Bottineau County
38011 Bowman County
;
proc print;
run;
/* Start here */
data crtfmt;                                        
    set yyy;                                       
    start=GEOID;                                 
    label=desc;                                     
    fmtname='geof';                                
run;                                               
 
proc format library=work cntlin=crtfmt fmtlib;      
    select geof;                                   
run;
Data xxx;
   set yyy;
      LOCATION=put(GEOID,geof.) ;
run;
proc print;
ID GEOID;
var LOCATION;
run; 
/* End */

ghosh_0-1702149660458.png

 

Patrick
Opal | Level 21

That's an example where using formats is beneficial.

@ghosh is basically proposing the same but using a table as source for the format instead of an explicit definition. I felt for your use case and assumed level of SAS expertise below syntax is likely easier to absorb. 

proc format;
  value GeoId2County
    38001="Adams County"
    38003="Barnes County"
    38005="Benson County"
    38007="Billings County"
    38009="Bottineau County"
    38011="Bowman County"
    38013="Burke County"
    38015="Burleigh County"
    38017="Cass County"
    38019="Cavalier County"
    38021="Dickey County"
    38023="Divide County"
    38025="Dunn County"
    38027="Eddy County"
    38029="Emmons County"
    38031="Foster County"
    38033="Golden Valley County"
    38035="Grand Forks County"
    38037="Grant County"
    38039="Griggs County"
    38041="Hettinger County"
    38043="Kidder County"
    38045="LaMoure County"
    38047="Logan County"
    38049="McHenry County"
    38051="McIntosh County"
    38053="McKenzie County"
    38055="McLean County"
    38057="Mercer County"
    38059="Morton County"
    38061="Mountrail County"
    38063="Nelson County"
    38065="Oliver County"
    38067="Pembina County"
    38069="Pierce County"
    38071="Ramsey County"
    38073="Ransom County"
    38075="Renville County"
    38077="Richland County"
    38079="Rolette County"
    38081="Sargent County"
    38083="Sheridan County"
    38085="Sioux County"
    38087="Slope County"
    38089="Stark County"
    38091="Steele County"
    38093="Stutsman County"
    38095="Towner County"
    38097="Traill County"
    38099="Walsh County"
    38101="Ward County"
    38103="Wells County"
    38105="Williams County"
    other=" "
    ;
run;

data want;
  set have;
  if put(geoid,GeoId2County.) = " " then delete;
run;

 

ballardw
Super User

You may want to investigate use of the FIPS related functions if you are looking for North Dakota areas.

Although only the the first two digits would identify a state.

I might suggest this for the filter.

data xxx;
   set yyy;
   if fipstate(floor(geoid/1000))='ND';
run;

You should also have a data set SASHELP.Zipcode that has a county level FIPS code in the variable County that is the numeric county number that the FIPS codes use and the COUNTYNM has the name of the county.

I use these pieces of information to make custom formats so I can display county names based on either the FIPS or ZIP which is again a numeric value.

I have custom numeric and character versions so I don't have to deal with as many conversions when provided external data in either format.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 711 views
  • 2 likes
  • 5 in conversation