Hi I simply want to keep only data with centain qualities.
Can somebody spot the mistake?
I don't understand why my code gives me a totally wrong output.
Data Kea.Trails1b(COMPRESS = YES REUSE = YES); length DIVISION_COUNTY_NME $30; format DIVISION_COUNTY_NME $30.; If DIVISION_COUNTY_NME in ('Denver','Douglas','Arapahoe', 'Jefferson', 'Adams', 'Bromfield', 'Elbert', 'Park', 'Clear Creek','Gilpin', 'Larimer' ) THEN COUNTY = "yes" ;ELSE COUNTY = "no"; Set Kea.Trails1a; run;
I read it is easier to introduce a new variable and then delete the records with that specific variable pattern.
So here I want to keep records with the Counties: 'Denver','Douglas','Arapahoe', 'Jefferson', 'Adams', 'Bromfield', 'Elbert', 'Park',
'Clear Creek','Gilpin', 'Larimer' they shoud all become COUNTY = "yes" ;
Why is this not happening?? Please see screenshot below.
Order of operations - move your SET statement up to either right after your DATA statement or after the LENGTH/FORMAT statements if you need to explicitly define those before you read in the data.
Data Kea.Trails1b(COMPRESS = YES REUSE = YES);
Set Kea.Trails1a;
length DIVISION_COUNTY_NME $30;
format DIVISION_COUNTY_NME $30.;
If DIVISION_COUNTY_NME in ('Denver','Douglas','Arapahoe', 'Jefferson', 'Adams', 'Bromfield', 'Elbert', 'Park',
'Clear Creek','Gilpin', 'Larimer' )
THEN COUNTY = "yes" ;
ELSE COUNTY = "no";
run;
try trim, see whether it will bring any change
If trim(DIVISION_COUNTY_NME) in ('Denver','Douglas','Arapahoe', 'Jefferson', 'Adams', 'Bromfield', 'Elbert', 'Park', 'Clear Creek','Gilpin', 'Larimer' )
Order of operations - move your SET statement up to either right after your DATA statement or after the LENGTH/FORMAT statements if you need to explicitly define those before you read in the data.
Data Kea.Trails1b(COMPRESS = YES REUSE = YES);
Set Kea.Trails1a;
length DIVISION_COUNTY_NME $30;
format DIVISION_COUNTY_NME $30.;
If DIVISION_COUNTY_NME in ('Denver','Douglas','Arapahoe', 'Jefferson', 'Adams', 'Bromfield', 'Elbert', 'Park',
'Clear Creek','Gilpin', 'Larimer' )
THEN COUNTY = "yes" ;
ELSE COUNTY = "no";
run;
The SET statement is an EXECUTED statement, not just something that is used during the compilation of the data step. Where you place it in your program makes a difference.
You are creating the new variable COUNTY based on the values of DIVISION_COUNTY_NME from the previous observation.
Try this little program.
data test;
length new_name name $30 ;
new_name = name ;
set sashelp.class ;
run;
proc print;
var name new_name ;
run;
Please see this example code for what was happening based on similar logic. The second data step shows the value that was compared and may help you understand why the second Adams was marked correctly.
data example; input letter $; datalines; a b c d e f ; run; data example2; if letter in ('a','c') then found='yes'; lv = letter; set example; label lv='Actual value of letter compared'; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.