Hi,
I have a dataset with a categorical variable filled with two-letter values. Some of the values are missing.
ID Categ_var
1 AD
1
1 BA
2 BR
2 CB
2
2 BR
3 AZ
I am trying to filter values of Categ_var that have a B in them, like this:
proc sql;
create table want as
select * from have
where categ_var not like '%B%';
quit;
The problem is that doing this causes SAS to remove the rows in which Categ_var is missing as well, but I want to keep those rows. I tried amending it like this:
proc sql;
create table want as
select * from have
where (categ_var=" " or categ_var not like '%B%');
quit;
But the same thing happened.
Any help is much appreciated.
where not(missing) categ_var and categ_var not like '%B%';
where not missing(categ_var) and categ_var not like '%B%';
Yup, that's what happens when I do two things at once and don't test.
Easy to do! 
where categ_var not like '%B%';
should not filter out missing values.
data want;
set have;
where indexc(categ_var,'B') = 0;
run;
Hi. I agree with Chris@NewZealand in that the LIKE condition is SQL does NOT delete the observations with missing values ...
data test;
input id categ_var :$2. @@;
datalines;
1 AD 1 . 1 BA 2 BR 2 CB 2 . 2 BR 3 AZ
;
title "MISSING VALUES STILL IN DATA SET";
proc sql;title "MISSING VALUES STILL IN DATA SET";
select * from test
where categ_var not like '%B%';
quit;
MISSING VALUES STILL IN DATA SET
id categ_var
1 AD
1
2
3 AZ
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
