Hi,
I have a dataset with a categorical variable filled with two-letter values. Some of the values are missing.
ID Categ_var
1 AD
1
1 BA
2 BR
2 CB
2
2 BR
3 AZ
I am trying to filter values of Categ_var that have a B in them, like this:
proc sql;
create table want as
select * from have
where categ_var not like '%B%';
quit;
The problem is that doing this causes SAS to remove the rows in which Categ_var is missing as well, but I want to keep those rows. I tried amending it like this:
proc sql;
create table want as
select * from have
where (categ_var=" " or categ_var not like '%B%');
quit;
But the same thing happened.
Any help is much appreciated.
where not(missing) categ_var and categ_var not like '%B%';
where not missing(categ_var) and categ_var not like '%B%';
Yup, that's what happens when I do two things at once and don't test.
Easy to do!
where categ_var not like '%B%';
should not filter out missing values.
data want;
set have;
where indexc(categ_var,'B') = 0;
run;
Hi. I agree with Chris@NewZealand in that the LIKE condition is SQL does NOT delete the observations with missing values ...
data test;
input id categ_var :$2. @@;
datalines;
1 AD 1 . 1 BA 2 BR 2 CB 2 . 2 BR 3 AZ
;
title "MISSING VALUES STILL IN DATA SET";
proc sql;title "MISSING VALUES STILL IN DATA SET";
select * from test
where categ_var not like '%B%';
quit;
MISSING VALUES STILL IN DATA SET
id categ_var
1 AD
1
2
3 AZ
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.