Hi,
I have a dataset with a categorical variable filled with two-letter values. Some of the values are missing.
ID Categ_var
1 AD
1
1 BA
2 BR
2 CB
2
2 BR
3 AZ
I am trying to filter values of Categ_var that have a B in them, like this:
proc sql;
create table want as
select * from have
where categ_var not like '%B%';
quit;
The problem is that doing this causes SAS to remove the rows in which Categ_var is missing as well, but I want to keep those rows. I tried amending it like this:
proc sql;
create table want as
select * from have
where (categ_var=" " or categ_var not like '%B%');
quit;
But the same thing happened.
Any help is much appreciated.
where not(missing) categ_var and categ_var not like '%B%';
where not missing(categ_var) and categ_var not like '%B%';
Yup, that's what happens when I do two things at once and don't test.
Easy to do!
where categ_var not like '%B%';
should not filter out missing values.
data want;
set have;
where indexc(categ_var,'B') = 0;
run;
Hi. I agree with Chris@NewZealand in that the LIKE condition is SQL does NOT delete the observations with missing values ...
data test;
input id categ_var :$2. @@;
datalines;
1 AD 1 . 1 BA 2 BR 2 CB 2 . 2 BR 3 AZ
;
title "MISSING VALUES STILL IN DATA SET";
proc sql;title "MISSING VALUES STILL IN DATA SET";
select * from test
where categ_var not like '%B%';
quit;
MISSING VALUES STILL IN DATA SET
id categ_var
1 AD
1
2
3 AZ
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.