Hi,
I want to delete rows that have missing values for a list of consecutive variables. For example, from a the a dataset like:
id cat var1 var2 var3 var4 stop
42 a 1 0 0 1 Y
43 b . . . . Y
44 a 1 . 0 1 N
45 b 0 . . 1 Y
I want to delete row 43 which is missing data from var1 to var 4.
Thank you!
My apologies. It looks like you can't use a -- in a WHERE statement. Which I guess makes sense, because it relies on the order of variables in the PDV.
It should work if you change to a subsetting IF, e.g.:
12 data want;
13 set have ;
14 if cmiss(OF var1 -- zvariable) < 5 ; *IF not where;
15 run;
NOTE: There were 1 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 1 observations and 3 variables.
Does it have to be missing values for all variables from var1 to var4?
Yes, missing all values from var 1 to 4.
Try this
data have;
input id cat $ var1 var2 var3 var4 stop $;
datalines;
42 a 1 0 0 1 Y
43 b . . . . Y
44 a 1 . 0 1 N
45 b 0 . . 1 Y
;
data want;
set have;
where n(var1-var4);
run;
Thank you. It didn't work. I forgot the add that the variable names can have different names. For example the names are var1 xvar2 yname3 zvariable. These are all numeric variables.
What defines the range of variables?
You can list them:
data want;
set have;
where n(var1, xvar2, yname3, zvariable) ;
run;
Or if you know the four variables are in order in the PDV you could use:
data want;
set have;
where n(of var1 -- zvariable) ;
run;
If you want to do it based on variable position (which is odd) I would run PROC CONTENTS to figure out the names of the variables in positions 1-4, then put that into a macro variable.
Thanks! I had to list all variables like the first option you gave. And when I ran it, I realized that in the middle of the range there is a character variable. Say yname3 is a character and not numeric. How will you handle that?
Do you want to include the character variable or exclude it?
If you want to exclude it, you can do:
where n(of var1 - numeric - zvariable) ;
That will only include the numeric variables in the range.
If you want to include the character variable, you can switch from the N function to the CMISS function, which can handle both numeric and character variables:
where cmiss(of var1 -- zvariable) < 5 ; *assuming there are 5 variables in this range;
This is very helpful. I would want to include character variables as well. But "where" with cmiss() is not working for character variables.
What do you mean "not working"? Do you get an error, or the wrong result?
Please try making a little example like what @PeterClemmensen posted where there is a data step for making a HAVE dataset with a few records, and then a WANT step using CMISS. If you get errors in the log, please post the log as well. CMISS should work with both numeric and character variables. The documentation for CMISS is: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p1tth4ltf640din1ey86ubo2lky2.h...
This is the code:
data want;
set have;
where cmiss(var1 -- zvariable) <5;
run;
Where zvaribale is a character variable. I get this error in the log:
ERROR: WHERE clause operator requires numeric values.
When you use the -- or - list as an argument to a function, you need to use the word OF at the beginning. Try:
data want;
set have;
where cmiss(OF var1 -- zvariable) < 5 ;
run;
If you don't write OF, SAS will do subtraction:
var1 - (-zvariable)
The error you're getting is because SAS is trying to do subtraction with a character variable.
Thanks for your reply. I added the word OF as advised and I am getting this error in the log:
My apologies. It looks like you can't use a -- in a WHERE statement. Which I guess makes sense, because it relies on the order of variables in the PDV.
It should work if you change to a subsetting IF, e.g.:
12 data want;
13 set have ;
14 if cmiss(OF var1 -- zvariable) < 5 ; *IF not where;
15 run;
NOTE: There were 1 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 1 observations and 3 variables.
It worked! Thank you so much! This is extremely helpful.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.