Hi,
I am hoping you can help me. I have a large set of variables, where some of the variables have missing values (small example shown below). I only want to keep variables where there are no missing values, e.g. I need to delete variable B below.
How can I delete all variables with missing values?
Var A | Var B | Var C |
1 | 2 | 3 |
1 | 2 | 3 |
1 | 2 | 3 |
1 | 2 | 3 |
1 | 2 | 3 |
1 | missing | 3 |
1 | missing | 3 |
Thanks!
One way
data have;
input VarA VarB VarC;
datalines;
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 . 3
1 . 3
;
proc transpose data=have out=temp;
var _numeric_;
run;
data temp2;
set temp;
if nmiss(of col:)=0;
run;
proc transpose data=temp2 out=want(drop=_:);
id _name_;
var col:;
run;
One way
data have;
input VarA VarB VarC;
datalines;
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 . 3
1 . 3
;
proc transpose data=have out=temp;
var _numeric_;
run;
data temp2;
set temp;
if nmiss(of col:)=0;
run;
proc transpose data=temp2 out=want(drop=_:);
id _name_;
var col:;
run;
Thank you so much! This solved my problem 🙂
Hi @Birgithj,
Alternatively, you can adapt a technique that has been suggested by @yabwon for a similar question yesterday, e.g., like this:
ods output nlevels=_nlev(keep=tablevar nmisslevels);
proc freq data=have nlevels;
tables _all_ / noprint;
run;
data _null_;
call execute('data want; set have; drop _n_');
do until(last);
set _nlev end=last;
if nmisslevels then call execute(tablevar);
end;
call execute('; run;');
stop;
run;
(This works for numeric and character variables.)
Here is a way to do it:
data have;
input Var_A Var_B Var_C Var_D;
cards;
1 2 3 1
1 2 3 .
1 2 3 .
1 2 3 .
1 2 3 .
1 . 3 .
1 . 3 .
run;
%let inds=WORK.HAVE;
data _NULL_;
length missvars $2000;
dsid=open("&inds");
do i=1 to attrn(dsid,'nvars');
var=varname(dsid,i);
dsid2=open(catx(' ',"&inds(where=(",var,'is null))'));
if fetch(dsid2)=0 then
call catx(' ',missvars,var);
dsid2=close(dsid2);
end;
call symputx('missvars',missvars);
run;
%put &missvars;
Data want;
set have(drop=&missvars);
run;
For a task as simple as this, I see no reason to rewrite the original data set even once, let alone twice. Just auto-define a view VHAVE with the unneeded variables dropped and refer to it instead of HAVE in the processing downstream:
data have ;
input VarA VarB VarC VarD VarE VarF ;
cards ;
1 2 3 1 2 3
1 2 3 1 2 3
1 2 3 1 2 .
1 2 3 . 2 .
1 2 3 1 2 3
1 . 3 1 . 3
1 . 3 1 . 3
;
run ;
data _null_ ;
do until (z) ;
set have end = z ;
array v var: ;
length _d $ 32767 ;
do over v ;
if missing (v) and findw (_d, vname(v)) = 0 then _d = catx (" ", _d, vname (v)) ;
end ;
end ;
call execute (cats ("data vhave/view=vhave; set have(drop=", _d, "); run;")) ;
run ;
Now if you read VHAVE instead of HAVE, you'll have what you need. Note that the entire resource consumption with this method amounts to a single read through HAVE.
Kind regards
Paul D.
Do not use PROC TRANSPOSE if you have a big table. ( EDITED )
data have; input VarA VarB VarC; datalines; 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 . 3 1 . 3 ; ods select none; ods output nlevels=temp; proc freq data=have nlevels ; table _all_; run; ods select all; proc sql; select tablevar into : drop separated by ' ' from temp where nmisslevels>0; quit; data want; set have; drop &drop; run; proc print;run;
"Do not use PROC TRANSPOSE if you have a big table."
Agree; TRANSPOSE is quite memory-hungry. Also, if better performance is desired, MEANS does much better than FREQ with NLEVELS (and that without the burden of using ODS), particularly if the VAR variables are numerous; e,g.:
proc means noprint data = have ;
var var: ;
output out = temp (drop = _:) nmiss=;
run ;
data _null_ ;
set temp ;
array nn _numeric_ ;
call symputx ("drop", "") ;
do over nn ;
if nn then call symputx ("drop", catx (" ", symget ("drop"), vname (nn))) ;
end ;
run ;
Kind regards
Paul D.
p.s. As a side note, in your SQL, TableVar should be used instead of NLevels, or else the DATA step consuming the DROP macro variable won't produce the desired result (and bomb to boot on the attempt to use an invalid SAS name in the DROP list).
As a side note: Your last step will bomb on the DROP statement because in the preceding SQL you select NLevels, while it should be TableVar instead.
Hi @hashman ,
Thanks point out that error. it has been fixed.
As you said PROC MEANS can't handle character variable.
Another Alternative way is using PROC SQL, but need some more code (I did't post it ), I believe SQL would have faster speed .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.