I have a dataset with about 6000 observations and over 1000 variables, some of them may have a value only for one variable - ID, how can I find these observations without listing ALL 1000 variables as missing in my code? something general?
Hope someone knows!!
hi ... try CMISS (idea posted a few times by Linlin) ... if ID is never missing you can take that check out of the IF statement ...
data x;
input id gender :$1. a b c;
datalines;
1 M 9 9 9
2 . . . .
3 F 9 . 9
. F . . .
;
* specify number of variables-1 for CMISS finction;
data y;
set x;
if ^missing(id) and cmiss(of _all_) eq 4;
run;
id gender a b c
2 . . .
Try this (based on a recent SAS-L discussion):
data want;
set have (drop=id);
_n_= ( n(of _numeric_)>0) + (coalescec(of _character_)^=' ');
set have (keep=id);
if _n_;
run;
The revealing part of this (to me) was to learn that ID was not included when the "of _numeric_" and "of _character_" were compiled, so it did not interfere with the testing for all missing values.
one more way.
data x;
input id gender :$1. a b c;
datalines;
1 M 9 9 9
2 . . . .
3 F 9 . 9
. F . . .
;
run;
options missing=' ';
data want;
set x;
if cats(of _all_)=cats(id) ;
run;
Ksharp
And, you can take Ksharp's, or any of the suggestions, and incorporate the logic into code that separates the missing and non-missing records. e.g.:
data have;
input id gender :$1. a b c;
datalines;
1 M 9 9 9
2 . . . .
3 F 9 . 9
. . . . .
. F . . .
;
run;
options missing=' ';
data want missing;
set have;
if missing(cats(of _all_))
or cats(of _all_)=cats(id)
then output missing;
else output want;
run;
" Yes, using the "options missing=' ' in combination with a single "if missing(cats(of _all_))" is seductive,
BUT ...
it only works with plain vanilla numeric missing values. As a counter-example, consider that:
options missing=' ';
data _null_;
x=.A;
if missing(cats(of _all_)) then putlog 'missing';
else putlog 'Not missing';
run;
will produce "Not missing" on the log.
As long as you have any numeric values I don't see how to efficiently avoid separately testing numerics and character variables for 100% missing-ness.
So I'll go back to my earlier suggestion with a minor modification to adapt to the possibility that the incoming data set night have only one type of variable. Lets assume there are 3 ID variables that should be excluded from the testing. Then:
data want (drop=_:);
retain _chr ' ' _num .;
set have;
call missing(id1, id2, id3); ** Avoid false positives in the test for all missing **;
_allmiss= (N(of _numeric_)=0)) and (coalescec(of _character_)=' ');
set have (keep=id1 id2 id3); ** Go back and get the excluded variables **;
if _allmiss;
run;
Changes:
Ha. It looks like we need some more code to be stronger .
data have; input id gender :$1. a b c; datalines; 1 M 9 9 9 2 . . .b ._ 3 F 9 . 9 . . .a . .z . F . . . ; run; options missing=' '; data want missing; set have; array _a{*} _numeric_; do _n_=1 to dim(_a) ; if missing(_a{_n_}) then call missing(_a{_n_}); end; if missing(cats(of _all_)) or cats(of _all_)=cats(id) then output missing; else output want; run; run;
Ksharp
I don't see the benefit of this approach. It confirms the assertion that numeric values have to be treated separately from character values, and it unneccesarily loses information by collapsing all numeric missing codes to a single value.
What if i have 2 variables present and rest all missing and need to delete those type of observations.
For example gender and class should be excluded and need to check the rest all missing observations and delete those which have all missing values.
Pick the solution you want and just include a drop option. e.g.:
data have;
input id gender :$1. class a b c;
datalines;
1 M 1 9 9 9
2 . 1 . . .
3 F 2 9 . 9
. . 3 . . .
. F 3 . . .
;
run;
options missing=' ';
data want missing;
set have (drop=gender class);
if missing(cats(of _all_))
or cats(of _all_)=cats(id)
then output missing;
else output want;
run;
Thanks to all of you for help.
Works nicely.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.