BookmarkSubscribeRSS Feed
GalMed
Calcite | Level 5


I have a dataset with about 6000 observations and over 1000 variables, some of them may have a value only for one variable - ID, how can I find these observations without listing ALL 1000 variables as missing in my code? something general?

Hope someone knows!!

10 REPLIES 10
MikeZdeb
Rhodochrosite | Level 12

hi ... try CMISS (idea posted a few times by Linlin) ... if ID is never missing you can take that check out of the IF statement  ...

data x;

input id gender :$1. a b c;

datalines;

1 M 9 9 9

2 . . . .

3 F 9 . 9

. F . . .

;

* specify number of variables-1 for CMISS finction;

data y;

set x;

if ^missing(id) and cmiss(of _all_) eq  4;

run;


id    gender    a    b    c

2              .    .    .

mkeintz
PROC Star

Try this (based on a recent SAS-L discussion):

data want;

  set have (drop=id);

  _n_=  ( n(of _numeric_)>0)  + (coalescec(of _character_)^=' ');

  set have (keep=id);

  if _n_;

run;

The revealing part of this (to me) was to learn that ID was not included when the "of _numeric_" and "of _character_" were compiled, so it did not interfere with the testing for all missing values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

one more way.

data x;
input id gender :$1. a b c;
datalines;
1 M 9 9 9
2 . . . .
3 F 9 . 9
. F . . .
;
run;


options missing=' ';
data want;
 set x;
 if cats(of _all_)=cats(id) ;
run;




Ksharp

art297
Opal | Level 21

And, you can take Ksharp's, or any of the suggestions, and incorporate the logic into code that separates the missing and non-missing records.  e.g.:

data have;

input id gender :$1. a b c;

datalines;

1 M 9 9 9

2 . . . .

3 F 9 . 9

. . . . .

. F . . .

;

run;

options missing=' ';

data want missing;

set have;

if missing(cats(of _all_))

  or cats(of _all_)=cats(id)

  then output missing;

else output want;

run;

mkeintz
PROC Star

" Yes, using the "options missing=' '  in combination with a single "if missing(cats(of _all_))" is seductive,
   BUT ...

it only works with plain vanilla numeric missing values.  As a counter-example, consider that:

options missing=' ';

data _null_;

  x=.A;

  if missing(cats(of _all_)) then putlog 'missing';

  else putlog 'Not missing';

run;


will produce "Not missing" on the log.

As long as you have any numeric values I don't see how to efficiently avoid separately testing numerics and character variables for 100% missing-ness.

So I'll go back to my earlier suggestion with a minor modification to adapt to the possibility that the incoming data set night have only one type of variable.  Lets assume there are 3 ID variables that should be excluded from the testing.  Then:

data want (drop=_:);

   retain _chr ' '  _num .;

   set have;  

   call missing(id1, id2, id3);  ** Avoid false positives in the test for all missing **;

   _allmiss=   (N(of _numeric_)=0)) and (coalescec(of _character_)=' ');

    set have (keep=id1 id2 id3);   ** Go back and get the excluded variables **;

    if _allmiss;

run;

Changes:

  1. The leading RETAIN statement assures that there is at least one variables of each type, so that the N and COALESCEC functions don't produce a "no arguments" error condition.
  2. The first SET statement doesn't have the "drop=id1 id2 id3") option that I used earlier, because I don't want to change the order of variables in WANT.  Hence a "call missing" statement must precede the test for all missing numerics and characters.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

Ha. It looks like we need some more code to be stronger .

data have;
input id gender :$1. a b c;
datalines;
1 M 9 9 9
2 . . .b ._
3 F 9 . 9
. . .a . .z
. F . . .
;
run;   
options missing=' ';
data want missing;
 set have;
 array _a{*} _numeric_;
 do _n_=1 to dim(_a) ;
  if missing(_a{_n_}) then call missing(_a{_n_});
 end;

if missing(cats(of _all_))
  or cats(of _all_)=cats(id)
  then output missing;
else output want;
run;
run;

Ksharp

mkeintz
PROC Star

I don't see the benefit of this approach.  It confirms the assertion that numeric values have to be treated separately from character values, and it unneccesarily loses information by collapsing all numeric missing codes to a single value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
venkatard
Calcite | Level 5

What if i have 2 variables present and rest all missing and need to delete those type of observations.

For example gender and class should be excluded and need to check the rest all missing observations and delete those which have all missing values.

art297
Opal | Level 21

Pick the solution you want and just include a drop option.  e.g.:

data have;

input id gender :$1. class a b c;

datalines;

1 M 1 9 9 9

2 . 1 . . .

3 F 2 9 . 9

. . 3 . . .

. F 3 . . .

;

run;

options missing=' ';

data want missing;

set have (drop=gender class);

if missing(cats(of _all_))

  or cats(of _all_)=cats(id)

  then output missing;

else output want;

run;

GalMed
Calcite | Level 5

Thanks to all of you for help.

Works nicely.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 4828 views
  • 0 likes
  • 6 in conversation