Solved: delete group in SAS if all observations are missing in that group for ...

nonlinear999 · Posted 07-03-2022 09:06 AM

Hi,

The following is an example of my dataset.

data have;
    input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

I would like to remove the ric group when all the n for the ric is missing. For example, abc should be removed but not hba (not all n is missing).

The output would look like the following-

data want;
input ric $ price n;
datalines;
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

This is my first post. Any help is appreciated. Thanks.

mkeintz · Posted 07-03-2022 02:13 PM

As others have said, there could be RIC groups, in which all obs except the last are missing. Such an RIC should be kept, according to your criteria. But you have to read the entire RIC group to determine that outcome.

The program below does that. It then rereads each group and outputs it depending on the results of the first pass of the RIC group.

  data have;
    input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

data want (drop=_:);
  do until (last.ric);
    set have;
    by ric notsorted;
    if n^=. then _keepdummy=1;
  end;
  do until (last.ric);
    set have;
    by ric notsorted;
    if _keepdummy=1 then output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

nonlinear999 · Posted 07-03-2022 09:42 AM

I actually solved it. Someone may find it useful. This is the code that I have adapted (reference community link )

data want;
set have;
by ric;
retain flag;
if first.ric then call missing(flag);
if first.ric and n^=. then flag=1;
if flag;
drop flag;
run;

Tom · Posted 07-03-2022 11:41 AM

That is not a general solution. It worked for your sample because non of the groups to be kept had a missing value on the first observations, just on later observations.

If you sorted

by ric descending n;

then your logic would work.

In general you need to check all of the observations in the group before deciding whether or not to delete the observations for that group.

You could use SQL code

proc sql;
create table want as
  select * from have
  group by ric
  having not missing(max(n))
;
quit;

Or if the data is sorted a double DOW loop.

data want;
  do until (last.ric);
     set have;
     by ric;
     if not missing(n) then flag=1;
  end;
  do until (last.ric);
     set have;
     by ric;
     if flag then output;
  end;
run;

Spintu · Posted 07-03-2022 09:44 AM

proc sql;
  create table want as
  select *
  from have
  where cats(ric) not in 
  (select cats(ric) from have where n =.);
quit;

I didn't tested may be you can for this approach.

qatman28 · Posted 07-03-2022 01:19 PM

I find it easiest to identify the values in a separate step, then merge to include or exclude the ones you want. Here is an example:

proc sort data=have;
  by ric;

** Select all distinct values of ric that have at least one nonmissing value for n **;
proc sort data=have (where=(^missing(n))) out=nonmissing (keep=ric) nodupkey;
  by ric;

** Restrict 'have' to only include those ric values with at least one nonmissing value for n **;
data want;
  merge have (in=in1)
        nonmissing (in=in2)
        ;
  by ric;
  ** Technically you only need 'in2' here, but as a general practice I like to be explicit with these statements **;
  if in1 & in2;
run;

mkeintz · Posted 07-03-2022 02:13 PM

As others have said, there could be RIC groups, in which all obs except the last are missing. Such an RIC should be kept, according to your criteria. But you have to read the entire RIC group to determine that outcome.

The program below does that. It then rereads each group and outputs it depending on the results of the first pass of the RIC group.

  data have;
    input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

data want (drop=_:);
  do until (last.ric);
    set have;
    by ric notsorted;
    if n^=. then _keepdummy=1;
  end;
  do until (last.ric);
    set have;
    by ric notsorted;
    if _keepdummy=1 then output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

nonlinear999 · Posted 07-05-2022 07:42 PM

So this makes more sense and the code seems to work well. You are right (I have missed that part in my explanation). Thank you.

delete group in SAS if all observations are missing in that group for a column

Re: delete group in SAS if all observations are missing in that group for a column

Re: delete group in SAS if all observations are missing in that group for a column

Re: delete group in SAS if all observations are missing in that group for a column

Re: delete group in SAS if all observations are missing in that group for a column

Re: delete group in SAS if all observations are missing in that group for a column

Re: delete group in SAS if all observations are missing in that group for a column

Re: delete group in SAS if all observations are missing in that group for a column

Registration is open

SAS Training: Just a Click Away