BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nonlinear999
Fluorite | Level 6

Hi,

The following is an example of my dataset. 

data have;
    input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

I would like to remove the ric group when all the n for the ric is missing. For example, abc should be removed but not hba (not all n is missing). 

The output would look like the following- 

data want;
input ric $ price n;
datalines;
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

This is my first post. Any help is appreciated. Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

As others have said, there could be RIC groups, in which all obs except the last are missing.  Such an RIC should be kept, according to your criteria.  But you have to read the entire RIC group to determine that outcome.

 

The program below does that.  It then rereads each group and outputs it depending on the results of the first pass of the RIC group.

 

  data have;
    input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

data want (drop=_:);
  do until (last.ric);
    set have;
    by ric notsorted;
    if n^=. then _keepdummy=1;
  end;
  do until (last.ric);
    set have;
    by ric notsorted;
    if _keepdummy=1 then output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
nonlinear999
Fluorite | Level 6

I actually solved it. Someone may find it useful. This is the code that I have adapted (reference community link ) 

data want;
set have;
by ric;
retain flag;
if first.ric then call missing(flag);
if first.ric and n^=. then flag=1;
if flag;
drop flag;
run;
Tom
Super User Tom
Super User

That is not a general solution.  It worked for your sample because non of the groups to be kept had a missing value on the first observations, just on later observations.

If you sorted 

by ric descending n;

then your logic would work.

 

In general you need to check all of the observations in the group before deciding whether or not to delete the observations for that group.

You could use SQL code

proc sql;
create table want as
  select * from have
  group by ric
  having not missing(max(n))
;
quit;

Or if the data is sorted a double DOW loop.

data want;
  do until (last.ric);
     set have;
     by ric;
     if not missing(n) then flag=1;
  end;
  do until (last.ric);
     set have;
     by ric;
     if flag then output;
  end;
run;

 

Spintu
Quartz | Level 8
proc sql;
  create table want as
  select *
  from have
  where cats(ric) not in 
  (select cats(ric) from have where n =.);
quit;

I didn't tested may be you can for this approach.

qatman28
Obsidian | Level 7

I find it easiest to identify the values in a separate step, then merge to include or exclude the ones you want. Here is an example:

 

proc sort data=have;
  by ric;

** Select all distinct values of ric that have at least one nonmissing value for n **;
proc sort data=have (where=(^missing(n))) out=nonmissing (keep=ric) nodupkey;
  by ric;

** Restrict 'have' to only include those ric values with at least one nonmissing value for n **;
data want;
  merge have (in=in1)
        nonmissing (in=in2)
        ;
  by ric;
  ** Technically you only need 'in2' here, but as a general practice I like to be explicit with these statements **;
  if in1 & in2;
run;


mkeintz
PROC Star

As others have said, there could be RIC groups, in which all obs except the last are missing.  Such an RIC should be kept, according to your criteria.  But you have to read the entire RIC group to determine that outcome.

 

The program below does that.  It then rereads each group and outputs it depending on the results of the first pass of the RIC group.

 

  data have;
    input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20 
cba 40 10
hba 10 10 
hba 12 . 
cab 6 20
cab 7 20
run;

data want (drop=_:);
  do until (last.ric);
    set have;
    by ric notsorted;
    if n^=. then _keepdummy=1;
  end;
  do until (last.ric);
    set have;
    by ric notsorted;
    if _keepdummy=1 then output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
nonlinear999
Fluorite | Level 6

So this makes more sense and the code seems to work well. You are right (I have missed that part in my explanation). Thank you. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1879 views
  • 3 likes
  • 5 in conversation