BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xiangpang
Quartz | Level 8

Hello,

I want to remove some ID because the first DM is missing or all other DM is missing besides first DM (such as ID1 and ID3). However, my code did not work. anyone can help me to figure it out? 

Thanks

data eq;
input ID y dm z w;
cards;
1 1 . 40 8
1 0 . 29 37
1 1 30 . 25
1 1 38 38 23
2 1 23 45 19
2 0 32 20 .
2 1 67 . .
2 1 . 27 .
3 0 33 23 46
3 1 . 12 56
3 0 . . 34
3 1 . 45 . 
4 1 56 45 23
4 0 67 13 67  
4 0 . 35 13
4 1 48 35 56 
;
run;

proc sql;                                                                                                                              
create table a as   
select * 
when first.dm ^= . 
when nmiss(of _numeric_) ^= count(of _numeric_)-1                                                                                                 
from eq; 
group by id; 
quit;  

PROC SORT data=eq;
by id;
RUN;

data b;                                                                                                                      set eq;  
by id ;
if first.dm=. then delete;
if nmiss(of _numeric_) ^= count(of _numeric_)-1   then delete;                                                                                                 
by id;                                                                                                                     
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@xiangpang wrote:

Thanks, but this is not what I want. I want to remove ID with the first DM missing in each ID or only have first DM in each ID. Your code only remove the ID with all DM missing.


That is still a little fuzzy. Sounds like you want delete all of the observations for an ID if either 

  1. The first value of DM is missing.
  2. The first value of DM is the only DISTINCT non-missing value of DM for that ID.

You can use RETAIN to keep the first value.  You can also use RETAIN to keep a flag variable to indicate if condition is met.

 

data drop_list ;
  set eq ;
  by id ;
  retain first_dm any_different;
  if first.id then first_dm = dm ;
  if first.id then any_different=0 ;
  if dm ne . and dm ne first_dm then any_different=1;
  if last.id and (first_dm=. or any_different=0);
  keep id;
run;

data want ;
  merge eq drop_list(in=in2);
  by id ;
  if in2 then delete;
run;

 

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

You don't seem to understand how BY processing works in a data step.  The FIRST. and LAST. automatic variables are binary flags to indicate whether this observations is the first (or last) observation in the by group.  So they can only ever be 1 (true) or 0 (false). They will never be missing.

xiangpang
Quartz | Level 8

Thanks for clarification about first. if I want to remove ID with the first DM missing in each ID or only have first DM in each ID, what code should I use.

 

Thanks again

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have overthought it, simply using a subquery would be simpler:

proc sql;
  create table want as
  select *
  from   eq
  where id in (select distinct id from eq where dm ne .);
quit;

This will filter off data where no dm value is present.

xiangpang
Quartz | Level 8

Thanks, but this is not what I want. I want to remove ID with the first DM missing in each ID or only have first DM in each ID. Your code only remove the ID with all DM missing.

Tom
Super User Tom
Super User

@xiangpang wrote:

Thanks, but this is not what I want. I want to remove ID with the first DM missing in each ID or only have first DM in each ID. Your code only remove the ID with all DM missing.


That is still a little fuzzy. Sounds like you want delete all of the observations for an ID if either 

  1. The first value of DM is missing.
  2. The first value of DM is the only DISTINCT non-missing value of DM for that ID.

You can use RETAIN to keep the first value.  You can also use RETAIN to keep a flag variable to indicate if condition is met.

 

data drop_list ;
  set eq ;
  by id ;
  retain first_dm any_different;
  if first.id then first_dm = dm ;
  if first.id then any_different=0 ;
  if dm ne . and dm ne first_dm then any_different=1;
  if last.id and (first_dm=. or any_different=0);
  keep id;
run;

data want ;
  merge eq drop_list(in=in2);
  by id ;
  if in2 then delete;
run;

 

xiangpang
Quartz | Level 8

Thanks a lot. This result is exact what I want. Is there a way to do that in one step? what if I don't use sql, how to do it in another way? 

 

 

Tom
Super User Tom
Super User

@xiangpang wrote:

Thanks a lot. This result is exact what I want. Is there a way to do that in one step? what if I don't use sql, how to do it in another way?  

 


It would be very hard to do in SQL since SQL has no concept of first observation.  If you had some type of record counter or other way of finding the first value of DM you could do it.

 

You could use a double DOW loop to do this in a single data step.  The first loop would gather the information need for the selection criteria and then the second loop would re-read the data for the by group and conditionally output them. Since you end up processing all of a by group in one pass of the data step there is no need to RETAIN the variables.  But you will probably want to drop them.

data want2 ;
  do until(last.id);
    set eq ;
    by id ;
    if first.id then first_dm = dm ;
    if first.id then any_different=0 ;
    if dm ne . and dm ne first_dm then any_different=1;
  end;
  do until(last.id);
    set eq ;
    by id ;
    if not (first_dm=. or any_different=0) then output;
  end;
  drop first_dm any_different ;
run;
xiangpang
Quartz | Level 8

Thanks a lot. I never think a way like this to solve the problem. I am happy to learn it. 

 

Thanks again. Have a good day!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2485 views
  • 1 like
  • 3 in conversation