DATA Step, Macro, Functions and more

how to remove some data?

Accepted Solution Solved
Reply
Contributor
Posts: 60
Accepted Solution

how to remove some data?

Hello,

I want to remove some ID because the first DM is missing or all other DM is missing besides first DM (such as ID1 and ID3). However, my code did not work. anyone can help me to figure it out? 

Thanks

data eq;
input ID y dm z w;
cards;
1 1 . 40 8
1 0 . 29 37
1 1 30 . 25
1 1 38 38 23
2 1 23 45 19
2 0 32 20 .
2 1 67 . .
2 1 . 27 .
3 0 33 23 46
3 1 . 12 56
3 0 . . 34
3 1 . 45 . 
4 1 56 45 23
4 0 67 13 67  
4 0 . 35 13
4 1 48 35 56 
;
run;

proc sql;                                                                                                                              
create table a as   
select * 
when first.dm ^= . 
when nmiss(of _numeric_) ^= count(of _numeric_)-1                                                                                                 
from eq; 
group by id; 
quit;  

PROC SORT data=eq;
by id;
RUN;

data b;                                                                                                                      set eq;  
by id ;
if first.dm=. then delete;
if nmiss(of _numeric_) ^= count(of _numeric_)-1   then delete;                                                                                                 
by id;                                                                                                                     
run;

Accepted Solutions
Solution
‎03-09-2018 10:07 AM
Super User
Super User
Posts: 7,844

Re: how to remove some data?

[ Edited ]
Posted in reply to xiangpang

@xiangpang wrote:

Thanks, but this is not what I want. I want to remove ID with the first DM missing in each ID or only have first DM in each ID. Your code only remove the ID with all DM missing.


That is still a little fuzzy. Sounds like you want delete all of the observations for an ID if either 

  1. The first value of DM is missing.
  2. The first value of DM is the only DISTINCT non-missing value of DM for that ID.

You can use RETAIN to keep the first value.  You can also use RETAIN to keep a flag variable to indicate if condition is met.

 

data drop_list ;
  set eq ;
  by id ;
  retain first_dm any_different;
  if first.id then first_dm = dm ;
  if first.id then any_different=0 ;
  if dm ne . and dm ne first_dm then any_different=1;
  if last.id and (first_dm=. or any_different=0);
  keep id;
run;

data want ;
  merge eq drop_list(in=in2);
  by id ;
  if in2 then delete;
run;

 

View solution in original post


All Replies
Super User
Super User
Posts: 7,844

Re: how to remove some data?

Posted in reply to xiangpang

You don't seem to understand how BY processing works in a data step.  The FIRST. and LAST. automatic variables are binary flags to indicate whether this observations is the first (or last) observation in the by group.  So they can only ever be 1 (true) or 0 (false). They will never be missing.

Contributor
Posts: 60

Re: how to remove some data?

Thanks for clarification about first. if I want to remove ID with the first DM missing in each ID or only have first DM in each ID, what code should I use.

 

Thanks again

Super User
Super User
Posts: 9,193

Re: how to remove some data?

Posted in reply to xiangpang

You have overthought it, simply using a subquery would be simpler:

proc sql;
  create table want as
  select *
  from   eq
  where id in (select distinct id from eq where dm ne .);
quit;

This will filter off data where no dm value is present.

Contributor
Posts: 60

Re: how to remove some data?

Thanks, but this is not what I want. I want to remove ID with the first DM missing in each ID or only have first DM in each ID. Your code only remove the ID with all DM missing.

Solution
‎03-09-2018 10:07 AM
Super User
Super User
Posts: 7,844

Re: how to remove some data?

[ Edited ]
Posted in reply to xiangpang

@xiangpang wrote:

Thanks, but this is not what I want. I want to remove ID with the first DM missing in each ID or only have first DM in each ID. Your code only remove the ID with all DM missing.


That is still a little fuzzy. Sounds like you want delete all of the observations for an ID if either 

  1. The first value of DM is missing.
  2. The first value of DM is the only DISTINCT non-missing value of DM for that ID.

You can use RETAIN to keep the first value.  You can also use RETAIN to keep a flag variable to indicate if condition is met.

 

data drop_list ;
  set eq ;
  by id ;
  retain first_dm any_different;
  if first.id then first_dm = dm ;
  if first.id then any_different=0 ;
  if dm ne . and dm ne first_dm then any_different=1;
  if last.id and (first_dm=. or any_different=0);
  keep id;
run;

data want ;
  merge eq drop_list(in=in2);
  by id ;
  if in2 then delete;
run;

 

Contributor
Posts: 60

Re: how to remove some data?

Thanks a lot. This result is exact what I want. Is there a way to do that in one step? what if I don't use sql, how to do it in another way? 

 

 

Super User
Super User
Posts: 7,844

Re: how to remove some data?

Posted in reply to xiangpang

@xiangpang wrote:

Thanks a lot. This result is exact what I want. Is there a way to do that in one step? what if I don't use sql, how to do it in another way?  

 


It would be very hard to do in SQL since SQL has no concept of first observation.  If you had some type of record counter or other way of finding the first value of DM you could do it.

 

You could use a double DOW loop to do this in a single data step.  The first loop would gather the information need for the selection criteria and then the second loop would re-read the data for the by group and conditionally output them. Since you end up processing all of a by group in one pass of the data step there is no need to RETAIN the variables.  But you will probably want to drop them.

data want2 ;
  do until(last.id);
    set eq ;
    by id ;
    if first.id then first_dm = dm ;
    if first.id then any_different=0 ;
    if dm ne . and dm ne first_dm then any_different=1;
  end;
  do until(last.id);
    set eq ;
    by id ;
    if not (first_dm=. or any_different=0) then output;
  end;
  drop first_dm any_different ;
run;
Contributor
Posts: 60

Re: how to remove some data?

Thanks a lot. I never think a way like this to solve the problem. I am happy to learn it. 

 

Thanks again. Have a good day!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 96 views
  • 1 like
  • 3 in conversation