BookmarkSubscribeRSS Feed
hwkim286
Calcite | Level 5

Hello.

 

I am currently trying to exclude all observations within an ID if the first observation of a variable is larger than 0.

 

So for example,

 

PID         Gender        Income         Year          Exercise

1               1                   300            2000            0

1               1                   400            2002            1

1               1                   50001        2004            2

2               2                  90000         2000            4

2               2                  5999           2002           10

2               2                  3443           2004            0 

3               2                  1333           2000            2

3               2                  4000           2002            3

3               2                  5000           2004            10

 

For PID 2 and 3, since exercise at year 2000 (the first year), I would like to delete everything under PID 2 and 3 (including the IDs) so that they won't show as having missing values.

 

Thank you.

 

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

I assume that your data is sorted by PID and Year

 

data have;
input PID Gender Income Year Exercise;
datalines;
1 1 300 2000 0
1 1 400 2002 1
1 1 50001 2004 2
2 2 90000 2000 4
2 2 5999 2002 10
2 2 3443 2004 0 
3 2 1333 2000 2
3 2 4000 2002 3
3 2 5000 2004 10
;

data want(drop=dummy);
    dummy=0;
    do until (last.PID);
        set have;
        by PID;
        if first.PID & Exercise gt 0 then dummy=1;
    end;
    do until (last.PID);
        set have;
        by PID;
        if dummy=0 then output;
    end;
run;
hashman
Ammonite | Level 13

@hwkim286 :

Since you condition is "first observation" for each ID, it's a single-pass job:

data have ;                                        
  input PID Gender Income Year Exercise ;          
  cards ;                                          
1  1    300  2000   0                              
1  1    400  2002   1                              
1  1  50001  2004   2                              
2  2  90000  2000   4                              
2  2   5999  2002  10                              
2  2   3443  2004   0                              
3  2   1333  2000   2                              
3  2   4000  2002   3                              
3  2   5000  2004  10                              
run ;                                              
                                                   
data want ;                                        
  set have ;                                       
  by pid ;                                         
  if first.pid then _iorc_ = ifn (exercise, 0, 1) ;
  if _iorc_ ;                                      
run ;                                              

Kind regards

Paul D.

Jakub_O
Calcite | Level 5

Please note that this condition

 

  if first.pid then _iorc_ = ifn (exercise, 0, 1) ;

will not work as specified in the initial post if the variable exercise takes negative values.

To work around that the condition should be

  if first.pid then _iorc_ = ifn (exercise gt 0, 0, 1) ;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 5065 views
  • 2 likes
  • 4 in conversation