BookmarkSubscribeRSS Feed
hwkim286
Calcite | Level 5

Hello.

 

I am currently trying to exclude all observations within an ID if the first observation of a variable is larger than 0.

 

So for example,

 

PID         Gender        Income         Year          Exercise

1               1                   300            2000            0

1               1                   400            2002            1

1               1                   50001        2004            2

2               2                  90000         2000            4

2               2                  5999           2002           10

2               2                  3443           2004            0 

3               2                  1333           2000            2

3               2                  4000           2002            3

3               2                  5000           2004            10

 

For PID 2 and 3, since exercise at year 2000 (the first year), I would like to delete everything under PID 2 and 3 (including the IDs) so that they won't show as having missing values.

 

Thank you.

 

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

I assume that your data is sorted by PID and Year

 

data have;
input PID Gender Income Year Exercise;
datalines;
1 1 300 2000 0
1 1 400 2002 1
1 1 50001 2004 2
2 2 90000 2000 4
2 2 5999 2002 10
2 2 3443 2004 0 
3 2 1333 2000 2
3 2 4000 2002 3
3 2 5000 2004 10
;

data want(drop=dummy);
    dummy=0;
    do until (last.PID);
        set have;
        by PID;
        if first.PID & Exercise gt 0 then dummy=1;
    end;
    do until (last.PID);
        set have;
        by PID;
        if dummy=0 then output;
    end;
run;
hashman
Ammonite | Level 13

@hwkim286 :

Since you condition is "first observation" for each ID, it's a single-pass job:

data have ;                                        
  input PID Gender Income Year Exercise ;          
  cards ;                                          
1  1    300  2000   0                              
1  1    400  2002   1                              
1  1  50001  2004   2                              
2  2  90000  2000   4                              
2  2   5999  2002  10                              
2  2   3443  2004   0                              
3  2   1333  2000   2                              
3  2   4000  2002   3                              
3  2   5000  2004  10                              
run ;                                              
                                                   
data want ;                                        
  set have ;                                       
  by pid ;                                         
  if first.pid then _iorc_ = ifn (exercise, 0, 1) ;
  if _iorc_ ;                                      
run ;                                              

Kind regards

Paul D.

Jakub_O
Calcite | Level 5

Please note that this condition

 

  if first.pid then _iorc_ = ifn (exercise, 0, 1) ;

will not work as specified in the initial post if the variable exercise takes negative values.

To work around that the condition should be

  if first.pid then _iorc_ = ifn (exercise gt 0, 0, 1) ;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 3825 views
  • 2 likes
  • 4 in conversation