BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following: 

 

     ID             my_value                  

      1                      

      1                     1

      1                      

      1                     1 

      2                    1

      2                    0 

      2                    0

      2             

      3                    

      3                    0

      3                                                        

      4

      4

      4

Is there a way to subset according to the following rules:

 

1) if in my_value column for each ID there is at least one "1" then retain that row and delete the other rows for that ID (regardless if they contain a missing or 0). If there is more than one row with "1" it doesn't matter. Give me the ID one time with "1" in column my_value. (case of ID = 1)

2) if there's no 1 but only 0 and missing then retain 0 (case of ID = 3)

3) if only missing then give me one time that row with missing corresponding to my_value (case of ID = 4) 

 

Desired output: 

 

     ID             my_value                  

      1                     1

      2                    1

      3                   0

      4 

 

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @NewUsrStat,

 

You can use PROC SUMMARY to create an output dataset with the maximum of MY_VALUE per ID:

proc summary data=have;
by id;
var my_value;
output out=want(drop=_:) max=;
run;

If your real dataset is only grouped, but not sorted by ID, add the NOTSORTED option to the BY statement:

by id notsorted;

If it is not grouped either, use the NWAY option and the CLASS statement instead of the BY statement:

proc summary data=have nway;
class id;
...

View solution in original post

4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hello @NewUsrStat,

 

You can use PROC SUMMARY to create an output dataset with the maximum of MY_VALUE per ID:

proc summary data=have;
by id;
var my_value;
output out=want(drop=_:) max=;
run;

If your real dataset is only grouped, but not sorted by ID, add the NOTSORTED option to the BY statement:

by id notsorted;

If it is not grouped either, use the NWAY option and the CLASS statement instead of the BY statement:

proc summary data=have nway;
class id;
...
NewUsrStat
Lapis Lazuli | Level 10
Thank you very much FreelanceReinh!
Kurt_Bremser
Super User

There is an important lesson to be learned here: you looked at the problem as one of removal, while @FreelanceReinh sees it as one of keeping. It is always good to consider an issue from the diametrically opposed POV, as it may (and often does) lead to a much simpler solution.

Whenever you deal with conditional code, ask yourself if it isn't better to use the negated condition and perform the opposite action.

NewUsrStat
Lapis Lazuli | Level 10
Thank you Kurt_Bremser. It is a good idea the look at the problem with the opposite perspective. Unusual to think in this way but sometimes decisive.