I have a data set of the following type:
I want to have a single row per ID in such a way that it picks the maximum value of the status at its first occurrence and the month corresponding to the row that has max value of status.
The values of the remaining variables should be picked from row 1.
The new data set should look like
ID Status Amount Month
125 2 1200 2
127 2 1234 2
130 0 212 5
Thanks!
Thanks!
A datastep works well in this scenario.
data want;
set have;
by id;
retain new_status;
if first.id then do;
new_amount=amount;
new_status=status;
new_month=month;
end;
if status > new_status then do;
new_status=status;
new_month=month;
end;
if last.id then output;
drop status account month;
run;
Here is a SQL approach:
Proc sql;
create table want as
select * from
(select * from have group by id having status=max(status))
group by id, status having month=min(month)
;
quit;
Haikuo
PROC MEAN/SUMMARY has features that are specifically designed for this problem.
borrowed DN's data. the code bellow is inefficient, but it is easy to understand:
data odd;
input id $ status amount month;
cards;
125 1 1200 1
125 2 1300 2
125 1 1300 3
125 2 1300 4
127 0 1234 1
127 2 1000 2
127 2 1001 3
127 0 999 4
130 0 1212 5
130 0 1200 6
;;;;
data temp1(keep=id amount);
set odd;
by id notsorted;
if first.id;
run;
proc sort data=odd out=temp2;
by id descending status month;
data temp3(drop=amount);
set temp2;
by id;
if first.id;
run;
data want;
merge temp1 temp3;
by id;
proc print;run;
Obs id amount status month
1 125 1200 2 2
2 127 1234 2 2
3 130 1212 0 5
Thanks!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.