Hi,
I have a large dataset that I would like to transform to ID level using the entire history. I am unsure on the most efficient way to do this but it's probably better explaining using the data below.
I am only interested in capturing where flag is populated (flag=1).
ID | Mon | Flag | Amt |
1 | Jan-18 | 100 | |
1 | Feb-18 | 200 | |
1 | Mar-18 | 250 | |
1 | Apr-18 | 1 | 400 |
1 | May-18 | 300 | |
2 | Jan-15 | 50 | |
2 | Feb-15 | 75 | |
2 | Mar-15 | 1 | 125 |
2 | Apr-15 | 300 | |
2 | May-15 | 500 | |
3 | Dec-14 | 600 | |
3 | Jan-15 | 700 | |
3 | Feb-15 | 650 | |
3 | Mar-15 | 400 | |
3 | Apr-15 | 300 | |
3 | May-15 | 1 | 100 |
3 | Jun-15 | 0 | |
4 | Jan-18 | 500 | |
4 | Feb-18 | 600 | |
4 | Mar-18 | 700 | |
4 | Apr-18 | 800 |
I want to transform to:
ID | Amt | Mon_flag |
1 | 250 | Apr-18 |
2 | 75 | Mar-15 |
3 | 300 | May-15 |
The 'Amt' is taken from the previous month prior to the flag=1 event.
'Mon_flag' is the month the flag was triggered.
I did start by selecting where flag=1 and then merging back into the history by previous month but it took a long time!
What is the most efficient way to carry this out in a reduced number of steps?
Thanks in advance.
Try this
data have;
input ID Mon $ Flag Amt;
infile datalines dsd dlm=',';
datalines;
1,Jan-18,,100
1,Feb-18,,200
1,Mar-18,,250
1,Apr-18,1,400
1,May-18,,300
2,Jan-15,,50
2,Feb-15,,75
2,Mar-15,1,125
2,Apr-15,,300
2,May-15,,500
3,Dec-14,,600
3,Jan-15,,700
3,Feb-15,,650
3,Mar-15,,400
3,Apr-15,,300
3,May-15,1,100
3,Jun-15,,0
4,Jan-18,,500
4,Feb-18,,600
4,Mar-18,,700
4,Apr-18,,800
;
data want(keep=Id Amt Mon_flag);
set have;
by ID;
lagAmt = lag1(amt);
if not first.ID & flag=1 then do;
amt=lagAmt;
output;
end;
rename Mon=Mon_flag;
run;
Try this
data have;
input ID Mon $ Flag Amt;
infile datalines dsd dlm=',';
datalines;
1,Jan-18,,100
1,Feb-18,,200
1,Mar-18,,250
1,Apr-18,1,400
1,May-18,,300
2,Jan-15,,50
2,Feb-15,,75
2,Mar-15,1,125
2,Apr-15,,300
2,May-15,,500
3,Dec-14,,600
3,Jan-15,,700
3,Feb-15,,650
3,Mar-15,,400
3,Apr-15,,300
3,May-15,1,100
3,Jun-15,,0
4,Jan-18,,500
4,Feb-18,,600
4,Mar-18,,700
4,Apr-18,,800
;
data want(keep=Id Amt Mon_flag);
set have;
by ID;
lagAmt = lag1(amt);
if not first.ID & flag=1 then do;
amt=lagAmt;
output;
end;
rename Mon=Mon_flag;
run;
Thanks for the quick reply Draycut, this is so much more cleaner efficient.
Anytime, glad you found your answer 🙂
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.