Hi,
I have a large dataset that I would like to transform to ID level using the entire history. I am unsure on the most efficient way to do this but it's probably better explaining using the data below.
I am only interested in capturing where flag is populated (flag=1).
| ID | Mon | Flag | Amt |
| 1 | Jan-18 | 100 | |
| 1 | Feb-18 | 200 | |
| 1 | Mar-18 | 250 | |
| 1 | Apr-18 | 1 | 400 |
| 1 | May-18 | 300 | |
| 2 | Jan-15 | 50 | |
| 2 | Feb-15 | 75 | |
| 2 | Mar-15 | 1 | 125 |
| 2 | Apr-15 | 300 | |
| 2 | May-15 | 500 | |
| 3 | Dec-14 | 600 | |
| 3 | Jan-15 | 700 | |
| 3 | Feb-15 | 650 | |
| 3 | Mar-15 | 400 | |
| 3 | Apr-15 | 300 | |
| 3 | May-15 | 1 | 100 |
| 3 | Jun-15 | 0 | |
| 4 | Jan-18 | 500 | |
| 4 | Feb-18 | 600 | |
| 4 | Mar-18 | 700 | |
| 4 | Apr-18 | 800 |
I want to transform to:
| ID | Amt | Mon_flag |
| 1 | 250 | Apr-18 |
| 2 | 75 | Mar-15 |
| 3 | 300 | May-15 |
The 'Amt' is taken from the previous month prior to the flag=1 event.
'Mon_flag' is the month the flag was triggered.
I did start by selecting where flag=1 and then merging back into the history by previous month but it took a long time!
What is the most efficient way to carry this out in a reduced number of steps?
Thanks in advance.
Try this
data have;
input ID Mon $ Flag Amt;
infile datalines dsd dlm=',';
datalines;
1,Jan-18,,100
1,Feb-18,,200
1,Mar-18,,250
1,Apr-18,1,400
1,May-18,,300
2,Jan-15,,50
2,Feb-15,,75
2,Mar-15,1,125
2,Apr-15,,300
2,May-15,,500
3,Dec-14,,600
3,Jan-15,,700
3,Feb-15,,650
3,Mar-15,,400
3,Apr-15,,300
3,May-15,1,100
3,Jun-15,,0
4,Jan-18,,500
4,Feb-18,,600
4,Mar-18,,700
4,Apr-18,,800
;
data want(keep=Id Amt Mon_flag);
set have;
by ID;
lagAmt = lag1(amt);
if not first.ID & flag=1 then do;
amt=lagAmt;
output;
end;
rename Mon=Mon_flag;
run;
Try this
data have;
input ID Mon $ Flag Amt;
infile datalines dsd dlm=',';
datalines;
1,Jan-18,,100
1,Feb-18,,200
1,Mar-18,,250
1,Apr-18,1,400
1,May-18,,300
2,Jan-15,,50
2,Feb-15,,75
2,Mar-15,1,125
2,Apr-15,,300
2,May-15,,500
3,Dec-14,,600
3,Jan-15,,700
3,Feb-15,,650
3,Mar-15,,400
3,Apr-15,,300
3,May-15,1,100
3,Jun-15,,0
4,Jan-18,,500
4,Feb-18,,600
4,Mar-18,,700
4,Apr-18,,800
;
data want(keep=Id Amt Mon_flag);
set have;
by ID;
lagAmt = lag1(amt);
if not first.ID & flag=1 then do;
amt=lagAmt;
output;
end;
rename Mon=Mon_flag;
run;
Thanks for the quick reply Draycut, this is so much more cleaner efficient.
Anytime, glad you found your answer 🙂
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.