Hi,
I have a large dataset that I would like to transform to ID level using the entire history. I am unsure on the most efficient way to do this but it's probably better explaining using the data below.
I am only interested in capturing where flag is populated (flag=1).
ID | Mon | Flag | Amt |
1 | Jan-18 | 100 | |
1 | Feb-18 | 200 | |
1 | Mar-18 | 250 | |
1 | Apr-18 | 1 | 400 |
1 | May-18 | 300 | |
2 | Jan-15 | 50 | |
2 | Feb-15 | 75 | |
2 | Mar-15 | 1 | 125 |
2 | Apr-15 | 300 | |
2 | May-15 | 500 | |
3 | Dec-14 | 600 | |
3 | Jan-15 | 700 | |
3 | Feb-15 | 650 | |
3 | Mar-15 | 400 | |
3 | Apr-15 | 300 | |
3 | May-15 | 1 | 100 |
3 | Jun-15 | 0 | |
4 | Jan-18 | 500 | |
4 | Feb-18 | 600 | |
4 | Mar-18 | 700 | |
4 | Apr-18 | 800 |
I want to transform to:
ID | Amt | Mon_flag |
1 | 250 | Apr-18 |
2 | 75 | Mar-15 |
3 | 300 | May-15 |
The 'Amt' is taken from the previous month prior to the flag=1 event.
'Mon_flag' is the month the flag was triggered.
I did start by selecting where flag=1 and then merging back into the history by previous month but it took a long time!
What is the most efficient way to carry this out in a reduced number of steps?
Thanks in advance.
Try this
data have;
input ID Mon $ Flag Amt;
infile datalines dsd dlm=',';
datalines;
1,Jan-18,,100
1,Feb-18,,200
1,Mar-18,,250
1,Apr-18,1,400
1,May-18,,300
2,Jan-15,,50
2,Feb-15,,75
2,Mar-15,1,125
2,Apr-15,,300
2,May-15,,500
3,Dec-14,,600
3,Jan-15,,700
3,Feb-15,,650
3,Mar-15,,400
3,Apr-15,,300
3,May-15,1,100
3,Jun-15,,0
4,Jan-18,,500
4,Feb-18,,600
4,Mar-18,,700
4,Apr-18,,800
;
data want(keep=Id Amt Mon_flag);
set have;
by ID;
lagAmt = lag1(amt);
if not first.ID & flag=1 then do;
amt=lagAmt;
output;
end;
rename Mon=Mon_flag;
run;
Try this
data have;
input ID Mon $ Flag Amt;
infile datalines dsd dlm=',';
datalines;
1,Jan-18,,100
1,Feb-18,,200
1,Mar-18,,250
1,Apr-18,1,400
1,May-18,,300
2,Jan-15,,50
2,Feb-15,,75
2,Mar-15,1,125
2,Apr-15,,300
2,May-15,,500
3,Dec-14,,600
3,Jan-15,,700
3,Feb-15,,650
3,Mar-15,,400
3,Apr-15,,300
3,May-15,1,100
3,Jun-15,,0
4,Jan-18,,500
4,Feb-18,,600
4,Mar-18,,700
4,Apr-18,,800
;
data want(keep=Id Amt Mon_flag);
set have;
by ID;
lagAmt = lag1(amt);
if not first.ID & flag=1 then do;
amt=lagAmt;
output;
end;
rename Mon=Mon_flag;
run;
Thanks for the quick reply Draycut, this is so much more cleaner efficient.
Anytime, glad you found your answer 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.