Solved
New Contributor
Posts: 3

# Need help! Lag in loop

Hello,

I have a file with 72 months of data for thousands of accounts, which makes the total # of records millions. each account has up to 72 rows of records, each of which represents one month. in each month's record, amt1 is for current month only, while the amt2 is from another table and always the same amount for all month for the same account. amt2 is the cumulative amount as of now, and this is not available at each month, therefore, when I merged the data file to the am2 table, each acct gets this same cumulative amt2. However, I need the correct cumulative amt2 as of each month, therefore, I need to back out the current amt1 from amt2 for the prior month's amt2.  I hope I'm not too confusing, but I made up some test data just to test.

here is my code, it works, but thinking about if I apply this in my real program, it might take forever to run? because it has millions of records?  I don't think this is the way to do, since the loop is to run the data step 72 times (6*12mons). I wonder if anyone has a much better way to do this? I need this asap. Thank you all for your help!

**** note:  assume acct is account no,  date is the month, c1 and c2 are the current month amounts, c2 and r2 are the cumulative amounts as of 20130600,, right now c2 and r2 are same for all months, I need to back out the c1 and r1 from c2 and r2 to get the correct c2 and r2 amounts for each prior month. i.e., for acct 11111 at 20130100, c2 should have been 500 instead of 1000, and 1000 for 20130200 after adding the 500 from c1 for the same acct, r2 as of 20130100 should have been -510, after you back out all r1 from 20130600 to 20130200.

data test;

input acct \$5. date  c1 c2 r1 r2 8.;

datalines;

11111 20130100 500 1000 0 -500

11111 20130200 500 1000 0 -500

11111 20130300 0 1000 -50 -500

11111 20130400 0 1000 0 -500

11111 20130500 0 1000 60 -500

11111 20130600 0 1000 0 -500

11112 20130200 100 2000 0 0

11113 20130100 0 200 -500 -1000

11113 20130200 0 200 0 -1000

11113 20130300 0 200 400 -1000

11113 20130400 100 200 200 -1000

11113 20130500 0 200 0 -1000

11113 20130600 0 200 -200 -1000

;

run;

proc sort data=test ; by acct descending date ; run;

data out;

set test;

by acct ;

run;

%macro crazyloop;

%do i=1 %to 10;

data out;

set out;

by acct ;

newc2=ifn(first.acct,c2,lag(newc2)-lag(c1));

newr2=ifn(first.acct,r2,lag(newr2)-lag(r1));

run;

%end;

%mend;

%crazyloop;

Accepted Solutions
Solution
‎09-25-2013 01:33 PM
Super User
Posts: 6,781

## Re: Need help! Lag in loop

It's a little confusing, but I think this is what you are trying to do.  This would replace your final DATA step, so you wouldn't need a macro:

data want;

total_c_to_back_out = 0;

total_r_to_back_out = 0;

do until (last.id);

set out (keep=id r1 c1);

by id;

total_r_to_back_out + r1;

total_c_to_back_out + c1;

end;

do until (last.id);

set out;

by id;

if first.id then do;

new_c = c2 - total_c_to_back_out;

new_r = r2 - total_r_to_back_out;

end;

new_c + c1;

new_r + r1;

output;

end;

run;

All Replies
Solution
‎09-25-2013 01:33 PM
Super User
Posts: 6,781

## Re: Need help! Lag in loop

It's a little confusing, but I think this is what you are trying to do.  This would replace your final DATA step, so you wouldn't need a macro:

data want;

total_c_to_back_out = 0;

total_r_to_back_out = 0;

do until (last.id);

set out (keep=id r1 c1);

by id;

total_r_to_back_out + r1;

total_c_to_back_out + c1;

end;

do until (last.id);

set out;

by id;

if first.id then do;

new_c = c2 - total_c_to_back_out;

new_r = r2 - total_r_to_back_out;

end;

new_c + c1;

new_r + r1;

output;

end;

run;

New Contributor
Posts: 3

## Re: Need help! Lag in loop

Cool !  I just tested it, works like a charm. produced exact same result as my stupid macro. Thank you!

Super Contributor
Posts: 339

## Re: Need help! Lag in loop

Don't use lag function, it is a queue and not an actual access to the previous row. Conditioning on piles makes it so that you don't push new values in/take old values out whenever your condition is false so you end up with an empty pile on the 2nd row. Regardless that it is possible to do with lag, it is much easier with a retain statement.

With the descending sort you've used, you can simply use a retain statement and substract the values as you go down the rows.

data want;

set have; /* sorted descending */

by acct;

retain newc2 newr2;

if first.acct then do;

newc2=newc2 = c2;

newr2=newr2 = r2;

end;

else do;

newc2=newc2-c1;

newr2=newr2-r1;

end;

run;

If you actually wish to substract the values from previous row as your lag logic reads, then instead do:

data want;

set have; /* sorted descending */

retain newc2 newr2;

by acct;

if first.acct then do;

newc2 = c2;

newr2 = r2;

output;

newc2=newc2-c1;

newr2=newr2-r1;

end;

else if not last.acct do;

output;

newc2=newc2-c1;

newr2=newr2-r1;

end;

else do;

output;

end;

run;

Only tested one of the 2 but logic is there. The real question is whether c2 and r2 are supposed to be begining of period running totals or end of period running totals.

Vince

New Contributor
Posts: 3