BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SasDewd
Obsidian | Level 7

Hello,

 

I have a data set that includes Period (20190930, 20190831, 20190731,....), AccountNumber, CurrentBalance, and some other columns.

Let's say the data set goes back 2 years or back to Period 20180131.

What I want to do is I want to add a column to each row in the data set, called PriorPeriodCurrentBalance, and populate it with, well, the current balance amount from the prior month.

I'm pretty sure this can be done with the LAG function and I found some examples but I think I may need to sort my data set first and I can't find examples that do just that.

Anyone has any ideas?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

What exactly are you having trouble with?

The short,simple way is to process the data in order and use the LAG() function to remember the value of the variable from the previous call to the LAG() function.  Remember to clear the result when you start a new account, otherwise your variable will have the last balance from the previous account.

data want;
  set have;
  by AccountNumber Period;
  PriorPeriodCurrentBalance=lag(CurrentBalance);
  if first.AccountNumber then call missing(PriorPeriodCurrentBalance);
run;

 

View solution in original post

4 REPLIES 4
Krueger
Pyrite | Level 9

For sorting (you may have to play with it depending on your data) although it's not 100% necessary, it is extremely useful so you can ensure your LAG function is working correctly.

 

PROC SORT DATA = Have OUT = Want;
	BY AccountNumber Period;
RUN;

Once your data is sorted how you want it, I would recommend inputting ROW_NUMBER() function with PROC SQL. I'm assuming you want the LAG to be the prior period for the given Account Number?

 

If so you would do something like this:

PROC SQL;
CREATE TABLE Want AS
SELECT *
       ,ROW_NUMBER() OVER(PARTITION BY Period ORDER BY AccountNumber)
   FROM Have;
QUIT;

 Once your ROW_NUMBER() function is lining up with the data the way you want it you can then implement LAG() using the same OVER(PARTITION BY) as your ROW_NUMBER. 

 

Sometimes in these cases your data will be reversed so the PRIOR period you want will actually be row 3 while the current period is row 2. If this is the case simply switch your ORDER  BY in the ROW_NUMBER (or LAG) to DESC/ASC and that should fix it.

Tom
Super User Tom
Super User

What exactly are you having trouble with?

The short,simple way is to process the data in order and use the LAG() function to remember the value of the variable from the previous call to the LAG() function.  Remember to clear the result when you start a new account, otherwise your variable will have the last balance from the previous account.

data want;
  set have;
  by AccountNumber Period;
  PriorPeriodCurrentBalance=lag(CurrentBalance);
  if first.AccountNumber then call missing(PriorPeriodCurrentBalance);
run;

 

mkeintz
PROC Star

Your data is apparently  monthly date sorted by AccountNumber and descending date.   So yes, you could sort in ascending order, then use the LAG function to retrieve the value for the prior period, making sure you  don't contaminate the start of one accountnumber with lagged values from the prior account.  At its simplest such a program would look like:

 

proc sort data=have  out=have_ascending;
  by accountnumber period;
run;

data want;
   set have_ascending;
   by accountnumber;
   lagged_bal=lag(current_balance);
   if first.accountnumber=1 then lagged_bal=.;
run;

 

 

Now, if for some reason you need the data to be restored to original order, implying a second sort, you could avoid sorting altogether by something like:

 

data want;
  set have (keep=accountnumber);
  by accountnumber;
  merge have
        have (firstobs=2 keep=currentbalance rename=(currentbalance=prior_balance));
**  if first.accountnumber the prior_balance=.;
  if last.accountnumber then prior_balannce=.;
  run;

 

Correction above, thanks to @Tom .

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

If reading the data in descending date order then it is LAST observation for the account that should not have a prior value instead of the FIRST.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 733 views
  • 2 likes
  • 4 in conversation