DATA Step, Macro, Functions and more

Lag statement

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Lag statement

Hi Everyone

Using the lag function, I am trying to create a declining balance in a numeric column based on the value from the previous record, assuming it meets one criteria. Below is the code for this. However, the result is actually accumulating rather than declining. I have tried various iterations of this and it does not work. However, I have lag in the past without issue. Does anyone know if this is logically wrong or a syntax issue?

Attached is part of the result also.

Paul

data test2;

set &a3;

by &a6;

if first.&a7 then CNumber=CohortYearTotalAge;

if ExitMonthCategory > .5 then do;

CNumber=lag(CNumber)-DistributiveNumber;

end;

run;

Attachment

Accepted Solutions
Solution
‎01-24-2014 11:41 AM
New Contributor
Posts: 4

Re: Lag statement

Maybe this?

proc sort data=test out=test1 (drop=CNumber);

by county year decending CohortYearTotalAge ExitMonthCategory;

run;

data test2;

  set test1;

  by county year descending CohortYearTotalAge;

  retain CNumber;

  if first.CohortYearTotalAge then CNumber=CohortYearTotalAge;

  CNumber = ifn( ExitMonthCategory > .5, (CNumber-DistributiveNumber), (CNumber) );

run;

View solution in original post


All Replies
Super User
Posts: 19,789

Re: Lag statement

Does this do what you want?

data test2;

set &a3;

by &a6;

retain cnumber;

if first.&a7 then CNumber=CohortYearTotalAge;

if ExitMonthCategory > .5 then do;

CNumber=cnumber-DistributiveNumber;

end;

run;

Regular Contributor
Posts: 216

Re: Lag statement

Thanks Reeza, but no, it is the same result actually.

Paul

Super User
Posts: 19,789

Re: Lag statement

Then I think your logic is actually what's incorrect. Post a Have dataset and want. Preferably in here, rather than in CSV files.

Regular Contributor
Posts: 216

Re: Lag statement

It is simply this (the only difference is the CNumber column):

Input:

CountyYearExitMonthCategoryCohortYearTotalAgeDistributiveNumberCNumber
1 - New York State20060.546320
1 - New York State2006146322
1 - New York State2006246320
1 - New York State2006346320
1 - New York State2006446320
1 - New York State2006546320
1 - New York State2006646322
1 - New York State20061246322
1 - New York State20061846326
1 - New York State20062446324
1 - New York State20063046322
1 - New York State20063646327
1 - New York State20060.523500
1 - New York State2006123502
1 - New York State2006223500
1 - New York State2006323500
1 - New York State2006423500
1 - New York State2006523500
1 - New York State2006623502
1 - New York State20061223502
1 - New York State20061823506
1 - New York State20062423504
1 - New York State20063023502
1 - New York State20063623507

End:

CountyYearExitMonthCategoryCohortYearTotalAgeDistributiveNumberCNumber
1 - New York State20060.5463204632
1 - New York State20061463224630
1 - New York State20062463204630
1 - New York State20063463204630
1 - New York State20064463204630
1 - New York State20065463204630
1 - New York State20066463224628
1 - New York State200612463224626
1 - New York State200618463264620
1 - New York State200624463244616
1 - New York State200630463224614
1 - New York State200636463274607
1 - New York State20060.5235002350
1 - New York State20061235022348
1 - New York State20062235002348
1 - New York State20063235002348
1 - New York State20064235002348
1 - New York State20065235002348
1 - New York State20066235022346
1 - New York State200612235022344
1 - New York State200618235062338
1 - New York State200624235042334
1 - New York State200630235022332
1 - New York State200636235072325
Super User
Posts: 19,789

Re: Lag statement

Drop CNumber from your original code and try it again.

data test2;

set &a3 (drop= Cnumber);

by &a6;

retain cnumber;

if first.&a7 then CNumber=CohortYearTotalAge;

if ExitMonthCategory > .5 then do;

CNumber=cnumber-DistributiveNumber;

end;

run;

Regular Contributor
Posts: 216

Re: Lag statement

Nothing prints for CNumber after the initial value in the 'first' statement (if first.&a7 then CNumber=CohortYearTotalAgeSmiley Wink.

Paul

PROC Star
Posts: 1,322

Re: Lag statement

What are the values of &a6 and &a7 ?

Unless there is a typo in the sample data you posted, it looks like the file is sorted by County CohortYearTotalAge ExitMonthCategory???   Or should year for second half be 2007, so that it would be sorted by County YEar ExitMonthCategory?

Regardless, for the sample data (assuming you are happy with the current sort order), it looks like when ExitMonthCategory=.5 you want to set Cnumber to the value of CohortYearTotalAge.  For any other value of ExitMonthCategory you want to decrement Cnumber by the value of DistributiveNumber.  If I've got the right logic, I think you could do that with:

data want ;

set have (drop=Cnumber) ;

if ExitMonthCategory < 1 then Cnumber = CohortYearTotalAge ;

else Cnumber +- DistributiveNumber ;

run;

So that's using the Sum statement (typically count+1 or some such), which automatically retains Cnumber, and in this case is used to decrement rather than increment.

HTH

Regular Contributor
Posts: 216

Re: Lag statement

Hi Quentin

You have the logic right and that is what I have done. However the resulting data set just adds a negative to the DistributiveNumber value and does not subtract it from the CohortYearTotalAge value:

CountyYearExitMonthCategoryCohortYearTotalAgeDistributiveNumberCNumber
1 - New York State20060.5463204632
1 - New York State2006146322-2
1 - New York State20062463200
1 - New York State20063463200
1 - New York State20064463200
1 - New York State20065463200
1 - New York State2006646322-2
1 - New York State20061246322-2
1 - New York State20061846326-6
1 - New York State20062446324-4
1 - New York State20063046322-2
1 - New York State20063646327-7
1 - New York State200642463200
1 - New York State200648463200
1 - New York State20065446323-3
1 - New York State200660463200
1 - New York State200666463200
1 - New York State200672463200
1 - New York State200678463200
1 - New York State200684463200
1 - New York State200687463200
1 - New York State20060.5463204632
1 - New York State20061463200
1 - New York State20062463200
1 - New York State20063463200
1 - New York State2006446321-1
1 - New York State2006546321-1
1 - New York State2006646323-3
1 - New York State20061246327-7
1 - New York State200618463226-26
1 - New York State200624463223-23
1 - New York State200630463222-22
1 - New York State200636463218-18
1 - New York State200642463215-15
1 - New York State20064846328-8
PROC Star
Posts: 1,322

Re: Lag statement

Hi,

Do you have Cnumber + - DistributiveNumber?  That does it for me:

data have;
  input County Year ExitMonthCategory CohortYearTotalAge DistributiveNumber;
  cards;
1 2006 0.5 4632 0
1 2006 1   4632 2
1 2006 2   4632 0
1 2006 3   4632 0
1 2006 0.5 4632 0
1 2006 1   4632 7
1 2006 2   4632 3
1 2006 3   4632 0

;
run;

data want ;
set have;
if ExitMonthCategory < 1 then Cnumber = CohortYearTotalAge ;
else Cnumber +- DistributiveNumber ;

put (_all_)(=);
run;

*log: ;

121  data want ;
122  set have;
123  if ExitMonthCategory < 1 then Cnumber = CohortYearTotalAge ;
124  else Cnumber +- DistributiveNumber ;
125
126  put (_all_)(=);
127  run;

County=1 Year=2006 ExitMonthCategory=0.5 CohortYearTotalAge=4632 DistributiveNumber=0 Cnumber=4632
County=1 Year=2006 ExitMonthCategory=1 CohortYearTotalAge=4632 DistributiveNumber=2 Cnumber=4630
County=1 Year=2006 ExitMonthCategory=2 CohortYearTotalAge=4632 DistributiveNumber=0 Cnumber=4630
County=1 Year=2006 ExitMonthCategory=3 CohortYearTotalAge=4632 DistributiveNumber=0 Cnumber=4630
County=1 Year=2006 ExitMonthCategory=0.5 CohortYearTotalAge=4632 DistributiveNumber=0 Cnumber=4632
County=1 Year=2006 ExitMonthCategory=1 CohortYearTotalAge=4632 DistributiveNumber=7 Cnumber=4625
County=1 Year=2006 ExitMonthCategory=2 CohortYearTotalAge=4632 DistributiveNumber=3 Cnumber=4622
County=1 Year=2006 ExitMonthCategory=3 CohortYearTotalAge=4632 DistributiveNumber=0 Cnumber=4622
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 8 observations and 6 variables.



Regular Contributor
Posts: 216

Re: Lag statement

Yes, I tried it exactly as you have it and tried with quotes as well. Neither worked. The only item I don't have is your statement below:

put (_all_)(=);

Paul

PROC Star
Posts: 1,322

Re: Lag statement

Not sure what's happening.

My last post had sample data creating HAVE, and code creating WANT, and I think the results are what you are looking for, as shown in the log. Right? If you run that code, do you get the same results I get?

If so, then I would double check that your code is really the same as what I posted.

If not, then perhaps you've found some odd bug (rarely the case...)

--Q.

Solution
‎01-24-2014 11:41 AM
New Contributor
Posts: 4

Re: Lag statement

Maybe this?

proc sort data=test out=test1 (drop=CNumber);

by county year decending CohortYearTotalAge ExitMonthCategory;

run;

data test2;

  set test1;

  by county year descending CohortYearTotalAge;

  retain CNumber;

  if first.CohortYearTotalAge then CNumber=CohortYearTotalAge;

  CNumber = ifn( ExitMonthCategory > .5, (CNumber-DistributiveNumber), (CNumber) );

run;

Regular Contributor
Posts: 216

Re: Lag statement

Nolaness, for some reason, that actually worked!! THANK YOU.

It didn't work until I added the drop statement in the prior Sort, but did thereafter. I have no idea why this worked and the 10+ variations of the code posted here and my own did not.

Paul

Frequent Contributor
Posts: 106

Re: Lag statement

data want;

set have;

by descending CohortYearTotalAge ;

if first.CohortYearTotalAge then cnumber=CohortYearTotalAge;

else if ExitMonthCategory > .5 then cnumber+-DistributiveNumber;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 503 views
  • 0 likes
  • 5 in conversation