BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
u63469428
Calcite | Level 5

I have created a sample datastep and am tyring to fill in the missing data starting with the May18 record. It is similar to It is simply a survival analysis step..

 

In order to calculate the current month surviving population (activeflg_final), you subtract what went to loss this month (UnitLossThisMTHPred) and interatively calculate for the future months.

 

In other words I am trrying to take was was surviving as of last month(activeflg_final) multiply by the current month loss rate (Unit_Loss_Perc) to calculate the total losses (unitlossthismthpred). Then you calculate what survived this month (activeflg_final) by subtracting the prior month surviving (lag of activeflg_final) and the total losses for this month (unitlossthismthpred).

 

This needs to be iteratively calculated for the future months starting with May18.

 

I normally could solve this by using the datastep debugger but my company switched to SAS studio and my version does not have the functionality.

 

The calculations are in the second datastep and also pasted directly below:

 

activeflg_lag = lag(activeflg_final);           

UnitLossThisMTHPred   = activeflg_lag * Unit_Loss_Perc;

activeflg_final = activeflg_lag - UnitLossThisMTHPred;    

 

 

 

data cards;

infile datalines delimiter=',';

input Dates         $ activeflg_final UnitLossThisMTHPred Unit_Loss_Perc;

datalines;

Mar18,19202,11,0.017

Apr18,11873,9,0.017

May18, , ,0.015

Jun18, , ,0.018

Jul18, , ,0.015

Aug18, , ,0.017

Sep18, , ,0.015

Oct18, , ,0.017

Nov18, , ,0.018

Dec18, , ,0.017

Jan19, , ,0.015

Feb19, , ,0.018

Mar19, , ,0.015

Apr19, , ,0.017

;

run;

 

 

data sample1;   

retain activeflg_final activeflg_lag UnitLossThisMTHPred;             

set cards;            

activeflg_lag = lag(activeflg_final);           

UnitLossThisMTHPred   = activeflg_lag * Unit_Loss_Perc;

activeflg_final = activeflg_lag - UnitLossThisMTHPred;    

run;       

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

In the first observation you read, lag(activeflg_final) will be missing; this means that every calculation using activeflg_lag will also result in missing values.

Your RETAIN statement has no meaning, as all variables are either contained in the input dataset, or calculated from another value in the first statement of each iteration.

The LAG value of activeflg_final is the one present in the data at the moment the LAG function is called; it will never be the on that is calculated in the last statement of the DATA step.

To get more insight into the evolution of your values, add PUT statements for variables of interest and read the log.

View solution in original post

10 REPLIES 10
Kurt_Bremser
Super User

In the first observation you read, lag(activeflg_final) will be missing; this means that every calculation using activeflg_lag will also result in missing values.

Your RETAIN statement has no meaning, as all variables are either contained in the input dataset, or calculated from another value in the first statement of each iteration.

The LAG value of activeflg_final is the one present in the data at the moment the LAG function is called; it will never be the on that is calculated in the last statement of the DATA step.

To get more insight into the evolution of your values, add PUT statements for variables of interest and read the log.

u63469428
Calcite | Level 5

Thank you for the quick response.

 

I do understand your comments and I have revised my logic a bit. I am still not able to get the redcords to populate. 

 

I understandyour first point..the first observation will be missing..but it still doesnt explain why the future observations are missing.

 

Once we get to the second record, the lag variable should be populated for the remaining records. and it DOES seem to be populated (after using the put statements you can see that the variables are be calculated), but once I move past the may18 record, everything becomes null again...I dont get it..The lag function should be populating a valid number and I calculate the unitlosspredthismth and activeflg_final based on that. but all of the future months are still missing..Below is a snippet from log for reference

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69
70
71 data sample1; set cards;
72
73 activeflg_lag = lag(activeflg_final);
74
75 UnitLossThisMTHPred = activeflg_lag * Unit_Loss_Perc;
76
77 activeflg_final = activeflg_lag - UnitLossThisMTHPred;
78
79 put activeflg_lag UnitLossThisMTHPred activeflg_final;
80 run;
 
. . .
19202 326.434 18875.566
11873 178.095 11694.905
. . .
. . .
. . .

 

 

You can see that it recalculates the activeflg_final and the unitlossthismthpred for observations 2,3 BUT then again everything is missing after that. 

 

P.S. I hit accept as solution as a mistake hopefully you will still respond..thanks for any more help

u63469428
Calcite | Level 5

I recreated the datastep and removed the unneccessary retain statement and added the put statement at the end

 

data sample1; set cards;

activeflg_lag = lag(activeflg_final);

UnitLossThisMTHPred = activeflg_lag * Unit_Loss_Perc;

activeflg_final = activeflg_lag - UnitLossThisMTHPred;

put activeflg_lag UnitLossThisMTHPred activeflg_final;
run;

 

 

u63469428
Calcite | Level 5

I guess it didnt post my follow-up question. 

 

I understand your point about the first observation being missing..but for any observation after that it should be a valide nuber to calculate off of right?

 

I see in the data step it does in fact calculate record 2 and 3 but after that everything is missing again..for some reasonit is not retaining the value or it is somehow getting reset to missing due to an error in my logic.

 

Any suggestions?

u63469428
Calcite | Level 5

Here is the log

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 data sample1; set cards;
70
71 activeflg_lag = lag(activeflg_final);
72
73 UnitLossThisMTHPred = activeflg_lag * Unit_Loss_Perc;
74
75 activeflg_final = activeflg_lag - UnitLossThisMTHPred;
76
77 put activeflg_lag UnitLossThisMTHPred activeflg_final;
78 run;
 
. . .
19202 326.434 18875.566
11873 178.095 11694.905
. . .
. . .
. . .
 
It just shows the table with the missing values..but I have NO idea why it is missing still.
 
Maybe I am missing something?? What should I be looking for in the log?
Kurt_Bremser
Super User

Since you use missing values to the right of the assignment, the outcome of the calculation will result in a missing value on the left.

 

It will be very helpful if you showed us the values you expect from your example data.

u63469428
Calcite | Level 5

Sorry for the hassle.

 

Here is what the base data looks like:

u63469428_3-1687180293302.png

 

 

The goal is to fill in the missing observations 

u63469428_0-1687180093254.png

 

Using the prior observation "activeflg_final" MULTIPLIED BY the current observation "unit_loss_perc" should give me the UnitLossPredthismth

 

u63469428_1-1687180158674.png

 

Once I have the UnitLossPredthismthI again use the prior observation "activeflg_final" and subtract the newly calculated unitlosspredthismth to get the current observation activeflg_final

 

u63469428_2-1687180174897.png

 

 

 

I understand my datatep is ALSO recreating the activeflg_final and unitlosspredthismth for the "already populated rows" but my main issue is that the next observations all have missing values. For some reason my calculations to get activeflg_final and unitlosspredthimth are not working..

 

Kurt_Bremser
Super User

So you basically want a variation of LOCF (last observation carry forward) if you have missing values?

 

This should do it:

data sample1;
set cards;
retain _ac_flg;
if missing(activeflg_final)
then do;
  UnitLossThisMTHPred = _ac_flg * Unit_Loss_Perc;
  activeflg_final = _ac_flg - UnitLossThisMTHPred;
end;
_ac_flg = activeflg_final;
drop _ac_flg;
run;
u63469428
Calcite | Level 5
Ok so that did not work exactly as expectedbut with your help on identifying this is an offshoot of Last observation carry-forward...I was able to work with SAS and get the following results. This one creates a temporary dataset and fills in the missing data as expected..Thank you for all the help and I hope this data step helps others who are struggling


data cards;

infile datalines delimiter=',';

input Dates $ activeflg_final UnitLossThisMTHPred Unit_Loss_Perc;

datalines;

Mar18,19202,11,0.017

Apr18,11873,9,0.017

May18, , ,0.015

Jun18, , ,0.018

Jul18, , ,0.015

Aug18, , ,0.017

;

run;

data sample1;

retain activeflg_lag temp_activeflg temp_unitloss;

set cards;

if activeflg_final ne . then do;

temp_activeflg=activeflg_final;

temp_unitloss=unitlossthismthpred;

end;

if activeflg_final=. then do;

temp_unitloss= activeflg_lag * Unit_Loss_Perc;

temp_activeflg = activeflg_lag - temp_unitloss;

activeflg_final=temp_activeflg;

unitlossthismthpred=temp_unitloss;

end;

output;

activeflg_lag=temp_activeflg;

drop activeflg_lag temp_activeflg temp_unitloss;

run;



proc print;

var Dates activeflg_final UnitLossThisMTHPred Unit_Loss_Perc;

run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2094 views
  • 0 likes
  • 2 in conversation