I have created a sample datastep and am tyring to fill in the missing data starting with the May18 record. It is similar to It is simply a survival analysis step..
In order to calculate the current month surviving population (activeflg_final), you subtract what went to loss this month (UnitLossThisMTHPred) and interatively calculate for the future months.
In other words I am trrying to take was was surviving as of last month(activeflg_final) multiply by the current month loss rate (Unit_Loss_Perc) to calculate the total losses (unitlossthismthpred). Then you calculate what survived this month (activeflg_final) by subtracting the prior month surviving (lag of activeflg_final) and the total losses for this month (unitlossthismthpred).
This needs to be iteratively calculated for the future months starting with May18.
I normally could solve this by using the datastep debugger but my company switched to SAS studio and my version does not have the functionality.
The calculations are in the second datastep and also pasted directly below:
activeflg_lag = lag(activeflg_final);
UnitLossThisMTHPred = activeflg_lag * Unit_Loss_Perc;
activeflg_final = activeflg_lag - UnitLossThisMTHPred;
data cards;
infile datalines delimiter=',';
input Dates $ activeflg_final UnitLossThisMTHPred Unit_Loss_Perc;
datalines;
Mar18,19202,11,0.017
Apr18,11873,9,0.017
May18, , ,0.015
Jun18, , ,0.018
Jul18, , ,0.015
Aug18, , ,0.017
Sep18, , ,0.015
Oct18, , ,0.017
Nov18, , ,0.018
Dec18, , ,0.017
Jan19, , ,0.015
Feb19, , ,0.018
Mar19, , ,0.015
Apr19, , ,0.017
;
run;
data sample1;
retain activeflg_final activeflg_lag UnitLossThisMTHPred;
set cards;
activeflg_lag = lag(activeflg_final);
UnitLossThisMTHPred = activeflg_lag * Unit_Loss_Perc;
activeflg_final = activeflg_lag - UnitLossThisMTHPred;
run;
In the first observation you read, lag(activeflg_final) will be missing; this means that every calculation using activeflg_lag will also result in missing values.
Your RETAIN statement has no meaning, as all variables are either contained in the input dataset, or calculated from another value in the first statement of each iteration.
The LAG value of activeflg_final is the one present in the data at the moment the LAG function is called; it will never be the on that is calculated in the last statement of the DATA step.
To get more insight into the evolution of your values, add PUT statements for variables of interest and read the log.
In the first observation you read, lag(activeflg_final) will be missing; this means that every calculation using activeflg_lag will also result in missing values.
Your RETAIN statement has no meaning, as all variables are either contained in the input dataset, or calculated from another value in the first statement of each iteration.
The LAG value of activeflg_final is the one present in the data at the moment the LAG function is called; it will never be the on that is calculated in the last statement of the DATA step.
To get more insight into the evolution of your values, add PUT statements for variables of interest and read the log.
Thank you for the quick response.
I do understand your comments and I have revised my logic a bit. I am still not able to get the redcords to populate.
I understandyour first point..the first observation will be missing..but it still doesnt explain why the future observations are missing.
Once we get to the second record, the lag variable should be populated for the remaining records. and it DOES seem to be populated (after using the put statements you can see that the variables are be calculated), but once I move past the may18 record, everything becomes null again...I dont get it..The lag function should be populating a valid number and I calculate the unitlosspredthismth and activeflg_final based on that. but all of the future months are still missing..Below is a snippet from log for reference
You can see that it recalculates the activeflg_final and the unitlossthismthpred for observations 2,3 BUT then again everything is missing after that.
P.S. I hit accept as solution as a mistake hopefully you will still respond..thanks for any more help
I recreated the datastep and removed the unneccessary retain statement and added the put statement at the end
data sample1; set cards;
activeflg_lag = lag(activeflg_final);
UnitLossThisMTHPred = activeflg_lag * Unit_Loss_Perc;
activeflg_final = activeflg_lag - UnitLossThisMTHPred;
put activeflg_lag UnitLossThisMTHPred activeflg_final;
run;
I guess it didnt post my follow-up question.
I understand your point about the first observation being missing..but for any observation after that it should be a valide nuber to calculate off of right?
I see in the data step it does in fact calculate record 2 and 3 but after that everything is missing again..for some reasonit is not retaining the value or it is somehow getting reset to missing due to an error in my logic.
Any suggestions?
So now you can see what your step is doing (read the log), and take corrective measures if and as needed.
Here is the log
Since you use missing values to the right of the assignment, the outcome of the calculation will result in a missing value on the left.
It will be very helpful if you showed us the values you expect from your example data.
Sorry for the hassle.
Here is what the base data looks like:
The goal is to fill in the missing observations
Using the prior observation "activeflg_final" MULTIPLIED BY the current observation "unit_loss_perc" should give me the UnitLossPredthismth
Once I have the UnitLossPredthismthI again use the prior observation "activeflg_final" and subtract the newly calculated unitlosspredthismth to get the current observation activeflg_final
I understand my datatep is ALSO recreating the activeflg_final and unitlosspredthismth for the "already populated rows" but my main issue is that the next observations all have missing values. For some reason my calculations to get activeflg_final and unitlosspredthimth are not working..
So you basically want a variation of LOCF (last observation carry forward) if you have missing values?
This should do it:
data sample1;
set cards;
retain _ac_flg;
if missing(activeflg_final)
then do;
UnitLossThisMTHPred = _ac_flg * Unit_Loss_Perc;
activeflg_final = _ac_flg - UnitLossThisMTHPred;
end;
_ac_flg = activeflg_final;
drop _ac_flg;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Ready to level-up your skills? Choose your own adventure.