Hello,
I am trying to create a data set that includes a running total. In some observations, the value that is being added to the running total is missing. Instead of keeping the previous running total, the new running total value for the record that is missing is also missing.
I have been researching all day. I have tried the retain statement, just adding the running total to the next value (assuming that a missing value would act as Running Total=0) and nothing is working. I couldn't get it to work with proc sql either.
Any help would be appreciated.
Here is the code and current results I am getting:
Your code and the results are not matching. How the first retianed variable is bigger than the first not missing values???
and many unnessesary statment is included like a=a;
The summing function and your code logic should handle the missing values?
do you have any other proc or data step before or after this code in the workflow??
Just the proc print I posted below
I would first try only using one RETAIN for each variable. Multiples, especially when you have one initializing values and the other not may not be quite what you want.
Also, please post code, not pictures of code as we can't copy and paste elements for correction or suggestion.
I am not at all sure why there are statements like:
Fallout_date=Fallout_date;
Note_balance_20yr=Note_balance_20yr;
Waste of space.
Here is the code.
As far as the waste of space issue, I was naming the variable for my new dataset (i.e Fallout_date=Fallout_date).
DATA FLOWFRED.FMFLOW_LOANSDROPPED_TOT_RPT;
SET FLOWFRED.FMFLOW_LOANSDROPPED_RUN_TOT;
RETAIN RUNNING_NOTE_BALANCE_15YR;
RETAIN RUNNING_NOTE_BALANCE_20YR;
RETAIN RUNNING_NOTE_BALANCE_30YR;
FALLOUT_DATE=FALLOUT_DATE;
NOTE_BALANCE_15YR=NOTE_BALANCE_15YR;
retain RUNNING_NOTE_BALANCE_15YR 0;
RUNNING_NOTE_BALANCE_15YR=SUM(RUNNING_NOTE_BALANCE_15YR,NOTE_BALANCE_15YR);
NOTE_BALANCE_20YR=NOTE_BALANCE_20YR;
retain RUNNING_NOTE_BALANCE_20YR 0;
RUNNING_NOTE_BALANCE_20YR=SUM(RUNNING_NOTE_BALANCE_20YR,NOTE_BALANCE_20YR);
NOTE_BALANCE_30YR=NOTE_BALANCE_30YR;
retain RUNNING_NOTE_BALANCE_30YR 0;
RUNNING_NOTE_BALANCE_30YR=SUM(RUNNING_NOTE_BALANCE_30YR,NOTE_BALANCE_30YR);
run;
proc print data=flowfred.fmflow_loansdropped_tot_rpt;
run;
Can you post snapshot from the first rows of the original dataset "FLOWFRED.FMFLOW_LOANSDROPPED_RUN_TOT"?
Sure, thanks.
As you can see... the running total works in the original data set (created with proc sql). The problem I have is that the running total doesn't carry forward if there is a missing value in NOTE_BALANCE_15YR. For example, on row 7 there is a missing value for NOTE_BALANCE_15YR so the RUNNING_NOTE_BALANCE_15YR is also missing. I need the RUNNING_NOTE_BALANCE_15YR from row 6 (FALLOUT_DATE 8/12/2015) to carry forward until the next record with a value occurs (row 12, FALLOUT_DATE 8/25/2015).
Now i understand your problem.
The retained variables are already existing in your original data set and also contains data.
DATA FLOWFRED.FMFLOW_LOANSDROPPED_TOT_RPT;
SET FLOWFRED.FMFLOW_LOANSDROPPED_RUN_TOT;
RETAIN _RUNNING_NOTE_BALANCE_15YR;
RETAIN _RUNNING_NOTE_BALANCE_20YR;
RETAIN _RUNNING_NOTE_BALANCE_30YR;
_RUNNING_NOTE_BALANCE_15YR=SUM(_RUNNING_NOTE_BALANCE_15YR,NOTE_BALANCE_15YR);
_RUNNING_NOTE_BALANCE_20YR=SUM(_RUNNING_NOTE_BALANCE_20YR,NOTE_BALANCE_20YR);
_RUNNING_NOTE_BALANCE_30YR=SUM(_RUNNING_NOTE_BALANCE_30YR,NOTE_BALANCE_30YR);
run;
Run the above code and you find the correct result in the new created variables starting with _ .
Does your input data set FLOWFRED.FMFLOW_LOANSDROPPED_RUN_TOT have a variable already named Running_Note_Balance_15Yr? If so, then each time the input set is read it pulls that value from the input set.
If that is the case then either DROP the existing Running total variables on the set statement (data set option DROP)
SET FLOWFRED.FMFLOW_LOANSDROPPED_RUN_TOT (drop = RUNNING_NOTE_BALANCE_15YR RUNNING_NOTE_BALANCE_20YR RUNNING_NOTE_BALANCE_30YR );
or use entirely new variables in your Retain statement and calculations. And use only one retain statement per variable. And all of the variables could be on one statement:
Retain RUNNING_NOTE_BALANCE_15YR RUNNING_NOTE_BALANCE_20YR RUNNING_NOTE_BALANCE_30YR 0;
Thanks mohamed_zaki and ballardw... I will try this as soon as I can. I appreciate the help. I will mark this solved asap.
Happy New Year!
Thanks again, this worked great. However, now that I have the running totals, I would like to have everything sorted in descending order with the most recent date. That means I would like the running total descending along with the date. I can get the date sorted descending easily but the running total will not sort that way with the date. Any ideas?
Thanks Reeza. After taking the sort out of my initial query this worked. I appreciate all of the answers everyone!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.