Obsidian | Level 7

## I need help to calculate variable's value based on the previous value in the same variable

I have excel table and my goal is to convert this table to SAS.

The table has almost 100 columns with different statistic formula in each column, when some formulas depend on other columns.

This is the small sample of the table:

A                        B                       C                                D                    E

5  |Random number| Year | Normal Gross return | Gross Fund | Gross IRR |

| ------------------|------|-------------------------|--------------|------------|
6  |                               |        0|                                         |                        |                    |
7  |     1.294000103 |        1 |                         -18.13%  |              76.96|     -18.13% |
8  |   -0.330857931 |        2 |                        25.50%  |            214.55|         9.14% |
9  |  -0.003978838 |        3 |                          8.55%  |            334.92|        8.85% |
10|  -0.130874637  |        4 |                        -6.42%  |            401.40|        2.63% |
11 | -0.508842621   |       5  |                          9.07%  |            540.35|       4.69% |

The excel formula to calculate column D (Gross Fund) is:
`Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)`
Niotice, that in this column I have to take value from previous row.
I saw previous topics about Lag function and tried different approaches to solve the problem, but nothing gone well and I don't get the correct result.
The current SAS code in my program (Within Data step) is:
``GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);``
It returns Nulls only.

I need help to find the simplest solution to make the calculation with less steps as possible, because, as I noticed, i need to calculate almost a hundred additional variables. and i don't want the code to be to complicated and unreadable.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: I need help to calculate variable's value based on the previous value in the same variable

So your inputs that vary per observation are YEAR and NGROSSFUND .

``````data have;
input Year NGrossReturn;
datalines;
0 0
1 -0.181279377
2 0.25500056
3 0.085457165
4 -0.06416826
;``````

And your inputs that do not vary are:

``  retain GrossAnnualPremium 100 ExpenceLoad 0.06  ;``

So your data step to calculate GROSSFUND could be something like this:

``````data want;
set have;
if year=0 then grossfund=0;
else GrossFund = GrossFund*(1+NGrossReturn)
;
run;``````

Result:

```                            Gross
NGross      Annual    Expence      Gross

1       0      0.00000      100        0.06       0.000
2       1     -0.18128      100        0.06      76.960
3       2      0.25500      100        0.06     214.555
4       3      0.08546      100        0.06     334.923
5       4     -0.06417      100        0.06     401.400
```

NOTE: For the RETAIN of GROSSFUND to work it cannot be in the input dataset.  Because then when the SET statement reads the next observation the value read from the input dataset will overwrite the retained value.

22 REPLIES 22
Diamond | Level 26

## Re: I need help to calculate variable's value based on the previous value in the same variable

The excel formula to calculate column D (Gross Fund) is:
`Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)`

Since D6 is blank, I have no idea how the calculation works to give a value in D7. Please explain this.

--
Paige Miller
Obsidian | Level 7

## Re: I need help to calculate variable's value based on the previous value in the same variable

Because of cell D6 in Excel is blank it's value in excel formula turns to be 0(zero).
Diamond | Level 26

## Re: I need help to calculate variable's value based on the previous value in the same variable

Then you want to write code in SAS that says if any column D cell is missing, then you set it to zero.

--
Paige Miller
Obsidian | Level 7

## Re: I need help to calculate variable's value based on the previous value in the same variable

I tried this as well.

This is the code I tried:

``````If Year = 0 Then GrossFund = 0;

And this is result it return:

```|    0|             0|    0.00%  |            . |
|    1| 0.5126916173 | ( 18.13%) |   76.95973857|
|    2| 0.9550716021 |   25.50%  |             .|
|    3| -0.875237107 |    8.55%  |             .|
|    4| 0.8329493204 | ( 6.42%)  |             .|```

As you can see this code works just once.

Diamond | Level 26

## Re: I need help to calculate variable's value based on the previous value in the same variable

Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.

--
Paige Miller
Obsidian | Level 7

## Re: I need help to calculate variable's value based on the previous value in the same variable

@PaigeMiller wrote:

Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.

Hi,

The description is not so clear to me. Anyway, I prepared the code that build first 100 rows of the data set I'm working on.

To notice that the full set has:

6 Tranches (From 1 To 6),

Each tranche contains 15000 Iterations (From 1 to 15000),

Each Iteration contains 42 years (From 0 to 41) Where Rows with Year=0 must be blank or zero values.

``````%let Stdev_base = 0.187493224932249;
%let weight_to_equities = 0.5;
%let Stdev_ToUse = &Stdev_base*&weight_to_equities;
%let Term_Interval = 1;

Data Test;
Infile datalines dlm="09"x dsd;
Input Tranche Iteration Year NormInv  NetReturn Tresv NGrossReturn;
datalines;
1	1	0	0			0
1	1	1	-0.401683176	-0.192583212	16	-0.181279377
1	1	2	-2.188102767	0.237673136	16	0.25500056
1	1	3	-1.082351454	0.070470576	16	0.085457165
1	1	4	0.502844303	-0.077089014	16	-0.06416826
1	1	5	-0.764272142	0.075668747	16	0.09072811
1	1	6	0.832236652	0.080999439	16	0.096133431
1	1	7	-0.248428708	0.037800642	16	0.052329851
1	1	8	-0.696929109	0.012473564	16	0.026648194
1	1	9	-0.177708172	0.022553857	16	0.036869611
1	1	10	0.361635108	0.060861948	16	0.075714016
1	1	11	0.968456872	-0.03329219	16	-0.019758281
1	1	12	0.507503347	0.100902401	16	0.116315034
1	1	13	0.094918729	0.044654956	16	0.059280125
1	1	14	-0.268100788	0.100029068	16	0.115429475
1	1	15	1.114988618	-0.095820118	16	-0.0831616
1	1	16	0.621812038	0.020710059	16	0.035
1	1	17	-0.240327682			-0.03444714
1	1	18	-0.518506761			-0.05930167
1	1	19	-0.380628296			-0.047063621
1	1	20	0.216490394			0.007800988
1	1	21	-0.865064972			-0.08937257
1	1	22	0.516558723			0.036553334
1	1	23	0.635065672			0.048133258
1	1	24	0.614818545			0.046145683
1	1	25	1.703085114			0.15850912
1	1	26	0.306297412			0.016321587
1	1	27	1.054664304			0.090184211
1	1	28	-0.927067255			-0.094650243
1	1	29	-0.378329899			-0.046858273
1	1	30	1.37836213			0.123773543
1	1	31	0.698802551			0.054414724
1	1	32	1.877871249			0.177648372
1	1	33	-1.801178849			-0.165880781
1	1	34	-2.301837034			-0.204125642
1	1	35	-0.064140869			-0.01836675
1	1	36	-0.011118909			-0.013475268
1	1	37	0.16684725			0.003121704
1	1	38	-0.907010833			-0.092946383
1	1	39	-0.5522686			-0.062274328
1	1	40	-0.926762702			-0.094624394
1	1	41	-0.49304509			-0.057053593
1	2	0	0			0
1	2	1	0.272052109	-0.192583212	16	-0.181279377
1	2	2	-1.281898069	0.237673136	16	0.25500056
1	2	3	-0.647768883	0.070470576	16	0.085457165
1	2	4	-0.565498028	-0.077089014	16	-0.06416826
1	2	5	0.41197637	0.075668747	16	0.09072811
1	2	6	-0.779413048	0.080999439	16	0.096133431
1	2	7	0.855643268	0.037800642	16	0.052329851
1	2	8	-0.866463325	0.012473564	16	0.026648194
1	2	9	1.07588221	0.022553857	16	0.036869611
1	2	10	0.163612115	0.060861948	16	0.075714016
1	2	11	-0.033478564	-0.03329219	16	-0.019758281
1	2	12	-0.260679455	0.100902401	16	0.116315034
1	2	13	1.02590182	0.044654956	16	0.059280125
1	2	14	-2.021439569	0.100029068	16	0.115429475
1	2	15	-0.266694607	-0.095820118	16	-0.0831616
1	2	16	1.394018033	0.020710059	16	0.035
1	2	17	-0.042766779			-0.016397831
1	2	18	0.224547946			0.008562537
1	2	19	-2.84164275			-0.243398743
1	2	20	-0.529668341			-0.060285465
1	2	21	1.249656009			0.110295831
1	2	22	-0.032835878			-0.015481683
1	2	23	-0.336196189			-0.04308602
1	2	24	0.321693245			0.01778951
1	2	25	0.664098464			0.050989875
1	2	26	-0.969997516			-0.098286561
1	2	27	1.234485887			0.108717949
1	2	28	-0.371113375			-0.046213231
1	2	29	-0.217901648			-0.03241506
1	2	30	-0.487483006			-0.056561787
1	2	31	-0.462498733			-0.054349485
1	2	32	-0.425287629			-0.051044903
1	2	33	1.539378448			0.140865279
1	2	34	-0.458411447			-0.053987072
1	2	35	0.871198126			0.071594097
1	2	36	1.278369561			0.11328855
1	2	37	-0.232312352			-0.033721341
1	2	38	0.435834918			0.028738736
1	2	39	1.057741102			0.090498708
1	2	40	0.987034372			0.083294223
1	2	41	0.249422192			0.010917124
1	3	0	0			0
1	3	1	-0.24711478	-0.192583212	16	-0.181279377
1	3	2	0.193853784	0.237673136	16	0.25500056
1	3	3	0.722182005	0.070470576	16	0.085457165
1	3	4	-0.524994456	-0.077089014	16	-0.06416826
1	3	5	0.597940421	0.075668747	16	0.09072811
1	3	6	1.822602911	0.080999439	16	0.096133431
1	3	7	-0.66550602	0.037800642	16	0.052329851
1	3	8	0.229251942	0.012473564	16	0.026648194
1	3	9	0.515539259	0.022553857	16	0.036869611
1	3	10	-0.464778008	0.060861948	16	0.075714016
1	3	11	-2.071355087	-0.03329219	16	-0.019758281
1	3	12	-0.290791577	0.100902401	16	0.116315034
1	3	13	-0.092066762	0.044654956	16	0.059280125
1	3	14	0.564862272	0.100029068	16	0.115429475
1	3	15	-0.251988426	-0.095820118	16	-0.0831616
;
run;``````
Super User

## Re: I need help to calculate variable's value based on the previous value in the same variable

I don't understand.

You seem to have all of the observations already in the source data.

What are the variables where you already HAVE the data on the observation.

What are the variables where you need to CREATE a value on the observation.

What is the formula for creating the values?

Do you want to GENERATE 6 'tranches'?   Or do you already have them?  Could we just pull out the first observations per "tranch" from your giant file and have everything we need to create the other observations?

Super User

## Re: I need help to calculate variable's value based on the previous value in the same variable

So if your input is this data:

``````data have;
infile datalines dsd dlm='|' truncover;
input Tranche Iteration Year NormInv  NetReturn Tresv NGrossReturn;
datalines;
1|1|0|0|||0
1|1|1|-0.401683176|-0.192583212|16|-0.181279377
1|1|2|-2.188102767|0.237673136|16|0.25500056
1|1|3|-1.082351454| 0.070470576|16|0.085457165
1|1|4| 0.502844303|-0.077089014|16|-0.06416826
;``````

What is your OUTPUT.  Please share the desired output for this 6 observation input.

Explain how you got from the INPUT to the OUTPUT.

Do you want to add variables?  Which variables? How are they calculated?

Do you want to change the values of some variables? Which variables? On which observations?

Do you want to add observations?  How many?  How is that determined?

PROC Star

## Re: I need help to calculate variable's value based on the previous value in the same variable

Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
Obsidian | Level 7

## Re: I need help to calculate variable's value based on the previous value in the same variable

@Quentin wrote:

Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?

Yes. Here is the whole NOTE:

```NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
2445000 at 65:36   2445000 at 65:47   2445000 at 65:62   3645000 at 72:27   105000 at 72:30
NOTE: There were 3780000 observations read from the data set WORK.SIMULATOR1.
NOTE: The data set WORK.SIMULATOR_FINAL has 3780000 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time           1.96 seconds
cpu time            1.90 seconds```

Super User

## Re: I need help to calculate variable's value based on the previous value in the same variable

Where do you set the value of the macro variables &grossannualpremium and &expenceload?

Set to what?

Is there some grouping of data, such as client, company, fund or whatever?

Is this use of LAG inside anything resembling an If/then/else block?

ENTIRE data step code, not snippets because other statements prior or after your shown statment(s) could quite easily effect the result.

Example data in the form of a data step is a good idea.

Note that anyplace you use + if a term on either side is missing the result is missing

Super User

## Re: I need help to calculate variable's value based on the previous value in the same variable

Looks to me like you are trying to BUILD data.  So you start with some initial values and that loop YEAR from 0 to whatever and apply the same formula over and over.

So something like this:

``````data initial_values;
GrossFund = 1000;
NGrossReturn = 0.10 ;
run;

data want;
set initial_values;
do year=0 to 10;
output;
end;
run;``````

Result

```                             Gross
Gross     NGross     Annual    Expence

1    1000.00      0.1       200        0.25       0
2    1265.00      0.1       200        0.25       1
3    1556.50      0.1       200        0.25       2
4    1877.15      0.1       200        0.25       3
5    2229.87      0.1       200        0.25       4
6    2617.85      0.1       200        0.25       5
7    3044.64      0.1       200        0.25       6
8    3514.10      0.1       200        0.25       7
9    4030.51      0.1       200        0.25       8
10    4598.56      0.1       200        0.25       9
11    5223.42      0.1       200        0.25      10
```
Obsidian | Level 7

Hi Tom!

Super User

## Re: I need help to calculate variable's value based on the previous value in the same variable

I am starting to think that you have a misunderstanding of the LAG function.

`GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);`

(besides not showing any value for two macro variables) will not have a lagged value Grossfund if it does not exist in the input data set OR was not successfully assigned a value for Grossfund prior.

Consider:

```data have;
input x ;
datalines;
1
2
3
;

data example;
set have;
y= lag(y)+x;
z= sum( lag(z),x);
q = lag(q) * x;
run;```

on the first iteration of the data step there are no lag values for any of y, z or q.

So on the first iteration y = <missing> + x. Which results in a missing value. That will be the lagged value of y in the next iteration.

the SUM function however allows a result for missing plus actual values. So Z on the first step gets assigned a total of 1.

Q is assigned the value of <missing> * x. Which is missing. That will be lagged value of Q.

On the second iteration the lagged values of y and q are still missing. So result in missing values. And will continue to do so because the calculation used for those two variables in this example will always yield a missing result.

Z however does accumulate.

Lets work through your GrossFund calculation. IF Grossfund does not have a value from another data step or previous assignment the first use is the Lag(grossfund) function. Returning missing. Multiply that by some value: result is missing. Add using + the result of another group of terms. The result is missing. So the Grossfund is assigned missing.

Repeat the next time because no actual value other than missing was ever assigned to Grossfund.

If the expectation for the FIRST execution of Grossfund is the part after the + sign the fix may be easy:

`GrossFund =Sum( Lag(GrossFund)*(1+NGrossReturn), &GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn)  );`

Maybe. Still may have issues depending what is actually in the 2 macro variables .

If you do not want the First assignment to Grossfund to be

`&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn)`

then more explanation is needed. And actual complete code used to "build" the data set. And the definitions of those macro variables.

Discussion stats
• 22 replies
• 588 views
• 4 likes
• 6 in conversation