Solved: I need help to calculate variable's value based on the previous value ...

IgorR · Posted 08-16-2023 05:20 AM

I have excel table and my goal is to convert this table to SAS.

The table has almost 100 columns with different statistic formula in each column, when some formulas depend on other columns.

This is the small sample of the table:

A B C D E

| ------------------|------|-------------------------|--------------|------------|

6 | | 0| | | |

7 | 1.294000103 | 1 | -18.13% | 76.96| -18.13% |

8 | -0.330857931 | 2 | 25.50% | 214.55| 9.14% |

9 | -0.003978838 | 3 | 8.55% | 334.92| 8.85% |

10| -0.130874637 | 4 | -6.42% | 401.40| 2.63% |

11 | -0.508842621 | 5 | 9.07% | 540.35| 4.69% |

The excel formula to calculate column D (Gross Fund) is:

Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)

Niotice, that in this column I have to take value from previous row.

"Premium" and "ExpenceLoad" are named ranges in excel.

I saw previous topics about Lag function and tried different approaches to solve the problem, but nothing gone well and I don't get the correct result.

The current SAS code in my program (Within Data step) is:

GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);

It returns Nulls only.

I need help to find the simplest solution to make the calculation with less steps as possible, because, as I noticed, i need to calculate almost a hundred additional variables. and i don't want the code to be to complicated and unreadable.

Thanks in advance!

Tom · Posted 08-20-2023 04:22 PM

So your inputs that vary per observation are YEAR and NGROSSFUND .

data have;
  input Year NGrossReturn;
datalines;
0 0
1 -0.181279377
2 0.25500056
3 0.085457165
4 -0.06416826
;

And your inputs that do not vary are:

  retain GrossAnnualPremium 100 ExpenceLoad 0.06  ;

So your data step to calculate GROSSFUND could be something like this:

data want;
  set have;
  retain GrossAnnualPremium 100 ExpenceLoad 0.06 GrossFund 0;
  if year=0 then grossfund=0;
  else GrossFund = GrossFund*(1+NGrossReturn)
            + GrossAnnualPremium*(1-ExpenceLoad)*(1+NGrossReturn)
  ;
run;

Result:

                            Gross
                NGross      Annual    Expence      Gross
Obs    Year     Return     Premium      Load        Fund

 1       0      0.00000      100        0.06       0.000
 2       1     -0.18128      100        0.06      76.960
 3       2      0.25500      100        0.06     214.555
 4       3      0.08546      100        0.06     334.923
 5       4     -0.06417      100        0.06     401.400

NOTE: For the RETAIN of GROSSFUND to work it cannot be in the input dataset. Because then when the SET statement reads the next observation the value read from the input dataset will overwrite the retained value.

View solution in original post

PaigeMiller · Posted 08-16-2023 05:31 AM

The excel formula to calculate column D (Gross Fund) is:

Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)

Since D6 is blank, I have no idea how the calculation works to give a value in D7. Please explain this.

--
Paige Miller

IgorR · Posted 08-16-2023 05:36 AM

Because of cell D6 in Excel is blank it's value in excel formula turns to be 0(zero).

PaigeMiller · Posted 08-16-2023 06:06 AM

Then you want to write code in SAS that says if any column D cell is missing, then you set it to zero.

--
Paige Miller

IgorR · Posted 08-16-2023 06:33 AM

I tried this as well.

This is the code I tried:

If Year = 0 Then GrossFund = 0;
GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);

And this is result it return:

|    0|             0|    0.00%  |            . |
|    1| 0.5126916173 | ( 18.13%) |   76.95973857|
|    2| 0.9550716021 |   25.50%  |             .|
|    3| -0.875237107 |    8.55%  |             .|
|    4| 0.8329493204 | ( 6.42%)  |             .|

As you can see this code works just once.

PaigeMiller · Posted 08-16-2023 07:12 AM

Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.

--
Paige Miller

IgorR · Posted 08-17-2023 03:05 AM

@PaigeMiller wrote:
Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.

Hi,

The description is not so clear to me. Anyway, I prepared the code that build first 100 rows of the data set I'm working on.

To notice that the full set has:

6 Tranches (From 1 To 6),

Each tranche contains 15000 Iterations (From 1 to 15000),

Each Iteration contains 42 years (From 0 to 41) Where Rows with Year=0 must be blank or zero values.

%let Stdev_base = 0.187493224932249;
%let weight_to_equities = 0.5;
%let Stdev_ToUse = &Stdev_base*&weight_to_equities;
%let Term_Interval = 1;

Data Test;
Infile datalines dlm="09"x dsd;
Input Tranche Iteration Year NormInv  NetReturn Tresv NGrossReturn;
datalines;
1	1	0	0			0
1	1	1	-0.401683176	-0.192583212	16	-0.181279377
1	1	2	-2.188102767	0.237673136	16	0.25500056
1	1	3	-1.082351454	0.070470576	16	0.085457165
1	1	4	0.502844303	-0.077089014	16	-0.06416826
1	1	5	-0.764272142	0.075668747	16	0.09072811
1	1	6	0.832236652	0.080999439	16	0.096133431
1	1	7	-0.248428708	0.037800642	16	0.052329851
1	1	8	-0.696929109	0.012473564	16	0.026648194
1	1	9	-0.177708172	0.022553857	16	0.036869611
1	1	10	0.361635108	0.060861948	16	0.075714016
1	1	11	0.968456872	-0.03329219	16	-0.019758281
1	1	12	0.507503347	0.100902401	16	0.116315034
1	1	13	0.094918729	0.044654956	16	0.059280125
1	1	14	-0.268100788	0.100029068	16	0.115429475
1	1	15	1.114988618	-0.095820118	16	-0.0831616
1	1	16	0.621812038	0.020710059	16	0.035
1	1	17	-0.240327682			-0.03444714
1	1	18	-0.518506761			-0.05930167
1	1	19	-0.380628296			-0.047063621
1	1	20	0.216490394			0.007800988
1	1	21	-0.865064972			-0.08937257
1	1	22	0.516558723			0.036553334
1	1	23	0.635065672			0.048133258
1	1	24	0.614818545			0.046145683
1	1	25	1.703085114			0.15850912
1	1	26	0.306297412			0.016321587
1	1	27	1.054664304			0.090184211
1	1	28	-0.927067255			-0.094650243
1	1	29	-0.378329899			-0.046858273
1	1	30	1.37836213			0.123773543
1	1	31	0.698802551			0.054414724
1	1	32	1.877871249			0.177648372
1	1	33	-1.801178849			-0.165880781
1	1	34	-2.301837034			-0.204125642
1	1	35	-0.064140869			-0.01836675
1	1	36	-0.011118909			-0.013475268
1	1	37	0.16684725			0.003121704
1	1	38	-0.907010833			-0.092946383
1	1	39	-0.5522686			-0.062274328
1	1	40	-0.926762702			-0.094624394
1	1	41	-0.49304509			-0.057053593
1	2	0	0			0
1	2	1	0.272052109	-0.192583212	16	-0.181279377
1	2	2	-1.281898069	0.237673136	16	0.25500056
1	2	3	-0.647768883	0.070470576	16	0.085457165
1	2	4	-0.565498028	-0.077089014	16	-0.06416826
1	2	5	0.41197637	0.075668747	16	0.09072811
1	2	6	-0.779413048	0.080999439	16	0.096133431
1	2	7	0.855643268	0.037800642	16	0.052329851
1	2	8	-0.866463325	0.012473564	16	0.026648194
1	2	9	1.07588221	0.022553857	16	0.036869611
1	2	10	0.163612115	0.060861948	16	0.075714016
1	2	11	-0.033478564	-0.03329219	16	-0.019758281
1	2	12	-0.260679455	0.100902401	16	0.116315034
1	2	13	1.02590182	0.044654956	16	0.059280125
1	2	14	-2.021439569	0.100029068	16	0.115429475
1	2	15	-0.266694607	-0.095820118	16	-0.0831616
1	2	16	1.394018033	0.020710059	16	0.035
1	2	17	-0.042766779			-0.016397831
1	2	18	0.224547946			0.008562537
1	2	19	-2.84164275			-0.243398743
1	2	20	-0.529668341			-0.060285465
1	2	21	1.249656009			0.110295831
1	2	22	-0.032835878			-0.015481683
1	2	23	-0.336196189			-0.04308602
1	2	24	0.321693245			0.01778951
1	2	25	0.664098464			0.050989875
1	2	26	-0.969997516			-0.098286561
1	2	27	1.234485887			0.108717949
1	2	28	-0.371113375			-0.046213231
1	2	29	-0.217901648			-0.03241506
1	2	30	-0.487483006			-0.056561787
1	2	31	-0.462498733			-0.054349485
1	2	32	-0.425287629			-0.051044903
1	2	33	1.539378448			0.140865279
1	2	34	-0.458411447			-0.053987072
1	2	35	0.871198126			0.071594097
1	2	36	1.278369561			0.11328855
1	2	37	-0.232312352			-0.033721341
1	2	38	0.435834918			0.028738736
1	2	39	1.057741102			0.090498708
1	2	40	0.987034372			0.083294223
1	2	41	0.249422192			0.010917124
1	3	0	0			0
1	3	1	-0.24711478	-0.192583212	16	-0.181279377
1	3	2	0.193853784	0.237673136	16	0.25500056
1	3	3	0.722182005	0.070470576	16	0.085457165
1	3	4	-0.524994456	-0.077089014	16	-0.06416826
1	3	5	0.597940421	0.075668747	16	0.09072811
1	3	6	1.822602911	0.080999439	16	0.096133431
1	3	7	-0.66550602	0.037800642	16	0.052329851
1	3	8	0.229251942	0.012473564	16	0.026648194
1	3	9	0.515539259	0.022553857	16	0.036869611
1	3	10	-0.464778008	0.060861948	16	0.075714016
1	3	11	-2.071355087	-0.03329219	16	-0.019758281
1	3	12	-0.290791577	0.100902401	16	0.116315034
1	3	13	-0.092066762	0.044654956	16	0.059280125
1	3	14	0.564862272	0.100029068	16	0.115429475
1	3	15	-0.251988426	-0.095820118	16	-0.0831616
;
run;

Tom · Posted 08-17-2023 11:49 AM

I don't understand.

You seem to have all of the observations already in the source data.

What are the variables where you already HAVE the data on the observation.

What are the variables where you need to CREATE a value on the observation.

What is the formula for creating the values?

Do you want to GENERATE 6 'tranches'? Or do you already have them? Could we just pull out the first observations per "tranch" from your giant file and have everything we need to create the other observations?

Tom · Posted 08-17-2023 12:08 PM

So if your input is this data:

data have;
  infile datalines dsd dlm='|' truncover;
  input Tranche Iteration Year NormInv  NetReturn Tresv NGrossReturn;
datalines;
1|1|0|0|||0
1|1|1|-0.401683176|-0.192583212|16|-0.181279377
1|1|2|-2.188102767|0.237673136|16|0.25500056
1|1|3|-1.082351454| 0.070470576|16|0.085457165
1|1|4| 0.502844303|-0.077089014|16|-0.06416826
;

What is your OUTPUT. Please share the desired output for this 6 observation input.

Explain how you got from the INPUT to the OUTPUT.

Do you want to add variables? Which variables? How are they calculated?

Do you want to change the values of some variables? Which variables? On which observations?

Do you want to add observations? How many? How is that determined?

Quentin · Posted 08-16-2023 11:00 AM

Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

IgorR · Posted 08-17-2023 01:29 AM

@Quentin wrote:
Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?

Yes. Here is the whole NOTE:

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      2445000 at 65:36   2445000 at 65:47   2445000 at 65:62   3645000 at 72:27   105000 at 72:30    
NOTE: There were 3780000 observations read from the data set WORK.SIMULATOR1.
NOTE: The data set WORK.SIMULATOR_FINAL has 3780000 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           1.96 seconds
      cpu time            1.90 seconds

ballardw · Posted 08-16-2023 10:55 AM

Where do you set the value of the macro variables &grossannualpremium and &expenceload?

Set to what?

Is there some grouping of data, such as client, company, fund or whatever?

Is this use of LAG inside anything resembling an If/then/else block?

ENTIRE data step code, not snippets because other statements prior or after your shown statment(s) could quite easily effect the result.

Example data in the form of a data step is a good idea.

Note that anyplace you use + if a term on either side is missing the result is missing

Tom · Posted 08-16-2023 11:05 AM

Looks to me like you are trying to BUILD data. So you start with some initial values and that loop YEAR from 0 to whatever and apply the same formula over and over.

So something like this:

data initial_values;
  GrossFund = 1000;
  NGrossReturn = 0.10 ;
  GrossAnnualPremium = 200 ;
  ExpenceLoad = 0.25 ;
run;

data want;
  set initial_values;
  do year=0 to 10;
    output;
    GrossFund = GrossFund*(1+NGrossReturn)+GrossAnnualPremium*(1-ExpenceLoad)*(1+NGrossReturn);
  end;
run;

Result

                             Gross
        Gross     NGross     Annual    Expence
Obs      Fund     Return    Premium      Load     year

  1    1000.00      0.1       200        0.25       0
  2    1265.00      0.1       200        0.25       1
  3    1556.50      0.1       200        0.25       2
  4    1877.15      0.1       200        0.25       3
  5    2229.87      0.1       200        0.25       4
  6    2617.85      0.1       200        0.25       5
  7    3044.64      0.1       200        0.25       6
  8    3514.10      0.1       200        0.25       7
  9    4030.51      0.1       200        0.25       8
 10    4598.56      0.1       200        0.25       9
 11    5223.42      0.1       200        0.25      10

IgorR · Posted 08-17-2023 03:07 AM

Hi Tom!

Please, see my reply to @PaigeMiller

ballardw · Posted 08-17-2023 11:45 AM

I am starting to think that you have a misunderstanding of the LAG function.

Your original post with

GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);

(besides not showing any value for two macro variables) will not have a lagged value Grossfund if it does not exist in the input data set OR was not successfully assigned a value for Grossfund prior.

Consider:

data have;
   input x ;
datalines;
1 
2 
3 
;

data example;
   set have;
   y= lag(y)+x;
   z= sum( lag(z),x);
   q = lag(q) * x;
run;

on the first iteration of the data step there are no lag values for any of y, z or q.

So on the first iteration y = <missing> + x. Which results in a missing value. That will be the lagged value of y in the next iteration.

the SUM function however allows a result for missing plus actual values. So Z on the first step gets assigned a total of 1.

Q is assigned the value of <missing> * x. Which is missing. That will be lagged value of Q.

On the second iteration the lagged values of y and q are still missing. So result in missing values. And will continue to do so because the calculation used for those two variables in this example will always yield a missing result.

Z however does accumulate.

Lets work through your GrossFund calculation. IF Grossfund does not have a value from another data step or previous assignment the first use is the Lag(grossfund) function. Returning missing. Multiply that by some value: result is missing. Add using + the result of another group of terms. The result is missing. So the Grossfund is assigned missing.

Repeat the next time because no actual value other than missing was ever assigned to Grossfund.

If the expectation for the FIRST execution of Grossfund is the part after the + sign the fix may be easy:

GrossFund =Sum( Lag(GrossFund)*(1+NGrossReturn), &GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn)  );

Maybe. Still may have issues depending what is actually in the 2 macro variables .

If you do not want the First assignment to Grossfund to be

&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn)

then more explanation is needed. And actual complete code used to "build" the data set. And the definitions of those macro variables.

I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

Re: I need help to calculate variable's value based on the previous value in the same variable

SAS Innovate 2025: Call for Content

Classroom Training Available!