I have excel table and my goal is to convert this table to SAS.
The table has almost 100 columns with different statistic formula in each column, when some formulas depend on other columns.
This is the small sample of the table:
A B C D E
5 |Random number| Year | Normal Gross return | Gross Fund | Gross IRR |
Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)Niotice, that in this column I have to take value from previous row.
GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);
It returns Nulls only.
So your inputs that vary per observation are YEAR and NGROSSFUND .
data have;
input Year NGrossReturn;
datalines;
0 0
1 -0.181279377
2 0.25500056
3 0.085457165
4 -0.06416826
;
And your inputs that do not vary are:
retain GrossAnnualPremium 100 ExpenceLoad 0.06 ;
So your data step to calculate GROSSFUND could be something like this:
data want;
set have;
retain GrossAnnualPremium 100 ExpenceLoad 0.06 GrossFund 0;
if year=0 then grossfund=0;
else GrossFund = GrossFund*(1+NGrossReturn)
+ GrossAnnualPremium*(1-ExpenceLoad)*(1+NGrossReturn)
;
run;
Result:
Gross NGross Annual Expence Gross Obs Year Return Premium Load Fund 1 0 0.00000 100 0.06 0.000 2 1 -0.18128 100 0.06 76.960 3 2 0.25500 100 0.06 214.555 4 3 0.08546 100 0.06 334.923 5 4 -0.06417 100 0.06 401.400
NOTE: For the RETAIN of GROSSFUND to work it cannot be in the input dataset. Because then when the SET statement reads the next observation the value read from the input dataset will overwrite the retained value.
The excel formula to calculate column D (Gross Fund) is:
Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)
Since D6 is blank, I have no idea how the calculation works to give a value in D7. Please explain this.
Then you want to write code in SAS that says if any column D cell is missing, then you set it to zero.
I tried this as well.
This is the code I tried:
If Year = 0 Then GrossFund = 0;
GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);
And this is result it return:
| 0| 0| 0.00% | . | | 1| 0.5126916173 | ( 18.13%) | 76.95973857| | 2| 0.9550716021 | 25.50% | .| | 3| -0.875237107 | 8.55% | .| | 4| 0.8329493204 | ( 6.42%) | .|
As you can see this code works just once.
Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.
@PaigeMiller wrote:Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.
Hi,
The description is not so clear to me. Anyway, I prepared the code that build first 100 rows of the data set I'm working on.
To notice that the full set has:
6 Tranches (From 1 To 6),
Each tranche contains 15000 Iterations (From 1 to 15000),
Each Iteration contains 42 years (From 0 to 41) Where Rows with Year=0 must be blank or zero values.
%let Stdev_base = 0.187493224932249;
%let weight_to_equities = 0.5;
%let Stdev_ToUse = &Stdev_base*&weight_to_equities;
%let Term_Interval = 1;
Data Test;
Infile datalines dlm="09"x dsd;
Input Tranche Iteration Year NormInv NetReturn Tresv NGrossReturn;
datalines;
1 1 0 0 0
1 1 1 -0.401683176 -0.192583212 16 -0.181279377
1 1 2 -2.188102767 0.237673136 16 0.25500056
1 1 3 -1.082351454 0.070470576 16 0.085457165
1 1 4 0.502844303 -0.077089014 16 -0.06416826
1 1 5 -0.764272142 0.075668747 16 0.09072811
1 1 6 0.832236652 0.080999439 16 0.096133431
1 1 7 -0.248428708 0.037800642 16 0.052329851
1 1 8 -0.696929109 0.012473564 16 0.026648194
1 1 9 -0.177708172 0.022553857 16 0.036869611
1 1 10 0.361635108 0.060861948 16 0.075714016
1 1 11 0.968456872 -0.03329219 16 -0.019758281
1 1 12 0.507503347 0.100902401 16 0.116315034
1 1 13 0.094918729 0.044654956 16 0.059280125
1 1 14 -0.268100788 0.100029068 16 0.115429475
1 1 15 1.114988618 -0.095820118 16 -0.0831616
1 1 16 0.621812038 0.020710059 16 0.035
1 1 17 -0.240327682 -0.03444714
1 1 18 -0.518506761 -0.05930167
1 1 19 -0.380628296 -0.047063621
1 1 20 0.216490394 0.007800988
1 1 21 -0.865064972 -0.08937257
1 1 22 0.516558723 0.036553334
1 1 23 0.635065672 0.048133258
1 1 24 0.614818545 0.046145683
1 1 25 1.703085114 0.15850912
1 1 26 0.306297412 0.016321587
1 1 27 1.054664304 0.090184211
1 1 28 -0.927067255 -0.094650243
1 1 29 -0.378329899 -0.046858273
1 1 30 1.37836213 0.123773543
1 1 31 0.698802551 0.054414724
1 1 32 1.877871249 0.177648372
1 1 33 -1.801178849 -0.165880781
1 1 34 -2.301837034 -0.204125642
1 1 35 -0.064140869 -0.01836675
1 1 36 -0.011118909 -0.013475268
1 1 37 0.16684725 0.003121704
1 1 38 -0.907010833 -0.092946383
1 1 39 -0.5522686 -0.062274328
1 1 40 -0.926762702 -0.094624394
1 1 41 -0.49304509 -0.057053593
1 2 0 0 0
1 2 1 0.272052109 -0.192583212 16 -0.181279377
1 2 2 -1.281898069 0.237673136 16 0.25500056
1 2 3 -0.647768883 0.070470576 16 0.085457165
1 2 4 -0.565498028 -0.077089014 16 -0.06416826
1 2 5 0.41197637 0.075668747 16 0.09072811
1 2 6 -0.779413048 0.080999439 16 0.096133431
1 2 7 0.855643268 0.037800642 16 0.052329851
1 2 8 -0.866463325 0.012473564 16 0.026648194
1 2 9 1.07588221 0.022553857 16 0.036869611
1 2 10 0.163612115 0.060861948 16 0.075714016
1 2 11 -0.033478564 -0.03329219 16 -0.019758281
1 2 12 -0.260679455 0.100902401 16 0.116315034
1 2 13 1.02590182 0.044654956 16 0.059280125
1 2 14 -2.021439569 0.100029068 16 0.115429475
1 2 15 -0.266694607 -0.095820118 16 -0.0831616
1 2 16 1.394018033 0.020710059 16 0.035
1 2 17 -0.042766779 -0.016397831
1 2 18 0.224547946 0.008562537
1 2 19 -2.84164275 -0.243398743
1 2 20 -0.529668341 -0.060285465
1 2 21 1.249656009 0.110295831
1 2 22 -0.032835878 -0.015481683
1 2 23 -0.336196189 -0.04308602
1 2 24 0.321693245 0.01778951
1 2 25 0.664098464 0.050989875
1 2 26 -0.969997516 -0.098286561
1 2 27 1.234485887 0.108717949
1 2 28 -0.371113375 -0.046213231
1 2 29 -0.217901648 -0.03241506
1 2 30 -0.487483006 -0.056561787
1 2 31 -0.462498733 -0.054349485
1 2 32 -0.425287629 -0.051044903
1 2 33 1.539378448 0.140865279
1 2 34 -0.458411447 -0.053987072
1 2 35 0.871198126 0.071594097
1 2 36 1.278369561 0.11328855
1 2 37 -0.232312352 -0.033721341
1 2 38 0.435834918 0.028738736
1 2 39 1.057741102 0.090498708
1 2 40 0.987034372 0.083294223
1 2 41 0.249422192 0.010917124
1 3 0 0 0
1 3 1 -0.24711478 -0.192583212 16 -0.181279377
1 3 2 0.193853784 0.237673136 16 0.25500056
1 3 3 0.722182005 0.070470576 16 0.085457165
1 3 4 -0.524994456 -0.077089014 16 -0.06416826
1 3 5 0.597940421 0.075668747 16 0.09072811
1 3 6 1.822602911 0.080999439 16 0.096133431
1 3 7 -0.66550602 0.037800642 16 0.052329851
1 3 8 0.229251942 0.012473564 16 0.026648194
1 3 9 0.515539259 0.022553857 16 0.036869611
1 3 10 -0.464778008 0.060861948 16 0.075714016
1 3 11 -2.071355087 -0.03329219 16 -0.019758281
1 3 12 -0.290791577 0.100902401 16 0.116315034
1 3 13 -0.092066762 0.044654956 16 0.059280125
1 3 14 0.564862272 0.100029068 16 0.115429475
1 3 15 -0.251988426 -0.095820118 16 -0.0831616
;
run;
I don't understand.
You seem to have all of the observations already in the source data.
What are the variables where you already HAVE the data on the observation.
What are the variables where you need to CREATE a value on the observation.
What is the formula for creating the values?
Do you want to GENERATE 6 'tranches'? Or do you already have them? Could we just pull out the first observations per "tranch" from your giant file and have everything we need to create the other observations?
So if your input is this data:
data have;
infile datalines dsd dlm='|' truncover;
input Tranche Iteration Year NormInv NetReturn Tresv NGrossReturn;
datalines;
1|1|0|0|||0
1|1|1|-0.401683176|-0.192583212|16|-0.181279377
1|1|2|-2.188102767|0.237673136|16|0.25500056
1|1|3|-1.082351454| 0.070470576|16|0.085457165
1|1|4| 0.502844303|-0.077089014|16|-0.06416826
;
What is your OUTPUT. Please share the desired output for this 6 observation input.
Explain how you got from the INPUT to the OUTPUT.
Do you want to add variables? Which variables? How are they calculated?
Do you want to change the values of some variables? Which variables? On which observations?
Do you want to add observations? How many? How is that determined?
Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?
@Quentin wrote:Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?
Yes. Here is the whole NOTE:
NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 2445000 at 65:36 2445000 at 65:47 2445000 at 65:62 3645000 at 72:27 105000 at 72:30 NOTE: There were 3780000 observations read from the data set WORK.SIMULATOR1. NOTE: The data set WORK.SIMULATOR_FINAL has 3780000 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 1.96 seconds cpu time 1.90 seconds
Where do you set the value of the macro variables &grossannualpremium and &expenceload?
Set to what?
Is there some grouping of data, such as client, company, fund or whatever?
Is this use of LAG inside anything resembling an If/then/else block?
ENTIRE data step code, not snippets because other statements prior or after your shown statment(s) could quite easily effect the result.
Example data in the form of a data step is a good idea.
Note that anyplace you use + if a term on either side is missing the result is missing
Looks to me like you are trying to BUILD data. So you start with some initial values and that loop YEAR from 0 to whatever and apply the same formula over and over.
So something like this:
data initial_values;
GrossFund = 1000;
NGrossReturn = 0.10 ;
GrossAnnualPremium = 200 ;
ExpenceLoad = 0.25 ;
run;
data want;
set initial_values;
do year=0 to 10;
output;
GrossFund = GrossFund*(1+NGrossReturn)+GrossAnnualPremium*(1-ExpenceLoad)*(1+NGrossReturn);
end;
run;
Result
Gross Gross NGross Annual Expence Obs Fund Return Premium Load year 1 1000.00 0.1 200 0.25 0 2 1265.00 0.1 200 0.25 1 3 1556.50 0.1 200 0.25 2 4 1877.15 0.1 200 0.25 3 5 2229.87 0.1 200 0.25 4 6 2617.85 0.1 200 0.25 5 7 3044.64 0.1 200 0.25 6 8 3514.10 0.1 200 0.25 7 9 4030.51 0.1 200 0.25 8 10 4598.56 0.1 200 0.25 9 11 5223.42 0.1 200 0.25 10
Hi Tom!
Please, see my reply to @PaigeMiller
I am starting to think that you have a misunderstanding of the LAG function.
Your original post with
GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);
(besides not showing any value for two macro variables) will not have a lagged value Grossfund if it does not exist in the input data set OR was not successfully assigned a value for Grossfund prior.
Consider:
data have; input x ; datalines; 1 2 3 ; data example; set have; y= lag(y)+x; z= sum( lag(z),x); q = lag(q) * x; run;
on the first iteration of the data step there are no lag values for any of y, z or q.
So on the first iteration y = <missing> + x. Which results in a missing value. That will be the lagged value of y in the next iteration.
the SUM function however allows a result for missing plus actual values. So Z on the first step gets assigned a total of 1.
Q is assigned the value of <missing> * x. Which is missing. That will be lagged value of Q.
On the second iteration the lagged values of y and q are still missing. So result in missing values. And will continue to do so because the calculation used for those two variables in this example will always yield a missing result.
Z however does accumulate.
Lets work through your GrossFund calculation. IF Grossfund does not have a value from another data step or previous assignment the first use is the Lag(grossfund) function. Returning missing. Multiply that by some value: result is missing. Add using + the result of another group of terms. The result is missing. So the Grossfund is assigned missing.
Repeat the next time because no actual value other than missing was ever assigned to Grossfund.
If the expectation for the FIRST execution of Grossfund is the part after the + sign the fix may be easy:
GrossFund =Sum( Lag(GrossFund)*(1+NGrossReturn), &GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn) );
Maybe. Still may have issues depending what is actually in the 2 macro variables .
If you do not want the First assignment to Grossfund to be
&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn)
then more explanation is needed. And actual complete code used to "build" the data set. And the definitions of those macro variables.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.