BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
IgorR
Obsidian | Level 7

I have excel table and my goal is to convert this table to SAS.

The table has almost 100 columns with different statistic formula in each column, when some formulas depend on other columns.

This is the small sample of the table:

              A                        B                       C                                D                    E

5  |Random number| Year | Normal Gross return | Gross Fund | Gross IRR |

    | ------------------|------|-------------------------|--------------|------------|
6  |                               |        0|                                         |                        |                    |
7  |     1.294000103 |        1 |                         -18.13%  |              76.96|     -18.13% |
8  |   -0.330857931 |        2 |                        25.50%  |            214.55|         9.14% |
9  |  -0.003978838 |        3 |                          8.55%  |            334.92|        8.85% |
10|  -0.130874637  |        4 |                        -6.42%  |            401.40|        2.63% |
11 | -0.508842621   |       5  |                          9.07%  |            540.35|       4.69% |
 
The excel formula to calculate column D (Gross Fund) is:
Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)
Niotice, that in this column I have to take value from previous row.
"Premium" and "ExpenceLoad" are named ranges in excel.
I saw previous topics about Lag function and tried different approaches to solve the problem, but nothing gone well and I don't get the correct result.
The current SAS code in my program (Within Data step) is:
GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);
It returns Nulls only.
 
I need help to find the simplest solution to make the calculation with less steps as possible, because, as I noticed, i need to calculate almost a hundred additional variables. and i don't want the code to be to complicated and unreadable.
 
Thanks in advance!
 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So your inputs that vary per observation are YEAR and NGROSSFUND .

data have;
  input Year NGrossReturn;
datalines;
0 0
1 -0.181279377
2 0.25500056
3 0.085457165
4 -0.06416826
;

And your inputs that do not vary are:

  retain GrossAnnualPremium 100 ExpenceLoad 0.06  ;

So your data step to calculate GROSSFUND could be something like this:

data want;
  set have;
  retain GrossAnnualPremium 100 ExpenceLoad 0.06 GrossFund 0;
  if year=0 then grossfund=0;
  else GrossFund = GrossFund*(1+NGrossReturn)
            + GrossAnnualPremium*(1-ExpenceLoad)*(1+NGrossReturn)
  ;
run;

Result:

                            Gross
                NGross      Annual    Expence      Gross
Obs    Year     Return     Premium      Load        Fund

 1       0      0.00000      100        0.06       0.000
 2       1     -0.18128      100        0.06      76.960
 3       2      0.25500      100        0.06     214.555
 4       3      0.08546      100        0.06     334.923
 5       4     -0.06417      100        0.06     401.400

NOTE: For the RETAIN of GROSSFUND to work it cannot be in the input dataset.  Because then when the SET statement reads the next observation the value read from the input dataset will overwrite the retained value.

 

View solution in original post

22 REPLIES 22
PaigeMiller
Diamond | Level 26
The excel formula to calculate column D (Gross Fund) is:
Cell(D7)=D6*(1+C7)+Premium*(1-expenseload)*(1+C7)

 

Since D6 is blank, I have no idea how the calculation works to give a value in D7. Please explain this.

--
Paige Miller
IgorR
Obsidian | Level 7
Because of cell D6 in Excel is blank it's value in excel formula turns to be 0(zero).
PaigeMiller
Diamond | Level 26

Then you want to write code in SAS that says if any column D cell is missing, then you set it to zero.

--
Paige Miller
IgorR
Obsidian | Level 7

I tried this as well.

This is the code I tried:

If Year = 0 Then GrossFund = 0;
GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);

And this is result it return:

|    0|             0|    0.00%  |            . |
|    1| 0.5126916173 | ( 18.13%) |   76.95973857|
|    2| 0.9550716021 |   25.50%  |             .|
|    3| -0.875237107 |    8.55%  |             .|
|    4| 0.8329493204 | ( 6.42%)  |             .|

As you can see this code works just once.

PaigeMiller
Diamond | Level 26

Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.

--
Paige Miller
IgorR
Obsidian | Level 7

@PaigeMiller wrote:

Since you obviously have the data as a SAS data set, we need the SAS data set and not typed in Excel-clone data. Please provide the data as a working SAS data step code by following these instructions. We don't need all 100 columns, just the relevant ones.


Hi,

The description is not so clear to me. Anyway, I prepared the code that build first 100 rows of the data set I'm working on.

To notice that the full set has:

6 Tranches (From 1 To 6), 

Each tranche contains 15000 Iterations (From 1 to 15000),

Each Iteration contains 42 years (From 0 to 41) Where Rows with Year=0 must be blank or zero values.

 

%let Stdev_base = 0.187493224932249;
%let weight_to_equities = 0.5;
%let Stdev_ToUse = &Stdev_base*&weight_to_equities;
%let Term_Interval = 1;

Data Test;
Infile datalines dlm="09"x dsd;
Input Tranche Iteration Year NormInv  NetReturn Tresv NGrossReturn;
datalines;
1	1	0	0			0
1	1	1	-0.401683176	-0.192583212	16	-0.181279377
1	1	2	-2.188102767	0.237673136	16	0.25500056
1	1	3	-1.082351454	0.070470576	16	0.085457165
1	1	4	0.502844303	-0.077089014	16	-0.06416826
1	1	5	-0.764272142	0.075668747	16	0.09072811
1	1	6	0.832236652	0.080999439	16	0.096133431
1	1	7	-0.248428708	0.037800642	16	0.052329851
1	1	8	-0.696929109	0.012473564	16	0.026648194
1	1	9	-0.177708172	0.022553857	16	0.036869611
1	1	10	0.361635108	0.060861948	16	0.075714016
1	1	11	0.968456872	-0.03329219	16	-0.019758281
1	1	12	0.507503347	0.100902401	16	0.116315034
1	1	13	0.094918729	0.044654956	16	0.059280125
1	1	14	-0.268100788	0.100029068	16	0.115429475
1	1	15	1.114988618	-0.095820118	16	-0.0831616
1	1	16	0.621812038	0.020710059	16	0.035
1	1	17	-0.240327682			-0.03444714
1	1	18	-0.518506761			-0.05930167
1	1	19	-0.380628296			-0.047063621
1	1	20	0.216490394			0.007800988
1	1	21	-0.865064972			-0.08937257
1	1	22	0.516558723			0.036553334
1	1	23	0.635065672			0.048133258
1	1	24	0.614818545			0.046145683
1	1	25	1.703085114			0.15850912
1	1	26	0.306297412			0.016321587
1	1	27	1.054664304			0.090184211
1	1	28	-0.927067255			-0.094650243
1	1	29	-0.378329899			-0.046858273
1	1	30	1.37836213			0.123773543
1	1	31	0.698802551			0.054414724
1	1	32	1.877871249			0.177648372
1	1	33	-1.801178849			-0.165880781
1	1	34	-2.301837034			-0.204125642
1	1	35	-0.064140869			-0.01836675
1	1	36	-0.011118909			-0.013475268
1	1	37	0.16684725			0.003121704
1	1	38	-0.907010833			-0.092946383
1	1	39	-0.5522686			-0.062274328
1	1	40	-0.926762702			-0.094624394
1	1	41	-0.49304509			-0.057053593
1	2	0	0			0
1	2	1	0.272052109	-0.192583212	16	-0.181279377
1	2	2	-1.281898069	0.237673136	16	0.25500056
1	2	3	-0.647768883	0.070470576	16	0.085457165
1	2	4	-0.565498028	-0.077089014	16	-0.06416826
1	2	5	0.41197637	0.075668747	16	0.09072811
1	2	6	-0.779413048	0.080999439	16	0.096133431
1	2	7	0.855643268	0.037800642	16	0.052329851
1	2	8	-0.866463325	0.012473564	16	0.026648194
1	2	9	1.07588221	0.022553857	16	0.036869611
1	2	10	0.163612115	0.060861948	16	0.075714016
1	2	11	-0.033478564	-0.03329219	16	-0.019758281
1	2	12	-0.260679455	0.100902401	16	0.116315034
1	2	13	1.02590182	0.044654956	16	0.059280125
1	2	14	-2.021439569	0.100029068	16	0.115429475
1	2	15	-0.266694607	-0.095820118	16	-0.0831616
1	2	16	1.394018033	0.020710059	16	0.035
1	2	17	-0.042766779			-0.016397831
1	2	18	0.224547946			0.008562537
1	2	19	-2.84164275			-0.243398743
1	2	20	-0.529668341			-0.060285465
1	2	21	1.249656009			0.110295831
1	2	22	-0.032835878			-0.015481683
1	2	23	-0.336196189			-0.04308602
1	2	24	0.321693245			0.01778951
1	2	25	0.664098464			0.050989875
1	2	26	-0.969997516			-0.098286561
1	2	27	1.234485887			0.108717949
1	2	28	-0.371113375			-0.046213231
1	2	29	-0.217901648			-0.03241506
1	2	30	-0.487483006			-0.056561787
1	2	31	-0.462498733			-0.054349485
1	2	32	-0.425287629			-0.051044903
1	2	33	1.539378448			0.140865279
1	2	34	-0.458411447			-0.053987072
1	2	35	0.871198126			0.071594097
1	2	36	1.278369561			0.11328855
1	2	37	-0.232312352			-0.033721341
1	2	38	0.435834918			0.028738736
1	2	39	1.057741102			0.090498708
1	2	40	0.987034372			0.083294223
1	2	41	0.249422192			0.010917124
1	3	0	0			0
1	3	1	-0.24711478	-0.192583212	16	-0.181279377
1	3	2	0.193853784	0.237673136	16	0.25500056
1	3	3	0.722182005	0.070470576	16	0.085457165
1	3	4	-0.524994456	-0.077089014	16	-0.06416826
1	3	5	0.597940421	0.075668747	16	0.09072811
1	3	6	1.822602911	0.080999439	16	0.096133431
1	3	7	-0.66550602	0.037800642	16	0.052329851
1	3	8	0.229251942	0.012473564	16	0.026648194
1	3	9	0.515539259	0.022553857	16	0.036869611
1	3	10	-0.464778008	0.060861948	16	0.075714016
1	3	11	-2.071355087	-0.03329219	16	-0.019758281
1	3	12	-0.290791577	0.100902401	16	0.116315034
1	3	13	-0.092066762	0.044654956	16	0.059280125
1	3	14	0.564862272	0.100029068	16	0.115429475
1	3	15	-0.251988426	-0.095820118	16	-0.0831616
;
run;
Tom
Super User Tom
Super User

I don't understand.

You seem to have all of the observations already in the source data.

What are the variables where you already HAVE the data on the observation.

What are the variables where you need to CREATE a value on the observation.

What is the formula for creating the values?

 

Do you want to GENERATE 6 'tranches'?   Or do you already have them?  Could we just pull out the first observations per "tranch" from your giant file and have everything we need to create the other observations?

 

Tom
Super User Tom
Super User

So if your input is this data:

data have;
  infile datalines dsd dlm='|' truncover;
  input Tranche Iteration Year NormInv  NetReturn Tresv NGrossReturn;
datalines;
1|1|0|0|||0
1|1|1|-0.401683176|-0.192583212|16|-0.181279377
1|1|2|-2.188102767|0.237673136|16|0.25500056
1|1|3|-1.082351454| 0.070470576|16|0.085457165
1|1|4| 0.502844303|-0.077089014|16|-0.06416826
;

What is your OUTPUT.  Please share the desired output for this 6 observation input.

 

Explain how you got from the INPUT to the OUTPUT.

Do you want to add variables?  Which variables? How are they calculated? 

Do you want to change the values of some variables? Which variables? On which observations?

Do you want to add observations?  How many?  How is that determined?

Quentin
PROC Star

Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?

Check out the Boston Area SAS Users Group (BASUG) video archives: https://www.basug.org/videos.
IgorR
Obsidian | Level 7

@Quentin wrote:

Does your log show the message NOTE: Missing values were generated as a result of performing an operation on missing values?


Yes. Here is the whole NOTE:

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      2445000 at 65:36   2445000 at 65:47   2445000 at 65:62   3645000 at 72:27   105000 at 72:30    
NOTE: There were 3780000 observations read from the data set WORK.SIMULATOR1.
NOTE: The data set WORK.SIMULATOR_FINAL has 3780000 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           1.96 seconds
      cpu time            1.90 seconds



ballardw
Super User

Where do you set the value of the macro variables &grossannualpremium and &expenceload?

Set to what?

 

Is there some grouping of data, such as client, company, fund or whatever?

Is this use of LAG inside anything resembling an If/then/else block?

 

ENTIRE data step code, not snippets because other statements prior or after your shown statment(s) could quite easily effect the result.

Example data in the form of a data step is a good idea.

 

Note that anyplace you use + if a term on either side is missing the result is missing

Tom
Super User Tom
Super User

Looks to me like you are trying to BUILD data.  So you start with some initial values and that loop YEAR from 0 to whatever and apply the same formula over and over.

So something like this:

data initial_values;
  GrossFund = 1000;
  NGrossReturn = 0.10 ;
  GrossAnnualPremium = 200 ;
  ExpenceLoad = 0.25 ;
run;

data want;
  set initial_values;
  do year=0 to 10;
    output;
    GrossFund = GrossFund*(1+NGrossReturn)+GrossAnnualPremium*(1-ExpenceLoad)*(1+NGrossReturn);
  end;
run;

Result

                             Gross
        Gross     NGross     Annual    Expence
Obs      Fund     Return    Premium      Load     year

  1    1000.00      0.1       200        0.25       0
  2    1265.00      0.1       200        0.25       1
  3    1556.50      0.1       200        0.25       2
  4    1877.15      0.1       200        0.25       3
  5    2229.87      0.1       200        0.25       4
  6    2617.85      0.1       200        0.25       5
  7    3044.64      0.1       200        0.25       6
  8    3514.10      0.1       200        0.25       7
  9    4030.51      0.1       200        0.25       8
 10    4598.56      0.1       200        0.25       9
 11    5223.42      0.1       200        0.25      10
ballardw
Super User

I am starting to think that you have a misunderstanding of the LAG function.

Your original post with

GrossFund = Lag(GrossFund)*(1+NGrossReturn)+&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn);

(besides not showing any value for two macro variables) will not have a lagged value Grossfund if it does not exist in the input data set OR was not successfully assigned a value for Grossfund prior.

 

Consider:

data have;
   input x ;
datalines;
1 
2 
3 
;

data example;
   set have;
   y= lag(y)+x;
   z= sum( lag(z),x);
   q = lag(q) * x;
run;

on the first iteration of the data step there are no lag values for any of y, z or q. 

So on the first iteration y = <missing> + x. Which results in a missing value. That will be the lagged value of y in the next iteration.

the SUM function however allows a result for missing plus actual values. So Z on the first step gets assigned a total of 1.

Q is assigned the value of <missing> * x. Which is missing. That will be lagged value of Q.

On the second iteration the lagged values of y and q are still missing. So result in missing values. And will continue to do so because the calculation used for those two variables in this example will always yield a missing result.

Z however does accumulate.

 

Lets work through your GrossFund calculation. IF Grossfund does not have a value from another data step or previous assignment the first use is the Lag(grossfund) function. Returning missing. Multiply that by some value: result is missing. Add using + the result of another group of terms. The result is missing. So the Grossfund is assigned missing.

Repeat the next time because no actual value other than missing was ever assigned to Grossfund.

 

If the expectation for the FIRST execution of Grossfund is the part after the + sign the fix may be easy:

GrossFund =Sum( Lag(GrossFund)*(1+NGrossReturn), &GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn)  );

Maybe. Still may have issues depending what is actually in the 2 macro variables .

If you do not want the First assignment to Grossfund to be

&GrossAnnualPremium*(1-&ExpenceLoad)*(1+NGrossReturn)

then more explanation is needed. And actual complete code used to "build" the data set. And the definitions of those macro variables.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 588 views
  • 4 likes
  • 6 in conversation