BookmarkSubscribeRSS Feed
Zeus_Olympous
Obsidian | Level 7

Dear all,

 

I have the following dataset, DSN_1

 

pic1.jpg

 

 

 

and a second DSN_2

pic2.jpg

 

 

I would like to create a new variable Calcl in DSN_1,

 

pic3.jpg

 

that will calculate the following quantities per each obs of DSN_1

 

 

for ID = 18798  Calcl =1,100  +  1,100 * 0.25 *( p15 + p15*p16 + p15*p16*p17 + p15*p16*p17*p18)

 

for ID = 21345  Calcl =2 * 250  +  250 * 0.25 *( g9 + g9*g10 + g9*g10*g11 + g9*g10*g11*g12 + g9*g10*g11*g12*g13 + g9*g10*g11*g12*g13*g14)

 

for ID = 32178  Calcl =1,000  +  1,000 * 0.25 *( k4 + k4*k5 + k4*k5*k6)

 

Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms

i.e. for the 1st Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms

      for the 2nd Calcl. will start at the 4th obs. of DSN_2 for cat2 and it will have 3 terms

      for the 3rd Calcl. will start at the 9th obs. of DSN_2 for cat3 and it will have 6 terms

 

additionally when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled

i.e. 2 * 250 for the second Calcl.

 

Any hints will be more than welcome.

 

Thanking you in advance.

 

Kind regards

8 REPLIES 8
Zeus_Olympous
Obsidian | Level 7

Forgive me but the wanted calculations should read as follows :

 

=1,100  +  1,100 * ( p15 + (0.25)^1 * p15*p16 + (0.25)^ 2 * p15*p16*p17 + (0.25)^3 * p15*p16*p17*p18)

 

=2 * 250  +  250 *( g9 + (0.25)^1 * g9*g10 + (0.25)^2 * g9*g10*g11 + (0.25)^3 * g9*g10*g11*g12 + (0.25)^4 * g9*g10*g11*g12*g13 + (0.25)^5 * g9*g10*g11*g12*g13*g14)

 

=1,000  +  1,000 * ( k4 + (0.25)^1 * k4*k5 + (0.25)^2 * k4*k5*k6)

 

Thank you for your attention

Zeus_Olympous
Obsidian | Level 7

Dear,

 

Please find below the datasets I have DSN_1 and DSN_2 .

 

Additionally I have added the dataset WANT with the required CALCL variable with values produced by the posted summation+products.

 

In fact what I need is for each observation of DSN_1 to "lookup" table DSN_2  values for the appropriate CAT(I) (as dictated by variable FLG_A and perform the calculations

 

=1,100  +  1,100 * ( p15 + (0.25)^1 * p15*p16 + (0.25)^ 2 * p15*p16*p17 + (0.25)^3 * p15*p16*p17*p18)

 

=2 * 250  +  250 *( g9 + (0.25)^1 * g9*g10 + (0.25)^2 * g9*g10*g11 + (0.25)^3 * g9*g10*g11*g12 + (0.25)^4 * g9*g10*g11*g12*g13 + (0.25)^5 * g9*g10*g11*g12*g13*g14)

 

=1,000  +  1,000 * ( k4 + (0.25)^1 * k4*k5 + (0.25)^2 * k4*k5*k6)

 

in fact there are only two conditions

 

1st cond:

 

Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms

i.e. for the 1st  Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms

 

  (=1100+1100*(0,9711+(0,25**1)*0,9711*0,9735 +(0,25**2)*0,9711*0,9735 *0,9755 +0,25^3*0,9711*0,9735 *0,9755 *0,9777 )

 

      for the 2nd Calcl. will start at the 9th obs. of  DSN_2 for cat3 and it will have 6 terms

 

(=2*250+250*(0,9711 +(0,25**1)*0,9711 *0,9734 +(0,25**2)*0,9711 *0,9734 *0,9755 +(0,25**3)*0,9711 *0,9734 *0,9755 *0,9775 +(0,25**4)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 +(0,25**5)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 *0,9812 )

 

      for the 3rd Calcl. will start at the 4th obs. of   DSN_2 for cat2 and it will have 3 terms

(=1000+1000*(0,9367 + (0,25**1)*0,9367*0,9404 +(0,25**2)*0,9367*0,9404 *0,9438 )

 

 2nd cond.

 

when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled

i.e. 2 * 250 for the second Calcl.

 

I hope this clarifies my question.

Thanking you in advance

 

==================================================

 

data DSN_1;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
       DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5    500  0,25 cat1 NO
21345 2017/03/11 2018/06/12 2017/11/13  8 15 7    250  0,25 cat3 YES
32178 2017/08/01 2017/12/01 2017/11/13  3  7 4  1.000  0,25 cat2 NO
....
....
....
....
;
run;


data DSN_2;
input nr cat1 cat2 cat3 ;
cards;
0  1,0000   1,0000   1,0000
1  0,9160   0,9755   0,9604
2  0,9334   0,7908   0,9525
3  0,9364   0,9329   0,9554
4  0,9394   0,9367   0,9582
5  0,9424   0,9404   0,9609
6  0,9455   0,9438   0,9636
7  0,9486   0,9469   0,9662
8  0,9516   0,9499   0,9687
9  0,9547   0,9527   0,9711
10  0,9577   0,9556   0,9734
11  0,9605   0,9586   0,9755
12  0,9633   0,9616   0,9775
13  0,9661   0,9649   0,9794
14  0,9687   0,9683   0,9812
15  0,9711   0,9717   0,9829
16  0,9735   0,9749   0,9844
17  0,9755   0,9778   0,9888
18  0,9777   0,9805   0,9899
19  0,9798   0,9827   0,9907
20  0,9818   0,9845   0,9916
....
....
....
....
;
run;


data WANT;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
       DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $ CALCL;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5    500  0,25 cat1 NO    2.507,12
21345 2017/03/11 2018/06/12 2017/11/13  8 15 7    250  0,25 cat3 YES     820,86
32178 2017/08/01 2017/12/01 2017/11/13  3  7 4  1.000  0,25 cat2 NO    2.208,93
....
....
....
....
;
run;

mkeintz
PROC Star

@Zeus_Olympous

 

Thanks for posting data as SAS data, but I suspect you apparently didn't run your sample DATA steps.  DSN1 has 16 vars defined, but only 11 data values per line.  WANT has 17 vars defined but only 12 data values per line.

 


@Zeus_Olympous wrote:

Dear,

 

Please find below the datasets I have DSN_1 and DSN_2 .

 

Additionally I have added the dataset WANT with the required CALCL variable with values produced by the posted summation+products.

 

In fact what I need is for each observation of DSN_1 to "lookup" table DSN_2  values for the appropriate CAT(I) (as dictated by variable FLG_A and perform the calculations

 

=1,100  +  1,100 * ( p15 + (0.25)^1 * p15*p16 + (0.25)^ 2 * p15*p16*p17 + (0.25)^3 * p15*p16*p17*p18)

 

=2 * 250  +  250 *( g9 + (0.25)^1 * g9*g10 + (0.25)^2 * g9*g10*g11 + (0.25)^3 * g9*g10*g11*g12 + (0.25)^4 * g9*g10*g11*g12*g13 + (0.25)^5 * g9*g10*g11*g12*g13*g14)

 

=1,000  +  1,000 * ( k4 + (0.25)^1 * k4*k5 + (0.25)^2 * k4*k5*k6)

 

in fact there are only two conditions

 

1st cond:

 

Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms

i.e. for the 1st  Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms

 

  (=1100+1100*(0,9711+(0,25**1)*0,9711*0,9735 +(0,25**2)*0,9711*0,9735 *0,9755 +0,25^3*0,9711*0,9735 *0,9755 *0,9777 )

 

      for the 2nd Calcl. will start at the 9th obs. of  DSN_2 for cat3 and it will have 6 terms

 

(=2*250+250*(0,9711 +(0,25**1)*0,9711 *0,9734 +(0,25**2)*0,9711 *0,9734 *0,9755 +(0,25**3)*0,9711 *0,9734 *0,9755 *0,9775 +(0,25**4)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 +(0,25**5)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 *0,9812 )

 

      for the 3rd Calcl. will start at the 4th obs. of   DSN_2 for cat2 and it will have 3 terms

(=1000+1000*(0,9367 + (0,25**1)*0,9367*0,9404 +(0,25**2)*0,9367*0,9404 *0,9438 )

 

 2nd cond.

 

when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled

i.e. 2 * 250 for the second Calcl.

 

I hope this clarifies my question.

Thanking you in advance

 

==================================================

 

data DSN_1;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
       DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5    500  0,25 cat1 NO
21345 2017/03/11 2018/06/12 2017/11/13  8 15 7    250  0,25 cat3 YES
32178 2017/08/01 2017/12/01 2017/11/13  3  7 4  1.000  0,25 cat2 NO
....
....
....
....
;
run;


data DSN_2;
input nr cat1 cat2 cat3 ;
cards;
0  1,0000   1,0000   1,0000
1  0,9160   0,9755   0,9604
2  0,9334   0,7908   0,9525
3  0,9364   0,9329   0,9554
4  0,9394   0,9367   0,9582
5  0,9424   0,9404   0,9609
6  0,9455   0,9438   0,9636
7  0,9486   0,9469   0,9662
8  0,9516   0,9499   0,9687
9  0,9547   0,9527   0,9711
10  0,9577   0,9556   0,9734
11  0,9605   0,9586   0,9755
12  0,9633   0,9616   0,9775
13  0,9661   0,9649   0,9794
14  0,9687   0,9683   0,9812
15  0,9711   0,9717   0,9829
16  0,9735   0,9749   0,9844
17  0,9755   0,9778   0,9888
18  0,9777   0,9805   0,9899
19  0,9798   0,9827   0,9907
20  0,9818   0,9845   0,9916
....
....
....
....
;
run;


data WANT;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
       DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $ CALCL;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5    500  0,25 cat1 NO    2.507,12
21345 2017/03/11 2018/06/12 2017/11/13  8 15 7    250  0,25 cat3 YES     820,86
32178 2017/08/01 2017/12/01 2017/11/13  3  7 4  1.000  0,25 cat2 NO    2.208,93
....
....
....
....
;
run;


 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Zeus_Olympous
Obsidian | Level 7

My apologies.

 

=============================================

DSN_1 should read as

 

data DSN_1;
input ID $ Start_Date End_Date TODAY :YYMMDD10.
       DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5    500  0,25 cat1 NO
21345 2017/03/11 2018/06/12 2017/11/13  8 15 7    250  0,25 cat3 YES
32178 2017/08/01 2017/12/01 2017/11/13  3  7 4  1.000  0,25 cat2 NO
....
....
....
....
;
run;

 

and dataset WANT as:

 

data WANT;
input ID $ Start_Date End_Date TODAY :YYMMDD10.
       DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $ CALCL;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5    500  0,25 cat1 NO    2.507,12
21345 2017/03/11 2018/06/12 2017/11/13  8 15 7    250  0,25 cat3 YES     820,86
32178 2017/08/01 2017/12/01 2017/11/13  3  7 4  1.000  0,25 cat2 NO    2.208,93
....
....
....
....
;
run;

=============================================

Thanks

mkeintz
PROC Star
  1. It looks like you still haven't actually run the data steps with your sample data.  I get notes like the following from them (even after removing the "...." lines:
      

    1 data DSN_1;

    2 input ID $ Start_Date End_Date TODAY :YYMMDD10.

    3 DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;

    4 cards;

    NOTE: Invalid data for Start_Date in line 5 7-16.

    NOTE: Invalid data for End_Date in line 5 18-27.

    NOTE: Invalid data for Mult in line 5 56-59.

    RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8

    5 18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO

    ID=18798 Start_Date=. End_Date=. TODAY=21136 DIFF_MNTHS_FROM=14 DIFF_MNTHS_TO=19 Tot_MNTHS=5 Amount=500 Mult=. FLG_A=cat1

    DBL_SM_FRST_MONT=NO _ERROR_=1 _N_=1

  2. Where do you get the 1100 in the first observation?
  3. This looks like a homework problem.  What have you tried so far?

 

Here's a hint:  consider using the POINT= options of the SET statement. Put that inside a loop that traverses the needed observations from DSN2  (and note that OBSERVATION 15 is for NR=14):

 

data want;

   set dsn1;

   if something then cacls=2*amount;

   else cacls=amount;

 

 

   do P=   somewhere to somewhere_else;

      set dsn2 point=p;

      if flga='CAT1' then x=cat1;
      else if flga='CAT2' then  x=cat2;

      else if flga='CAT3' then x=cat3;

     cacls=cacls + some-function-of-X and multiplier;

   end;

run;

    

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

Are there rules that come with this, or are these values hardcoded, i.e. you'e doing the calculation once without need to expand this beyond this exact example?

 

And you need to post data as text, not images. 


@Zeus_Olympous wrote:

 

 

that will calculate the following quantities per each obs of DSN_1

 

 

for ID = 18798  Calcl =1,100  +  1,100 * 0.25 *( p15 + p15*p16 + p15*p16*p17 + p15*p16*p17*p18)

 

for ID = 21345  Calcl =2 * 250  +  250 * 0.25 *( g9 + g9*g10 + g9*g10*g11 + g9*g10*g11*g12 + g9*g10*g11*g12*g13 + g9*g10*g11*g12*g13*g14)

 

for ID = 32178  Calcl =1,000  +  1,000 * 0.25 *( k4 + k4*k5 + k4*k5*k6)

 

Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms

i.e. for the 1st Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms

      for the 2nd Calcl. will start at the 4th obs. of DSN_2 for cat2 and it will have 3 terms

      for the 3rd Calcl. will start at the 9th obs. of DSN_2 for cat3 and it will have 6 terms

 

additionally when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled

i.e. 2 * 250 for the second Calcl.

 

Any hints will be more than welcome.

 

Thanking you in advance.

 

Kind regards


 

Zeus_Olympous
Obsidian | Level 7

Hello ,

 

I provided some clarifications above.

 

Thznk you.

mkeintz
PROC Star

Dear Zeus:

 

Please come down from Olympus and show us mortals SAS datasets that describe your problem, and desired solution.

 

Athena is no doubt lurking around with the necessary wisdom, but will likely be more helpful to a well-defined problem placed on the alter of the sas forum temple.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1440 views
  • 2 likes
  • 3 in conversation