Dear all,
I have the following dataset, DSN_1
and a second DSN_2
I would like to create a new variable Calcl in DSN_1,
that will calculate the following quantities per each obs of DSN_1
for ID = 18798 Calcl =1,100 + 1,100 * 0.25 *( p15 + p15*p16 + p15*p16*p17 + p15*p16*p17*p18)
for ID = 21345 Calcl =2 * 250 + 250 * 0.25 *( g9 + g9*g10 + g9*g10*g11 + g9*g10*g11*g12 + g9*g10*g11*g12*g13 + g9*g10*g11*g12*g13*g14)
for ID = 32178 Calcl =1,000 + 1,000 * 0.25 *( k4 + k4*k5 + k4*k5*k6)
Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms
i.e. for the 1st Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms
for the 2nd Calcl. will start at the 4th obs. of DSN_2 for cat2 and it will have 3 terms
for the 3rd Calcl. will start at the 9th obs. of DSN_2 for cat3 and it will have 6 terms
additionally when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled
i.e. 2 * 250 for the second Calcl.
Any hints will be more than welcome.
Thanking you in advance.
Kind regards
Forgive me but the wanted calculations should read as follows :
=1,100 + 1,100 * ( p15 + (0.25)^1 * p15*p16 + (0.25)^ 2 * p15*p16*p17 + (0.25)^3 * p15*p16*p17*p18)
=2 * 250 + 250 *( g9 + (0.25)^1 * g9*g10 + (0.25)^2 * g9*g10*g11 + (0.25)^3 * g9*g10*g11*g12 + (0.25)^4 * g9*g10*g11*g12*g13 + (0.25)^5 * g9*g10*g11*g12*g13*g14)
=1,000 + 1,000 * ( k4 + (0.25)^1 * k4*k5 + (0.25)^2 * k4*k5*k6)
Thank you for your attention
Dear,
Please find below the datasets I have DSN_1 and DSN_2 .
Additionally I have added the dataset WANT with the required CALCL variable with values produced by the posted summation+products.
In fact what I need is for each observation of DSN_1 to "lookup" table DSN_2 values for the appropriate CAT(I) (as dictated by variable FLG_A and perform the calculations
=1,100 + 1,100 * ( p15 + (0.25)^1 * p15*p16 + (0.25)^ 2 * p15*p16*p17 + (0.25)^3 * p15*p16*p17*p18)
=2 * 250 + 250 *( g9 + (0.25)^1 * g9*g10 + (0.25)^2 * g9*g10*g11 + (0.25)^3 * g9*g10*g11*g12 + (0.25)^4 * g9*g10*g11*g12*g13 + (0.25)^5 * g9*g10*g11*g12*g13*g14)
=1,000 + 1,000 * ( k4 + (0.25)^1 * k4*k5 + (0.25)^2 * k4*k5*k6)
in fact there are only two conditions
1st cond:
Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms
i.e. for the 1st Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms
(=1100+1100*(0,9711+(0,25**1)*0,9711*0,9735 +(0,25**2)*0,9711*0,9735 *0,9755 +0,25^3*0,9711*0,9735 *0,9755 *0,9777 )
for the 2nd Calcl. will start at the 9th obs. of DSN_2 for cat3 and it will have 6 terms
(=2*250+250*(0,9711 +(0,25**1)*0,9711 *0,9734 +(0,25**2)*0,9711 *0,9734 *0,9755 +(0,25**3)*0,9711 *0,9734 *0,9755 *0,9775 +(0,25**4)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 +(0,25**5)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 *0,9812 )
for the 3rd Calcl. will start at the 4th obs. of DSN_2 for cat2 and it will have 3 terms
(=1000+1000*(0,9367 + (0,25**1)*0,9367*0,9404 +(0,25**2)*0,9367*0,9404 *0,9438 )
2nd cond.
when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled
i.e. 2 * 250 for the second Calcl.
I hope this clarifies my question.
Thanking you in advance
==================================================
data DSN_1;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO
21345 2017/03/11 2018/06/12 2017/11/13 8 15 7 250 0,25 cat3 YES
32178 2017/08/01 2017/12/01 2017/11/13 3 7 4 1.000 0,25 cat2 NO
....
....
....
....
;
run;
data DSN_2;
input nr cat1 cat2 cat3 ;
cards;
0 1,0000 1,0000 1,0000
1 0,9160 0,9755 0,9604
2 0,9334 0,7908 0,9525
3 0,9364 0,9329 0,9554
4 0,9394 0,9367 0,9582
5 0,9424 0,9404 0,9609
6 0,9455 0,9438 0,9636
7 0,9486 0,9469 0,9662
8 0,9516 0,9499 0,9687
9 0,9547 0,9527 0,9711
10 0,9577 0,9556 0,9734
11 0,9605 0,9586 0,9755
12 0,9633 0,9616 0,9775
13 0,9661 0,9649 0,9794
14 0,9687 0,9683 0,9812
15 0,9711 0,9717 0,9829
16 0,9735 0,9749 0,9844
17 0,9755 0,9778 0,9888
18 0,9777 0,9805 0,9899
19 0,9798 0,9827 0,9907
20 0,9818 0,9845 0,9916
....
....
....
....
;
run;
data WANT;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $ CALCL;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO 2.507,12
21345 2017/03/11 2018/06/12 2017/11/13 8 15 7 250 0,25 cat3 YES 820,86
32178 2017/08/01 2017/12/01 2017/11/13 3 7 4 1.000 0,25 cat2 NO 2.208,93
....
....
....
....
;
run;
Thanks for posting data as SAS data, but I suspect you apparently didn't run your sample DATA steps. DSN1 has 16 vars defined, but only 11 data values per line. WANT has 17 vars defined but only 12 data values per line.
@Zeus_Olympous wrote:
Dear,
Please find below the datasets I have DSN_1 and DSN_2 .
Additionally I have added the dataset WANT with the required CALCL variable with values produced by the posted summation+products.
In fact what I need is for each observation of DSN_1 to "lookup" table DSN_2 values for the appropriate CAT(I) (as dictated by variable FLG_A and perform the calculations
=1,100 + 1,100 * ( p15 + (0.25)^1 * p15*p16 + (0.25)^ 2 * p15*p16*p17 + (0.25)^3 * p15*p16*p17*p18)
=2 * 250 + 250 *( g9 + (0.25)^1 * g9*g10 + (0.25)^2 * g9*g10*g11 + (0.25)^3 * g9*g10*g11*g12 + (0.25)^4 * g9*g10*g11*g12*g13 + (0.25)^5 * g9*g10*g11*g12*g13*g14)
=1,000 + 1,000 * ( k4 + (0.25)^1 * k4*k5 + (0.25)^2 * k4*k5*k6)
in fact there are only two conditions
1st cond:
Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms
i.e. for the 1st Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms
(=1100+1100*(0,9711+(0,25**1)*0,9711*0,9735 +(0,25**2)*0,9711*0,9735 *0,9755 +0,25^3*0,9711*0,9735 *0,9755 *0,9777 )
for the 2nd Calcl. will start at the 9th obs. of DSN_2 for cat3 and it will have 6 terms
(=2*250+250*(0,9711 +(0,25**1)*0,9711 *0,9734 +(0,25**2)*0,9711 *0,9734 *0,9755 +(0,25**3)*0,9711 *0,9734 *0,9755 *0,9775 +(0,25**4)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 +(0,25**5)*0,9711 *0,9734 *0,9755 *0,9775 *0,9794 *0,9812 )
for the 3rd Calcl. will start at the 4th obs. of DSN_2 for cat2 and it will have 3 terms
(=1000+1000*(0,9367 + (0,25**1)*0,9367*0,9404 +(0,25**2)*0,9367*0,9404 *0,9438 )
2nd cond.
when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled
i.e. 2 * 250 for the second Calcl.
I hope this clarifies my question.
Thanking you in advance
==================================================
data DSN_1;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO
21345 2017/03/11 2018/06/12 2017/11/13 8 15 7 250 0,25 cat3 YES
32178 2017/08/01 2017/12/01 2017/11/13 3 7 4 1.000 0,25 cat2 NO
....
....
....
....
;
run;
data DSN_2;
input nr cat1 cat2 cat3 ;
cards;
0 1,0000 1,0000 1,0000
1 0,9160 0,9755 0,9604
2 0,9334 0,7908 0,9525
3 0,9364 0,9329 0,9554
4 0,9394 0,9367 0,9582
5 0,9424 0,9404 0,9609
6 0,9455 0,9438 0,9636
7 0,9486 0,9469 0,9662
8 0,9516 0,9499 0,9687
9 0,9547 0,9527 0,9711
10 0,9577 0,9556 0,9734
11 0,9605 0,9586 0,9755
12 0,9633 0,9616 0,9775
13 0,9661 0,9649 0,9794
14 0,9687 0,9683 0,9812
15 0,9711 0,9717 0,9829
16 0,9735 0,9749 0,9844
17 0,9755 0,9778 0,9888
18 0,9777 0,9805 0,9899
19 0,9798 0,9827 0,9907
20 0,9818 0,9845 0,9916
....
....
....
....
;
run;
data WANT;
input ID $ Start_Date End_Date TODAY :YYMMDD10. No C1 $ Year2 $ C3 $ Date :DDMMYY10.
DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $ CALCL;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO 2.507,12
21345 2017/03/11 2018/06/12 2017/11/13 8 15 7 250 0,25 cat3 YES 820,86
32178 2017/08/01 2017/12/01 2017/11/13 3 7 4 1.000 0,25 cat2 NO 2.208,93
....
....
....
....
;
run;
My apologies.
=============================================
DSN_1 should read as
data DSN_1;
input ID $ Start_Date End_Date TODAY :YYMMDD10.
DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO
21345 2017/03/11 2018/06/12 2017/11/13 8 15 7 250 0,25 cat3 YES
32178 2017/08/01 2017/12/01 2017/11/13 3 7 4 1.000 0,25 cat2 NO
....
....
....
....
;
run;
and dataset WANT as:
data WANT;
input ID $ Start_Date End_Date TODAY :YYMMDD10.
DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $ CALCL;
cards;
18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO 2.507,12
21345 2017/03/11 2018/06/12 2017/11/13 8 15 7 250 0,25 cat3 YES 820,86
32178 2017/08/01 2017/12/01 2017/11/13 3 7 4 1.000 0,25 cat2 NO 2.208,93
....
....
....
....
;
run;
=============================================
Thanks
1 data DSN_1;
2 input ID $ Start_Date End_Date TODAY :YYMMDD10.
3 DIFF_MNTHS_FROM DIFF_MNTHS_TO Tot_MNTHS Amount Mult FLG_A $ DBL_SM_FRST_MONT $;
4 cards;
NOTE: Invalid data for Start_Date in line 5 7-16.
NOTE: Invalid data for End_Date in line 5 18-27.
NOTE: Invalid data for Mult in line 5 56-59.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
5 18798 2016/09/01 2018/04/07 2017/11/13 14 19 5 500 0,25 cat1 NO
ID=18798 Start_Date=. End_Date=. TODAY=21136 DIFF_MNTHS_FROM=14 DIFF_MNTHS_TO=19 Tot_MNTHS=5 Amount=500 Mult=. FLG_A=cat1
DBL_SM_FRST_MONT=NO _ERROR_=1 _N_=1
Here's a hint: consider using the POINT= options of the SET statement. Put that inside a loop that traverses the needed observations from DSN2 (and note that OBSERVATION 15 is for NR=14):
data want;
set dsn1;
if something then cacls=2*amount;
else cacls=amount;
do P= somewhere to somewhere_else;
set dsn2 point=p;
if flga='CAT1' then x=cat1;
else if flga='CAT2' then x=cat2;
else if flga='CAT3' then x=cat3;
cacls=cacls + some-function-of-X and multiplier;
end;
run;
Are there rules that come with this, or are these values hardcoded, i.e. you'e doing the calculation once without need to expand this beyond this exact example?
And you need to post data as text, not images.
@Zeus_Olympous wrote:
that will calculate the following quantities per each obs of DSN_1
for ID = 18798 Calcl =1,100 + 1,100 * 0.25 *( p15 + p15*p16 + p15*p16*p17 + p15*p16*p17*p18)
for ID = 21345 Calcl =2 * 250 + 250 * 0.25 *( g9 + g9*g10 + g9*g10*g11 + g9*g10*g11*g12 + g9*g10*g11*g12*g13 + g9*g10*g11*g12*g13*g14)
for ID = 32178 Calcl =1,000 + 1,000 * 0.25 *( k4 + k4*k5 + k4*k5*k6)
Each calculation will start at (DIFF_MNTHS_FROM + 1) obs. and will have (Tot_MNTHS - 1) terms
i.e. for the 1st Calcl. will start at the 15th obs. of DSN_2 for cat1 and it will have 4 terms
for the 2nd Calcl. will start at the 4th obs. of DSN_2 for cat2 and it will have 3 terms
for the 3rd Calcl. will start at the 9th obs. of DSN_2 for cat3 and it will have 6 terms
additionally when variable DBL_SM_FRST_MONT equals "YES" then the very first term will be doubled
i.e. 2 * 250 for the second Calcl.
Any hints will be more than welcome.
Thanking you in advance.
Kind regards
Hello ,
I provided some clarifications above.
Thznk you.
Dear Zeus:
Please come down from Olympus and show us mortals SAS datasets that describe your problem, and desired solution.
Athena is no doubt lurking around with the necessary wisdom, but will likely be more helpful to a well-defined problem placed on the alter of the sas forum temple.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.