Hi everyone, SAS learner. I have a dataset from which I need to do some calculations on some of the columns and output in a new columns. The data set temp1 is already been sorted and group by with ID and Date and I generated a counters for different categories. For each ID on that date, I need to do two calculations for category C (Customer) Form1C = Princ * Rate2; Form2C= (Princ - Div ) * Rate1; Output for C category : Diff_C = Form 2C- Form1C Similary for the category D (Div Customer) Form1D = Princ * Rate2/100; Form2D =( Princ + Gains) * Rate1 Output for 😧 Diff_D = Form2D - Form1D ** There are certain conditions to consider when calculating if the first record for any given ID has a Rate1 =0 and Flag set as 'Y' then Diff_C and Diff_C should be set to Initial Amt. ** If it is not first record for that ID and if the Rate1=0 and Flag = 'Y' Then the Diff_C and Diff_D should take the previous value calculated for that ID. /*Sample dataset */
data temp1;
input ID Date $ Initial Amt Princ Div Gains Flag $ Rn RnG Category $ Rate1 Rate2;
datalines;
934 18-Jul-17 1.67 16694.38 300 0 N 1 0 C 8 7
934 18-Aug-17 1.67 16662.95 298 0 N 2 0 C 8 7
934 17-Sep-17 1.67 16619.13 296 0 N 3 0 C 0 6
934 18-Oct-17 1.67 16591.16 295 0 N 4 0 C 8 7
484 4-Jul-17 0 6982 200 0 N 1 0 C 13 12
484 4-Aug-17 0 6900 198 0 N 2 0 C 13 12
484 4-Sep-17 0 6818 196 0 N 3 0 C 0 12
484 4-Oct-17 0 6736 194 0 N 4 0 C 13 12
404 18-Feb-17 1.56 1200 165 0 Y 1 1 D 6.5 5.5
404 18-Mar-17 1.56 1180 0 9 Y 2 2 D 0 5.5
404 18-Apr-17 1.56 1160 0 8.98 Y 3 3 D 0 5.5
404 18-May-17 1.56 1140 0 8.96 Y 4 4 D 0 5.5
404 18-Jun-17 1.56 1120 0 8.94 Y 5 5 D 6.5 5.5
404 18-Jul-17 1.56 1100 0 8.92 Y 6 6 D 6.5 5.5
;
run
Here is the code that I wrote which is working. However I would like to know how it can implemented in the most efficient way since the data set will have records in millions. Appreciate any help. /** My sample code ****/
proc sql;
create table ctemp_1 as
select
R.*,
(Form2C-Form1C) as Diff_C,
(Form2D-Form1D) as Diff_D
FROM (
select
ID,
Date,
Initial Amt,
Princ,
Div,
Gains,
Flag,
Rn,
RnG,
Category,
Rate1,
Rate2,
Princ* Rate2 as Form1C,
(Princ- Div ) * Rate1 as Form2C,
Princ* (Rate2/100) as Form1D,
( Princ+ Gains) * Rate1 as Form2D
FROM temp1 ) R
order by 1, 2;
QUIT;
/** I am update the records to . if the Rate1=0 and Flag is Y ***/
proc sql;
update ctemp_1
set Diff_C = .,
Diff_D = .
where Rate1 =0 and Flag = 'Y' ;
Quit;
/*** My final Output here ************/
data want;
set ctemp_1;
by ID Date;
retain _Diff_C _Diff_D;
if first.ID and Rate1=0 and Rn =1 and RnG =0 Then Diff_C = Initial Amt;
if first.ID and Rate1=0 and Rn =1 and RnG =1 Then Diff_D = Initial Amt;
if not missing(Diff_C) then _Diff_C = Diff_C;
If not missing(Diff_D) then _Diff_D = Diff_D;
if missing (Diff_C) then Diff_C = _Diff_C;
if missing (Diff_D) then Diff_D = _Diff_D;
drop _:;
run; Sample output ID Date Flag Rn RnG Category Rate1 Diff_C Diff_D 934 18-Jul-17 N 1 0 C 8 14294.38 132386.4 934 18-Aug-17 N 2 0 C 8 14278.95 132137.2 934 17-Sep-17 N 3 0 C 0 14278.95 132137.2 934 18-Oct-17 N 4 0 C 8 14231.16 131567.9 484 4-Jul-17 N 1 0 C 13 4382 89928.16 484 4-Aug-17 N 2 0 C 13 4326 88872 484 4-Sep-17 N 3 0 C 0 4326 88872 484 4-Oct-17 N 4 0 C 13 4214 86759.68 404 18-Feb-17 Y 1 1 D 6.5 127.5 7734 404 18-Mar-17 Y 2 2 D 0 127.5 7734 404 18-Apr-17 Y 3 3 D 0 127.5 7734 404 18-May-17 Y 4 4 D 0 127.5 7734 404 18-Jun-17 Y 5 5 D 6.5 1120 7276.51 404 18-Jul-17 Y 6 6 D 6.5 1100 7147.48
... View more