Solved: Loop through and calculate the fields. If missing, fill with the previ...

sasnew_484 · Posted 08-20-2021 12:02 PM

Hi everyone, SAS learner. I have a dataset from which I need to do some calculations on some of the columns and output in a new columns. The data set temp1 is already been sorted and group by with ID and Date and I generated a counters for different categories.

For each ID on that date, I need to do two calculations for category C (Customer)

Form1C = Princ * Rate2;

Form2C= (Princ - Div ) * Rate1;

Output for C category :

Diff_C = Form 2C- Form1C

Similary for the category D (Div Customer)

Form1D = Princ * Rate2/100;

Form2D =( Princ + Gains) * Rate1

Output for 😧

Diff_D = Form2D - Form1D

** There are certain conditions to consider when calculating if the first record for any given ID has a Rate1 =0 and Flag set as 'Y' then Diff_C and Diff_C should be set to Initial Amt.

** If it is not first record for that ID and if the Rate1=0 and Flag = 'Y' Then the Diff_C and Diff_D should take the previous value calculated for that ID.

/*Sample dataset */ 

data temp1;
 input ID	Date $	Initial Amt	Princ	Div	Gains	Flag $	Rn	RnG	Category $	Rate1	Rate2;

datalines;
934	18-Jul-17	1.67	16694.38	300	0	N	1	0	C	8	7
934	18-Aug-17	1.67	16662.95	298	0	N	2	0	C	8	7
934	17-Sep-17	1.67	16619.13	296	0	N	3	0	C	0	6
934	18-Oct-17	1.67	16591.16	295	0	N	4	0	C	8	7
484	4-Jul-17	0	6982	200	0	N	1	0	C	13	12
484	4-Aug-17	0	6900	198	0	N	2	0	C	13	12
484	4-Sep-17	0	6818	196	0	N	3	0	C	0	12
484	4-Oct-17	0	6736	194	0	N	4	0	C	13	12
404	18-Feb-17	1.56	1200	165	0	Y	1	1	D	6.5	5.5
404	18-Mar-17	1.56	1180	0	9	Y	2	2	D	0	5.5
404	18-Apr-17	1.56	1160	0	8.98	Y	3	3	D	0	5.5
404	18-May-17	1.56	1140	0	8.96	Y	4	4	D	0	5.5
404	18-Jun-17	1.56	1120	0	8.94	Y	5	5	D	6.5	5.5
404	18-Jul-17	1.56	1100	0	8.92	Y	6	6	D	6.5	5.5
; 
run

Here is the code that I wrote which is working. However I would like to know how it can implemented in the most efficient way since the data set will have records in millions. Appreciate any help.

/** My sample code ****/

proc sql;
create table ctemp_1 as
select 
R.*, 
(Form2C-Form1C) as Diff_C,
(Form2D-Form1D) as Diff_D
FROM (
select 
ID,
Date,
Initial Amt,
Princ,
Div,
Gains,
Flag,
Rn,
RnG,
Category,
Rate1,
Rate2,
Princ* Rate2 as Form1C, 
(Princ- Div ) * Rate1 as Form2C,
Princ* (Rate2/100) as Form1D,
( Princ+ Gains) * Rate1 as Form2D
FROM temp1 ) R
order by 1, 2;
QUIT;

/** I am update the records to . if the Rate1=0 and Flag is Y ***/
proc sql;
update ctemp_1
 set Diff_C = .,
Diff_D = .
where Rate1 =0 and Flag = 'Y' ;
Quit;
/*** My final Output here ************/
data want;
set ctemp_1;
by ID Date;
retain _Diff_C _Diff_D;
if first.ID and Rate1=0 and Rn =1 and RnG =0 Then Diff_C = Initial Amt; 
if first.ID and Rate1=0 and Rn =1 and RnG =1 Then Diff_D = Initial Amt;
if not missing(Diff_C) then _Diff_C = Diff_C;
If not missing(Diff_D) then _Diff_D = Diff_D;
if missing (Diff_C) then Diff_C = _Diff_C;
if missing (Diff_D) then Diff_D = _Diff_D;
drop _:;
run;

Sample output

ID	Date	Flag	Rn	RnG	Category	Rate1	Diff_C	Diff_D
934	18-Jul-17	N	1	0	C	8	14294.38	132386.4
934	18-Aug-17	N	2	0	C	8	14278.95	132137.2
934	17-Sep-17	N	3	0	C	0	14278.95	132137.2
934	18-Oct-17	N	4	0	C	8	14231.16	131567.9
484	4-Jul-17	N	1	0	C	13	4382	89928.16
484	4-Aug-17	N	2	0	C	13	4326	88872
484	4-Sep-17	N	3	0	C	0	4326	88872
484	4-Oct-17	N	4	0	C	13	4214	86759.68
404	18-Feb-17	Y	1	1	D	6.5	127.5	7734
404	18-Mar-17	Y	2	2	D	0	127.5	7734
404	18-Apr-17	Y	3	3	D	0	127.5	7734
404	18-May-17	Y	4	4	D	0	127.5	7734
404	18-Jun-17	Y	5	5	D	6.5	1120	7276.51
404	18-Jul-17	Y	6	6	D	6.5	1100	7147.48

ballardw · Posted 08-20-2021 04:34 PM

@sasnew_484 wrote:

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.

When you use Proc Contents or other tools to examine the properties of the SAS variable does it have a FORMAT such as ddmmyy assigned? In SAS FORMAT has a very specific meaning as the way a value is displayed. The same value of a variable can be displayed using a different SAS format at different times. So "looks like" is not a Format.

You can run this code and look at the log for some example of displaying the same numeric value using different Formats.

data _null_;
   x=1;
   put "X with date9 format " x= date9.;
   put "X with Z8.1 format "  x= z8.1 ;
   put "X with datetime18. format " x= datetime18.;
   put "X with time8. format "   x= time8.;
run;

Dates, time and datetime values are numeric with certain ranges. The Format is so the value makes sense to humans. If your variable is not numeric then it is not a date value.

View solution in original post

ballardw · Posted 08-20-2021 01:20 PM

@sasnew_484 wrote:

Hi everyone, SAS learner. I have a dataset from which I need to do some calculations on some of the columns and output in a new columns. The data set temp1 is already been sorted and group by with ID and Date and I generated a counters for different categories.

For each ID on that date, I need to do two calculations for category C (Customer)

Form1C = Princ * Rate2;

Form2C= (Princ - Div ) * Rate1;

Output for C category :

Diff_C = Form 2C- Form1C

Similary for the category D (Div Customer)

Form1D = Princ * Rate2/100;

Form2D =( Princ + Gains) * Rate1

Output for 😧

Diff_D = Form2D - Form1D

** There are certain conditions to consider when calculating if the first record for any given ID has a Rate1 =0 and Flag set as 'Y' then Diff_C and Diff_C should be set to Initial Amt.

** If it is not first record for that ID and if the Rate1=0 and Flag = 'Y' Then the Diff_C and Diff_D should take the previous value calculated for that ID.
/*Sample dataset */ 

data temp1;
 input ID	Date $	Initial Amt	Princ	Div	Gains	Flag $	Rn	RnG	Category $	Rate1	Rate2;

datalines;
934	18-Jul-17	1.67	16694.38	300	0	N	1	0	C	8	7
934	18-Aug-17	1.67	16662.95	298	0	N	2	0	C	8	7
934	17-Sep-17	1.67	16619.13	296	0	N	3	0	C	0	6
934	18-Oct-17	1.67	16591.16	295	0	N	4	0	C	8	7
484	4-Jul-17	0	6982	200	0	N	1	0	C	13	12
484	4-Aug-17	0	6900	198	0	N	2	0	C	13	12
484	4-Sep-17	0	6818	196	0	N	3	0	C	0	12
484	4-Oct-17	0	6736	194	0	N	4	0	C	13	12
404	18-Feb-17	1.56	1200	165	0	Y	1	1	D	6.5	5.5
404	18-Mar-17	1.56	1180	0	9	Y	2	2	D	0	5.5
404	18-Apr-17	1.56	1160	0	8.98	Y	3	3	D	0	5.5
404	18-May-17	1.56	1140	0	8.96	Y	4	4	D	0	5.5
404	18-Jun-17	1.56	1120	0	8.94	Y	5	5	D	6.5	5.5
404	18-Jul-17	1.56	1100	0	8.92	Y	6	6	D	6.5	5.5
; 
run
Here is the code that I wrote which is working. However I would like to know how it can implemented in the most efficient way since the data set will have records in millions. Appreciate any help.
/** My sample code ****/

proc sql;
create table ctemp_1 as
select 
R.*, 
(Form2C-Form1C) as Diff_C,
(Form2D-Form1D) as Diff_D
FROM (
select 
ID,
Date,
Initial Amt,
Princ,
Div,
Gains,
Flag,
Rn,
RnG,
Category,
Rate1,
Rate2,
Princ* Rate2 as Form1C, 
(Princ- Div ) * Rate1 as Form2C,
Princ* (Rate2/100) as Form1D,
( Princ+ Gains) * Rate1 as Form2D
FROM temp1 ) R
order by 1, 2;
QUIT;

/** I am update the records to . if the Rate1=0 and Flag is Y ***/
proc sql;
update ctemp_1
 set Diff_C = .,
Diff_D = .
where Rate1 =0 and Flag = 'Y' ;
Quit;
/*** My final Output here ************/
data want;
set ctemp_1;
by ID Date;
retain _Diff_C _Diff_D;
if first.ID and Rate1=0 and Rn =1 and RnG =0 Then Diff_C = Initial Amt; 
if first.ID and Rate1=0 and Rn =1 and RnG =1 Then Diff_D = Initial Amt;
if not missing(Diff_C) then _Diff_C = Diff_C;
If not missing(Diff_D) then _Diff_D = Diff_D;
if missing (Diff_C) then Diff_C = _Diff_C;
if missing (Diff_D) then Diff_D = _Diff_D;
drop _:;
run;
Sample output

ID Date Flag Rn RnG Category Rate1 Diff_C Diff_D

934 18-Jul-17 N 1 0 C 8 14294.38 132386.4

934 18-Aug-17 N 2 0 C 8 14278.95 132137.2

934 17-Sep-17 N 3 0 C 0 14278.95 132137.2

934 18-Oct-17 N 4 0 C 8 14231.16 131567.9

484 4-Jul-17 N 1 0 C 13 4382 89928.16

484 4-Aug-17 N 2 0 C 13 4326 88872

484 4-Sep-17 N 3 0 C 0 4326 88872

484 4-Oct-17 N 4 0 C 13 4214 86759.68

404 18-Feb-17 Y 1 1 D 6.5 127.5 7734

404 18-Mar-17 Y 2 2 D 0 127.5 7734

404 18-Apr-17 Y 3 3 D 0 127.5 7734

404 18-May-17 Y 4 4 D 0 127.5 7734

404 18-Jun-17 Y 5 5 D 6.5 1120 7276.51

404 18-Jul-17 Y 6 6 D 6.5 1100 7147.48

The data in your data step code has no "missing" values. So you need to define what you mean by missing and likely "previous value" as well.

sasnew_484 · Posted 08-20-2021 01:34 PM

Thanks for your response. My bad, I should have edited it properly. I meant for the computed values such as Diff_C and Diff_D, if the Rate1 has zero value then fill them with previous computed value.

ballardw · Posted 08-20-2021 03:28 PM

@sasnew_484 wrote:
Thanks for your response. My bad, I should have edited it properly. I meant for the computed values such as Diff_C and Diff_D, if the Rate1 has zero value then fill them with previous computed value.

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

Are your dates really character values? That makes "previous" moderately obnoxious to determine because a sort order would have '17-Sep-17' before '18-Jul-17'. BTW you really should use 4 digit years to remove any possible confusion.

Proc SQL is generally not to the tool to use when you need to process records in a given order.

sasnew_484 · Posted 08-20-2021 03:44 PM

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.

ballardw · Posted 08-20-2021 04:34 PM

@sasnew_484 wrote:

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.

When you use Proc Contents or other tools to examine the properties of the SAS variable does it have a FORMAT such as ddmmyy assigned? In SAS FORMAT has a very specific meaning as the way a value is displayed. The same value of a variable can be displayed using a different SAS format at different times. So "looks like" is not a Format.

You can run this code and look at the log for some example of displaying the same numeric value using different Formats.

data _null_;
   x=1;
   put "X with date9 format " x= date9.;
   put "X with Z8.1 format "  x= z8.1 ;
   put "X with datetime18. format " x= datetime18.;
   put "X with time8. format "   x= time8.;
run;

Dates, time and datetime values are numeric with certain ranges. The Format is so the value makes sense to humans. If your variable is not numeric then it is not a date value.

Loop through and calculate the fields. If missing, fill with the previous value.

Re: Loop through and calculate the fields. If missing, fill with the previous value.

Re: Loop through and calculate the fields. If missing, fill with the previous value.

Re: Loop through and calculate the fields. If missing, fill with the previous value.

Re: Loop through and calculate the fields. If missing, fill with the previous value.

Re: Loop through and calculate the fields. If missing, fill with the previous value.

Re: Loop through and calculate the fields. If missing, fill with the previous value.

SAS Innovate 2025: Call for Content

Classroom Training Available!