Fluorite | Level 6

## Loop through and calculate the fields. If missing, fill with the previous value.

Hi everyone, SAS learner. I have a dataset from which I need to do some calculations on some of the columns and output in a new columns. The data set temp1 is already been sorted and group by with ID and Date and I generated a counters for different categories.

For each ID on that date, I need to do two calculations for category C (Customer)

Form1C = Princ * Rate2;

Form2C= (Princ - Div ) * Rate1;

Output for C category :

Diff_C = Form 2C- Form1C

Similary for the category D (Div Customer)

Form1D = Princ * Rate2/100;

Form2D =( Princ + Gains) * Rate1

Output for 😧

Diff_D = Form2D - Form1D

** There are certain conditions to consider when calculating if the first record for any given ID has  a Rate1 =0 and Flag set as 'Y' then Diff_C and Diff_C should be set to Initial Amt.

** If it is not first record for that ID and if the Rate1=0 and Flag = 'Y' Then the Diff_C and Diff_D should take the previous value calculated for that ID.

``````/*Sample dataset */

data temp1;
input ID	Date \$	Initial Amt	Princ	Div	Gains	Flag \$	Rn	RnG	Category \$	Rate1	Rate2;

datalines;
934	18-Jul-17	1.67	16694.38	300	0	N	1	0	C	8	7
934	18-Aug-17	1.67	16662.95	298	0	N	2	0	C	8	7
934	17-Sep-17	1.67	16619.13	296	0	N	3	0	C	0	6
934	18-Oct-17	1.67	16591.16	295	0	N	4	0	C	8	7
484	4-Jul-17	0	6982	200	0	N	1	0	C	13	12
484	4-Aug-17	0	6900	198	0	N	2	0	C	13	12
484	4-Sep-17	0	6818	196	0	N	3	0	C	0	12
484	4-Oct-17	0	6736	194	0	N	4	0	C	13	12
404	18-Feb-17	1.56	1200	165	0	Y	1	1	D	6.5	5.5
404	18-Mar-17	1.56	1180	0	9	Y	2	2	D	0	5.5
404	18-Apr-17	1.56	1160	0	8.98	Y	3	3	D	0	5.5
404	18-May-17	1.56	1140	0	8.96	Y	4	4	D	0	5.5
404	18-Jun-17	1.56	1120	0	8.94	Y	5	5	D	6.5	5.5
404	18-Jul-17	1.56	1100	0	8.92	Y	6	6	D	6.5	5.5
;
run

``````

Here is the code that I wrote which is working. However I would like to know how it can implemented in the most efficient way since the data set will have records in millions. Appreciate any help.

``````/** My sample code ****/

proc sql;
create table ctemp_1 as
select
R.*,
(Form2C-Form1C) as Diff_C,
(Form2D-Form1D) as Diff_D
FROM (
select
ID,
Date,
Initial Amt,
Princ,
Div,
Gains,
Flag,
Rn,
RnG,
Category,
Rate1,
Rate2,
Princ* Rate2 as Form1C,
(Princ- Div ) * Rate1 as Form2C,
Princ* (Rate2/100) as Form1D,
( Princ+ Gains) * Rate1 as Form2D
FROM temp1 ) R
order by 1, 2;
QUIT;

/** I am update the records to . if the Rate1=0 and Flag is Y ***/
proc sql;
update ctemp_1
set Diff_C = .,
Diff_D = .
where Rate1 =0 and Flag = 'Y' ;
Quit;
/*** My final Output here ************/
data want;
set ctemp_1;
by ID Date;
retain _Diff_C _Diff_D;
if first.ID and Rate1=0 and Rn =1 and RnG =0 Then Diff_C = Initial Amt;
if first.ID and Rate1=0 and Rn =1 and RnG =1 Then Diff_D = Initial Amt;
if not missing(Diff_C) then _Diff_C = Diff_C;
If not missing(Diff_D) then _Diff_D = Diff_D;
if missing (Diff_C) then Diff_C = _Diff_C;
if missing (Diff_D) then Diff_D = _Diff_D;
drop _:;
run;``````

Sample output

 ID Date Flag Rn RnG Category Rate1 Diff_C Diff_D 934 18-Jul-17 N 1 0 C 8 14294.38 132386.4 934 18-Aug-17 N 2 0 C 8 14278.95 132137.2 934 17-Sep-17 N 3 0 C 0 14278.95 132137.2 934 18-Oct-17 N 4 0 C 8 14231.16 131567.9 484 4-Jul-17 N 1 0 C 13 4382 89928.16 484 4-Aug-17 N 2 0 C 13 4326 88872 484 4-Sep-17 N 3 0 C 0 4326 88872 484 4-Oct-17 N 4 0 C 13 4214 86759.68 404 18-Feb-17 Y 1 1 D 6.5 127.5 7734 404 18-Mar-17 Y 2 2 D 0 127.5 7734 404 18-Apr-17 Y 3 3 D 0 127.5 7734 404 18-May-17 Y 4 4 D 0 127.5 7734 404 18-Jun-17 Y 5 5 D 6.5 1120 7276.51 404 18-Jul-17 Y 6 6 D 6.5 1100 7147.48
1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Loop through and calculate the fields. If missing, fill with the previous value.

@sasnew_484 wrote:

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.

When you use Proc Contents or other tools to examine the properties of the SAS variable does it have a FORMAT such as ddmmyy assigned? In SAS FORMAT has a very specific meaning as the way  a value is displayed. The same value of a variable can be displayed using a different SAS format at different times. So "looks like" is not a Format.

You can run this code and look at the log for some example of displaying the same numeric value using different Formats.

```data _null_;
x=1;
put "X with date9 format " x= date9.;
put "X with Z8.1 format "  x= z8.1 ;
put "X with datetime18. format " x= datetime18.;
put "X with time8. format "   x= time8.;
run;```

Dates, time and datetime values are numeric with certain ranges. The Format is so the value makes sense to humans. If your variable is not numeric then it is not a date value.

5 REPLIES 5
Super User

## Re: Loop through and calculate the fields. If missing, fill with the previous value.

@sasnew_484 wrote:

Hi everyone, SAS learner. I have a dataset from which I need to do some calculations on some of the columns and output in a new columns. The data set temp1 is already been sorted and group by with ID and Date and I generated a counters for different categories.

For each ID on that date, I need to do two calculations for category C (Customer)

Form1C = Princ * Rate2;

Form2C= (Princ - Div ) * Rate1;

Output for C category :

Diff_C = Form 2C- Form1C

Similary for the category D (Div Customer)

Form1D = Princ * Rate2/100;

Form2D =( Princ + Gains) * Rate1

Output for 😧

Diff_D = Form2D - Form1D

** There are certain conditions to consider when calculating if the first record for any given ID has  a Rate1 =0 and Flag set as 'Y' then Diff_C and Diff_C should be set to Initial Amt.

** If it is not first record for that ID and if the Rate1=0 and Flag = 'Y' Then the Diff_C and Diff_D should take the previous value calculated for that ID.

``````/*Sample dataset */

data temp1;
input ID	Date \$	Initial Amt	Princ	Div	Gains	Flag \$	Rn	RnG	Category \$	Rate1	Rate2;

datalines;
934	18-Jul-17	1.67	16694.38	300	0	N	1	0	C	8	7
934	18-Aug-17	1.67	16662.95	298	0	N	2	0	C	8	7
934	17-Sep-17	1.67	16619.13	296	0	N	3	0	C	0	6
934	18-Oct-17	1.67	16591.16	295	0	N	4	0	C	8	7
484	4-Jul-17	0	6982	200	0	N	1	0	C	13	12
484	4-Aug-17	0	6900	198	0	N	2	0	C	13	12
484	4-Sep-17	0	6818	196	0	N	3	0	C	0	12
484	4-Oct-17	0	6736	194	0	N	4	0	C	13	12
404	18-Feb-17	1.56	1200	165	0	Y	1	1	D	6.5	5.5
404	18-Mar-17	1.56	1180	0	9	Y	2	2	D	0	5.5
404	18-Apr-17	1.56	1160	0	8.98	Y	3	3	D	0	5.5
404	18-May-17	1.56	1140	0	8.96	Y	4	4	D	0	5.5
404	18-Jun-17	1.56	1120	0	8.94	Y	5	5	D	6.5	5.5
404	18-Jul-17	1.56	1100	0	8.92	Y	6	6	D	6.5	5.5
;
run

``````

Here is the code that I wrote which is working. However I would like to know how it can implemented in the most efficient way since the data set will have records in millions. Appreciate any help.

``````/** My sample code ****/

proc sql;
create table ctemp_1 as
select
R.*,
(Form2C-Form1C) as Diff_C,
(Form2D-Form1D) as Diff_D
FROM (
select
ID,
Date,
Initial Amt,
Princ,
Div,
Gains,
Flag,
Rn,
RnG,
Category,
Rate1,
Rate2,
Princ* Rate2 as Form1C,
(Princ- Div ) * Rate1 as Form2C,
Princ* (Rate2/100) as Form1D,
( Princ+ Gains) * Rate1 as Form2D
FROM temp1 ) R
order by 1, 2;
QUIT;

/** I am update the records to . if the Rate1=0 and Flag is Y ***/
proc sql;
update ctemp_1
set Diff_C = .,
Diff_D = .
where Rate1 =0 and Flag = 'Y' ;
Quit;
/*** My final Output here ************/
data want;
set ctemp_1;
by ID Date;
retain _Diff_C _Diff_D;
if first.ID and Rate1=0 and Rn =1 and RnG =0 Then Diff_C = Initial Amt;
if first.ID and Rate1=0 and Rn =1 and RnG =1 Then Diff_D = Initial Amt;
if not missing(Diff_C) then _Diff_C = Diff_C;
If not missing(Diff_D) then _Diff_D = Diff_D;
if missing (Diff_C) then Diff_C = _Diff_C;
if missing (Diff_D) then Diff_D = _Diff_D;
drop _:;
run;``````

Sample output

 ID Date Flag Rn RnG Category Rate1 Diff_C Diff_D 934 18-Jul-17 N 1 0 C 8 14294.38 132386.4 934 18-Aug-17 N 2 0 C 8 14278.95 132137.2 934 17-Sep-17 N 3 0 C 0 14278.95 132137.2 934 18-Oct-17 N 4 0 C 8 14231.16 131567.9 484 4-Jul-17 N 1 0 C 13 4382 89928.16 484 4-Aug-17 N 2 0 C 13 4326 88872 484 4-Sep-17 N 3 0 C 0 4326 88872 484 4-Oct-17 N 4 0 C 13 4214 86759.68 404 18-Feb-17 Y 1 1 D 6.5 127.5 7734 404 18-Mar-17 Y 2 2 D 0 127.5 7734 404 18-Apr-17 Y 3 3 D 0 127.5 7734 404 18-May-17 Y 4 4 D 0 127.5 7734 404 18-Jun-17 Y 5 5 D 6.5 1120 7276.51 404 18-Jul-17 Y 6 6 D 6.5 1100 7147.48

The data in your data step code has no "missing" values. So you need to define what you mean by missing and likely "previous value" as well.

Fluorite | Level 6

## Re: Loop through and calculate the fields. If missing, fill with the previous value.

Thanks for your response. My bad, I should have edited it properly. I meant for the computed values such as Diff_C and Diff_D, if the Rate1 has zero value then fill them with previous computed value.
Super User

## Re: Loop through and calculate the fields. If missing, fill with the previous value.

@sasnew_484 wrote:
Thanks for your response. My bad, I should have edited it properly. I meant for the computed values such as Diff_C and Diff_D, if the Rate1 has zero value then fill them with previous computed value.

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

Are your dates really character values? That makes "previous" moderately obnoxious to determine because a sort order would have '17-Sep-17' before '18-Jul-17'.  BTW you really should use 4 digit years to remove any possible confusion.

Proc SQL is generally not to the tool to use when you need to process records in a given order.

Fluorite | Level 6

## Re: Loop through and calculate the fields. If missing, fill with the previous value.

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.

Super User

## Re: Loop through and calculate the fields. If missing, fill with the previous value.

@sasnew_484 wrote:

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.

When you use Proc Contents or other tools to examine the properties of the SAS variable does it have a FORMAT such as ddmmyy assigned? In SAS FORMAT has a very specific meaning as the way  a value is displayed. The same value of a variable can be displayed using a different SAS format at different times. So "looks like" is not a Format.

You can run this code and look at the log for some example of displaying the same numeric value using different Formats.

```data _null_;
x=1;
put "X with date9 format " x= date9.;
put "X with Z8.1 format "  x= z8.1 ;
put "X with datetime18. format " x= datetime18.;
put "X with time8. format "   x= time8.;
run;```

Dates, time and datetime values are numeric with certain ranges. The Format is so the value makes sense to humans. If your variable is not numeric then it is not a date value.

Discussion stats
• 5 replies
• 982 views
• 0 likes
• 2 in conversation