BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasnew_484
Fluorite | Level 6

Hi everyone, SAS learner. I have a dataset from which I need to do some calculations on some of the columns and output in a new columns. The data set temp1 is already been sorted and group by with ID and Date and I generated a counters for different categories. 

 

For each ID on that date, I need to do two calculations for category C (Customer)

Form1C = Princ * Rate2;

Form2C= (Princ - Div ) * Rate1;

Output for C category : 

Diff_C = Form 2C- Form1C

Similary for the category D (Div Customer)

Form1D = Princ * Rate2/100;

Form2D =( Princ + Gains) * Rate1

Output for 😧 

Diff_D = Form2D - Form1D

 

** There are certain conditions to consider when calculating if the first record for any given ID has  a Rate1 =0 and Flag set as 'Y' then Diff_C and Diff_C should be set to Initial Amt. 

** If it is not first record for that ID and if the Rate1=0 and Flag = 'Y' Then the Diff_C and Diff_D should take the previous value calculated for that ID. 

  

/*Sample dataset */ 

data temp1;
 input ID	Date $	Initial Amt	Princ	Div	Gains	Flag $	Rn	RnG	Category $	Rate1	Rate2;

datalines;
934	18-Jul-17	1.67	16694.38	300	0	N	1	0	C	8	7
934	18-Aug-17	1.67	16662.95	298	0	N	2	0	C	8	7
934	17-Sep-17	1.67	16619.13	296	0	N	3	0	C	0	6
934	18-Oct-17	1.67	16591.16	295	0	N	4	0	C	8	7
484	4-Jul-17	0	6982	200	0	N	1	0	C	13	12
484	4-Aug-17	0	6900	198	0	N	2	0	C	13	12
484	4-Sep-17	0	6818	196	0	N	3	0	C	0	12
484	4-Oct-17	0	6736	194	0	N	4	0	C	13	12
404	18-Feb-17	1.56	1200	165	0	Y	1	1	D	6.5	5.5
404	18-Mar-17	1.56	1180	0	9	Y	2	2	D	0	5.5
404	18-Apr-17	1.56	1160	0	8.98	Y	3	3	D	0	5.5
404	18-May-17	1.56	1140	0	8.96	Y	4	4	D	0	5.5
404	18-Jun-17	1.56	1120	0	8.94	Y	5	5	D	6.5	5.5
404	18-Jul-17	1.56	1100	0	8.92	Y	6	6	D	6.5	5.5
; 
run

 

 

Here is the code that I wrote which is working. However I would like to know how it can implemented in the most efficient way since the data set will have records in millions. Appreciate any help. 

 

 

/** My sample code ****/

proc sql;
create table ctemp_1 as
select 
R.*, 
(Form2C-Form1C) as Diff_C,
(Form2D-Form1D) as Diff_D
FROM (
select 
ID,
Date,
Initial Amt,
Princ,
Div,
Gains,
Flag,
Rn,
RnG,
Category,
Rate1,
Rate2,
Princ* Rate2 as Form1C, 
(Princ- Div ) * Rate1 as Form2C,
Princ* (Rate2/100) as Form1D,
( Princ+ Gains) * Rate1 as Form2D
FROM temp1 ) R
order by 1, 2;
QUIT;

/** I am update the records to . if the Rate1=0 and Flag is Y ***/
proc sql;
update ctemp_1
 set Diff_C = .,
Diff_D = .
where Rate1 =0 and Flag = 'Y' ;
Quit;
/*** My final Output here ************/
data want;
set ctemp_1;
by ID Date;
retain _Diff_C _Diff_D;
if first.ID and Rate1=0 and Rn =1 and RnG =0 Then Diff_C = Initial Amt; 
if first.ID and Rate1=0 and Rn =1 and RnG =1 Then Diff_D = Initial Amt;
if not missing(Diff_C) then _Diff_C = Diff_C;
If not missing(Diff_D) then _Diff_D = Diff_D;
if missing (Diff_C) then Diff_C = _Diff_C;
if missing (Diff_D) then Diff_D = _Diff_D;
drop _:;
run;

 

  Sample output

 

IDDateFlagRnRnGCategoryRate1Diff_CDiff_D
93418-Jul-17N10C814294.38132386.4
93418-Aug-17N20C814278.95132137.2
93417-Sep-17N30C014278.95132137.2
93418-Oct-17N40C814231.16131567.9
4844-Jul-17N10C13438289928.16
4844-Aug-17N20C13432688872
4844-Sep-17N30C0432688872
4844-Oct-17N40C13421486759.68
40418-Feb-17Y11D6.5127.57734
40418-Mar-17Y22D0127.57734
40418-Apr-17Y33D0127.57734
40418-May-17Y44D0127.57734
40418-Jun-17Y55D6.511207276.51
40418-Jul-17Y66D6.511007147.48
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@sasnew_484 wrote:

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.


When you use Proc Contents or other tools to examine the properties of the SAS variable does it have a FORMAT such as ddmmyy assigned? In SAS FORMAT has a very specific meaning as the way  a value is displayed. The same value of a variable can be displayed using a different SAS format at different times. So "looks like" is not a Format.

You can run this code and look at the log for some example of displaying the same numeric value using different Formats.

data _null_;
   x=1;
   put "X with date9 format " x= date9.;
   put "X with Z8.1 format "  x= z8.1 ;
   put "X with datetime18. format " x= datetime18.;
   put "X with time8. format "   x= time8.;
run;

Dates, time and datetime values are numeric with certain ranges. The Format is so the value makes sense to humans. If your variable is not numeric then it is not a date value.

 

View solution in original post

5 REPLIES 5
ballardw
Super User

@sasnew_484 wrote:

Hi everyone, SAS learner. I have a dataset from which I need to do some calculations on some of the columns and output in a new columns. The data set temp1 is already been sorted and group by with ID and Date and I generated a counters for different categories. 

 

For each ID on that date, I need to do two calculations for category C (Customer)

Form1C = Princ * Rate2;

Form2C= (Princ - Div ) * Rate1;

Output for C category : 

Diff_C = Form 2C- Form1C

Similary for the category D (Div Customer)

Form1D = Princ * Rate2/100;

Form2D =( Princ + Gains) * Rate1

Output for 😧 

Diff_D = Form2D - Form1D

 

** There are certain conditions to consider when calculating if the first record for any given ID has  a Rate1 =0 and Flag set as 'Y' then Diff_C and Diff_C should be set to Initial Amt. 

** If it is not first record for that ID and if the Rate1=0 and Flag = 'Y' Then the Diff_C and Diff_D should take the previous value calculated for that ID. 

  

/*Sample dataset */ 

data temp1;
 input ID	Date $	Initial Amt	Princ	Div	Gains	Flag $	Rn	RnG	Category $	Rate1	Rate2;

datalines;
934	18-Jul-17	1.67	16694.38	300	0	N	1	0	C	8	7
934	18-Aug-17	1.67	16662.95	298	0	N	2	0	C	8	7
934	17-Sep-17	1.67	16619.13	296	0	N	3	0	C	0	6
934	18-Oct-17	1.67	16591.16	295	0	N	4	0	C	8	7
484	4-Jul-17	0	6982	200	0	N	1	0	C	13	12
484	4-Aug-17	0	6900	198	0	N	2	0	C	13	12
484	4-Sep-17	0	6818	196	0	N	3	0	C	0	12
484	4-Oct-17	0	6736	194	0	N	4	0	C	13	12
404	18-Feb-17	1.56	1200	165	0	Y	1	1	D	6.5	5.5
404	18-Mar-17	1.56	1180	0	9	Y	2	2	D	0	5.5
404	18-Apr-17	1.56	1160	0	8.98	Y	3	3	D	0	5.5
404	18-May-17	1.56	1140	0	8.96	Y	4	4	D	0	5.5
404	18-Jun-17	1.56	1120	0	8.94	Y	5	5	D	6.5	5.5
404	18-Jul-17	1.56	1100	0	8.92	Y	6	6	D	6.5	5.5
; 
run

 

 

Here is the code that I wrote which is working. However I would like to know how it can implemented in the most efficient way since the data set will have records in millions. Appreciate any help. 

 

 

/** My sample code ****/

proc sql;
create table ctemp_1 as
select 
R.*, 
(Form2C-Form1C) as Diff_C,
(Form2D-Form1D) as Diff_D
FROM (
select 
ID,
Date,
Initial Amt,
Princ,
Div,
Gains,
Flag,
Rn,
RnG,
Category,
Rate1,
Rate2,
Princ* Rate2 as Form1C, 
(Princ- Div ) * Rate1 as Form2C,
Princ* (Rate2/100) as Form1D,
( Princ+ Gains) * Rate1 as Form2D
FROM temp1 ) R
order by 1, 2;
QUIT;

/** I am update the records to . if the Rate1=0 and Flag is Y ***/
proc sql;
update ctemp_1
 set Diff_C = .,
Diff_D = .
where Rate1 =0 and Flag = 'Y' ;
Quit;
/*** My final Output here ************/
data want;
set ctemp_1;
by ID Date;
retain _Diff_C _Diff_D;
if first.ID and Rate1=0 and Rn =1 and RnG =0 Then Diff_C = Initial Amt; 
if first.ID and Rate1=0 and Rn =1 and RnG =1 Then Diff_D = Initial Amt;
if not missing(Diff_C) then _Diff_C = Diff_C;
If not missing(Diff_D) then _Diff_D = Diff_D;
if missing (Diff_C) then Diff_C = _Diff_C;
if missing (Diff_D) then Diff_D = _Diff_D;
drop _:;
run;

 

  Sample output

 

ID Date Flag Rn RnG Category Rate1 Diff_C Diff_D
934 18-Jul-17 N 1 0 C 8 14294.38 132386.4
934 18-Aug-17 N 2 0 C 8 14278.95 132137.2
934 17-Sep-17 N 3 0 C 0 14278.95 132137.2
934 18-Oct-17 N 4 0 C 8 14231.16 131567.9
484 4-Jul-17 N 1 0 C 13 4382 89928.16
484 4-Aug-17 N 2 0 C 13 4326 88872
484 4-Sep-17 N 3 0 C 0 4326 88872
484 4-Oct-17 N 4 0 C 13 4214 86759.68
404 18-Feb-17 Y 1 1 D 6.5 127.5 7734
404 18-Mar-17 Y 2 2 D 0 127.5 7734
404 18-Apr-17 Y 3 3 D 0 127.5 7734
404 18-May-17 Y 4 4 D 0 127.5 7734
404 18-Jun-17 Y 5 5 D 6.5 1120 7276.51
404 18-Jul-17 Y 6 6 D 6.5 1100 7147.48

The data in your data step code has no "missing" values. So you need to define what you mean by missing and likely "previous value" as well.

sasnew_484
Fluorite | Level 6
Thanks for your response. My bad, I should have edited it properly. I meant for the computed values such as Diff_C and Diff_D, if the Rate1 has zero value then fill them with previous computed value.
ballardw
Super User

@sasnew_484 wrote:
Thanks for your response. My bad, I should have edited it properly. I meant for the computed values such as Diff_C and Diff_D, if the Rate1 has zero value then fill them with previous computed value.

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

 

Are your dates really character values? That makes "previous" moderately obnoxious to determine because a sort order would have '17-Sep-17' before '18-Jul-17'.  BTW you really should use 4 digit years to remove any possible confusion.

 

Proc SQL is generally not to the tool to use when you need to process records in a given order.

sasnew_484
Fluorite | Level 6

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.

ballardw
Super User

@sasnew_484 wrote:

Rules? Such as within a an Id (seems likely)? If so, what if the value is missing for the first record of an Id? The previous would be from a different ID.

>> if the value is missing or the rate1 is zero for the first record of an ID, then it would have the initial amount instead. However, if the record is not the first and the rate1 =0, then the computed value in this case Diff_C and Diff_D will have its previous value.

Sorry, I am super new to Sas, I usually work using Tsql and Python. I am not aware of few things. The dates in the actual data are represent as 'ddmmyyyy' format. The sample data when I used on excel it came out as dd-Mon-yy.


When you use Proc Contents or other tools to examine the properties of the SAS variable does it have a FORMAT such as ddmmyy assigned? In SAS FORMAT has a very specific meaning as the way  a value is displayed. The same value of a variable can be displayed using a different SAS format at different times. So "looks like" is not a Format.

You can run this code and look at the log for some example of displaying the same numeric value using different Formats.

data _null_;
   x=1;
   put "X with date9 format " x= date9.;
   put "X with Z8.1 format "  x= z8.1 ;
   put "X with datetime18. format " x= datetime18.;
   put "X with time8. format "   x= time8.;
run;

Dates, time and datetime values are numeric with certain ranges. The Format is so the value makes sense to humans. If your variable is not numeric then it is not a date value.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1559 views
  • 0 likes
  • 2 in conversation