Hi,
I have a dataset (called "ac.test") that has three million records and there are three variables: DATE, VALUE_1 and VALUE_2, all of which are numeric variables. Is there any code that could help to efficiently create a new variable "Value_1_times_2" by calculating VALUE_1 multiplied by VALUE_2 for all records for each month (the screenshot below only part of the dataset, with dates from June 2017-January 2019 in the complete dataset), then summing up all of the individual accounts for that month using proc summary?
Steps:
1. Create new variable, "Value_1_times_2". Value_1_times_2 = Value_1*Value_2.
2. Sum up all of "Value_1_times_2" for that month e.g. in the screenshot below, proc summary would show that the sum of all "Value_1_times_2" for August 2018 (note that all records for each month only have the end date of the month) i.e. 46.
3. Repeat the process/loop for all months in the dataset in order to show that the sum of "Value_1_times_2" in August 2018 is 46, 94 in September 2018, 53 in October 2018, 79 in November 2018 and 77 in December 2018 etc.?
Example:
data have;
set ac.test;
value_1_times_2 = value1*value2;
run;
proc summary data=have nway;
class month;
var value_1_times_2;
output out=_sum_ sum=;
run;
No looping needed. This assumes that, as shown in your sample data set, the month values are always the last day of the month. Otherwise, if there are other days in the data set that are not the last day of the month, a simple modification of the code will fix that to get monthly results.
Hi @jeremy4
Does the following code meet your expectations?
data test;
input Date:date9. Value_1 Value_2;
format Date date9.;
datalines;
30AUG2018 3 8
30AUG2018 2 7
30AUG2018 4 2
30SEP2018 8 8
30SEP2018 3 3
30SEP2018 7 3
30OCT2018 4 9
30OCT2018 1 5
30OCT2018 2 6
30NOV2018 5 10
30NOV2018 2 4
30NOV2018 3 7
31DEC2018 2 8
31DEC2018 5 9
31DEC2018 8 2
;
run;
data want;
set test;
by Date;
Value_1_times_2 = Value_1 * Value_2;
run;
proc means data=want noprint;
var Value_1_times_2;
class Date;
ways 1;
output out=want_stat (drop=_:) sum=sum_Value_1_times_2;
run;
data have;
set ac.test;
value_1_times_2 = value1*value2;
run;
proc summary data=have nway;
class month;
var value_1_times_2;
output out=_sum_ sum=;
run;
No looping needed. This assumes that, as shown in your sample data set, the month values are always the last day of the month. Otherwise, if there are other days in the data set that are not the last day of the month, a simple modification of the code will fix that to get monthly results.
Thanks for your help! Just as a reference, please can you tell me how the code would change if the dates were not all from the end of the month (i.e. the best way make sure that different dates belong to that one month and in the proc summary, have the results grouped by that one month)?
@jeremy4 wrote:
Thanks for your help! Just as a reference, please can you tell me how the code would change if the dates were not all from the end of the month (i.e. the best way make sure that different dates belong to that one month and in the proc summary, have the results grouped by that one month)?
You create a new variable that either contains the last day of a given month
period = intnx('month',date,0,'e');
or the period as string
period = put(date,yymmn6.);
and use that as class variable in proc summary.
With all due respect to @Kurt_Bremser , I am not fond of his solution. I prefer to apply a format to the variable DATE. Such as:
format date yymm.;
I like to create a new variable, because it can also be used in a data step BY or in a SQL GROUP BY and yield the same groups. But that's just me.
What you are asking for is the weighted sum of value1, weighted by value2 (or value2 weighted by value1) classified by date. PROC SUMMARY can do this, without doing a multiplication step;
proc summary data=have;
class date;
var value1;
weight value2;
output out=want sum=value1_time_value2;
run;
I would add nway, so we only get the group sums:
proc summary data=have nway;
class date;
var value1;
weight value2;
output
out=want (drop=_type_ _freq_)
sum=value1_times_2
;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.