- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dataset (called "ac.test") that has three million records and there are three variables: DATE, VALUE_1 and VALUE_2, all of which are numeric variables. Is there any code that could help to efficiently create a new variable "Value_1_times_2" by calculating VALUE_1 multiplied by VALUE_2 for all records for each month (the screenshot below only part of the dataset, with dates from June 2017-January 2019 in the complete dataset), then summing up all of the individual accounts for that month using proc summary?
Steps:
1. Create new variable, "Value_1_times_2". Value_1_times_2 = Value_1*Value_2.
2. Sum up all of "Value_1_times_2" for that month e.g. in the screenshot below, proc summary would show that the sum of all "Value_1_times_2" for August 2018 (note that all records for each month only have the end date of the month) i.e. 46.
3. Repeat the process/loop for all months in the dataset in order to show that the sum of "Value_1_times_2" in August 2018 is 46, 94 in September 2018, 53 in October 2018, 79 in November 2018 and 77 in December 2018 etc.?
Example:
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
set ac.test;
value_1_times_2 = value1*value2;
run;
proc summary data=have nway;
class month;
var value_1_times_2;
output out=_sum_ sum=;
run;
No looping needed. This assumes that, as shown in your sample data set, the month values are always the last day of the month. Otherwise, if there are other days in the data set that are not the last day of the month, a simple modification of the code will fix that to get monthly results.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @jeremy4
Does the following code meet your expectations?
data test;
input Date:date9. Value_1 Value_2;
format Date date9.;
datalines;
30AUG2018 3 8
30AUG2018 2 7
30AUG2018 4 2
30SEP2018 8 8
30SEP2018 3 3
30SEP2018 7 3
30OCT2018 4 9
30OCT2018 1 5
30OCT2018 2 6
30NOV2018 5 10
30NOV2018 2 4
30NOV2018 3 7
31DEC2018 2 8
31DEC2018 5 9
31DEC2018 8 2
;
run;
data want;
set test;
by Date;
Value_1_times_2 = Value_1 * Value_2;
run;
proc means data=want noprint;
var Value_1_times_2;
class Date;
ways 1;
output out=want_stat (drop=_:) sum=sum_Value_1_times_2;
run;
- Tags:
- PROC MEANS
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
set ac.test;
value_1_times_2 = value1*value2;
run;
proc summary data=have nway;
class month;
var value_1_times_2;
output out=_sum_ sum=;
run;
No looping needed. This assumes that, as shown in your sample data set, the month values are always the last day of the month. Otherwise, if there are other days in the data set that are not the last day of the month, a simple modification of the code will fix that to get monthly results.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your help! Just as a reference, please can you tell me how the code would change if the dates were not all from the end of the month (i.e. the best way make sure that different dates belong to that one month and in the proc summary, have the results grouped by that one month)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jeremy4 wrote:
Thanks for your help! Just as a reference, please can you tell me how the code would change if the dates were not all from the end of the month (i.e. the best way make sure that different dates belong to that one month and in the proc summary, have the results grouped by that one month)?
You create a new variable that either contains the last day of a given month
period = intnx('month',date,0,'e');
or the period as string
period = put(date,yymmn6.);
and use that as class variable in proc summary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
With all due respect to @Kurt_Bremser , I am not fond of his solution. I prefer to apply a format to the variable DATE. Such as:
format date yymm.;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I like to create a new variable, because it can also be used in a data step BY or in a SQL GROUP BY and yield the same groups. But that's just me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What you are asking for is the weighted sum of value1, weighted by value2 (or value2 weighted by value1) classified by date. PROC SUMMARY can do this, without doing a multiplication step;
proc summary data=have;
class date;
var value1;
weight value2;
output out=want sum=value1_time_value2;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would add nway, so we only get the group sums:
proc summary data=have nway;
class date;
var value1;
weight value2;
output
out=want (drop=_type_ _freq_)
sum=value1_times_2
;
run;