BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jeremy4
Quartz | Level 8

Hi,

 

I have a dataset (called "ac.test") that has three million records and there are three variables: DATE, VALUE_1 and VALUE_2, all of which are numeric variables. Is there any code that could help to efficiently create a new variable "Value_1_times_2" by calculating VALUE_1 multiplied by VALUE_2 for all records for each month (the screenshot below only part of the dataset, with dates from June 2017-January 2019 in the complete dataset), then summing up all of the individual accounts for that month using proc summary?

 

Steps:

1. Create new variable, "Value_1_times_2". Value_1_times_2 = Value_1*Value_2.

2. Sum up all of "Value_1_times_2" for that month e.g. in the screenshot below, proc summary would show that the sum of all "Value_1_times_2" for August 2018 (note that all records for each month only have the end date of the month) i.e. 46.

3. Repeat the process/loop for all months in the dataset in order to show that the sum of "Value_1_times_2" in August 2018 is 46, 94 in September 2018, 53 in October 2018, 79 in November 2018 and 77 in December 2018 etc.?

 

Example:

SAS multiply two values 2.JPG

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
data have;
    set ac.test;
    value_1_times_2 = value1*value2;
run;
proc summary data=have nway;
    class month;
    var value_1_times_2;
    output out=_sum_ sum=;
run;

No looping needed. This assumes that, as shown in your sample data set, the month values are always the last day of the month. Otherwise, if there are other days in the data set that are not the last day of the month, a simple modification of the code will fix that to get monthly results.

--
Paige Miller

View solution in original post

8 REPLIES 8
ed_sas_member
Meteorite | Level 14

Hi @jeremy4 

 

Does the following code meet your expectations?

 

data test;
	input Date:date9. Value_1 Value_2;
	format Date date9.;
	datalines;
30AUG2018 3 8
30AUG2018 2 7
30AUG2018 4 2
30SEP2018 8 8
30SEP2018 3 3
30SEP2018 7 3
30OCT2018 4 9
30OCT2018 1 5
30OCT2018 2 6
30NOV2018 5 10
30NOV2018 2 4
30NOV2018 3 7
31DEC2018 2 8
31DEC2018 5 9
31DEC2018 8 2
;
run;

data want;
	set test;
	by Date;
	Value_1_times_2 = Value_1 * Value_2;
run;

proc means data=want noprint;
	var Value_1_times_2;
	class Date;
	ways 1;
	output out=want_stat (drop=_:) sum=sum_Value_1_times_2;
run;

Capture d’écran 2020-02-27 à 14.23.59.png

 

Capture d’écran 2020-02-27 à 14.24.11.png

 

PaigeMiller
Diamond | Level 26
data have;
    set ac.test;
    value_1_times_2 = value1*value2;
run;
proc summary data=have nway;
    class month;
    var value_1_times_2;
    output out=_sum_ sum=;
run;

No looping needed. This assumes that, as shown in your sample data set, the month values are always the last day of the month. Otherwise, if there are other days in the data set that are not the last day of the month, a simple modification of the code will fix that to get monthly results.

--
Paige Miller
jeremy4
Quartz | Level 8

Thanks for your help! Just as a reference, please can you tell me how the code would change if the dates were not all from the end of the month (i.e. the best way make sure that different dates belong to that one month and in the proc summary, have the results grouped by that one month)?

Kurt_Bremser
Super User

@jeremy4 wrote:

Thanks for your help! Just as a reference, please can you tell me how the code would change if the dates were not all from the end of the month (i.e. the best way make sure that different dates belong to that one month and in the proc summary, have the results grouped by that one month)?


You create a new variable that either contains the last day of a given month

period = intnx('month',date,0,'e');

or the period as string

period = put(date,yymmn6.);

and use that as class variable in proc summary.

PaigeMiller
Diamond | Level 26

With all due respect to @Kurt_Bremser , I am not fond of his solution. I prefer to apply a format to the variable DATE. Such as: 

 

format date yymm.;
--
Paige Miller
mkeintz
PROC Star

What  you are asking for is the weighted sum of value1, weighted by value2  (or value2 weighted by value1) classified  by date.  PROC SUMMARY can do this, without doing a multiplication step;

 

proc summary  data=have;
  class date;
  var value1;
  weight value2;
  output out=want sum=value1_time_value2;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

I would add nway, so we only get the group sums:

proc summary data=have nway;
class date;
var value1;
weight value2;
output
  out=want (drop=_type_ _freq_)
  sum=value1_times_2
;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 7985 views
  • 5 likes
  • 5 in conversation