DATA Step, Macro, Functions and more

How to multiply two dataset and calculate an average?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

How to multiply two dataset and calculate an average?

I have two large datasets. Dataset 1 with returns. Dataset 2 with 0 or 1. I need to have an output where I calculate the average of the returns by row.

 

A mini example could be:

 

N=2

 

DATASET 1                                             DATASET 2                                               Average (Output)

        Var1       Var2       Var3                                   Var1       Var2       Var3

1       0.5         0.2          0.1                          1          1             0            1                       (0.5+0.1)/N

2       0.1         0.2          0.0                          2          1             1            0                       (0.1+0.2)/N

3       0.2         0.5          0.3                          3          0             1            1                       (0.5+0.3)/N

 

How do I do this?


Accepted Solutions
Solution
‎07-24-2016 05:20 PM
Super User
Posts: 17,837

Re: How to multiply two dataset and calculate an average?

An ID variable tells SAS which rows to match from A with rows from B.

 

1. Merge datasets by ID variable - you can't have the same variable names in each so you'll need to rename them.

2. Declare Arrays for each

3. Mulitple/Average

 

It would have been better if you'd explicitly written out your formula as

0.5*1 + 0.2*0 + 0.1*1/N

 

What is N by the way, 2 or 3?

 

data want;
merge a b(rename=var1-var3=ind1-ind3);
by ID;

array table1(*) var1-var3;
array table2(*) ind1-ind3;
total=0;
n=0; do i=1 to dim(table1); total=table1(i)*table2(*) + total; n=n+1; end; average=total/n; run;

View solution in original post


All Replies
Super User
Posts: 5,257

Re: How to multiply two dataset and calculate an average?

First join the two data sets. I can't see an id variable but I guess you left that our to male the example as simple as possible...
The You could use a two dimensional array to make your program flexible (if in the future more variables will be used). Of use hard coded variable name to the calculation sum(var1_ds1*var1_ds2, var2 etc) / N
Data never sleeps
Occasional Contributor
Posts: 6

Re: How to multiply two dataset and calculate an average?

I'm not sure what an id variable is.

Can you make an example of a dynamic code?
Solution
‎07-24-2016 05:20 PM
Super User
Posts: 17,837

Re: How to multiply two dataset and calculate an average?

An ID variable tells SAS which rows to match from A with rows from B.

 

1. Merge datasets by ID variable - you can't have the same variable names in each so you'll need to rename them.

2. Declare Arrays for each

3. Mulitple/Average

 

It would have been better if you'd explicitly written out your formula as

0.5*1 + 0.2*0 + 0.1*1/N

 

What is N by the way, 2 or 3?

 

data want;
merge a b(rename=var1-var3=ind1-ind3);
by ID;

array table1(*) var1-var3;
array table2(*) ind1-ind3;
total=0;
n=0; do i=1 to dim(table1); total=table1(i)*table2(*) + total; n=n+1; end; average=total/n; run;
Super Contributor
Posts: 254

Re: How to multiply two dataset and calculate an average?

What is your N for Large Datasets?
Valued Guide
Posts: 505

Re: How to multiply two dataset and calculate an average?

 OT

 

I provide a SAS/IML, R and WPS solution for moderate size datasets.

For huge datasets the array method may be more appropriate.

 

see

https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;5d05f7f.1607d

Super User
Posts: 17,837

Re: How to multiply two dataset and calculate an average?

[ Edited ]

@rogerjdeangelis You have to log in for that link, is there a link that doesn't require a log in?

 

EDIT:

I'm not very knowledgable about performance, but I think IML would be faster than arrays.

 

 

SAS Employee
Posts: 106

Re: How to multiply two dataset and calculate an average?

Here is an IML solution (it uses elementwise multiplication).

 

 


data dataset1;
input var1 var2 var3; 
datalines;
.5 .2 .1
.1 .2 0
.2 .5 .3
;



data dataset2;
input var1 var2 var3; 
datalines;
1 0 1
1 1 0 
0 1 1 
;



proc iml;


	use dataset1; 
	read all into ds1; 

	n = nrow(ds1);

	use dataset2; 
	read all into ds2; 

	elementProd = ds1#ds2;

	want = elementProd[,+] / n ; 

	print want; 


quit;

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 356 views
  • 2 likes
  • 6 in conversation