I have two large datasets. Dataset 1 with returns. Dataset 2 with 0 or 1. I need to have an output where I calculate the average of the returns by row.
A mini example could be:
N=2
DATASET 1 DATASET 2 Average (Output)
Var1 Var2 Var3 Var1 Var2 Var3
1 0.5 0.2 0.1 1 1 0 1 (0.5+0.1)/N
2 0.1 0.2 0.0 2 1 1 0 (0.1+0.2)/N
3 0.2 0.5 0.3 3 0 1 1 (0.5+0.3)/N
How do I do this?
An ID variable tells SAS which rows to match from A with rows from B.
1. Merge datasets by ID variable - you can't have the same variable names in each so you'll need to rename them.
2. Declare Arrays for each
3. Mulitple/Average
It would have been better if you'd explicitly written out your formula as
0.5*1 + 0.2*0 + 0.1*1/N
What is N by the way, 2 or 3?
data want;
merge a b(rename=var1-var3=ind1-ind3);
by ID;
array table1(*) var1-var3;
array table2(*) ind1-ind3;
total=0;
n=0;
do i=1 to dim(table1);
total=table1(i)*table2(*) + total;
n=n+1;
end;
average=total/n;
run;
An ID variable tells SAS which rows to match from A with rows from B.
1. Merge datasets by ID variable - you can't have the same variable names in each so you'll need to rename them.
2. Declare Arrays for each
3. Mulitple/Average
It would have been better if you'd explicitly written out your formula as
0.5*1 + 0.2*0 + 0.1*1/N
What is N by the way, 2 or 3?
data want;
merge a b(rename=var1-var3=ind1-ind3);
by ID;
array table1(*) var1-var3;
array table2(*) ind1-ind3;
total=0;
n=0;
do i=1 to dim(table1);
total=table1(i)*table2(*) + total;
n=n+1;
end;
average=total/n;
run;
OT
I provide a SAS/IML, R and WPS solution for moderate size datasets.
For huge datasets the array method may be more appropriate.
see
@rogerjdeangelis You have to log in for that link, is there a link that doesn't require a log in?
EDIT:
I'm not very knowledgable about performance, but I think IML would be faster than arrays.
Here is an IML solution (it uses elementwise multiplication).
data dataset1;
input var1 var2 var3;
datalines;
.5 .2 .1
.1 .2 0
.2 .5 .3
;
data dataset2;
input var1 var2 var3;
datalines;
1 0 1
1 1 0
0 1 1
;
proc iml;
use dataset1;
read all into ds1;
n = nrow(ds1);
use dataset2;
read all into ds2;
elementProd = ds1#ds2;
want = elementProd[,+] / n ;
print want;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.