I have two large datasets. Dataset 1 with returns. Dataset 2 with 0 or 1. I need to have an output where I calculate the average of the returns by row.
A mini example could be:
N=2
DATASET 1 DATASET 2 Average (Output)
Var1 Var2 Var3 Var1 Var2 Var3
1 0.5 0.2 0.1 1 1 0 1 (0.5+0.1)/N
2 0.1 0.2 0.0 2 1 1 0 (0.1+0.2)/N
3 0.2 0.5 0.3 3 0 1 1 (0.5+0.3)/N
How do I do this?
An ID variable tells SAS which rows to match from A with rows from B.
1. Merge datasets by ID variable - you can't have the same variable names in each so you'll need to rename them.
2. Declare Arrays for each
3. Mulitple/Average
It would have been better if you'd explicitly written out your formula as
0.5*1 + 0.2*0 + 0.1*1/N
What is N by the way, 2 or 3?
data want;
merge a b(rename=var1-var3=ind1-ind3);
by ID;
array table1(*) var1-var3;
array table2(*) ind1-ind3;
total=0;
n=0;
do i=1 to dim(table1);
total=table1(i)*table2(*) + total;
n=n+1;
end;
average=total/n;
run;
An ID variable tells SAS which rows to match from A with rows from B.
1. Merge datasets by ID variable - you can't have the same variable names in each so you'll need to rename them.
2. Declare Arrays for each
3. Mulitple/Average
It would have been better if you'd explicitly written out your formula as
0.5*1 + 0.2*0 + 0.1*1/N
What is N by the way, 2 or 3?
data want;
merge a b(rename=var1-var3=ind1-ind3);
by ID;
array table1(*) var1-var3;
array table2(*) ind1-ind3;
total=0;
n=0;
do i=1 to dim(table1);
total=table1(i)*table2(*) + total;
n=n+1;
end;
average=total/n;
run;
OT
I provide a SAS/IML, R and WPS solution for moderate size datasets.
For huge datasets the array method may be more appropriate.
see
@rogerjdeangelis You have to log in for that link, is there a link that doesn't require a log in?
EDIT:
I'm not very knowledgable about performance, but I think IML would be faster than arrays.
Here is an IML solution (it uses elementwise multiplication).
data dataset1;
input var1 var2 var3;
datalines;
.5 .2 .1
.1 .2 0
.2 .5 .3
;
data dataset2;
input var1 var2 var3;
datalines;
1 0 1
1 1 0
0 1 1
;
proc iml;
use dataset1;
read all into ds1;
n = nrow(ds1);
use dataset2;
read all into ds2;
elementProd = ds1#ds2;
want = elementProd[,+] / n ;
print want;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.