Contributor
Posts: 32

# pairwise product of variables and its average

Greetings SAS users.

Can someone help me to generate pairwise product of multiple variables ( more than 40 variables..)

and calculate average pairwise overlap?

Below is very very small part of my original dataset. with 4 dummy variables

 family stock port1 port2 port3 port4 1 1 1 0 1 . 1 2 1 1 0 . 1 3 1 0 1 . 1 4 1 1 1 . 1 5 0 1 1 . 1 6 1 1 1 . 1 7 0 1 0 . 1 9 0 1 0 . 1 11 1 0 0 . 1 12 1 1 1 . 1 13 1 1 1 . 1 15 1 1 0 . 1 16 0 1 0 . 1 17 1 1 0 . 1 18 1 1 0 . 1 19 0 1 0 . 1 21 1 1 0 . 1 23 1 0 0 . 1 24 1 0 0 . 2 1 1 1 1 1 2 2 1 1 0 1 2 3 1 1 0 1 2 4 1 1 0 1 2 5 1 1 0 1 2 6 1 0 0 1 2 7 1 1 1 0 2 8 1 0 0 0 2 9 0 1 1 0 2 11 0 1 1 0 2 12 0 0 1 0 2 27 0 0 1 1 2 28 0 0 1 1 2 29 0 0 1 1

What I want to do is to generate,

port1_2 which is port1*port2

port1_3 which is port1*port3

port2_1 which is port2*port1 (this is same with port1_2, but I need separate variable with different name)

port2_3 which is port2*port3

port3_1 which is port3*port1

port3_2 which is port3*port2

(There is no port4 for family=1)

port4_1          = port4*port1

port4_2          = port4*port2

port4_3          = port4*port3

After generating those pairwise products of dummy variables, I want to calculate average sum of ones of port1, port2, port3, port4

For example,

avg_port1 = average ( sum of ones in port1_2, sum of ones in port1_3, sum of ones in port1_4)  FOR EACH FAMILY 1,2

* If you have missing column, just ignore it.

->There is no port4 for family=1 so that for family=1 avg_port1 is just equal to average(sum of ones in port1_2, sum of ones in port1_3)

avg_port2= average ( sum of ones in port2_1, sum of ones in port2_3, sum of ones in port2_4)  FOR EACH FAMILY 1,2

avg_port3= average ( sum of ones in port3_1, sum of ones in port3_2, sum of ones in port3_4)  FOR EACH FAMILY 1,2

avg_port4= average ( sum of ones in port4_1, sum of ones in port4_2, sum of ones in port4_3)  FOR EACH FAMILY 1,2

Hence my final dataset looks like below.

Actually, I just need avg_port1, avg_port2, avg_port3, avg_port4family, stock nothing more.

 family stock port1 port2 port3 port4 port1_2 port1_3 port1_4 port2_1 port2_3 port2_4 port3_1 port3_2 port3_4 port4_1 port4_2 port4_3 avg_port1 avg_port2 avg_port3 avg_port4 1 1 1 0 1 0 1 0 0 1 0 5 4.666667 3.666667 1 2 1 1 0 1 0 1 0 0 0 5 4.666667 3.666667 1 3 1 0 1 0 1 0 0 1 0 5 4.666667 3.666667 1 4 1 1 1 1 1 1 1 1 1 5 4.666667 3.666667 1 5 0 1 1 0 0 0 1 0 1 5 4.666667 3.666667 1 6 1 1 1 1 1 1 1 1 1 5 4.666667 3.666667 1 7 0 1 0 0 0 0 0 0 0 5 4.666667 3.666667 1 8 0 0 0 0 0 0 0 0 0 5 4.666667 3.666667 1 9 0 1 0 0 0 0 0 0 0 5 4.666667 3.666667 1 10 0 0 0 0 0 0 0 0 0 5 4.666667 3.666667 1 11 1 0 0 0 0 0 0 0 0 5 4.666667 3.666667 1 12 1 1 1 1 1 1 1 1 1 5 4.666667 3.666667 1 13 1 1 1 1 1 1 1 1 1 5 4.666667 3.666667 1 14 0 0 0 0 0 0 0 0 0 5 4.666667 3.666667 1 15 1 1 0 1 0 1 0 0 0 5 4.666667 3.666667 1 16 0 1 0 0 0 0 0 0 0 5 4.666667 3.666667 1 17 1 1 0 1 0 1 0 0 0 5 4.666667 3.666667 1 18 1 1 0 1 0 1 0 0 0 5 4.666667 3.666667 1 19 0 1 0 0 0 0 0 0 0 5 4.666667 3.666667 1 20 0 0 0 0 0 0 0 0 0 5 4.666667 3.666667 1 21 1 1 0 1 0 1 0 0 0 5 4.666667 3.666667 1 22 0 0 0 0 0 0 0 0 0 5 4.666667 3.666667 1 23 1 0 0 0 0 0 0 0 0 5 4.666667 3.666667 1 24 1 0 0 0 0 0 0 0 0 5 4.666667 3.666667 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4.666667 7.333333 4.666667 6 2 2 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 4.666667 7.333333 4.666667 6 2 3 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 4.666667 7.333333 4.666667 6 2 4 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 4.666667 7.333333 4.666667 6 2 5 1 1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 4.666667 7.333333 4.666667 6 2 6 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 4.666667 7.333333 4.666667 6 2 7 1 1 1 0 1 1 0 1 1 0 1 1 0 0 0 0 4.666667 7.333333 4.666667 6 2 8 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.666667 7.333333 4.666667 6 2 9 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 4.666667 7.333333 4.666667 6 2 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.666667 7.333333 4.666667 6 2 11 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 4.666667 7.333333 4.666667 6 2 12 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 0 4.666667 7.333333 4.666667 6 2 27 0 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 4.666667 7.333333 4.666667 6 2 28 0 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 4.666667 7.333333 4.666667 6 2 29 0 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 4.666667 7.333333 4.666667 6

How do I get the results EFFICIENTLY? My dataset is so large that there could be over 40 columns (e.g. port1, port2, port3, ..... port44) so that columns can be extended to 44*43 something else.... So I desperately need EFFICIENCY.

Super User
Posts: 10,850

## Re: pairwise product of variables and its average

``````data have ;
infile cards expandtabs ;
input family	stock	port1	port2	port3;
cards;
1	1	1	0	1	.
1	2	1	1	0	.
1	3	1	0	1	.
1	4	1	1	1	.
1	5	0	1	1
;
run;
data temp;
do i=1 to 3;
do j=1 to 3;
if i ne j then do;name=cats('_port',i,'_',j);output;end;
end;
end;
run;
proc sql noprint;
select name into : name separated by ' ' from temp;
quit;

data want;
set have;
array x{*} &name;
array y{*} port1-port3;
n=0;
do i=1 to dim(y);
do j=1 to dim(y);
if i ne j then do;n+1;x{n}=y{i}*y{j};end;
end;
end;
run;
``````
Contributor
Posts: 36

## Re: pairwise product of variables and its average

[ Edited ]

Here is a SQL based solution . Just change dataset name and library name as indicated.

``````proc sql;
SELECT catx( "   " ,catt(A,'*',B ,'  AS  ' ), Varlist)  into : myvar separated by ', '
FROM (
SELECT  a.Vlist as A,
b.Vlist as B,
cats(a.Vlist,'_',b.Vlist) as Varlist
FROM(	select name as vlist
from dictionary.columns
where memname = upcase("have") /*DATASET NAME*/
and libname =upcase("Work")  /*LiBRARY NAME*/
and name like 'p%' ) a
CROSS JOIN
(	select name as vlist
from dictionary.columns
where memname = upcase("have") /*DATASET NAME*/
and libname =upcase("Work") /*LiBRARY NAME*/
and name like 'p%' ) b
WHERE a.vlist ne b.vlist
)
;quit;

PROC SQL;
Create table want as
SELECT family,stock,port1,port2,port3,port4,&myvar
from have;
Run;``````

Discussion stats
• 2 replies
• 107 views
• 0 likes
• 3 in conversation