Greetings SAS users.
Can someone help me to generate pairwise product of multiple variables ( more than 40 variables..)
and calculate average pairwise overlap?
Below is very very small part of my original dataset. with 4 dummy variables
family | stock | port1 | port2 | port3 | port4 |
1 | 1 | 1 | 0 | 1 | . |
1 | 2 | 1 | 1 | 0 | . |
1 | 3 | 1 | 0 | 1 | . |
1 | 4 | 1 | 1 | 1 | . |
1 | 5 | 0 | 1 | 1 | . |
1 | 6 | 1 | 1 | 1 | . |
1 | 7 | 0 | 1 | 0 | . |
1 | 9 | 0 | 1 | 0 | . |
1 | 11 | 1 | 0 | 0 | . |
1 | 12 | 1 | 1 | 1 | . |
1 | 13 | 1 | 1 | 1 | . |
1 | 15 | 1 | 1 | 0 | . |
1 | 16 | 0 | 1 | 0 | . |
1 | 17 | 1 | 1 | 0 | . |
1 | 18 | 1 | 1 | 0 | . |
1 | 19 | 0 | 1 | 0 | . |
1 | 21 | 1 | 1 | 0 | . |
1 | 23 | 1 | 0 | 0 | . |
1 | 24 | 1 | 0 | 0 | . |
2 | 1 | 1 | 1 | 1 | 1 |
2 | 2 | 1 | 1 | 0 | 1 |
2 | 3 | 1 | 1 | 0 | 1 |
2 | 4 | 1 | 1 | 0 | 1 |
2 | 5 | 1 | 1 | 0 | 1 |
2 | 6 | 1 | 0 | 0 | 1 |
2 | 7 | 1 | 1 | 1 | 0 |
2 | 8 | 1 | 0 | 0 | 0 |
2 | 9 | 0 | 1 | 1 | 0 |
2 | 11 | 0 | 1 | 1 | 0 |
2 | 12 | 0 | 0 | 1 | 0 |
2 | 27 | 0 | 0 | 1 | 1 |
2 | 28 | 0 | 0 | 1 | 1 |
2 | 29 | 0 | 0 | 1 | 1 |
What I want to do is to generate,
port1_2 which is port1*port2
port1_3 which is port1*port3
port2_1 which is port2*port1 (this is same with port1_2, but I need separate variable with different name)
port2_3 which is port2*port3
port3_1 which is port3*port1
port3_2 which is port3*port2
(There is no port4 for family=1)
port4_1 = port4*port1
port4_2 = port4*port2
port4_3 = port4*port3
After generating those pairwise products of dummy variables, I want to calculate average sum of ones of port1, port2, port3, port4
For example,
avg_port1 = average ( sum of ones in port1_2, sum of ones in port1_3, sum of ones in port1_4) FOR EACH FAMILY 1,2
* If you have missing column, just ignore it.
->There is no port4 for family=1 so that for family=1 avg_port1 is just equal to average(sum of ones in port1_2, sum of ones in port1_3)
avg_port2= average ( sum of ones in port2_1, sum of ones in port2_3, sum of ones in port2_4) FOR EACH FAMILY 1,2
avg_port3= average ( sum of ones in port3_1, sum of ones in port3_2, sum of ones in port3_4) FOR EACH FAMILY 1,2
avg_port4= average ( sum of ones in port4_1, sum of ones in port4_2, sum of ones in port4_3) FOR EACH FAMILY 1,2
Hence my final dataset looks like below.
Actually, I just need avg_port1, avg_port2, avg_port3, avg_port4, family, stock nothing more.
family | stock | port1 | port2 | port3 | port4 | port1_2 | port1_3 | port1_4 | port2_1 | port2_3 | port2_4 | port3_1 | port3_2 | port3_4 | port4_1 | port4_2 | port4_3 | avg_port1 | avg_port2 | avg_port3 | avg_port4 |
1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 2 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 3 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 5 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 6 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 7 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 9 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 12 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 13 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 15 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 16 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 17 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 18 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 19 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 21 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
1 | 24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 4.666667 | 3.666667 | ||||||||
2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 2 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 3 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 4 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 5 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 6 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 7 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 8 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 9 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 11 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 12 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 27 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 28 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 4.666667 | 7.333333 | 4.666667 | 6 |
2 | 29 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 4.666667 | 7.333333 | 4.666667 | 6 |
How do I get the results EFFICIENTLY? My dataset is so large that there could be over 40 columns (e.g. port1, port2, port3, ..... port44) so that columns can be extended to 44*43 something else.... So I desperately need EFFICIENCY.
Please help me.
data have ;
infile cards expandtabs ;
input family stock port1 port2 port3;
cards;
1 1 1 0 1 .
1 2 1 1 0 .
1 3 1 0 1 .
1 4 1 1 1 .
1 5 0 1 1
;
run;
data temp;
do i=1 to 3;
do j=1 to 3;
if i ne j then do;name=cats('_port',i,'_',j);output;end;
end;
end;
run;
proc sql noprint;
select name into : name separated by ' ' from temp;
quit;
data want;
set have;
array x{*} &name;
array y{*} port1-port3;
n=0;
do i=1 to dim(y);
do j=1 to dim(y);
if i ne j then do;n+1;x{n}=y{i}*y{j};end;
end;
end;
run;
Here is a SQL based solution . Just change dataset name and library name as indicated.
proc sql;
SELECT catx( " " ,catt(A,'*',B ,' AS ' ), Varlist) into : myvar separated by ', '
FROM (
SELECT a.Vlist as A,
b.Vlist as B,
cats(a.Vlist,'_',b.Vlist) as Varlist
FROM( select name as vlist
from dictionary.columns
where memname = upcase("have") /*DATASET NAME*/
and libname =upcase("Work") /*LiBRARY NAME*/
and name like 'p%' ) a
CROSS JOIN
( select name as vlist
from dictionary.columns
where memname = upcase("have") /*DATASET NAME*/
and libname =upcase("Work") /*LiBRARY NAME*/
and name like 'p%' ) b
WHERE a.vlist ne b.vlist
)
;quit;
PROC SQL;
Create table want as
SELECT family,stock,port1,port2,port3,port4,&myvar
from have;
Run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.