DATA Step, Macro, Functions and more

pairwise product of variables and its average

Reply
Contributor
Posts: 32

pairwise product of variables and its average

Greetings SAS users.

Can someone help me to generate pairwise product of multiple variables ( more than 40 variables..)

and calculate average pairwise overlap?

 

 

 

 

Below is very very small part of my original dataset. with 4 dummy variables

 

familystockport1port2port3port4
11101.
12110.
13101.
14111.
15011.
16111.
17010.
19010.
111100.
112111.
113111.
115110.
116010.
117110.
118110.
119010.
121110.
123100.
124100.
211111
221101
231101
241101
251101
261001
271110
281000
290110
2110110
2120010
2270011
2280011
2290011

 

 

 

 

 

 

What I want to do is to generate,

                                           port1_2 which is port1*port2

                                           port1_3 which is port1*port3

                                           port2_1 which is port2*port1 (this is same with port1_2, but I need separate variable with different name)

                                           port2_3 which is port2*port3

                                           port3_1 which is port3*port1

                                           port3_2 which is port3*port2

                                           (There is no port4 for family=1)

                                           port4_1          = port4*port1

                                           port4_2          = port4*port2

                                           port4_3          = port4*port3

 

After generating those pairwise products of dummy variables, I want to calculate average sum of ones of port1, port2, port3, port4

 

For example,

 

                     avg_port1 = average ( sum of ones in port1_2, sum of ones in port1_3, sum of ones in port1_4)  FOR EACH FAMILY 1,2

* If you have missing column, just ignore it.         

->There is no port4 for family=1 so that for family=1 avg_port1 is just equal to average(sum of ones in port1_2, sum of ones in port1_3)

                     

 

                   avg_port2= average ( sum of ones in port2_1, sum of ones in port2_3, sum of ones in port2_4)  FOR EACH FAMILY 1,2

                   avg_port3= average ( sum of ones in port3_1, sum of ones in port3_2, sum of ones in port3_4)  FOR EACH FAMILY 1,2

                   avg_port4= average ( sum of ones in port4_1, sum of ones in port4_2, sum of ones in port4_3)  FOR EACH FAMILY 1,2

                    

 

Hence my final dataset looks like below.

Actually, I just need avg_port1, avg_port2, avg_port3, avg_port4family, stock nothing more.

 

familystockport1port2port3port4port1_2port1_3port1_4port2_1port2_3port2_4port3_1port3_2port3_4port4_1port4_2port4_3avg_port1avg_port2avg_port3avg_port4
11101 01 00 10    54.6666673.666667 
12110 10 10 00    54.6666673.666667 
13101 01 00 10    54.6666673.666667 
14111 11 11 11    54.6666673.666667 
15011 00 01 01    54.6666673.666667 
16111 11 11 11    54.6666673.666667 
17010 00 00 00    54.6666673.666667 
18000 00 00 00    54.6666673.666667 
19010 00 00 00    54.6666673.666667 
110000 00 00 00    54.6666673.666667 
111100 00 00 00    54.6666673.666667 
112111 11 11 11    54.6666673.666667 
113111 11 11 11    54.6666673.666667 
114000 00 00 00    54.6666673.666667 
115110 10 10 00    54.6666673.666667 
116010 00 00 00    54.6666673.666667 
117110 10 10 00    54.6666673.666667 
118110 10 10 00    54.6666673.666667 
119010 00 00 00    54.6666673.666667 
120000 00 00 00    54.6666673.666667 
121110 10 10 00    54.6666673.666667 
122000 00 00 00    54.6666673.666667 
123100 00 00 00    54.6666673.666667 
124100 00 00 00    54.6666673.666667 
2111111111111111114.6666677.3333334.6666676
2211011011010001104.6666677.3333334.6666676
2311011011010001104.6666677.3333334.6666676
2411011011010001104.6666677.3333334.6666676
2511011011010001104.6666677.3333334.6666676
2610010010000001004.6666677.3333334.6666676
2711101101101100004.6666677.3333334.6666676
2810000000000000004.6666677.3333334.6666676
2901100000100100004.6666677.3333334.6666676
21000000000000000004.6666677.3333334.6666676
21101100000100100004.6666677.3333334.6666676
21201100000100100004.6666677.3333334.6666676
22701110000110110114.6666677.3333334.6666676
22801110000110110114.6666677.3333334.6666676
22901110000110110114.6666677.3333334.6666676

 

 

 

How do I get the results EFFICIENTLY? My dataset is so large that there could be over 40 columns (e.g. port1, port2, port3, ..... port44) so that columns can be extended to 44*43 something else.... So I desperately need EFFICIENCY.

 

Please help me.

Super User
Posts: 10,850

Re: pairwise product of variables and its average

data have ;
infile cards expandtabs ;
input family	stock	port1	port2	port3;
cards;
1	1	1	0	1	.
1	2	1	1	0	.
1	3	1	0	1	.
1	4	1	1	1	.
1	5	0	1	1
;
run;
data temp;
 do i=1 to 3;
  do j=1 to 3;
   if i ne j then do;name=cats('_port',i,'_',j);output;end;
  end;
 end;
run;
proc sql noprint;
select name into : name separated by ' ' from temp;
quit;

data want;
 set have;
 array x{*} &name;
 array y{*} port1-port3;
 n=0;
 do i=1 to dim(y);
  do j=1 to dim(y);
   if i ne j then do;n+1;x{n}=y{i}*y{j};end;
  end;
 end;
run;
Contributor
Posts: 36

Re: pairwise product of variables and its average

[ Edited ]

Here is a SQL based solution . Just change dataset name and library name as indicated.

 

proc sql;
SELECT catx( "   " ,catt(A,'*',B ,'  AS  ' ), Varlist)  into : myvar separated by ', '
FROM (
		SELECT  a.Vlist as A,
				b.Vlist as B,
				cats(a.Vlist,'_',b.Vlist) as Varlist
		FROM(	select name as vlist 
		        from dictionary.columns
		        where memname = upcase("have") /*DATASET NAME*/
				and libname =upcase("Work")  /*LiBRARY NAME*/
				and name like 'p%' ) a
		CROSS JOIN 
			(	select name as vlist 
		        from dictionary.columns
		        where memname = upcase("have") /*DATASET NAME*/
				and libname =upcase("Work") /*LiBRARY NAME*/
				and name like 'p%' ) b
		WHERE a.vlist ne b.vlist
	)
;quit;

PROC SQL;
Create table want as
SELECT family,stock,port1,port2,port3,port4,&myvar
from have;
Run;

 

Ask a Question
Discussion stats
  • 2 replies
  • 107 views
  • 0 likes
  • 3 in conversation