I have a data of 200,000 firm-year observations with s1-s1000,1000 numeric variables (or, you can perceive them as 1-by-1000 vector for each firm-year observation). I would like to calculate the sum of each firm-year observation's 1-by-1000 vector's inner product with all other firm-year observations' 1-by-1000 vectors matching on the same year. Since I find it is difficult to directly perform vector calculations in SAS for my case, I perform the simple multiplications by the following sas code: %macro process; %do i=1980 %to 2017; * I intend to avoid over-capacity of SAS by merging observations within each year; data t1;set x6;if year=&i;run; data t2;set x6_copy;if year=&i;run; * x6_copy.sas is the copy of x6.sas, except that I rename s1-s1000 to be ss1-ss1000 for merging; proc sql;create table t3 as select a.*, b.* from t1 a left join t2 b on a.firm^=b.firm;quit; * merge each firm-year with all other firm-years to add ss1-ss1000; data t3;set t3;array x {1000} s1-s1000;array y {1000} ss1-ss1000; d=0;do i=1 to 1000;d=d+x{i}*y{i};end;drop i;run; * mechanical calculation of inner product; proc sort data=t3;by firm year;run; proc means data=t3 noprint;var d;by firm year;output out=folder.x&i sum=d;run; * sum up across each firm-year; proc datasets lib=work noprint;delete t1 t2 t3;quit; %end; %mend; %process(); In practice, to avoid "insufficient space" problem, I even further divide 1000 variables into "300, 300, 300, 100" 4 groups to reduce the one-time workload of SAS. But it still takes my three computers at least two days to run the code. So I really appreciate it if anyone could let me know how I can improve my code to save some time. Thank you very much in advance!
... View more