What I need to do, boils down (more or less) to joining two huge datasets A and B. The first approach is of course: proc sql;
create table AB as select * from A join B on 1;
quit; But this is infeasible due to memory restrictions. The obvious alternative is to loop, for each observation of A, through each observation of B. My current naive approach (ab)uses the POINT option of the SET statement: data AB;
set A;
do point_B = 1 to nobs_B;
set B point = point_B nobs = nobs_B;
output;
end;
run; This works - but is incredibly slow. Certainly, this can be remedied by replacing the random access of POINT by sequential access. Unfortunately, I was not able to figure out how this can be achieved. So any suggestions are greatly appreciated. For instance, what would be nice to have is some option RESET (similar to KEYRESET) making the following code work after uncommenting reset = end_A : data AB;
do while(not end_A);
set A end = end_A;
do while(not end_B);
set B end = end_B /* reset = end_A */;
output;
end;
end;
run; (If there is no other solution, I might have to split A into multiple smaller datasets first...)
... View more