Before I could release the code, I had to sterilize it, the names should be consistent representations of the original variables and each are surrounded by "<>" to denote that they are the sterilized names.
The code is pretty simple, but it does our processing very fast:
proc sql;
create table <localFlagSource> (replace=yes) as
select <elementType>, <flagName> from <serverDataSource1> order by <elementType>;
Create table <localFormulaSource> (replace=yes) as
select <formulaType>, <SASFormula> from <serverDataSource2> order by <formulaType>;
quit;
%let <elementTypeMacroVar>=;
%let <flagListMacroVar>=;
%let <formulaSourceMacroVar>=;
%let <formulaMacroVar>=;
proc sql NOPRINT;
select <elementType>, <flagName> into :<elementTypeMacroVar> separated by ' ', :<flagListMacroVar> separated by ' ' from <localFlagSource> order by <elementType>;
/* separated by nothing is correct for loading the <formulaMacroVar> to pack the formuals together--each is terminated by a semicolon */
select <formulaType>, <SASFormula> into :<formulaSourceMacroVar> separated by ' ', :<formulaMacroVar> separated by '' from <localFormulaSource> order by <formulaType>;
quit;
%let <formulaMacroVar>=%SUPERQ(<formulaMacroVar>);
%putTime;
data localSAS.test (replace=yes);
<calculatedField>=0;
array flag[*] &<flagListMacroVar>.;
retain &<flagListMacroVar>.;
set localSAS.<localSource> /*(obs=100000000)*/ end=eof;
by <uniqueKey>;
if first.<uniqueKey> then do; /* blank our in memory array for each new grouping */
do i=1 to Dim(flag);
flag[i]=0;
end;
end;
flag[findw("&<elementTypeMacroVar>.", strip(put(<elementType>, best.)), ' ', 'E')]=1; /* reconstitue our data array in memory only */
if last.<uniqueKey> then do;
&<formulaMacroVar>.; /* entire list of formulas to process by each true clause contains and output */
end;
keep <uniqueKey> <calculatedField>;
run;
The few lines of the datastep are where we do the processing, it may be a little misleading: there are approximately 400 formulas that load into formulaMacroVar and are expanded into the code for in memory processing through each loop. Our data array is only reconstituted in memory by using the macro variable that contains a list of the variable names and elements are set appropriately in the lookup to allow the formulas to work--each formula looks for a 1 in elements used for calculation purposes. The list is cleared at the beginning of each data group in the source file of the datastep.
The processing is done locally on my workstation:
Processor Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz, 2394 Mhz, 10 Core(s), 20 Logical Processor(s)
and processes through our source in the datastep at the rate of about 12.7 Million observations per minute.
... View more