Excerpts from above:
%let M1=1;
%let M2=2;
%let M3=3;
%let M4=4;
%let M5=5;
%let vars = a b c d e;
This is what PROC SCORE can do. This examples uses the data macro vars from Alpay's example.
data coef;
if 0 then set new(keep=&vars);
array _n &vars;
length _TYPE_ _NAME_ $8;
retain _TYPE_ 'SCORE' _NAME_ 'RESULT';
do i = 1 to dim(_n);
_n = symgetN(cats('M',i));
end;
output;
stop;
drop i;
run;
So, reviewing the data coef datastep: 1) No records are read from set new. It's only used to set the PDV variable attributes for &vars (a b c d e). 2) Create an array for a b c d e. 3) Create additional variables _TYPE_='SCORE' and _NAME_='RESULT' 4) Loop over the 5 element array, setting a b c d e to &m1, &m2, ..., &m5 respectively. 5) Output the record 6) Stop the data step (actually not needed but makes the code crystal clear). So, there is only one loop through the data step, and symgetn is called 5 times. No records are read from an input dataset. I gotta agree with Data: where do you think the poor performance would lie? Your statement that symgetn is slow is in fact correct, in principle, but not in this instance due to the construction of the data step. On the other hand, your code:
data _null_;
set sashelp.vcolumn(keep=libname name memname type where=(libname='SASHELP' and memname='CLASS' and type='num')) end=last;
length list $ 4000;
retain list;
list=catx(',',list,cats(name,'*','&m ',_n_));
if last then call symputx('list',list);
run;
will need to process *every column* from *every dataset* in *every allocated library*! And if some of those libraries are RDBMS libraries it will perform even more poorly. Even though you're applying a where clause, the sashelp view will not use indexes, and in general will perform badly. Honestly, allocate 20 libraries or so, with say 100 datasets per library, with say an average of 20 variables per dataset, and see how sashelp.vcolumn will perform. If you're going to use your approach, at least consider using PROC CONTENTS against your desired dataset, i.e. proc contents data=sashelp.class out=columns, then use your approach. But of the two approaches shown, yours and Data's, I believe Data's will perform much faster. Regards, Scott
... View more