Hello.
I have a gradient boosting tree where I want to get a list of all the variables used in the model. I currently have "IMPORTANCE NVARS=40",
but is there a way to set this to ALL or MAX, so that the same code will work on any dataset? Or do I need to sum up the number of variables I have in the data set and put that into a macro variable?
Here is the code:
PROC TREEBOOST data=&abt._ CATEGORICALBINS=&CATEGORICALBINS. INTERVALBINS=&INTERVALBINS. EXHAUSTIVE=&EXHAUSTIVE. INTERVALDECIMALS=&INTERVALDECIMALS. LEAFSIZE=&LEAFSIZE. MAXBRANCHES=&MAXBRANCHES. ITERATIONS=&ITERATIONS. MINCATSIZE=&MINCATSIZE. MISSING=&MISSING. LEAFFRACTION=&LEAFFRACTION. SEED=12345 SHRINKAGE=&SHRINKAGE. SPLITSIZE=&SPLITSIZE. TRAINPROPORTION=&TRAINPROPORTION. ; INPUT &charvars. &flgvars. /level= nominal ; INPUT &numvars. /level= interval ; TARGET target_flg /level=binary ; IMPORTANCE NVARS=40 OUTFIT=imp_VARS out = imp20 ; SUBSERIES BEST ; CODE FILE="&save_path.\&save_file." ; SAVE MODEL=GBoost_Test FIT=fit IMPORTANCE=imp RULES=rules; RUN;
Doesn't look like there is an option like "ALL" for Nvars, but you could either set it to a very high value or like you said, use a macro variable to get the number of inputs, something like:
%let nclass = %sysfunc(countw(&charvars. &flgvars.));
%let ninterval = %sysfunc(countw(&numvars.));
%let totnvars = %eval(&nclass + &ninterval);
proc treeboost...
importance nvars=&totnvars...;
...
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.