Hello.
I have a gradient boosting tree where I want to get a list of all the variables used in the model. I currently have "IMPORTANCE NVARS=40",but is there a way to set this to ALL or MAX, so that the same code will work on any dataset? Or do I need to sum up the number of variables I have in the data set and put that into a macro variable?
Here is the code:
PROC TREEBOOST data=&abt._ CATEGORICALBINS=&CATEGORICALBINS. INTERVALBINS=&INTERVALBINS. EXHAUSTIVE=&EXHAUSTIVE. INTERVALDECIMALS=&INTERVALDECIMALS. LEAFSIZE=&LEAFSIZE. MAXBRANCHES=&MAXBRANCHES. ITERATIONS=&ITERATIONS. MINCATSIZE=&MINCATSIZE. MISSING=&MISSING. LEAFFRACTION=&LEAFFRACTION. SEED=12345 SHRINKAGE=&SHRINKAGE. SPLITSIZE=&SPLITSIZE. TRAINPROPORTION=&TRAINPROPORTION. ; INPUT &charvars. &flgvars. /level= nominal ; INPUT &numvars. /level= interval ; TARGET target_flg /level=binary ; IMPORTANCE NVARS=40 OUTFIT=imp_VARS out = imp20 ; SUBSERIES BEST ; CODE FILE="&save_path.\&save_file." ; SAVE MODEL=GBoost_Test FIT=fit IMPORTANCE=imp RULES=rules; RUN;
Doesn't look like there is an option like "ALL" for Nvars, but you could either set it to a very high value or like you said, use a macro variable to get the number of inputs, something like:
%let nclass = %sysfunc(countw(&charvars. &flgvars.));
%let ninterval = %sysfunc(countw(&numvars.));
%let totnvars = %eval(&nclass + &ninterval);
proc treeboost...
importance nvars=&totnvars...;
...
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.