BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jeff_b
Fluorite | Level 6

Hello,

I am trying to obtain a variable importance ranking plot from the proc hpforest procedure in SAS.  Here is my code:

 

proc hpforest data=mydata maxtrees=1000 vars_to_try=10 seed=2021
trainfraction=0.7 maxdepth=50 leafsize=6 alpha=0.5 ;
target outcome /level=binary;
input variable_list  / level = interval;
ods output VariableImportance = Variable_Importance;
run;

 

Then, I am using PROC SPLOT to plot this output. However, I am not getting the plot I want. Somehow, the values are not in the ascending order. Also, I wanted the variable labels in the plots, instead of the variable names as it's hard to understand the plot.

 

proc sgplot data = Variable_Importance;

title "The Variable of Importance Plot";
series x=Gini y=Variable /lineattrs=(pattern=shortdash color=royalblue thickness=3) legendlabel='Train Gini' GROUPORDER=ASCENDING ;
xaxis label='Train Gini' display=ALL;
yaxis label='Variable' display=ALL;
run;

The plot I want but I also want labels instead of the variable names:

jeff_b_0-1615318672599.png

 

The plot I am getting:

jeff_b_1-1615318714238.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
jeff_b
Fluorite | Level 6

Hi @ballardw !

Thank you for the help.

 

With some modification, the code you provided worked, although I had to create another variable called "rank" from the variable of importance output and merge it back to that dataset. I used that "rank" variable to sort, since I didn't have the Gini variable in the dataset.

 

data ranking;
length Variable $100;
INPUT rank Variable $;
CARDS;
1 high_PA_baseline
2 early_weight_loss

....

;
RUN;

 

proc sort data=Variable_Importance; by Variable; run;
proc sort data=ranking; by Variable; run;
*Merge;
data plot;
length Variable $100;
merge Variable_Importance(in=x) ranking(in=y);
by Variable;
if x=y;
run;

 

proc sgplot data=plot;
scatter x=AAEOOB y=Variable ;
run;

 

proc sort data=plot out=plot2;
by AAEOOB descending rank;
run;

 

proc sgplot data=plot2;
scatter x=AAEOOB y=Variable ;
run;

View solution in original post

4 REPLIES 4
ballardw
Super User

Some example data used for the plot would be nice.

Then we wouldn't have to ask such things as

What variable holds the label text you want to display on the axis? (you may need to go back and add this)

If you have a variable with the text of the label for the variables use that as the axis variable. Otherwise your best approach would be to create a custom character format with the name of the variables as the start and the display value using the variable label to use with the Yaxis statement VALUESFORMAT option.

What is the SORT order of the data variables used for the x and coordinates? Since Variable is character there isn't a natural order to use in Series (or possibly Scatter) plots and the order of the x y pairs will need to be set by x and a different variable so that the text is the proper order.

 

The documentation on the GROUPORDER option for the Series statement is pretty clear:

Interactions This option is ignored unless GROUP= is specified.

Your existing code does not supply a GROUP variable:

proc sgplot data = Variable_Importance;

title "The Variable of Importance Plot";
series x=Gini y=Variable /lineattrs=(pattern=shortdash color=royalblue thickness=3) 
     legendlabel='Train Gini' 
    GROUPORDER=ASCENDING
 ;
xaxis label='Train Gini' display=ALL;
yaxis label='Variable' display=ALL;
run;

 

jeff_b
Fluorite | Level 6

Hi @ballardw ,

Thank you for the reply to my question!

 

The variables of importance output from the proc hpforest procedure sorted by AAEOOB:

jeff_b_2-1615394363479.png

Yes, I tried to create a label variable from the PROC CONTENTS output and merging it BY VARIABLE with the variable of importance output. However, since the label variable is a character/string variable, I cannot input it on Y-axis in the PROC SGPLOT procedure. I will try using the VALUESFORMAT option. Thanks for the suggestion.

For the data sorting, I used the AAEOOB variable, which is absolute error out of bag. I think using this variable or any variable from the variable of importance output for sorting  is incorrect, because the proc hpforest procedure uses own method to rank the variables (I guess by dividing mean error rate by SD or something like that), which looks like in the below figure. When I try to sort the output by any variable (MSE, OOB MSE, AE or AAEOOB), the ranking or order of the variables changes and the plot doesn't look like as it's supposed to be. I am not sure why the proc hpforest procedure doesn't include the variable of importance plots in it. I guess I have to use R to get the plot. I was trying to see if I can get one using SAS, too. Thank you for any suggestions with this!

The variables of importance output from the proc hpforest procedure:

jeff_b_1-1615393694173.png

 

 
 
ballardw
Super User

Since your sort order example doesn't include the Gini variable used for the Xaxis it is hard to tell what you actually need. Likely in means that data needs a different sort order. I would be tempted to sort by Gini descending <other variable> , where the other variable would place the variable_name at the top that you want.

 

 

 

A brief example with a data set you should have available:

proc sgplot data=sashelp.class;
   scatter x=height y=name ;
run;

proc sort data=sashelp.class
       out=work.class;
   by descending height;
run;

proc sgplot data=work.class;
   scatter x=height y=name ;
run;

When you have a character variable on an axis there isn't a "natural" order and the values of something else are controlling the order of appearance in the data set which can affect the order they appear on the axis (depends to some extent on the type of graph, in some you can order by a result calculated by SGPLOT)

 

 

jeff_b
Fluorite | Level 6

Hi @ballardw !

Thank you for the help.

 

With some modification, the code you provided worked, although I had to create another variable called "rank" from the variable of importance output and merge it back to that dataset. I used that "rank" variable to sort, since I didn't have the Gini variable in the dataset.

 

data ranking;
length Variable $100;
INPUT rank Variable $;
CARDS;
1 high_PA_baseline
2 early_weight_loss

....

;
RUN;

 

proc sort data=Variable_Importance; by Variable; run;
proc sort data=ranking; by Variable; run;
*Merge;
data plot;
length Variable $100;
merge Variable_Importance(in=x) ranking(in=y);
by Variable;
if x=y;
run;

 

proc sgplot data=plot;
scatter x=AAEOOB y=Variable ;
run;

 

proc sort data=plot out=plot2;
by AAEOOB descending rank;
run;

 

proc sgplot data=plot2;
scatter x=AAEOOB y=Variable ;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1370 views
  • 0 likes
  • 2 in conversation