BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sarafrass
Fluorite | Level 6

Good morning, sorry for the bother. 

I'm performing an analysis in SAS and I plotted a graph using the opensource code with R for the Mean-Squared Error in the forest.

I would like to understand what the red, the green and the black line mean. Could you please help me?


# Note that a few lines of Python or R code are added before your code; for example:
# R:
dm_class_input <- c("class_var_1", "class_var_2")
dm_interval_input <- c("numeric_var_1", "numeric_var_2")

library(randomForest)

# Fit RandomForest model w/ training data
dm_model <- randomForest(dm_model_formula, ntree=200, data=dm_traindf, importance=TRUE)

# Save MSE plot to PNG
png("rpt_forestMsePlot.png")
plot(dm_model, main='randomForest Mean-Squared Error Plot')
dev.off()

# Save VariableImportance to CSV
write.csv(importance(dm_model), file="rpt_forestIMP.csv", row.names=TRUE)

# Score full data
dm_scoreddf <- data.frame(predict(dm_model, dm_inputdf, type="prob"))
colnames(dm_scoreddf) <- c("P_Dec_Tree", "P_Dec_Tree1")

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Too bad the package writer didn't include a legend!

 

See this CrossValidated article, which is a similar question:

Random Forest in R: intepret the plot - Cross Validated (stackexchange.com)

 

One of the lines is the out-of-bag error rate, which is equal to dm_model$err.rate[,1].  I think it is the black line, but you can verify by plotting that quantity in a separate graph. According to the article, the red and green line are for two factors in your data. If you run 
head(dm_model$err.rate) you can see the actual values and match the factor levels to the colors in your graph.

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

Too bad the package writer didn't include a legend!

 

See this CrossValidated article, which is a similar question:

Random Forest in R: intepret the plot - Cross Validated (stackexchange.com)

 

One of the lines is the out-of-bag error rate, which is equal to dm_model$err.rate[,1].  I think it is the black line, but you can verify by plotting that quantity in a separate graph. According to the article, the red and green line are for two factors in your data. If you run 
head(dm_model$err.rate) you can see the actual values and match the factor levels to the colors in your graph.

sarafrass
Fluorite | Level 6

Thank you so much @Rick_SAS !

I solved it doing as you suggested and plotting the head. 

 

Thank you again!

Sara

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 513 views
  • 1 like
  • 2 in conversation