BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sarafrass
Fluorite | Level 6

Good morning, sorry for the bother. 

I'm performing an analysis in SAS and I plotted a graph using the opensource code with R for the Mean-Squared Error in the forest.

I would like to understand what the red, the green and the black line mean. Could you please help me?


# Note that a few lines of Python or R code are added before your code; for example:
# R:
dm_class_input <- c("class_var_1", "class_var_2")
dm_interval_input <- c("numeric_var_1", "numeric_var_2")

library(randomForest)

# Fit RandomForest model w/ training data
dm_model <- randomForest(dm_model_formula, ntree=200, data=dm_traindf, importance=TRUE)

# Save MSE plot to PNG
png("rpt_forestMsePlot.png")
plot(dm_model, main='randomForest Mean-Squared Error Plot')
dev.off()

# Save VariableImportance to CSV
write.csv(importance(dm_model), file="rpt_forestIMP.csv", row.names=TRUE)

# Score full data
dm_scoreddf <- data.frame(predict(dm_model, dm_inputdf, type="prob"))
colnames(dm_scoreddf) <- c("P_Dec_Tree", "P_Dec_Tree1")

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Too bad the package writer didn't include a legend!

 

See this CrossValidated article, which is a similar question:

Random Forest in R: intepret the plot - Cross Validated (stackexchange.com)

 

One of the lines is the out-of-bag error rate, which is equal to dm_model$err.rate[,1].  I think it is the black line, but you can verify by plotting that quantity in a separate graph. According to the article, the red and green line are for two factors in your data. If you run 
head(dm_model$err.rate) you can see the actual values and match the factor levels to the colors in your graph.

View solution in original post

2 REPLIES 2
Rick_SAS
SAS Super FREQ

Too bad the package writer didn't include a legend!

 

See this CrossValidated article, which is a similar question:

Random Forest in R: intepret the plot - Cross Validated (stackexchange.com)

 

One of the lines is the out-of-bag error rate, which is equal to dm_model$err.rate[,1].  I think it is the black line, but you can verify by plotting that quantity in a separate graph. According to the article, the red and green line are for two factors in your data. If you run 
head(dm_model$err.rate) you can see the actual values and match the factor levels to the colors in your graph.

sarafrass
Fluorite | Level 6

Thank you so much @Rick_SAS !

I solved it doing as you suggested and plotting the head. 

 

Thank you again!

Sara