About Zachary

Zachary · ‎09-15-2015

I am getting varied results when I vary the Number of vars to consider in a split search within my HP Forest Node. I vaguely recall what the definition of the default here. But I cannot find that documentation again. Thank you.

Zachary · ‎09-11-2015

Thank you very much, again. Below is a listing of the varying issues we are up against, but before then I thought I would talk a little about the error: Regarding ULTIMATE_LITIGATION & ULTIMATE_RTW, these were originally set in the data to Drop. But what I did it make sure it was explicitly Rejected as well. This prevented the error from happening. I would think Drop & Rejected would be synonymous, but... Regarding the HP Forest Node itself... Does it use some form of bootstrapping to get the varying results? I am a little worried that my cutoff results may be different after a second time running it. Then again, I have it creating a max of 100 trees. Theoretically it should converge. The Report.pdf file: Am I correct in assuming that the scoring is developed solely based on the Training data? Now ultimately I need to come up with a set of scoring rules to submit to our IT department - ultimately to score a model outside of SAS. Originally I had 1,117 variables. The files says it selected 1,093. That is probably just too much. Honestly I am trying to: (1) Come up with a decent model. (2) Maybe select a subset of the predictors that comes close to converging on the final model. An analogy would be using discriminant analyses to predict segments that I generated on a much fuller set of data. 1,000+ variables is just too many for my IT department to work with. The Selected Variable Importance does not list out the names of all the variables. But I think their details occur in an alphabetized way below. I can use the bottom information, but how does it rank them in terms of their importance? It would be ideal if I could say maybe the top-20 or 30 variables are "good enough" for estimating the overall HP Forest model. I hope that is making sense. Is there enough information contained within here to converge to the solution? Also note that I am going to also experiment in seeing how I may limit my variable splits to 2 rather than the larger number that happens by default. Am I correct in that this is what I set the Max Categories in Split Search property to? I can set it to 2, down from a default of 30, but it says it only applies to Nominal variables. Lastly, I am assuming that the Scorecard Points provides the information that our programmers need to program all of this into our system? What do these specifically mean? Can you provide perhaps an example of how this works? How did the reporter node come up with this single tree? I apologize for all of the questions - but I guess I am back to being a little scare of the utility of the HP Forests. I like the stability, and the solution is good in terms of my hit-rates & false positives. Now I just need to see how it will be implemented in reality - and will working with a basic subset of variables be close enough to get us to where we need to go? Thank you, again, & as usual.

Zachary · ‎09-10-2015

One more question... I am intrigued by how you say it is a tradeoff between intepretability and other factors. Can records within the HP Random Forest model be scored? I ask becuase our IT department will ultimately need to build some algorithms outside of SAS to score each respective model. Without that capability I sort of cannot use the HP Forest model. Thank you.

Zachary · ‎09-10-2015

Thank you very much for that solution. Unfortunately I get errors when I run it. I am attaching a picture of my workspace as well as a log of the erorrs. Perhaps you are in the know in how to diagnose what is going wrong?

Zachary · ‎09-09-2015

I am using Enterprise Miner to build some models to predict a dichotomous criterion. I am using the HP Forest node as a way of sort of simulating multiple decision tree runs. First, I am looking for an easy way to summarize the summary of the decision trees. Any ideas or references here? I also adjust my cutoff to where the Event Precision Recall is equal – that is at 0.21 even though 10% of my data has the criterion – the outcome. When using this new cutoff my hit-rate (or true positive rate) is at about 50 with a false positive rate at 5. That hit rate is not where I would expect it to be. I was thinking it would be around the 70s. So if I search for a rate a little closer to 0.10 (down from 0.21 and matching back to my original criterion) then my hit rate is 95 but my false positives go to 41! Another way to look at it is to specify the cutoff based on my expectation of a decent hit rate. Under those circumstances I am able to get to around a 70% using a 0.15 cutoff. The false positive climb to about 14 though. So if I had to guess my cutoff should be somewhere between a .10 and a .21. Should the cutoff be chosen based on the hit rate, false positive rate, and classification rate? WWSCMD [What Would SAS Community members Do]? Thank you very much in advance.

Zachary · ‎08-26-2015

I think I need the Code Node to accomplish this, but I am not sure. I am running a Decision Tree on a set of data composed of about 10,000 records. The dependent variable is nominal/dichotomous, and I expect about 90% of the data will be predicted to be a 0 while the other 10% will be predicted to be a 1. I then want to put another Decision Tree into the project. That one will use a different continuous dependent variable and it will be only for cases that are predicted to be 0 for the first Decision Tree- again, around 9,000 cases. It sounds kind of basic, but kind of not. Either way, can anyone suggest the best way to do things in Enterprise Miner for this? Thank you. Much appreciated. FOLLOW-UP: Actually the TwoStage Node seems like it will suit my needs well. However, I am modeling two differeent dependent/response variables. One for categorical, and one continuous. So perhaps that is not usable.

Zachary · ‎08-26-2015

That is perfect! Thank you so much!

Zachary · ‎08-25-2015

Thanks. Feel free to send along the code to accomplish this, if you wish. I have never even remotely tried to pull something like that off.

Zachary · ‎08-25-2015

Thank you very much Dan. Unfortunately my data is a bit on the side of being "super skewed." So I use the following code to graph it when I break my full distribution into 100 bins. The graph is much more healthy looking. Any suggestions on how to get markers in here to denote particular breaks in the distribution? proc rank data = CLAIMS_DATA out = CLAIMS_DATA groups = 100; var TRENDED_INCURRED; ranks B_TRENDED_INCURRED; run; ods output "Summary statistics"=STATS; proc means data = CLAIMS_DATA mean; var TRENDED_INCURRED; class B_TRENDED_INCURRED; run; quit; ods output close; proc sgplot data = STATS; series x = B_TRENDED_INCURRED Y = TRENDED_INCURRED_Mean; run;

Zachary · ‎08-25-2015

I have been searching for a long time for the correct code to do this, but I have not found it yet. I have n = 20,000 cases. I have already sorted the order of the data by RESERVE from low to high. Basically I would like to see how all 20,000 records distribute with either the count or the % on the y-axis. No need for another variable. Sort of a very big histogram turned into a line graph. Then, to add a little complexity to it, I would like to mark a few points on the graph. Currently my RESERVE variable goes from 100 to about 5 million. But it would be nice if I could denote where 100,000 is, or where 20,000 is in my distribution. Thank you very much in advance.

Zachary · ‎08-20-2015

What is the proper and cleanest way to rename an Enterprise Miner Project? I will admit that I have gotten sloppy in the past renaming things in Windows and paying the price later. The reason is simply because I titled my newest project "Test1-" not thinking it was going anywhere. But now I wish to make this my project of focus for my office. Thank you very much. ************************************** As a follow-up- I tried to rename it within the SAS Management Console. Specifically within the Application Management directory, and within there the Projects. I renamed it from Test1 to Test2. Then I went into Enterprise Miner and clicked on Recent Projects. It did not see Test 2, but it saw Test 1. If I try to open it then it says it cannot find the files - do I want to remove the directory. Of course I said no. Then I tried opening a new Project. Here it found Test2, but when I tried to open it I got a similar message saying the files were not found. So I am sort of back to square one. Any thoughts and/or suggestions are welcome.

Zachary · ‎08-14-2015

I am using the Save Data node in Enteprise Miner. Both the P_DV & V_DV variables are created. P_DV is the predicted value of the dependent variable. Likewise, V_DV is the validation value of the dependent variable, but I am not so sure as to why they would be different on a record basis. But what is the R_DV variable? Thank you.

Zachary · ‎08-14-2015

Thanks a bunch. Will look into if my comments were more a result of a panic. Actually - thank you for the link. I now have numerous options in how I can make my data meaningful! Thank you so much!

Zachary · ‎08-14-2015

Thank you very much Chris. I love the Dr. AnnMaria blog as well - will be visiting that occasionally - she is very funny. I THINK I have figured out how to effectively use Edit Using SAS Code for the Edit Variables window. Below are some suggestions: I do not think the system likes comments - either via *; or /* */. Nor do I think it wants blank lines in there. This might be a place in SAS where it is important to capitalize key words. So if I set my Level to Nominal, it actually needs to be NOMINAL even though it does not show as all Caps in the window. I need to experiment more with this one, but I once thought it was important to have a run; after a set of code. But I am not so sure anymore. They are not much - but following those steps seems to help the process move along in an automated way. Anything else I might have missed?

Zachary · ‎08-12-2015

I pepare my data in Enterprise Guide then save it out into a directory to be later read into Enterprise Miner. There are a lot of variables where I would like to set varying parameters. Below are a few examples: * if Role = "TIMEID" then Level = "INTERVAL"; Role = "INPUT"; if Name = "TRENDED_INCURRED" then Role = "TARGET"; * Making all of the Text Flag fields Nominal; TEXT_ID = index(Name, "Flag"); if TEXT_ID = 1 then Level = "Nominal"; * Dropping all of the Date variables; if Role = "TIMEID" then Drop = "Y"; * Changing the following vars to Nominal - some ORDINAL sprinkled in as well for HAZARDC0DE; if Name = MAXTOTALCODE then Level = Nominal; if Name = DOCTORS then Level = Nominal; You can see that I commented out some of my statements as well - do not know if Edit Using SAS Code likes this. If I recall correctly Enterprise Miner can be a little quirky where sometimes it does not like blank lines, or stuff like that. If I am in the window of editing the SAS code everything looks fine. But when I say OK and re-open my SAS data file in Miner it shows a lot of weird changes or many changes that are just left blank. Is anything wrong with my version of Miner? But, my big question is if I can set the variable Levels, Roles, and Drop statuses in Enterprise Guide before it is read into Enterprise Miner? Thank you very much in advance.

Online Status	Offline
Date Last Visited	‎09-15-2015 07:13 PM

Number vars to consider in split search - What is the default for the ...

Re: Enterprise Miner Cutoff Node & Intepretation

Re: Enterprise Miner Cutoff Node & Intepretation

Re: Enterprise Miner Cutoff Node & Intepretation

Enterprise Miner Cutoff Node & Intepretation

Subset Decision Trees Within Enterprise Miner

Re: How to Properly Rename an Enterprise Miner Project

Re: Turn Histogram into a Line Graph - With Markers

Re: Turn Histogram into a Line Graph - With Markers

Turn Histogram into a Line Graph - With Markers

Re: Turn Histogram into a Line Graph - With Markers

Re: Word list search

Re: After SAS Code Node Variables Disappear- Rather, Never Appear

Re: Best Way to Finalize a Model Using 100% Data After 80/20 Training/...

Re: Is SAS Deployment the Correct Forum for Co-Worker Not Able to Conn...

Enterprise Miner Cutoff Node & Intepretation

Re: Turn Histogram into a Line Graph - With Markers

Re: Edit Using SAS Code Seems to Not Work

Looking for Precision Up to Maybe 8 Decimal Places in proc rank

EM Decision Trees - Stratification or Not - Validation or Test?

Number vars to consider in split search - What is the default for the ...

Re: Enterprise Miner Cutoff Node & Intepretation

Re: Enterprise Miner Cutoff Node & Intepretation

Re: Enterprise Miner Cutoff Node & Intepretation

Enterprise Miner Cutoff Node & Intepretation

Subset Decision Trees Within Enterprise Miner

Re: How to Properly Rename an Enterprise Miner Project

Re: Turn Histogram into a Line Graph - With Markers

Re: Turn Histogram into a Line Graph - With Markers

Turn Histogram into a Line Graph - With Markers

How to Properly Rename an Enterprise Miner Project

Save Data Node in Enterprise Miner - "R_" Column

Re: Edit Using SAS Code Seems to Not Work

Re: Edit Using SAS Code Seems to Not Work

Edit Using SAS Code Seems to Not Work