01-29-2018 07:47 AM
I put this under 'General' because I don't think Enterprise Miner can do this without a SAS Code node:
I have a HPGLM Node running with one dataset where I get one set of parameter estimates, then I'm using the residuals from that output to be the dependent variable with a different dataset where I will have an additional HPGLM Node. I need to be able to combine the parameter estimates from both models and then be able to use that to score a new/existing dataset. So I need:
1. How to combine the PE's from one model into two (not an ensemble model)
2. Use that to score additional data
I'd prefer to do this within the capacity of EM but I realize that probably isn't possible.
I'm doing this to validate an existing process from a consultant so I'm not looking for different ways to do the entire modeling process (although I realize there are probably several).
03-15-2018 03:07 PM
One point that is not clear is how you plan to map each residual of one data set to the observations in a different data set so that you can use the residuals as the dependent variable in the different data set running an additional HPGLM node.
Another point is how you plan to calculate "combine the parameter estimates".
One interpretation of what you might mean is this:
- Connect an HP GLM node to your flow. Run the HP GLM node.
- Add a Metadata node after the HP GLM node. Click the Train property.
In the Variables window that opens, modify the Hide and New Role columns
for each of the variables that you want to change.
Change the New Role of your existing target to Rejected.
Change the R_name variable (the residual variable) New Role to Target.
- Connect an HP GLM node after the Metadata node.
- Run the HP GLM node. It uses the previous residuals as the target.
- After the HP GLM node, connect a SAS Code node. (It does not have to be
connected, but if it is connected, then you can run the whole flow from this node,
Select the Code Editor property.
Enter code like this:
proc print data=&EM_LIB..hpglm_paramests;
proc print data=&EM_LIB..hpglm2_paramests;
Close the window. Run the node. View the results.
The general name of the HP GLM node data set that contains the parameter estimates is HPGLMn_PARAMESTS, where n, if present, refers to the nth HP GLM node. Look at the Node ID in the property panel of each HP GLM node to verify the name (HPGLM, HPGLM2, and so on). The point is that in that SAS Code node, you can reference those data sets to perform your combination calculation, and create a data set that contains your calculated parameter estimates. It is not clear how you want to apply those parameter estimates to yet another data set.
If the above is not what you intend, then what to advise depends on what additional information you can provide.
Have a great day!