11-08-2012 08:47 PM
Have been working on a VA data analysis project that aims to predict the workload needed for a future inpatient based on his/her demographic, DRG, and health care characteristics. The response variable is PCRVU (Primary-Care relative value unit for all primary care visits during the year) which is continous, and we have a cross-sectional data set pulled out of, up to now, 6 different VA facilities. The are a number of independent variables, that can be grouped as 1.health related (such as Inpatient Days (LOS), CanScore (severity of the patient illness), assigned provider, etc.) 2. patient demographic attributes (such as zip code, gender, insurance status, etc.) and 3. war-related columns (such as radiation status, agent orange status, etc.). Previous attempts were using SAS E-Miner for OLS regression and CART that could not yield a reasonable R-Square. I'm thinking to use GLMM or GAM procedures but not pretty sure the way to approach the problem. Any helpful/professional comment would be appreciative.
11-19-2012 01:51 PM
11-08-2012 08:54 PM
Since you have EM, and only have to show support for the model (not necessarily defend its logic), I'd use the suggested method of developing a sample to build the model on and testing it on another sample, and use all possible approaches (including neural nets).
11-16-2012 10:57 AM
You might consider using the ADAPTIVEREG procedure (it is new in SAS/STAT 12.1). It fits adaptive regression splines which can be useful if the relationship between the response variable and the covariates is more complex than a simple linear effect but you don't know exactly what that relationship is.
11-16-2012 11:17 AM
I've used EM with some transformation and get 39% of R-Square. Never had an experience working with NN so I could not be of favor or against it. Thanks!
Have just found the MARS package in R and I agree with you, but unfortunately my SAS platform is not updated with 12.1. It's really a well-done technique and of much applicability though.
11-16-2012 11:27 AM
Probably my first choice would be OLS using a spline basis. You can use the EFECT statement in many SAS regression procedures to build a spline basis, and then regress onto that basis to get nonlinear effects. There is an example on p. of my 2010 SAS Global Forum paper: http://support.sas.com/resources/papers/proceedings10/329-2010.pdf The example uses PROC GLMSELECT and the OUTDESIGN= option to generate the spline basis, and PROC REG to analyze the results. (Although the example uses LASSO for variable selection, you can omit that step and just get the spline basis for the variables of interest.)
11-17-2012 09:56 AM
That's a good point. One thing, by introducing splines in the model, is there a direct way to calculate the R-Square for the final model in PROC REG?
11-19-2012 11:36 AM
Thanks. Wondering whether the output design matrix of GLMSELECT is singular or non-singular.
11-19-2012 01:51 PM
Need further help from the community? Please ask a new question.