BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
issac
Fluorite | Level 6

Hi folks!

Have been working on a VA data analysis project that aims to predict the workload needed for a future inpatient based on his/her demographic, DRG, and health care characteristics. The response variable is PCRVU (Primary-Care relative value unit for all primary care visits during the year) which is continous, and we have a cross-sectional data set pulled out of, up to now, 6 different VA facilities. The are a number of independent variables, that can be grouped as 1.health related (such as Inpatient Days (LOS), CanScore (severity of the patient illness), assigned provider, etc.) 2. patient demographic attributes (such as zip code, gender, insurance status, etc.) and 3. war-related columns (such as radiation status, agent orange status, etc.). Previous attempts were using SAS E-Miner for OLS regression and CART that could not yield a reasonable R-Square. I'm thinking to use GLMM or GAM procedures but not pretty sure the way to approach the problem. Any helpful/professional comment would be appreciative.

Thanks!

Issac

1 ACCEPTED SOLUTION
8 REPLIES 8
art297
Opal | Level 21

Since you have EM, and only have to show support for the model (not necessarily defend its logic), I'd use the suggested method of developing a sample to build the model on and testing it on another sample, and use all possible approaches (including neural nets).

AllenMcDowell
SAS Employee

You might consider using the ADAPTIVEREG procedure (it is new in SAS/STAT 12.1). It fits adaptive regression splines which can be useful if the relationship between the response variable and the covariates is more complex than a simple linear effect but you don't know exactly what that relationship is.

issac
Fluorite | Level 6

@ Arthur

I've used EM with some transformation and get 39% of R-Square. Never had an experience working with NN so I could not be of favor or against it. Thanks!

@Allen

Have just found the MARS package in R and I agree with you, but unfortunately my SAS platform is not updated with 12.1. It's really a well-done technique and of much applicability though.  

Rick_SAS
SAS Super FREQ

Probably my first choice would be OLS using a spline basis. You can use the EFECT statement in many SAS regression procedures to build a spline basis, and then regress onto that basis to get nonlinear effects.  There is an example on p. of my 2010 SAS Global Forum paper: http://support.sas.com/resources/papers/proceedings10/329-2010.pdf The example uses PROC GLMSELECT and the OUTDESIGN= option to generate the spline basis, and PROC REG to analyze the results.  (Although the example uses LASSO for variable selection, you can omit that step and just get the spline basis for the variables of interest.)

issac
Fluorite | Level 6

@Rick    

That's a good point. One thing, by introducing splines in the model, is there a direct way to calculate the R-Square for the final model in PROC REG?

Rick_SAS
SAS Super FREQ

PROC REG will compute the R-Square statistic just as it does for any set of dummy variables.

issac
Fluorite | Level 6

@ Rick

Thanks. Wondering whether the output design matrix of GLMSELECT is singular or non-singular.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1710 views
  • 6 likes
  • 4 in conversation