BookmarkSubscribeRSS Feed
pradeepad
Calcite | Level 5

I have large dataset with more than 50,000 rows. My outcome variable is a nominal variable with 4 outcomes, of which 3 are ordinal and the 4th is independent of the other 3. My predictors list contains more than 100 variables, of which some are categorical and some are continuous. The categorical predictors are not ordinal.

I researched PROC GENMOD which allows to predict multinomial regression but doesnt have variable select method. I also looked up PROC Logistic which can predict oridnal outcome variable and has variable selection option.

I would like to know which procedure to use and with what options. Please help! Thanks.

6 REPLIES 6
PGStats
Opal | Level 21

"My outcome variable is a nominal variable with 4 outcomes, of which 3 are ordinal and the 4th is independent of the other 3." - Please explain further.

PG

PG
pradeepad
Calcite | Level 5

PG,

The outcome variable is a product type of which 3 products are is ordinal in terms of risk ratings. The 4th is a different type of product. For this discussion we can consider two scenarios.

1. All are different

2. All are ordinal

Depending on both solutions, I can then furture analyze and customize the procedure to my needs. Hope this helps.

PGStats
Opal | Level 21

You mean something like possible outcomes are ( 1, 2, 3, A )?

Interesting. This seems like the type of analysis that marketing experts do.

Hope one can help you.

CART modelling (a data mining tool) would seem like a good place to start.

PG

PG
Rick_SAS
SAS Super FREQ

Outcomes that are {1,2,3,A}?  That is a new concept for me.

Someone who actually understands your model (not me!) might have a better way to do it, but to get the ball rolling, would it be possible to run this model in two stages?

1) Subset the data to WHERE(Y^=A). For this subset, use an ordinal regression model.

2) For the full data, create a new indicator variable isA = (y=A).  Build a logistic regression for this model.

For scoring new data, you would use the logistic model to predict whether the data is 'A' or 'Not A.'  If it is 'Not A', then use the ordinal model to predict whether the response is 1,2, or 3.

I am interested in hearing from statistical gurus like PG whether it has any merit or whether this approach should be avoided. As I said, I basically "made it up" on the spot.

Rick

pradeepad
Calcite | Level 5

Hi Rick,

Thanks for your reply.

"

1) Subset the data to WHERE(Y^=A). For this subset, use an ordinal regression model.

"

Based on my dataset which procedure do you recommend and with what options? I want to have variable selection as well.

Thanks.

Rick_SAS
SAS Super FREQ

As far as I know, PROC  LOGISTIC is the only procedure that does both. See the doc for an example of ordinal regression: http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_sec...

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1691 views
  • 3 likes
  • 3 in conversation