Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Advice on data partition node: When to stratify or not

Reply
Contributor
Posts: 41

Advice on data partition node: When to stratify or not

Hello,

I am currently working on an assignment for college, where I need to create a predictive data mining model to determine sickness in patients.

I've been working around with regression so far (will also do decision trees and neural networks) and I just figured out something. When I added the data partition node, I set the sampling method as the default: simple random.

I do have several variables in my dataset such as age (interval), sex (binary) and others. I was wondering if I should modify my data partition node as stratified and use one or more of those variables (eg: sex, perhaps also age).

I realise that this might be a case of it is up to you, but I would really like to get form the community some advice around it. I mean, why would I do simple random or stratified, how could I make a judgement on what method to use, and so forth.

Any thoughts around this?

Thanks in advance for the help.

Regards,

P.

Super User
Posts: 17,905

Re: Advice on data partition node: When to stratify or not

You generally stratify when you believe the population is significantly different. For example if you were looking at physical health characteristics, males and females could be very diffferent so you could stratify on that variable. One method to determine variables to stratify on, is to do your regression and see what variables are significant, those are variables that might be worth stratifying on.

My 2 cents

Contributor
Posts: 41

Re: Advice on data partition node: When to stratify or not

Hi Reeza,

Thank you very much for your feedback. These are good points which I am certainly taking on board.

In this assignment I am performing only the first cycle of data mining. But this is valuable information for my recommendations.

Regards,

P.

Ask a Question
Discussion stats
  • 2 replies
  • 673 views
  • 3 likes
  • 2 in conversation