BookmarkSubscribeRSS Feed
pmdci
Fluorite | Level 6

Hello,

I am currently working on an assignment for college, where I need to create a predictive data mining model to determine sickness in patients.

I've been working around with regression so far (will also do decision trees and neural networks) and I just figured out something. When I added the data partition node, I set the sampling method as the default: simple random.

I do have several variables in my dataset such as age (interval), sex (binary) and others. I was wondering if I should modify my data partition node as stratified and use one or more of those variables (eg: sex, perhaps also age).

I realise that this might be a case of it is up to you, but I would really like to get form the community some advice around it. I mean, why would I do simple random or stratified, how could I make a judgement on what method to use, and so forth.

Any thoughts around this?

Thanks in advance for the help.

Regards,

P.

2 REPLIES 2
Reeza
Super User

You generally stratify when you believe the population is significantly different. For example if you were looking at physical health characteristics, males and females could be very diffferent so you could stratify on that variable. One method to determine variables to stratify on, is to do your regression and see what variables are significant, those are variables that might be worth stratifying on.

My 2 cents

pmdci
Fluorite | Level 6

Hi Reeza,

Thank you very much for your feedback. These are good points which I am certainly taking on board.

In this assignment I am performing only the first cycle of data mining. But this is valuable information for my recommendations.

Regards,

P.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2088 views
  • 3 likes
  • 2 in conversation