BookmarkSubscribeRSS Feed
pmdci
Fluorite | Level 6

Hello,

I am currently working on an assignment for college, where I need to create a predictive data mining model to determine sickness in patients.

I've been working around with regression so far (will also do decision trees and neural networks) and I just figured out something. When I added the data partition node, I set the sampling method as the default: simple random.

I do have several variables in my dataset such as age (interval), sex (binary) and others. I was wondering if I should modify my data partition node as stratified and use one or more of those variables (eg: sex, perhaps also age).

I realise that this might be a case of it is up to you, but I would really like to get form the community some advice around it. I mean, why would I do simple random or stratified, how could I make a judgement on what method to use, and so forth.

Any thoughts around this?

Thanks in advance for the help.

Regards,

P.

2 REPLIES 2
Reeza
Super User

You generally stratify when you believe the population is significantly different. For example if you were looking at physical health characteristics, males and females could be very diffferent so you could stratify on that variable. One method to determine variables to stratify on, is to do your regression and see what variables are significant, those are variables that might be worth stratifying on.

My 2 cents

pmdci
Fluorite | Level 6

Hi Reeza,

Thank you very much for your feedback. These are good points which I am certainly taking on board.

In this assignment I am performing only the first cycle of data mining. But this is valuable information for my recommendations.

Regards,

P.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2119 views
  • 3 likes
  • 2 in conversation