SAS Data Science

Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Viya (Machine Learning), SAS Visual Text Analytics, with point-and-click interfaces or programming
BookmarkSubscribeRSS Feed
pmdci
Fluorite | Level 6

Hello,

I am currently working on an assignment for college, where I need to create a predictive data mining model to determine sickness in patients.

I've been working around with regression so far (will also do decision trees and neural networks) and I just figured out something. When I added the data partition node, I set the sampling method as the default: simple random.

I do have several variables in my dataset such as age (interval), sex (binary) and others. I was wondering if I should modify my data partition node as stratified and use one or more of those variables (eg: sex, perhaps also age).

I realise that this might be a case of it is up to you, but I would really like to get form the community some advice around it. I mean, why would I do simple random or stratified, how could I make a judgement on what method to use, and so forth.

Any thoughts around this?

Thanks in advance for the help.

Regards,

P.

2 REPLIES 2
Reeza
Super User

You generally stratify when you believe the population is significantly different. For example if you were looking at physical health characteristics, males and females could be very diffferent so you could stratify on that variable. One method to determine variables to stratify on, is to do your regression and see what variables are significant, those are variables that might be worth stratifying on.

My 2 cents

pmdci
Fluorite | Level 6

Hi Reeza,

Thank you very much for your feedback. These are good points which I am certainly taking on board.

In this assignment I am performing only the first cycle of data mining. But this is valuable information for my recommendations.

Regards,

P.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2733 views
  • 3 likes
  • 2 in conversation