BookmarkSubscribeRSS Feed
mikepark
Obsidian | Level 7

I have some data that I split into training, validation, and test sets. 

I am going to perform logistic regression on these datasets. 

However, I want to make sure my outcome variable or target has a 6:1 ratio between the non-target and target outcome.

Any advice on how to do this? 

 

3 REPLIES 3
sbxkoenk
SAS Super FREQ

Hello,

 

Before answering this, I want to make sure you are not having Enterprise Miner (SAS 9.4) or Model Studio (VIYA 3.5 VDMML or VIYA 4 VML).

 

Because Enterprise Miner and Model Studio have built-in "functions" to achieve this !

 

Thanks,

Koen

ballardw
Super User

Is the target outcome a continuous value or categorical?

 

Surveyselect uses STRATA statement to identify groups for selection and wants a single value per strata variable. The data would have to be sorted by the STRATA variable.

You would set selection criteria to set the size of each strata.

 

This example assumes that your data set is named Dummy, the strata variable is named Stratvar, that Dummy is sorted by Stratvar, the first sorted order of Stratvar is the one that you want the 6 part of the ratio and the second is the one that you want the 1s, and that there are at least 600 records of the first strata and 100 of the second.

 

Proc surveyselect data=dummy out=selected 
   sampsize=(600 100) ;
   strata stratvar;
run;

If your outcome has more than one result then likely you need to use a data step to add a strata variable based on the values. Then sort.

If you want to have something that automagically provides values of the Sampsize parameters you will need to add some macro programming to count the strata variable values and then calculate the count size based on your ratio.

 

If you attempted to use SAMPRATE that would select percentage of records from each strata. This could accomplish your ratio but is even more complicated than counts to set the rate each level would be sampled.

Ksharp
Super User
/*Check original Probability*/
proc freq data=sashelp.heart;
table status;
run;





/*Start split dataset*/
data training validation test;
set sashelp.heart;
call streaminit(123);
_id=rand('table',0.6,0.2,0.2);
if _id=1 then output training;
if _id=2 then output validation;
if _id=3 then output test;
run;

proc freq data=training;
table status;
run;
proc freq data=validation;
table status;
run;
proc freq data=test;
table status;
run;





Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 928 views
  • 5 likes
  • 4 in conversation