BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GuyTreepwood
Obsidian | Level 7

Hello,

 

I am working with a dataset with a severe class imbalance and am applying the SMOTE (synthetic minority oversampling technique) technique to create synthetic versions of the rare event I am trying to predict. A part of this process involves me applying mean standardization to the current training population dataset in order to create the SMOTE target event rows. I am using the following code snippet to standardize the variables in this training set (the macro &inputvar contains the list of numeric input variables I will use for modeling). 

 

proc standard data=Original_Training_data mean=0 std=1 out=Standardized_Training_data (keep=Target ID &inputvar);
var &inputvar;
run;

 

After creating the synthetic target event rows, the values for the input variables are within the standardized scales. Is there an easy way to convert these columns to match the original input scales from the original training dataset above (which the SMOTE created synthetic rows are based off of)? I found that there is an option in PROC STDIZE called UNSTDIZE that looks like that can do this, but I am not quite sure how to apply this. The link to the documentation for this procedure is found here:

 

SAS Help Center: PROC STDIZE Statement

 

Would I have to use PROC STDIZE to do the initial standardization, and then use this output to de-standardize the SMOTE variables? 

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

This may be helpful with regards to syntax.  But you will be using "synthetic target event rows" as DATA= in the second STDIZE call.

 

proc stdize data=sashelp.class out=std outstat=stats;
   run;
proc print data=stats;
   run;
proc stdize data=std unstd method=in(stats) out=class2;
   run;
proc print data=class2;
   run;

View solution in original post

2 REPLIES 2
data_null__
Jade | Level 19

This may be helpful with regards to syntax.  But you will be using "synthetic target event rows" as DATA= in the second STDIZE call.

 

proc stdize data=sashelp.class out=std outstat=stats;
   run;
proc print data=stats;
   run;
proc stdize data=std unstd method=in(stats) out=class2;
   run;
proc print data=class2;
   run;
PGStats
Opal | Level 21

Inspired by @data_null__ ... adjusting females to match males means and stds :

 

/* Standardized training data */
proc stdize data=sashelp.class(where=(sex = "M")) out=stdMale outstat=statsMale;
var height weight; 
   run;
proc print data=statsMale;
   run;
   
/* Standardized test data */
proc stdize data=sashelp.class(where=(sex = "F")) out=stdFemale;
var height weight;    
run;

/* Adjusted test data matching training data */
proc stdize data=stdFemale unstd method=in(statsMale) out=classFemale;
var height weight;    
run;

data both;
set sashelp.class(where=(sex = "F"));
set classFemale(rename=(weight=adjWeight height=adjHeight) keep=weight height) ;
run;

proc print data=both;
   run;

PGStats_0-1649386765484.png

 

PG

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 882 views
  • 2 likes
  • 3 in conversation