I have been a Sas Enterprise miner user since 2000, but we have now Sas Viya 3.5 platform with vddml.
In our companies with to deal with rare event that can be handle by Sas Model Studio. But something we need more events and we want to use the Smote method.
In Sas Enterprise miner it is easy to implement. especially if you are use the prior functionaltity.
Now I like to implement the same method as Sas Enterprise Miner in Sas Model studio.
I was thinking.
create new project
import dataset without rare events
after data node in the pipeline
use feature extraction (use pca) and use sas code node (see program below).
or use sas code node on all the numeric variables.
but can I change the prior (correct event rate)??
or should I do the smote method before Sas Model Studio.??
This below I what is was using sas Enterprise Miner. I made the program a little bit sas model studio prove (not ready yet). probably I have to change proc modeclus if I want to execute it in cas. until now we are not using really big data.
%let THRESHOLD = 5;
options casdatalimit=50G;
ods _all_ close; proc modeclus data = &dm_data.(where=(%dm_dec_target = '1')) cluster = _SEGMENT_ dk = 10 /*get 10 nearest neighbors */ neighbor out = _modeclus_out_; var %dm_interval_input; /*Specify the input numeric variables here*/ id &id; /*Sepcify the unique id variable of each observation*/ ods output Neighbor = _nTest_; /*output the nearest neighbors' id along with their distance*/ run; ods listing;
/* List each pair of a rare case and its 10 nearest neighbors*/ data _neighbor_id_ (keep = center_id nbor index); set _ntest_ ( where= (distance < &THRESHOLD ) ); /*Optional: set a distance threshold*/ retain center_id '000000000000'; index = _N_ ; /*prepare id for the new cases*/ if not missing(id) then center_ID = put(id, $8.); else center_id = center_id; run;
/* transpose the raw data*/ proc transpose data=&EM_IMPORT_DATA.(where=(%dm_dec_target = '1')) out = _raw1_ prefix = ID_; var %dm_interval_input; id &id; run;
/*Generating random cases with look up table*/ data _NULL_; set _neighbor_id_ end=eof; if not eof then do; call execute ( 'DATA _raw1_ ; '); call execute ( 'set _raw1_ ;' ) ; call execute ("_R_=rand('uniform');"); call execute ( 'new_'||trim(left((index)))|| ' = (id_'||trim(left((nbor)))||' - id_'||trim(left((center_id)))||') * _R_ '); call execute ( ' + ID_'||trim(left((center_id)))||' ;'); call execute ('run;'); end;
if eof then do; call execute ('data _raw2_ (keep=_NAME_ new:); set _raw1_; run; '); end; run;
/* transpose the new cases back to the original layout*/ proc transpose data = _raw2_ out = _new_cases_; id _NAME_; var _NUMERIC_; run;
proc sql; create table &EM_EXPORT_TRAIN as select * from &dm_data union select *, 1 as %dm_dec_target from _new_cases_(rename = (_NAME_ = &id)) ;quit;
proc datasets lib = work; delete _:; run;
proc sql; select %dm_dec_target label = "TARGET_LEVEL", (select count(*) from &EM_IMPORT_DATA t1 where t1.%dm_dec_target = t3.%dm_dec_target) as input_count label = "INPUT_COUNT", (select count(*) / (select count(*) from &EM_IMPORT_DATA) from &EM_IMPORT_DATA t1 where t1.%dm_dec_target = t3.%dm_dec_target) as input_ratio format percent15.2 label = "INPUT_RATIO", count(*) as sampled_count label = "SAMPLED_COUNT", count(*) / (select count(*) from &EM_EXPORT_TRAIN) as sampled_ratio format percent15.2 label = "SAMLED_RATIO", count(*) / (select count(*) from &EM_IMPORT_DATA t2 where t2.%dm_dec_target = t3.%dm_dec_target) as ratio_of_change format percent15.2 label = "PERCENT CHANGE" from &EM_EXPORT_TRAIN t3 group by %dm_dec_target ;quit;
... View more