BookmarkSubscribeRSS Feed
Hansdewit
Obsidian | Level 7

I have been a Sas Enterprise miner user since 2000, but we have now Sas Viya 3.5 platform with vddml.

 

In our companies with to deal with rare event that can be handle by Sas Model Studio.  But something we need more events and we want to use the Smote method.

In Sas Enterprise miner it is easy to implement. especially if you are use the prior functionaltity.

 

Now I like to implement the same method as Sas Enterprise Miner in Sas Model studio. 

I was thinking.

  1. create new project
    1. import dataset without rare events
  2. after data node in the pipeline
    1. use feature extraction (use pca) and use sas code node (see program below).
    2. or use sas code node on all the numeric variables.
  3.  but can I change the prior (correct event rate)??

or should I do the smote method before Sas Model Studio.??

 

This below I what is was using sas Enterprise Miner. I made the program a little bit sas model studio prove (not ready yet).  probably I have to change proc modeclus if I want to execute it in cas. until now we are not using really big data.

 

%let THRESHOLD = 5;

options casdatalimit=50G;

ods _all_ close;
proc modeclus
data = &dm_data.(where=(%dm_dec_target = '1'))
cluster = _SEGMENT_
dk = 10 /*get 10 nearest neighbors */
neighbor out = _modeclus_out_;
var %dm_interval_input; /*Specify the input numeric variables here*/
id &id; /*Sepcify the unique id variable of each observation*/
ods output Neighbor = _nTest_; /*output the nearest neighbors' id along with their distance*/
run;
ods listing;

/* List each pair of a rare case and its 10 nearest neighbors*/
data _neighbor_id_ (keep = center_id nbor index);
set _ntest_ ( where= (distance < &THRESHOLD ) ); /*Optional: set a distance threshold*/
retain center_id '000000000000';
index = _N_ ; /*prepare id for the new cases*/
if not missing(id) then center_ID = put(id, $8.);
else center_id = center_id;
run;

/* transpose the raw data*/
proc transpose
data=&EM_IMPORT_DATA.(where=(%dm_dec_target = '1'))
out = _raw1_
prefix = ID_;
var %dm_interval_input;
id &id;
run;

/*Generating random cases with look up table*/
data _NULL_;
set _neighbor_id_ end=eof;
if not eof then do;
call execute ( 'DATA _raw1_ ; ');
call execute ( 'set _raw1_ ;' ) ;
call execute ("_R_=rand('uniform');");
call execute ( 'new_'||trim(left((index)))|| ' = (id_'||trim(left((nbor)))||' - id_'||trim(left((center_id)))||') * _R_ ');
call execute ( ' + ID_'||trim(left((center_id)))||' ;');
call execute ('run;');
end;

if eof then do;
call execute ('data _raw2_ (keep=_NAME_ new:); set _raw1_; run; ');
end;
run;

/* transpose the new cases back to the original layout*/
proc transpose
data = _raw2_
out = _new_cases_;
id _NAME_;
var _NUMERIC_;
run;

proc sql;
create table &EM_EXPORT_TRAIN as
select * from &dm_data
union
select *, 1 as %dm_dec_target from _new_cases_(rename = (_NAME_ = &id))
;quit;


proc datasets lib = work;
delete _:;
run;


proc sql;
select %dm_dec_target label = "TARGET_LEVEL",
(select count(*) from &EM_IMPORT_DATA t1 where t1.%dm_dec_target = t3.%dm_dec_target) as input_count label = "INPUT_COUNT",
(select count(*) / (select count(*) from &EM_IMPORT_DATA) from &EM_IMPORT_DATA t1 where t1.%dm_dec_target = t3.%dm_dec_target)
as input_ratio format percent15.2 label = "INPUT_RATIO",
count(*) as sampled_count label = "SAMPLED_COUNT",
count(*) / (select count(*) from &EM_EXPORT_TRAIN)
as sampled_ratio format percent15.2 label = "SAMLED_RATIO",
count(*) / (select count(*) from &EM_IMPORT_DATA t2 where t2.%dm_dec_target = t3.%dm_dec_target)
as ratio_of_change format percent15.2 label = "PERCENT CHANGE"
from &EM_EXPORT_TRAIN t3
group by %dm_dec_target
;quit;

2 REPLIES 2
Niranjan_18
SAS Employee

Hi,

 

You dont have to replace modeclus, you can use the same. Yeah like you mentioned need to change how you refer to the input, output datasets and variables within pipelines using macro calls like &dm_data, %dm_interval_input etc.

 

After the final step, you need to use %dmcas_register, %dmcas_metachange macros for cas registration and metadata update.

 

Thanks,

Niranjan 

Hansdewit
Obsidian | Level 7

there is not much information available for the macro &dm_data_targetInfo.  should i change  this?