BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
viva0521
Fluorite | Level 6

Hi, there

I am a SAS newbie and now I am working on some simple linear regression in SAS enterprise miner. I am wondering how can I output the ASE for training, validation and testing respectively.

My flow chart is like:

File imported---->data partition (Training, validation, test)------->code node.

The code node was written as:

ods trace on;

proc glmselect DATA=&EM_IMPORT_DATA;

effect MyPoly = polynomial(A B C/degree=4); 

model Y = MyPoly;

run;

ods trace off;

1 ACCEPTED SOLUTION

Accepted Solutions
M_Maldonado
Barite | Level 11

OK, it took some googling, but I got this working.

You have a couple options.

1) you pass the train, validation, and test sets using the macro variables so that a Model Comparison node can pick up the partition and calculate the stats.

2) take advantage of the specific proc syntax. I think this is what you were trying to do:

code your own glmselect flow.png

1. Add a data set. In my example I used German Credit from F1->Generate Sample Data Sources

2. Add a Partition node

3. Add a SAS code node with the code below. Change the bold for your own target (response) and inputs (effects).

data mydata;

set &EM_IMPORT_DATA(in=a) &EM_IMPORT_VALIDATE(in=b) &EM_IMPORT_TEST(in=c);

if a then _partition="_Train";

else if b then _partition="_Valid";

else if c then _partition="_Test";

run;

proc glmselect DATA=mydata;

effect MyPoly = polynomial(duration checking savings/degree=4);

model amount = MyPoly;

partition rolevar=_partition(TEST='_Test' TRAIN='_Train' VALIDATE='_Valid');

run;

4. Run.

The output results will give you the ASE of training/validation/testing.

sas code glmselect output.png

This model isn't fabulous for this data set but hopefully this approach will give you good results on yours!

Good luck!

-m

Good reference: SAS/STAT(R) User's Guide, proc glm select - partition statement

PS If you try the other approach I described, you can easily use a Model Comparison node to compare with HPGLM or any of the model nodes in Enterprise Miner. Very recommended to give this a try!!!!!!!!!!

View solution in original post

3 REPLIES 3
M_Maldonado
Barite | Level 11

Hi,

You are doing some advanced stuff!

If you have a recent Enterprise Miner version, the easiest is to use the HPGLM node to do your model. And then add a Model Comparison node.

To code your own proc on a SAS Code node you need to use some macro variables so that the Model comparison node catches your partitions correctly. You are on a good track! In addition to &em_import_ data we need the corresponding &em_import_validate, &em_export_validate, &em_export_train, etc.

Try HPGLM node while someone posts a workaround to use proc glmselect on a SAS Code node.

I hope it helps!

-Miguel

viva0521
Fluorite | Level 6

Hi,

Thanks for your reply, I don't use HP GLM node since it has limitation for polynomial degree (up to 3). I need a little bit higher than that.

Indeed, I need these variables: &em_import_validate, &em_export_validate,&em_export_train. However, I don't know what my equation looks like before I ran my model. For instance, I did a nonlinear model as follows to calculate the ASE:

...

model Yt_1 = Y / (a + b * Y)

...

data EM_IMPORT_VALIDATE_est;

set &EM_IMPORT_VALIDATE. ;

_res2 = (Y1- (Y / (aa + bb *Y) ) )**2;

run;

proc means data=EM_IMPORT_VALIDATE_est noprint;

var _res2;

output out=&EM_EXPORT_VALIDATE(drop=_:) n=validate_n sum=validate_sse;

run;

validate_ase=validate_sse/(validate_n-2);

In this way, I can calculate my ASE for validation portion. My problem here is:

1 if I don't know my predict equation ahead, how can I code to calculate ASE?

2.Without code calculating ASE,  I can still calculate the overall ASE for the entire data set by using the obtained regression equation. But how can I figure out which portion of the data set was used to do training and which portion is used to do validation?

Thanks.

M_Maldonado
Barite | Level 11

OK, it took some googling, but I got this working.

You have a couple options.

1) you pass the train, validation, and test sets using the macro variables so that a Model Comparison node can pick up the partition and calculate the stats.

2) take advantage of the specific proc syntax. I think this is what you were trying to do:

code your own glmselect flow.png

1. Add a data set. In my example I used German Credit from F1->Generate Sample Data Sources

2. Add a Partition node

3. Add a SAS code node with the code below. Change the bold for your own target (response) and inputs (effects).

data mydata;

set &EM_IMPORT_DATA(in=a) &EM_IMPORT_VALIDATE(in=b) &EM_IMPORT_TEST(in=c);

if a then _partition="_Train";

else if b then _partition="_Valid";

else if c then _partition="_Test";

run;

proc glmselect DATA=mydata;

effect MyPoly = polynomial(duration checking savings/degree=4);

model amount = MyPoly;

partition rolevar=_partition(TEST='_Test' TRAIN='_Train' VALIDATE='_Valid');

run;

4. Run.

The output results will give you the ASE of training/validation/testing.

sas code glmselect output.png

This model isn't fabulous for this data set but hopefully this approach will give you good results on yours!

Good luck!

-m

Good reference: SAS/STAT(R) User's Guide, proc glm select - partition statement

PS If you try the other approach I described, you can easily use a Model Comparison node to compare with HPGLM or any of the model nodes in Enterprise Miner. Very recommended to give this a try!!!!!!!!!!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 4238 views
  • 0 likes
  • 2 in conversation