turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- How to output ASE for training, validation and tes...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-22-2015 09:34 PM

Hi, there

I am a SAS newbie and now I am working on some simple linear regression in SAS enterprise miner. I am wondering how can I output the ASE for training, validation and testing respectively.

My flow chart is like:

File imported---->data partition (Training, validation, test)------->code node.

The code node was written as:

ods trace on;

proc glmselect DATA=&EM_IMPORT_DATA;

effect MyPoly = polynomial(A B C/degree=4);

model Y = MyPoly;

run;

ods trace off;

Accepted Solutions

Solution

06-23-2015
04:21 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to viva0521

06-23-2015 04:21 PM

OK, it took some googling, but I got this working.

You have a couple options.

1) you pass the train, validation, and test sets using the macro variables so that a Model Comparison node can pick up the partition and calculate the stats.

2) take advantage of the specific proc syntax. I think this is what you were trying to do:

1. Add a data set. In my example I used German Credit from F1->Generate Sample Data Sources

2. Add a Partition node

3. Add a SAS code node with the code below. Change the bold for your own target (response) and inputs (effects).

data mydata;

set &EM_IMPORT_DATA(in=a) &EM_IMPORT_VALIDATE(in=b) &EM_IMPORT_TEST(in=c);

if a then _partition="_Train";

else if b then _partition="_Valid";

else if c then _partition="_Test";

run;

proc glmselect DATA=mydata;

effect MyPoly = polynomial(**duration checking savings**/degree=4);

model **amount** = MyPoly;

partition rolevar=_partition(TEST='_Test' TRAIN='_Train' VALIDATE='_Valid');

run;

4. Run.

The output results will give you the ASE of training/validation/testing.

This model isn't fabulous for this data set but hopefully this approach will give you good results on yours!

Good luck!

-m

Good reference: SAS/STAT(R) User's Guide, proc glm select - partition statement

PS If you try the other approach I described, you can easily use a Model Comparison node to compare with HPGLM or any of the model nodes in Enterprise Miner. Very recommended to give this a try!!!!!!!!!!

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to viva0521

06-23-2015 01:23 PM

Hi,

You are doing some advanced stuff!

If you have a recent Enterprise Miner version, the easiest is to use the HPGLM node to do your model. And then add a Model Comparison node.

To code your own proc on a SAS Code node you need to use some macro variables so that the Model comparison node catches your partitions correctly. You are on a good track! In addition to &em_import_ data we need the corresponding &em_import_validate, &em_export_validate, &em_export_train, etc.

Try HPGLM node while someone posts a workaround to use proc glmselect on a SAS Code node.

I hope it helps!

-Miguel

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to M_Maldonado

06-23-2015 03:20 PM

Hi,

Thanks for your reply, I don't use HP GLM node since it has limitation for polynomial degree (up to 3). I need a little bit higher than that.

Indeed, I need these variables: &em_import_validate, &em_export_validate,&em_export_train. However, I don't know what my equation looks like before I ran my model. For instance, I did a nonlinear model as follows to calculate the ASE:

...

model Yt_1 = Y / (a + b * Y)

...

data EM_IMPORT_VALIDATE_est;

set &EM_IMPORT_VALIDATE. ;

_res2 = (Y1- (Y / (aa + bb *Y) ) )**2;

run;

proc means data=EM_IMPORT_VALIDATE_est noprint;

var _res2;

output out=&EM_EXPORT_VALIDATE(drop=_ n=validate_n sum=validate_sse;

run;

validate_ase=validate_sse/(validate_n-2);

In this way, I can calculate my ASE for validation portion. My problem here is:

1 if I don't know my predict equation ahead, how can I code to calculate ASE?

2.Without code calculating ASE, I can still calculate the overall ASE for the entire data set by using the obtained regression equation. But how can I figure out which portion of the data set was used to do training and which portion is used to do validation?

Thanks.

Solution

06-23-2015
04:21 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to viva0521

06-23-2015 04:21 PM

OK, it took some googling, but I got this working.

You have a couple options.

1) you pass the train, validation, and test sets using the macro variables so that a Model Comparison node can pick up the partition and calculate the stats.

2) take advantage of the specific proc syntax. I think this is what you were trying to do:

1. Add a data set. In my example I used German Credit from F1->Generate Sample Data Sources

2. Add a Partition node

3. Add a SAS code node with the code below. Change the bold for your own target (response) and inputs (effects).

data mydata;

set &EM_IMPORT_DATA(in=a) &EM_IMPORT_VALIDATE(in=b) &EM_IMPORT_TEST(in=c);

if a then _partition="_Train";

else if b then _partition="_Valid";

else if c then _partition="_Test";

run;

proc glmselect DATA=mydata;

effect MyPoly = polynomial(**duration checking savings**/degree=4);

model **amount** = MyPoly;

partition rolevar=_partition(TEST='_Test' TRAIN='_Train' VALIDATE='_Valid');

run;

4. Run.

The output results will give you the ASE of training/validation/testing.

This model isn't fabulous for this data set but hopefully this approach will give you good results on yours!

Good luck!

-m

Good reference: SAS/STAT(R) User's Guide, proc glm select - partition statement

PS If you try the other approach I described, you can easily use a Model Comparison node to compare with HPGLM or any of the model nodes in Enterprise Miner. Very recommended to give this a try!!!!!!!!!!