turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Learn SAS
- /
- Analytics U
- /
- Splitting, Training and Test

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-27-2017 11:33 AM

Hello, I am a SAS novice and learning SAS on my own. Here is what I'm trying to do:

1. Import my data from excel and splitt all data (500 values) into 300 values for "Training" and 200 values for "Testing"** - done**

2. Run a proc univariate and a proc reg for "Training" to obtain the ^y regression equiation** - done**

3. Run a proc reg for "Testing" to obtain the **predicted ^y - done**

4. Use the "Training" ^y regresion equation with his betas to calculate the "Testing" **real ^y** with the 200 values for Testing

5. Compare MSE from the **predicted ^y **and **real ^y**

Steps 1 to 3 I have it done, but 4 and 5 I don´t know how to do it, mostly the step 4.

This is my code at the moment:

**STEP 1**

proc import

datafile='C:\Data.xls'

dbms=xls out=Data replace;

proc surveyselect data=Data (firstobs=1 obs=300) n=300

out=DataTrain outall method=seq; run;

proc surveyselect data=Data (firstobs=301 obs=506) n=206

out=DataTest outall method=seq; run;

**STEP 2**

**proc univariate data=DataTrain plot;**

run;

**data Mod_DataTrain; set DataTrain;**

LY=-log(y)

**proc reg data=Mod_DataTrain;**

model LY = v1 v2 v3 / tol vif collin;

plot r.*p.;

run;

**STEP 3**

**data Mod_DataTest; set DataTest;**

PredictedY=-log(y);

**proc reg data=Mod_DataTest;**

model PredictedY = v1 v2 v3 / tol vif collin;

plot r.*p.;

run;

If someone could help me I will apreciate it.

Regards,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Fer87

01-27-2017 01:11 PM

It sounds like you are wanting to use the equation from one set of data on another. This is often referred to as scoring.

Using proc reg you want to create and OUTEST parameter data set using the TYPE=Parms option.

Proc Score will then use that data set with your data to obtain the ^y which you then summarize.

Or you can look at the parameters and write an equation in a data step to do the scoring and create the ^y.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Fer87

01-27-2017 04:41 PM

Hi,

Also, does proc surveyselect split the data randomly?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to stat_sas

01-27-2017 05:46 PM

stat_sas wrote:

Hi,

Also, does proc surveyselect split the data randomly?

Yes.

If you use right options you could have a selected/not selected flag and you could perhaps see that.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

01-28-2017 03:01 PM

Thanks @ballardw for explaination. I mean the way how proc surveyselect has been used in splitting data does not generate training/validation data sets randomly.