01-27-2017 11:33 AM
Hello, I am a SAS novice and learning SAS on my own. Here is what I'm trying to do:
1. Import my data from excel and splitt all data (500 values) into 300 values for "Training" and 200 values for "Testing" - done
2. Run a proc univariate and a proc reg for "Training" to obtain the ^y regression equiation - done
3. Run a proc reg for "Testing" to obtain the predicted ^y - done
4. Use the "Training" ^y regresion equation with his betas to calculate the "Testing" real ^y with the 200 values for Testing
5. Compare MSE from the predicted ^y and real ^y
Steps 1 to 3 I have it done, but 4 and 5 I don´t know how to do it, mostly the step 4.
This is my code at the moment:
dbms=xls out=Data replace;
proc surveyselect data=Data (firstobs=1 obs=300) n=300
out=DataTrain outall method=seq; run;
proc surveyselect data=Data (firstobs=301 obs=506) n=206
out=DataTest outall method=seq; run;
proc univariate data=DataTrain plot;
data Mod_DataTrain; set DataTrain;
proc reg data=Mod_DataTrain;
model LY = v1 v2 v3 / tol vif collin;
data Mod_DataTest; set DataTest;
proc reg data=Mod_DataTest;
model PredictedY = v1 v2 v3 / tol vif collin;
If someone could help me I will apreciate it.
01-27-2017 01:11 PM
It sounds like you are wanting to use the equation from one set of data on another. This is often referred to as scoring.
Using proc reg you want to create and OUTEST parameter data set using the TYPE=Parms option.
Proc Score will then use that data set with your data to obtain the ^y which you then summarize.
Or you can look at the parameters and write an equation in a data step to do the scoring and create the ^y.
01-27-2017 05:46 PM
Also, does proc surveyselect split the data randomly?
If you use right options you could have a selected/not selected flag and you could perhaps see that.
01-28-2017 03:01 PM
Thanks @ballardw for explaination. I mean the way how proc surveyselect has been used in splitting data does not generate training/validation data sets randomly.