Fluorite | Level 6

## SAS splitting

I write this code below to solve this problem but the code doesn't work. Is there anyone can help with that

Consider the gasoline mileage data. Split the data into estimation and prediction sets.
a. Evaluate the statistical properties of these data sets.
b. Fit a model involving x 1 and x 6 to the estimation data. Do the coefficients and fitted values from this model seem reasonable?
c. Use this model to predict the observations in the prediction data set. What is your evaluation of this model’ s predictive performance

Title1 'problem 2’;
data math;
input x1 x2 x3 x4 x5 x6 x7 x8 x9 y;
cards;

350 170 275 8.5 2.56 199.6 72.9 3860 1 17
250 105 185 8.25 2.73 196.7 72.2 3510 1 20
351 143 255 8 3 199.9 74 3890 1 18.25
231 110 175 8 2.56 179.3 65.4 3020 1 22.12
262 110 200 8.5 2.56 179.3 65.4 3180 1 21.47
89.7 70 81 8.2 3.9 155.7 64 1905 0 34.7
96.9 75 83 9 4.3 165.2 65 2320 0 30.4
350 155 250 8.5 3.08 195.4 74.4 3885 1 16.5
85.3 80 83 8.5 3.89 160.6 62.2 2009 0 36.5
171 109 146 8.2 3.22 170.4 66.9 2655 0 21.5
258 110 195 8 3.08 171.5 77 3375 1 19.7
140 83 109 8.4 3.4 168.8 69.4 2700 0 20.3
302 129 220 8 3 199.9 74 3890 1 17.8
500 190 360 8.5 2.73 224.1 79.8 5290 1 14.39
440 215 330 8.2 2.71 231 79.7 5185 1 14.89
350 155 250 8.5 3.08 196.7 72.2 3910 1 17.8
318 145 255 8.5 2.45 197.6 71 3660 1 16.41
231 110 175 8 2.56 179.3 65.4 3050 1 23.54
360 180 290 8.4 2.45 214.2 76.3 4250 1 21.47
96.9 75 83 9 4.3 165.2 61.8 2275 0 31.9
460 223 366 8 3 228 79.8 5430 1 13.27
133.6 96 120 8.4 3.91 171.5 63.4 2535 0 23.9
318 140 255 8.5 2.71 215.3 76.3 4370 1 19.73
351 148 243 8 3.25 215.5 78.5 4540 1 13.9
351 148 243 8 3.26 216.1 78.5 4715 1 13.27
360 195 295 8.25 3.15 209.3 77.4 4215 1 13.77
360 165 255 8.5 2.73 185.2 69 3660 1 16.5
;
run;

proc surveyselect data=math outall out=split
samprate=0.7 seed=90284098 method=SRS;
run;

proc freq data=split;
table selected;
run;

data estimation prediction;
set split;
if selected=1 then output estimation;
else output prediction;
drop selected;
run;

proc reg data=estimation;
model y=x1x6;
run;

2 REPLIES 2
Diamond | Level 26

## Re: SAS splitting

Since this seems like a homework assignment, I will outline the method, but I'm not going to write the code for you.

1. Every observation goes in one data set, with a flag like you have called SELECTED
2. Create a new Y variable called Y2 that is identical to Y, except when SELECTED=0 then Y2 is missing
3. Fit the model to Y2 and obtain predicted values
4. Compare the predicted values to Y (not Y2) and compute residuals and other summary statistics for training and validation.
--
Paige Miller
Super User

## Re: SAS splitting

Doesn't work is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.