Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Out of sample predictions with PROC GLM

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-18-2014 10:04 AM

Hi there!

I'm fairly new SAS and I'm trying to run some regressions using proc glm in Enterprise Guide.

I want to run a basic OLS linear regression. The reason I'm using proc glm instead proc reg is so that I can use class variables. I read that proc reg does not support this.

Say I have a sample with 2000 observations, and I want to estimate a series of coeffecients for all the independant variables. So far I'm all good with the following lines of code:

------------------

proc glm data=WORK.INPUT PLOTS=ALL; | |

where group=1 AND NB=0; | |

class C D; | |

model X= A B C D/ | |

solution; | |

output out=WORK.TEST p=yhat r=resid; | |

run;

-------------

Now I have another dataset with an additional 20 000 observations. They all include the independant variables A - D, but lack the dependant X.

How do I predict X in this dataset, using the coefficients from the above stated regression?

Accepted Solutions

Solution

02-19-2014
10:08 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2014 10:08 AM

PGs example is known as the "missing response trick": The missing value trick for scoring a regression model - The DO Loop

For other ways to score a data set, see Techniques for scoring a regression model in SAS - The DO Loop

For your example, I'd use the STORE statement followed by the PLM procedure.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-18-2014 11:03 AM

One way is to append your additional observations to your input dataset and give them a frequency of zero (that way, even if they included dependant values, additional observations would be excluded from the regression)

**data FULL / view=FULL;**

**set INPUT (in=inInput) ADDITIONAL;**

**where group=1 AND NB=0;**

**freq = inInput;**

**run;**

**proc glm data=FULL PLOTS=ALL;**

**class C D;**

**freq freq;**

**model X= A B C D/ solution;**

**output out=TEST(where=(not freq)) p=yhat r=resid;**

**run;**

(Untested)

PG

PG

Solution

02-19-2014
10:08 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2014 10:08 AM

PGs example is known as the "missing response trick": The missing value trick for scoring a regression model - The DO Loop

For other ways to score a data set, see Techniques for scoring a regression model in SAS - The DO Loop

For your example, I'd use the STORE statement followed by the PLM procedure.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-07-2014 06:55 AM

Thank you both!

The STORE and PLM procedure is exactly what I was looking for. I found your blog post very useful Rick - thanks again!