## Predicted value and residual value for each SIC-Year

Solved
Regular Contributor
Posts: 162

# Predicted value and residual value for each SIC-Year

Good days to all;

I wish to generate the predicted values and residual values by running an Ordinary Least Square (OLS). I used the following program:

Proc reg data=GEOG.USA;

Model TA=_ASSET CHANGE_SALES PPE_AT/p r;

Id FNA AEM;

output out=PRED_USA p = predicted_y r=residual_x;

Run;

Quit;

However, I would like to provide an additonal rules on the above regressions:

I have the other variables such as SIC (Industry Classification Code) and fyear (financial year). I wish to estimate the above predicted values and residual values based on the above cross-sectional model for each SIC code for each year (ie. SIC-fyear).

How should I modify the above program to incorporate the additonal criteria?

Thanks for any helps.

Regards,

mspak

Accepted Solutions
Solution
‎03-19-2012 03:37 AM
Super User
Posts: 9,766

## Predicted value and residual value for each SIC-Year

It looks like you need a BY statement in proc reg.

by SIC   fyear ;

Ksharp

All Replies
Solution
‎03-19-2012 03:37 AM
Super User
Posts: 9,766

## Predicted value and residual value for each SIC-Year

It looks like you need a BY statement in proc reg.

by SIC   fyear ;

Ksharp

Regular Contributor
Posts: 162

## Predicted value and residual value for each SIC-Year

Thanks Ksharp,

It took a longer time to process the the program than the previous one

Regards,

mspak

New Contributor
Posts: 2

## Re: Predicted value and residual value for each SIC-Year

Hi Xia Keshan, (Or anyone else reading this!)

I have a very similar question with one caveat -- each cross-sectional regression needs a minimum of 20 observations beginning at the 2-digit SIC level. For 2-digit industry groups with less than 20 observations, I need to form industry groups at the 1-digit SIC level.  Can this be accomplished directly in the proc reg statement or does it need to be separately identified?

My thoughts are to create a table as such:

proc sql;

create table mywork.comp_yi2 as

select

fyear,

sich

substrn(sich,1,2) as sich_two,

(count(*) GE 20) as has_twenty

from mywork.compf

group by fyear, substrn(SIC,1,2)

having (has_twenty=1);

quit;

And then running a pair of proc reg statements e.g.

For 2-digit groups having 20 observations:

proc reg noprint data=&in_ds outest=oreg_ds1 edf;

where  substrn(SIC,1,2) in mywork.comp_yi2

model IV=DVs;

by fyear; substrn(SIC,1,2)

run;

For The 1 digit SIC level:

proc reg noprint data=&in_ds outest=oreg_ds2 edf;

where  not substrn(SIC,1,2) in mywork.comp_yi2

model IV=DVs;

by fyear; substrn(SIC,1,1)

run;

And then mashing the two together:

proc datasets lib=work;

append base=oreg_ds1 data=oreg_ds2;

run;

Also -- will having the by clause mean the outtest data sets will aggregate all the cross sectional regressions, or do I need to write an iterative macro to accomplish this?

Best,

Prometheus

Super User
Posts: 9,766

## Re: Predicted value and residual value for each SIC-Year

1)for beginning at the 2-digit SIC level .

proc reg noprint data=&in_ds outest=oreg_ds1 edf;

model IV=DVs;

by fyear SIC ;

format SIC \$2,

run;

2)for beginning at the 1-digit SIC level .

proc reg noprint data=&in_ds outest=oreg_ds1 edf;

model IV=DVs;

by fyear SIC ;

format SIC \$1,

run;

" For 2-digit industry groups with less than 20 observations, I need to form industry groups at the 1-digit SIC level."

I don't think it is easy to done . you need another data step to get it . E.X.

data have;

do i=1 to 9;

sic='abcdse';output;

end;

do i=1 to 8;

sic='axabd';output;

end;

do i=1 to 24;

sic='oifdfe';output;

end;

do i=1 to 32;

sic='ogier';output;

end;

run;

proc sql;

create table want(drop=n _sic) as

select *,case when n lt 20 then substr(sic,1,1) else _sic end as flag length=2

from (

select *,substr(sic,1,2) as _sic length=2,count(*) as n

from have

group by calculated _sic);

quit;

Xia Keshan

Frequent Learner
Posts: 1

## Re: Predicted value and residual value for each SIC-Year

Hey!  I had the same task as mspak.  When I used the solution in the post, I was expecting to get an output that would just show residuals for each year within each industry.  Instead, SAS calculated a residual for each observation.  Is this correct?

For the ultimate variable I need to calculate, I need to calculate the standard deviation of my sample firms' residual compared to the corresponding industry residual by year based on all the firms in Compustat.  Thus, I need an output from my Compustat data that a residual per each year per each industry.  Based on the SAS output this post's solution returned, I have averaged the residuals per industry per year and will use this mean residual to calculate the standard deviation I need.  Is this correct, or does anyone have a better solution?   Thanks in advance!

proc reg data = Reg_Variables

noprint;

by industry fyear;

model chWC_TA=lagCFO_TA CFO_TA leadCFO_TA chREV_TA chPPE_TA/r;

output out=Residuals r=residual_x;

Run;

quit

proc sql; create table Mean

as select Industry, fyear, mean(residual_x) as ind_residual

from Residuals

group by Industry, fyear;

quit;

proc sort noduplicates;

by Industry fyear;

where ind_residual ne 0

run;

☑ This topic is solved.