BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Good days to all;

I wish to generate the predicted values and residual values by running an Ordinary Least Square (OLS). I used the following program:

Proc reg data=GEOG.USA;

Model TA=_ASSET CHANGE_SALES PPE_AT/p r;

Id FNA AEM;

output out=PRED_USA p = predicted_y r=residual_x;

Run;

Quit;

However, I would like to provide an additonal rules on the above regressions:

I have the other variables such as SIC (Industry Classification Code) and fyear (financial year). I wish to estimate the above predicted values and residual values based on the above cross-sectional model for each SIC code for each year (ie. SIC-fyear).

How should I modify the above program to incorporate the additonal criteria?

Thanks for any helps.

Regards,

mspak

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It looks like you need a BY statement in proc reg.

by SIC   fyear ;

Ksharp

View solution in original post

5 REPLIES 5
Ksharp
Super User

It looks like you need a BY statement in proc reg.

by SIC   fyear ;

Ksharp

mspak
Quartz | Level 8

Thanks Ksharp,

It took a longer time to process the the program than the previous one Smiley Happy

Regards,

mspak

Prometheus
Calcite | Level 5

Hi Xia Keshan, (Or anyone else reading this!)

I have a very similar question with one caveat -- each cross-sectional regression needs a minimum of 20 observations beginning at the 2-digit SIC level. For 2-digit industry groups with less than 20 observations, I need to form industry groups at the 1-digit SIC level.  Can this be accomplished directly in the proc reg statement or does it need to be separately identified?

My thoughts are to create a table as such:

proc sql;

create table mywork.comp_yi2 as

   select

    fyear,

  sich

    substrn(sich,1,2) as sich_two,

  (count(*) GE 20) as has_twenty 

   from mywork.compf

   group by fyear, substrn(SIC,1,2)

   having (has_twenty=1);

quit;

And then running a pair of proc reg statements e.g.

For 2-digit groups having 20 observations:

proc reg noprint data=&in_ds outest=oreg_ds1 edf;

  where  substrn(SIC,1,2) in mywork.comp_yi2

  model IV=DVs;

  by fyear; substrn(SIC,1,2)

run;

For The 1 digit SIC level:

proc reg noprint data=&in_ds outest=oreg_ds2 edf;

  where  not substrn(SIC,1,2) in mywork.comp_yi2

  model IV=DVs;

  by fyear; substrn(SIC,1,1)

run;

And then mashing the two together:

proc datasets lib=work;

  append base=oreg_ds1 data=oreg_ds2;

run;

Also -- will having the by clause mean the outtest data sets will aggregate all the cross sectional regressions, or do I need to write an iterative macro to accomplish this?

Best,

Prometheus

Ksharp
Super User

1)for beginning at the 2-digit SIC level .


proc reg noprint data=&in_ds outest=oreg_ds1 edf;

  model IV=DVs;

  by fyear SIC ;

format SIC $2,

run;

2)for beginning at the 1-digit SIC level .

proc reg noprint data=&in_ds outest=oreg_ds1 edf;

  model IV=DVs;

  by fyear SIC ;

format SIC $1,

run;

" For 2-digit industry groups with less than 20 observations, I need to form industry groups at the 1-digit SIC level."

I don't think it is easy to done . you need another data step to get it . E.X.

data have;

do i=1 to 9;

  sic='abcdse';output;

end;

do i=1 to 8;

  sic='axabd';output;

end;

do i=1 to 24;

  sic='oifdfe';output;

end;

do i=1 to 32;

  sic='ogier';output;

end;

run;

proc sql;

create table want(drop=n _sic) as

select *,case when n lt 20 then substr(sic,1,1) else _sic end as flag length=2

  from (

  select *,substr(sic,1,2) as _sic length=2,count(*) as n

   from have

    group by calculated _sic);

quit;

Xia Keshan

HannahEH
Calcite | Level 5

Hey!  I had the same task as mspak.  When I used the solution in the post, I was expecting to get an output that would just show residuals for each year within each industry.  Instead, SAS calculated a residual for each observation.  Is this correct?  

For the ultimate variable I need to calculate, I need to calculate the standard deviation of my sample firms' residual compared to the corresponding industry residual by year based on all the firms in Compustat.  Thus, I need an output from my Compustat data that a residual per each year per each industry.  Based on the SAS output this post's solution returned, I have averaged the residuals per industry per year and will use this mean residual to calculate the standard deviation I need.  Is this correct, or does anyone have a better solution?   Thanks in advance! 

 

proc reg data = Reg_Variables 

noprint;

by industry fyear;

model chWC_TA=lagCFO_TA CFO_TA leadCFO_TA chREV_TA chPPE_TA/r;

output out=Residuals r=residual_x;

Run;

quit

 

proc sql; create table Mean

  as select Industry, fyear, mean(residual_x) as ind_residual

  from Residuals

  group by Industry, fyear;

quit;

 

proc sort noduplicates;

by Industry fyear;

where ind_residual ne 0

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2520 views
  • 0 likes
  • 4 in conversation