Good days to all;
I wish to generate the predicted values and residual values by running an Ordinary Least Square (OLS). I used the following program:
Proc reg data=GEOG.USA;
Model TA=_ASSET CHANGE_SALES PPE_AT/p r;
Id FNA AEM;
output out=PRED_USA p = predicted_y r=residual_x;
Run;
Quit;
However, I would like to provide an additonal rules on the above regressions:
I have the other variables such as SIC (Industry Classification Code) and fyear (financial year). I wish to estimate the above predicted values and residual values based on the above cross-sectional model for each SIC code for each year (ie. SIC-fyear).
How should I modify the above program to incorporate the additonal criteria?
Thanks for any helps.
Regards,
mspak
It looks like you need a BY statement in proc reg.
by SIC fyear ;
Ksharp
It looks like you need a BY statement in proc reg.
by SIC fyear ;
Ksharp
Thanks Ksharp,
It took a longer time to process the the program than the previous one
Regards,
mspak
Hi Xia Keshan, (Or anyone else reading this!)
I have a very similar question with one caveat -- each cross-sectional regression needs a minimum of 20 observations beginning at the 2-digit SIC level. For 2-digit industry groups with less than 20 observations, I need to form industry groups at the 1-digit SIC level. Can this be accomplished directly in the proc reg statement or does it need to be separately identified?
My thoughts are to create a table as such:
proc sql;
create table mywork.comp_yi2 as
select
fyear,
sich
substrn(sich,1,2) as sich_two,
(count(*) GE 20) as has_twenty
from mywork.compf
group by fyear, substrn(SIC,1,2)
having (has_twenty=1);
quit;
And then running a pair of proc reg statements e.g.
For 2-digit groups having 20 observations:
proc reg noprint data=&in_ds outest=oreg_ds1 edf;
where substrn(SIC,1,2) in mywork.comp_yi2
model IV=DVs;
by fyear; substrn(SIC,1,2)
run;
For The 1 digit SIC level:
proc reg noprint data=&in_ds outest=oreg_ds2 edf;
where not substrn(SIC,1,2) in mywork.comp_yi2
model IV=DVs;
by fyear; substrn(SIC,1,1)
run;
And then mashing the two together:
proc datasets lib=work;
append base=oreg_ds1 data=oreg_ds2;
run;
Also -- will having the by clause mean the outtest data sets will aggregate all the cross sectional regressions, or do I need to write an iterative macro to accomplish this?
Best,
Prometheus
1)for beginning at the 2-digit SIC level .
proc reg noprint data=&in_ds outest=oreg_ds1 edf;
model IV=DVs;
by fyear SIC ;
format SIC $2,
run;
2)for beginning at the 1-digit SIC level .
proc reg noprint data=&in_ds outest=oreg_ds1 edf;
model IV=DVs;
by fyear SIC ;
format SIC $1,
run;
" For 2-digit industry groups with less than 20 observations, I need to form industry groups at the 1-digit SIC level."
I don't think it is easy to done . you need another data step to get it . E.X.
data have;
do i=1 to 9;
sic='abcdse';output;
end;
do i=1 to 8;
sic='axabd';output;
end;
do i=1 to 24;
sic='oifdfe';output;
end;
do i=1 to 32;
sic='ogier';output;
end;
run;
proc sql;
create table want(drop=n _sic) as
select *,case when n lt 20 then substr(sic,1,1) else _sic end as flag length=2
from (
select *,substr(sic,1,2) as _sic length=2,count(*) as n
from have
group by calculated _sic);
quit;
Xia Keshan
Hey! I had the same task as mspak. When I used the solution in the post, I was expecting to get an output that would just show residuals for each year within each industry. Instead, SAS calculated a residual for each observation. Is this correct?
For the ultimate variable I need to calculate, I need to calculate the standard deviation of my sample firms' residual compared to the corresponding industry residual by year based on all the firms in Compustat. Thus, I need an output from my Compustat data that a residual per each year per each industry. Based on the SAS output this post's solution returned, I have averaged the residuals per industry per year and will use this mean residual to calculate the standard deviation I need. Is this correct, or does anyone have a better solution? Thanks in advance!
proc reg data = Reg_Variables
noprint;
by industry fyear;
model chWC_TA=lagCFO_TA CFO_TA leadCFO_TA chREV_TA chPPE_TA/r;
output out=Residuals r=residual_x;
Run;
quit;
proc sql; create table Mean
as select Industry, fyear, mean(residual_x) as ind_residual
from Residuals
group by Industry, fyear;
quit;
proc sort noduplicates;
by Industry fyear;
where ind_residual ne 0;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.