Hi,
When I want to compute industry mean of other firms after excluding the firm-year, I find the following code (variable name: delta, the current dataset: c15, new dataset: ind1) from this community.
However, it is not run well. Further, I don't understand how this code computes the industry mean from the industry sum.
proc sql;
create table ind1 as
select *,(((select sum(delta) from c15 where cyear=a.cyear and sic=a.sic )-a.delta)/((select count(*) from c15 where cyear=a.cyear and sic=a.sic )-1)) as mean_delta
from c15 as a;
quit;
Thank you very much in advance for your advice
Using the code @Reeza posted with your sample data and things appear to work. Does that answer your question?
data c15;
input firmid cyear delta sic;
cards;
10884 1993 0.0168 100
11993 1993 0.01202 1000
4205 1993 0.03925 1040
5560 1993 0.01296 1040
5686 1993 0.05542 1040
7881 1993 0.02544 1040
11985 1993 0.01574 1040
3153 1993 0.01304 1044
10174 1993 0.00609 1044
1678 1993 0.03234 1311
6819 1993 0.05358 1311
7017 1993 0.02803 1311
7276 1993 0.00745 1311
7620 1993 0.03125 1311
7912 1993 0.02168 1311
8068 1993 0.02743 1311
8468 1993 0.02525 1311
8549 1993 0.05271 1311
8627 1993 0.03238 1311
8655 1993 0.02876 1311
9406 1993 0.03188 1311
11038 1993 0.03298 1311
11558 1993 0.0175 1311
11923 1993 0.08634 1311
14878 1993 0.03137 1311
14934 1993 0.14321 1311
15291 1993 0.08339 1311
23047 1993 0.44653 1311
;
run;
/* option 1 */
proc sql;
create table want1 as
select *,
( sum(delta) - a.delta ) / ( n(delta) - 1 ) as mean_minusFirmYear
from c15 as a
group by cyear, sic
order by firmid, cyear, delta, sic
;
quit;
/* option 2 */
proc sql;
create table want2 as
select *,
(((select sum(delta) from c15 where cyear=a.cyear and sic=a.sic )-a.delta)
/
((select count(*) from c15 where cyear=a.cyear and sic=a.sic )-1))
as mean_minusFirmYear
from c15 as a
order by firmid, cyear, delta, sic
;
quit;
/* test if want1 and want2 contain the same data */
proc compare data=want1 compare=want2;
run;
You calculate the average by SUM(X's) / N(X's).
To calculate the Average, without Xi then you can do:
Sum(X's) - Xi / N - 1
select *,
(((select sum(delta) from c15 where cyear=a.cyear and sic=a.sic )-a.delta) <- Sum(X's) - Xi
/
((select count(*) from c15 where cyear=a.cyear and sic=a.sic )-1)) <- N - 1
as mean_delta
from c15 as a;
This could probably be simplified to, but I didn't test it.
select *,
( sum(delta) - a.delta ) / ( n(delta) - 1 ) as mean_minusFirmYear
from mean_delta
group by year, sic;
quit;
If this is beyond your current programming skills it may be worth breaking it out into several data steps or SQL steps if that would help.
If you want us to post tested code then please always provide representative sample data in the form of a tested SAS data step generating such data.
Hi,
The following is the sample data.
data have;
input firmid cyear delta sic;
cards;
10884 | 1993 | 0.0168 | 100 |
11993 | 1993 | 0.01202 | 1000 |
4205 | 1993 | 0.03925 | 1040 |
5560 | 1993 | 0.01296 | 1040 |
5686 | 1993 | 0.05542 | 1040 |
7881 | 1993 | 0.02544 | 1040 |
11985 | 1993 | 0.01574 | 1040 |
3153 | 1993 | 0.01304 | 1044 |
10174 | 1993 | 0.00609 | 1044 |
1678 | 1993 | 0.03234 | 1311 |
6819 | 1993 | 0.05358 | 1311 |
7017 | 1993 | 0.02803 | 1311 |
7276 | 1993 | 0.00745 | 1311 |
7620 | 1993 | 0.03125 | 1311 |
7912 | 1993 | 0.02168 | 1311 |
8068 | 1993 | 0.02743 | 1311 |
8468 | 1993 | 0.02525 | 1311 |
8549 | 1993 | 0.05271 | 1311 |
8627 | 1993 | 0.03238 | 1311 |
8655 | 1993 | 0.02876 | 1311 |
9406 | 1993 | 0.03188 | 1311 |
11038 | 1993 | 0.03298 | 1311 |
11558 | 1993 | 0.0175 | 1311 |
11923 | 1993 | 0.08634 | 1311 |
14878 | 1993 | 0.03137 | 1311 |
14934 | 1993 | 0.14321 | 1311 |
15291 | 1993 | 0.08339 | 1311 |
23047 | 1993 | 0.44653 | 1311 |
I tried your suggested code, but it is hard to complete the programming. Also breaking down multiple data steps is not easy.
proc sql;
create table want as
select *,( sum(delta) - a.delta ) / ( n(delta) - 1 ) as indmean,
group by cyear sic,
from have as a;
quit;
Thank you very much in advance for your help
Using the code @Reeza posted with your sample data and things appear to work. Does that answer your question?
data c15;
input firmid cyear delta sic;
cards;
10884 1993 0.0168 100
11993 1993 0.01202 1000
4205 1993 0.03925 1040
5560 1993 0.01296 1040
5686 1993 0.05542 1040
7881 1993 0.02544 1040
11985 1993 0.01574 1040
3153 1993 0.01304 1044
10174 1993 0.00609 1044
1678 1993 0.03234 1311
6819 1993 0.05358 1311
7017 1993 0.02803 1311
7276 1993 0.00745 1311
7620 1993 0.03125 1311
7912 1993 0.02168 1311
8068 1993 0.02743 1311
8468 1993 0.02525 1311
8549 1993 0.05271 1311
8627 1993 0.03238 1311
8655 1993 0.02876 1311
9406 1993 0.03188 1311
11038 1993 0.03298 1311
11558 1993 0.0175 1311
11923 1993 0.08634 1311
14878 1993 0.03137 1311
14934 1993 0.14321 1311
15291 1993 0.08339 1311
23047 1993 0.44653 1311
;
run;
/* option 1 */
proc sql;
create table want1 as
select *,
( sum(delta) - a.delta ) / ( n(delta) - 1 ) as mean_minusFirmYear
from c15 as a
group by cyear, sic
order by firmid, cyear, delta, sic
;
quit;
/* option 2 */
proc sql;
create table want2 as
select *,
(((select sum(delta) from c15 where cyear=a.cyear and sic=a.sic )-a.delta)
/
((select count(*) from c15 where cyear=a.cyear and sic=a.sic )-1))
as mean_minusFirmYear
from c15 as a
order by firmid, cyear, delta, sic
;
quit;
/* test if want1 and want2 contain the same data */
proc compare data=want1 compare=want2;
run;
I’m not sure what you mean by complete the program. That is all you need for the calculation.
If you’re not sure of the steps, do it manually first and keep track of how you’re calculating it.
1. Calculate SUM and N for each SIC, YEAR => Proc Means
2. Merge it in with main data, by SIC and YEAR => PROC SQL, Data step
3. Do calculation identified in first post, subtract observation value and divide by N- 1 => Proc sql or data step, can be done in same step as #2
SQL does it in one step but the basic approach is quite straightforward.
@joon1 wrote:
Hi,
The following is the sample data.
data have;
input firmid cyear delta sic;cards;
10884 1993 0.0168 100 11993 1993 0.01202 1000 4205 1993 0.03925 1040 5560 1993 0.01296 1040 5686 1993 0.05542 1040 7881 1993 0.02544 1040 11985 1993 0.01574 1040 3153 1993 0.01304 1044 10174 1993 0.00609 1044 1678 1993 0.03234 1311 6819 1993 0.05358 1311 7017 1993 0.02803 1311 7276 1993 0.00745 1311 7620 1993 0.03125 1311 7912 1993 0.02168 1311 8068 1993 0.02743 1311 8468 1993 0.02525 1311 8549 1993 0.05271 1311 8627 1993 0.03238 1311 8655 1993 0.02876 1311 9406 1993 0.03188 1311 11038 1993 0.03298 1311 11558 1993 0.0175 1311 11923 1993 0.08634 1311 14878 1993 0.03137 1311 14934 1993 0.14321 1311 15291 1993 0.08339 1311 23047 1993 0.44653 1311
I tried your suggested code, but it is hard to complete the programming. Also breaking down multiple data steps is not easy.
proc sql;
create table want as
select *,( sum(delta) - a.delta ) / ( n(delta) - 1 ) as indmean,
group by cyear sic,
from have as a;
quit;
Thank you very much in advance for your help
Thank you so much
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.