I'm a bit unsure if the title describes the problem correctly, but I have the following data set. data have;
input year firm size indep depen;
cards;
1 1 4907.98 514.14 515.08
1 2 5045.94 509.85 509.03
1 3 4819.58 505.49 506.34
1 4 4967.02 482.39 482.02
1 5 5094.18 506.08 505.62
1 6 5040.42 501.66 499.79
1 7 4987.62 491.48 489.79
2 1 5086.52 512.95 510.39
2 2 4895.88 503.15 503.28
2 3 5002.13 494.76 495.36
3 1 5078.44 506.06 508.51
3 2 4980.23 520.75 521.21
3 3 4948.08 484.90 485.79
3 4 5050.82 497.25 497.74
3 5 4964.70 498.54 497.97
4 1 4952.31 517.95 517.92
4 2 5027.91 501.90 502.15
4 3 4979.13 512.67 512.07
4 4 5005.56 499.99 501.16
4 5 4905.29 517.62 517.56
4 6 4848.28 479.76 481.32
5 1 1969.43 501.68 502.21
5 2 1931.39 497.89 497.67
5 3 9935.65 505.15 504.44
5 4 9995.49 510.52 511.26
5 5 9008.28 525.02 524.49
6 1 9981.95 518.30 517.78
6 2 15.49 506.13 506.81
7 1 5095.48 517.15 516.85
;
run; Where YEAR, FIRM, SIZE, INDEP, and DEPEN are the time index, the individual index, the market capitalization, the independent variable, and the dependent variable, correspondingly. Instead of using all the observations to do the regression, I need to pick only the first top 90% in terms of SIZE each year, so I'm currently making the subset as follows. proc sort;
by year descending size;
run;
data have;
set have;
by year;
if first.year then accumulate=.;
accumulate+size;
run;
proc sql;
create table have as
select *,
0.9*max(accumulate) as hurdle,
accumulate<=calculated hurdle as pick
from have
group by year
order by year,size desc;
quit;
proc reg;
where pick=1;
model depen=indep;
run; In other words, I (1) downward sort by SIZE each year, (2) ACCUMULATE by SIZE each year, (3) set the 90% of the sum as HURDLE each year, (4) PICK if ACCUMULATE up to the observation each year doesn't exceed HURDLE, and (5) regress DEPEN on INDEP using the observations where PICK=1. Here's the outcome. 1. Is there any other approach simpler and more experienced than this? I can understand these processes, but still wonder whether SAS provides some matching PROCs for the subsets. 2. I want to also include the observation that "touches" the HURDLE as (a) this method picks no observation in Year 6 since the first observation already exceeds the HURDLE, and (b) picks no observation in Year 7 since there's only one observation. According to the result, I want to pick (i) the last observations in Years 1, 2, 3, and 4 as they touch the HURDLEs, (ii) the second last observation in Year 5 due to the same reason, (iii) the first observation in Year 6 that alone exceeds the HURDLE, and (iv) the only one observation in Year 7. I much appreciate any comment from your experience.
... View more