Dear All,
The dataset I am using has daily data for about 500 firms each year over a period of 15 years. I want to remove firms with a market capitalization (MKTCAP) of less than $5 million. To do this, I computed MKTCAP for each firm at the end of each year. Firms with MKTCAP < 5 are removed each year.
PART 1 of the code attached below, subsets the daily data to obtain only year-end observations.
data RET1;
set RET;
by year gvkey date;
if last.gvkey;
run;
GVKEY refers to the firm's code and RET is the dataset with daily data. YEAR is the year variable created in SAS.
I then used PROC SQL to retain firms with MKTCAP>5 as follows:
proc sql;
create table WANT as
select * from RET1
where mktcap>5
group by year
order by gvkey, year;
quit;
While this did the trick of keeping only those firms with MKTCAP>5, I am however left with a dataset by GVKEY and YEAR.
I am not sure how to merge WANT (with yearly obs) with the original dataset RET (with daily obs) to obtain the daily dataset with only MKTCAP>5
Would appreciate your help and advice please. Thanks.
So when we look on the example data for GVKEY = 272762, you want to delete daily data for 2005(09DEC05 mktcap=1.62) and 2006(29DEC06 mktcap=1.8) but keep daily data for 2007(18DEC07 mktcap=28.45) and 2008(05DEC08 mktcap=11.38), right?
In that case the sql query should be:
proc sql;
create table WANT as
select * from RET1
where mktcap>5
group by year, GVKEY
order by gvkey, year;
quit;
Btw. check code for `WANT_DAILY_v2` it should already do what you want.
Bart
Could you share some test data?
Dear Yabwon.
Attached is the test data RET for three firms, one of which, has the first few years where MKTCAP<5. The other two firms are large MKTCAP firms where MKTCAP>5 for all years. Thanks.
Hi,
Do I understand correct that you want to get something like:
proc sql;
create table WANT_DAILY as
select RET.* from RET
join
WANT
on RET.gvkey= WANT.gvkey
and RET.year = WANT.year
order by year gvkey date;
quit;
or am I missing something?
If I'm right, alternative approach could be:
data WANT_DAILY_v2;
do _N_=1 by 1 until(last.gvkey);
set RET;
by year gvkey date;
if last.gvkey then
do;
trigger = (mktcap>5);
end;
end;
do _N_=1 to _N_;
set RET;
if trigger then output;
end;
run;
Bart
So you want to remove given company if it has at leas one case with mktcap<5 ?
Bart
Sorry, my message is probably not clear. Let me clarify.
I have around daily data on about 500 firms over a period of 15 years. What I need to is to form a final dataset where the firms included each year have a market cap greater than $5m.
Let's say that there are 500 firms in 2015 and applying a simple rule, I find that 50 of them have a mktcap that is less than $5m. This means that, for 2015, the dataset would have daily data on only the 400 firms with a mktcap greater than $5m. The simple selection rule is to base the mktcap of the firm on its year-end value (last value of the year) and only select those with mktcap>$5m.
By that token, a firm excluded in the earlier years when its mktcap < $5m, may be included in later years when its mktcap exceeds $5m.
So, the final dataset should only contain daily data on all firms with mktcap greater than $5m.
I hope this is clearer. Thanks.
So when we look on the example data for GVKEY = 272762, you want to delete daily data for 2005(09DEC05 mktcap=1.62) and 2006(29DEC06 mktcap=1.8) but keep daily data for 2007(18DEC07 mktcap=28.45) and 2008(05DEC08 mktcap=11.38), right?
In that case the sql query should be:
proc sql;
create table WANT as
select * from RET1
where mktcap>5
group by year, GVKEY
order by gvkey, year;
quit;
Btw. check code for `WANT_DAILY_v2` it should already do what you want.
Bart
You have DAILY data, but want to exclude a firm for the entire year if the end-of-year market cap is <5, yes? If you have daily data, then you have about 200 records (i.e.for trading dates) for each firm-year, with corresponding daily MKTCAP.
Your daily data is sorted by YEAR GVKEY DATE. So you need to get to the last record of each year for each gvkey, check it's market cap value, then re-read the same daily data, either outputting or not the roughly 200 daily records for that firm/year.
Something like this should work.
data want (drop=keep_dummy);
do until (last.gvkey=1);
set ret ;
by year gvkey date;
end;
keep_dummy=(mktcap>5);
do until (last.gvkey);
set ret;
by year gvkey date;
if keep_dummy=1 then output;
end;
run;
Note that if a gvkey only has, say, the first 9 months of data for a given year, then the mktcap criterion will depend on the end-of-September status, not end-of-December.
I probably misunderstood something, but isn't this what you want?
data RET1;
set RET;
by year gvkey date;
if last.gvkey and MKTCAP > 5;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.