Hi Everyone,
I want to delete non-trading days where more than 50% stocks have zero returns. For example 60% of the total stocks (3 out of 5) have zero returns for 2010-01-07. Therefore, I want to delete 2010-01-07 from the sample. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.
Thanks a lot for your help.
Best,
Cheema
data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1 2010-01-07 0
1 2010-01-26 -0.02544
1 2010-01-29 0.03384
2 2010-01-07 0
2 2010-01-26 -0.06219
2 2010-01-29 0.01989
3 2010-01-07 0.02
3 2010-01-26 -0.04
3 2010-01-29 0.023
4 2010-01-07 0
4 2010-01-26 -0.02
4 2010-01-29 0.012
5 2010-01-07 0.01
5 2010-01-26 0.02
5 2010-01-29 -0.012
run;
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
Test this for efficiency:
proc sql;
create table trading as
select *
from have
group by date
having
0.5 * count(distinct stock) >=
count(distinct case when ret = 0 then stock else . end)
order by stock, date;
quit;
having an index on date should help a lot.
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.