Hi Everyone,
I want to delete non-trading days where more than 50% stocks have zero returns. For example 60% of the total stocks (3 out of 5) have zero returns for 2010-01-07. Therefore, I want to delete 2010-01-07 from the sample. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.
Thanks a lot for your help.
Best,
Cheema
data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1 2010-01-07 0
1 2010-01-26 -0.02544
1 2010-01-29 0.03384
2 2010-01-07 0
2 2010-01-26 -0.06219
2 2010-01-29 0.01989
3 2010-01-07 0.02
3 2010-01-26 -0.04
3 2010-01-29 0.023
4 2010-01-07 0
4 2010-01-26 -0.02
4 2010-01-29 0.012
5 2010-01-07 0.01
5 2010-01-26 0.02
5 2010-01-29 -0.012
run;
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
Test this for efficiency:
proc sql;
create table trading as
select *
from have
group by date
having
0.5 * count(distinct stock) >=
count(distinct case when ret = 0 then stock else . end)
order by stock, date;
quit;
having an index on date should help a lot.
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.