Hi Everyone,
I want to delete non-trading days where more than 50% stocks have zero returns. For example 60% of the total stocks (3 out of 5) have zero returns for 2010-01-07. Therefore, I want to delete 2010-01-07 from the sample. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.
Thanks a lot for your help.
Best,
Cheema
data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1 2010-01-07 0
1 2010-01-26 -0.02544
1 2010-01-29 0.03384
2 2010-01-07 0
2 2010-01-26 -0.06219
2 2010-01-29 0.01989
3 2010-01-07 0.02
3 2010-01-26 -0.04
3 2010-01-29 0.023
4 2010-01-07 0
4 2010-01-26 -0.02
4 2010-01-29 0.012
5 2010-01-07 0.01
5 2010-01-26 0.02
5 2010-01-29 -0.012
run;
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
Test this for efficiency:
proc sql;
create table trading as
select *
from have
group by date
having
0.5 * count(distinct stock) >=
count(distinct case when ret = 0 then stock else . end)
order by stock, date;
quit;
having an index on date should help a lot.
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.