Hi Everyone,
I want to delete non-trading days where more than 50% stocks have zero returns. For example 60% of the total stocks (3 out of 5) have zero returns for 2010-01-07. Therefore, I want to delete 2010-01-07 from the sample. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.
Thanks a lot for your help.
Best,
Cheema
data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1 2010-01-07 0
1 2010-01-26 -0.02544
1 2010-01-29 0.03384
2 2010-01-07 0
2 2010-01-26 -0.06219
2 2010-01-29 0.01989
3 2010-01-07 0.02
3 2010-01-26 -0.04
3 2010-01-29 0.023
4 2010-01-07 0
4 2010-01-26 -0.02
4 2010-01-29 0.012
5 2010-01-07 0.01
5 2010-01-26 0.02
5 2010-01-29 -0.012
run;
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
Test this for efficiency:
proc sql;
create table trading as
select *
from have
group by date
having
0.5 * count(distinct stock) >=
count(distinct case when ret = 0 then stock else . end)
order by stock, date;
quit;
having an index on date should help a lot.
If a stock can only be mentioned once on a given day, then the query can be simpler:
proc sql;
create table trading2 as
select *
from have
group by date
having
0.5 * count(*) >= sum(ret=0)
order by stock, date;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.