Solved
Contributor
Posts: 73

[ Edited ]

# Hi everyone,

I want to drop stocks in a month with less than five non-zero returns. For example stock 1 has 7 daily returns in March 2018 but only three non-zero returns. Therefore, I want to delete daily returns of stock 1 for March 2018. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.

Thanks a lot for your help.

Best,

Cheema

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
run;``````

Accepted Solutions
Solution
3 weeks ago
Posts: 4,695

## Re: Drop stock in a month with less than five non-zero returns

[ Edited ]

@cheema

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

if _n_=1 then
do;
length _cnt 3;
_cnt=1;
dcl hash h1(suminc: '_cnt', multidata:'n');
h1.defineKey('stock','date');
h1.defineDone();
do i=1 to nobs;
set have nobs=nobs point=i;
date=intnx('month',date,0,'b');
if ret=0 then h1.ref();
end;
end;

set have;
h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
if _zero_ret_cnt<&treshold_cnt then delete;
run;

``````

All Replies
Solution
3 weeks ago
Posts: 4,695

## Re: Drop stock in a month with less than five non-zero returns

[ Edited ]

@cheema

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

if _n_=1 then
do;
length _cnt 3;
_cnt=1;
dcl hash h1(suminc: '_cnt', multidata:'n');
h1.defineKey('stock','date');
h1.defineDone();
do i=1 to nobs;
set have nobs=nobs point=i;
date=intnx('month',date,0,'b');
if ret=0 then h1.ref();
end;
end;

set have;
h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
if _zero_ret_cnt<&treshold_cnt then delete;
run;

``````
Contributor
Posts: 73

## Re: Drop stock in a month with less than five non-zero returns

Thank you very much, it works very well.
Super User
Posts: 10,698

## Re: Drop stock in a month with less than five non-zero returns

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
month=month(date);
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;
proc sql;
create table want as
select *
from have
group by stock,month
having sum(ret ne 0)>=5 ;
quit;``````
Contributor
Posts: 73

## Re: Drop stock in a month with less than five non-zero returns

Thanks you very much for ksharp . Its even short code and more efficient.
Posts: 4,695

## Re: Drop stock in a month with less than five non-zero returns

@cheema

You've asked for "millions of observations, so I would appreciate an efficient code".

I've made the assumption your source data is a SAS table. If so then the data step hash approach will outperform the SQL. Should your source data be in a database then the SQL is the option of choice.

Contributor
Posts: 73

## Re: Drop stock in a month with less than five non-zero returns

Dear Patrick,

Thank you for pointing out this one. Yes, my source data is a SAS table, so I I will use data setp hash approach them. Thanks again to both of you, its really helpful
Posts: 4,695

## Re: Drop stock in a month with less than five non-zero returns

@cheema

If the SQL performs still good enough then I'd go for the SQL as besides of runtimes it's also always about code maintenance - and the SQL is certainly easier to understand and maintain.

Contributor
Posts: 73

## Re: Drop stock in a month with less than five non-zero returns

Yes, SQL is quite easier to understand and make changes. I will check the run-times of both SQL and hash, I guess hash will outperform since my data-set is 100 millions observations.
☑ This topic is solved.