🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pyrite | Level 9

# Hi everyone,

I want to drop stocks in a month with less than five non-zero returns. For example stock 1 has 7 daily returns in March 2018 but only three non-zero returns. Therefore, I want to delete daily returns of stock 1 for March 2018. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.

Thanks a lot for your help.

Best,

Cheema

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
run;``````
1 ACCEPTED SOLUTION

Accepted Solutions
Opal | Level 21

## Re: Drop stock in a month with less than five non-zero returns

@MAC1430

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

if _n_=1 then
do;
length _cnt 3;
_cnt=1;
dcl hash h1(suminc: '_cnt', multidata:'n');
h1.defineKey('stock','date');
h1.defineDone();
do i=1 to nobs;
set have nobs=nobs point=i;
date=intnx('month',date,0,'b');
if ret=0 then h1.ref();
end;
end;

set have;
h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
if _zero_ret_cnt<&treshold_cnt then delete;
run;

``````
8 REPLIES 8
Opal | Level 21

## Re: Drop stock in a month with less than five non-zero returns

@MAC1430

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

if _n_=1 then
do;
length _cnt 3;
_cnt=1;
dcl hash h1(suminc: '_cnt', multidata:'n');
h1.defineKey('stock','date');
h1.defineDone();
do i=1 to nobs;
set have nobs=nobs point=i;
date=intnx('month',date,0,'b');
if ret=0 then h1.ref();
end;
end;

set have;
h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
if _zero_ret_cnt<&treshold_cnt then delete;
run;

``````
Pyrite | Level 9

## Re: Drop stock in a month with less than five non-zero returns

Thank you very much, it works very well.
Super User

## Re: Drop stock in a month with less than five non-zero returns

``````data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
month=month(date);
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;
proc sql;
create table want as
select *
from have
group by stock,month
having sum(ret ne 0)>=5 ;
quit;``````
Pyrite | Level 9

## Re: Drop stock in a month with less than five non-zero returns

Thanks you very much for ksharp :). Its even short code and more efficient.
Opal | Level 21

## Re: Drop stock in a month with less than five non-zero returns

@MAC1430

You've asked for "millions of observations, so I would appreciate an efficient code".

I've made the assumption your source data is a SAS table. If so then the data step hash approach will outperform the SQL. Should your source data be in a database then the SQL is the option of choice.

Pyrite | Level 9

## Re: Drop stock in a month with less than five non-zero returns

Dear Patrick,

Thank you for pointing out this one. Yes, my source data is a SAS table, so I I will use data setp hash approach them. Thanks again to both of you, its really helpful 🙂
Opal | Level 21

## Re: Drop stock in a month with less than five non-zero returns

@MAC1430

If the SQL performs still good enough then I'd go for the SQL as besides of runtimes it's also always about code maintenance - and the SQL is certainly easier to understand and maintain.

Pyrite | Level 9

## Re: Drop stock in a month with less than five non-zero returns

Yes, SQL is quite easier to understand and make changes. I will check the run-times of both SQL and hash, I guess hash will outperform since my data-set is 100 millions observations.
Discussion stats
• 8 replies
• 1335 views
• 4 likes
• 3 in conversation