DATA Step, Macro, Functions and more

Drop stock in a month with less than five non-zero returns

Accepted Solution Solved
Reply
Contributor
Posts: 73
Accepted Solution

Drop stock in a month with less than five non-zero returns

[ Edited ]
 

 Hi everyone,

 

I want to drop stocks in a month with less than five non-zero returns. For example stock 1 has 7 daily returns in March 2018 but only three non-zero returns. Therefore, I want to delete daily returns of stock 1 for March 2018. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.

 

Thanks a lot for your help.

 

Best,

 

Cheema

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
run;

Accepted Solutions
Solution
3 weeks ago
Respected Advisor
Posts: 4,695

Re: Drop stock in a month with less than five non-zero returns

[ Edited ]

@cheema

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

  if _n_=1 then
    do;
      length _cnt 3;
      _cnt=1;
      dcl hash h1(suminc: '_cnt', multidata:'n');
      h1.defineKey('stock','date');
      h1.defineDone();
      do i=1 to nobs;
        set have nobs=nobs point=i;
        date=intnx('month',date,0,'b');
        if ret=0 then h1.ref();
      end;
    end;

    set have;
    h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
    if _zero_ret_cnt<&treshold_cnt then delete;
run;

View solution in original post


All Replies
Solution
3 weeks ago
Respected Advisor
Posts: 4,695

Re: Drop stock in a month with less than five non-zero returns

[ Edited ]

@cheema

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

  if _n_=1 then
    do;
      length _cnt 3;
      _cnt=1;
      dcl hash h1(suminc: '_cnt', multidata:'n');
      h1.defineKey('stock','date');
      h1.defineDone();
      do i=1 to nobs;
        set have nobs=nobs point=i;
        date=intnx('month',date,0,'b');
        if ret=0 then h1.ref();
      end;
    end;

    set have;
    h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
    if _zero_ret_cnt<&treshold_cnt then delete;
run;

Contributor
Posts: 73

Re: Drop stock in a month with less than five non-zero returns

Thank you very much, it works very well.
Super User
Posts: 10,698

Re: Drop stock in a month with less than five non-zero returns

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
month=month(date);
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;
proc sql;
create table want as
 select *
  from have
   group by stock,month
    having sum(ret ne 0)>=5 ;
quit;
Contributor
Posts: 73

Re: Drop stock in a month with less than five non-zero returns

Thanks you very much for ksharp Smiley Happy. Its even short code and more efficient.
Respected Advisor
Posts: 4,695

Re: Drop stock in a month with less than five non-zero returns

@cheema

You've asked for "millions of observations, so I would appreciate an efficient code".

 

I've made the assumption your source data is a SAS table. If so then the data step hash approach will outperform the SQL. Should your source data be in a database then the SQL is the option of choice.

Contributor
Posts: 73

Re: Drop stock in a month with less than five non-zero returns

Dear Patrick,

Thank you for pointing out this one. Yes, my source data is a SAS table, so I I will use data setp hash approach them. Thanks again to both of you, its really helpful Smiley Happy
Respected Advisor
Posts: 4,695

Re: Drop stock in a month with less than five non-zero returns

@cheema

If the SQL performs still good enough then I'd go for the SQL as besides of runtimes it's also always about code maintenance - and the SQL is certainly easier to understand and maintain.

Contributor
Posts: 73

Re: Drop stock in a month with less than five non-zero returns

Yes, SQL is quite easier to understand and make changes. I will check the run-times of both SQL and hash, I guess hash will outperform since my data-set is 100 millions observations.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 132 views
  • 4 likes
  • 3 in conversation