Hi everyone,

MAC1430 · Posted 05-31-2018 01:31 AM

Hi everyone,

I want to drop stocks in a month with less than five non-zero returns. For example stock 1 has 7 daily returns in March 2018 but only three non-zero returns. Therefore, I want to delete daily returns of stock 1 for March 2018. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.

Thanks a lot for your help.

Best,

Cheema

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
run;

Patrick · Posted 05-31-2018 03:21 AM

@MAC1430

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

  if _n_=1 then
    do;
      length _cnt 3;
      _cnt=1;
      dcl hash h1(suminc: '_cnt', multidata:'n');
      h1.defineKey('stock','date');
      h1.defineDone();
      do i=1 to nobs;
        set have nobs=nobs point=i;
        date=intnx('month',date,0,'b');
        if ret=0 then h1.ref();
      end;
    end;

    set have;
    h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
    if _zero_ret_cnt<&treshold_cnt then delete;
run;

View solution in original post

Patrick · Posted 05-31-2018 03:21 AM

@MAC1430

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

  if _n_=1 then
    do;
      length _cnt 3;
      _cnt=1;
      dcl hash h1(suminc: '_cnt', multidata:'n');
      h1.defineKey('stock','date');
      h1.defineDone();
      do i=1 to nobs;
        set have nobs=nobs point=i;
        date=intnx('month',date,0,'b');
        if ret=0 then h1.ref();
      end;
    end;

    set have;
    h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
    if _zero_ret_cnt<&treshold_cnt then delete;
run;

MAC1430 · Posted 05-31-2018 04:05 AM

Thank you very much, it works very well.

Ksharp · Posted 05-31-2018 09:17 AM

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
month=month(date);
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;
proc sql;
create table want as
 select *
  from have
   group by stock,month
    having sum(ret ne 0)>=5 ;
quit;

MAC1430 · Posted 05-31-2018 08:21 PM

Thanks you very much for ksharp :). Its even short code and more efficient.

Patrick · Posted 05-31-2018 08:25 PM

@MAC1430

You've asked for "millions of observations, so I would appreciate an efficient code".

I've made the assumption your source data is a SAS table. If so then the data step hash approach will outperform the SQL. Should your source data be in a database then the SQL is the option of choice.

MAC1430 · Posted 05-31-2018 08:33 PM

Dear Patrick,

Thank you for pointing out this one. Yes, my source data is a SAS table, so I I will use data setp hash approach them. Thanks again to both of you, its really helpful 🙂

Patrick · Posted 05-31-2018 08:41 PM

@MAC1430

If the SQL performs still good enough then I'd go for the SQL as besides of runtimes it's also always about code maintenance - and the SQL is certainly easier to understand and maintain.

MAC1430 · Posted 05-31-2018 08:44 PM

Yes, SQL is quite easier to understand and make changes. I will check the run-times of both SQL and hash, I guess hash will outperform since my data-set is 100 millions observations.

Drop stock in a month with less than five non-zero returns

Hi everyone,

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Re: Drop stock in a month with less than five non-zero returns

Hi everyone,

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away