BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MAC1430
Pyrite | Level 9
 

 Hi everyone,

 

I want to drop stocks in a month with less than five non-zero returns. For example stock 1 has 7 daily returns in March 2018 but only three non-zero returns. Therefore, I want to delete daily returns of stock 1 for March 2018. I have millions of observations, so I would appreciate an efficient code. Please find the SAS data below.

 

Thanks a lot for your help.

 

Best,

 

Cheema

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@MAC1430

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

  if _n_=1 then
    do;
      length _cnt 3;
      _cnt=1;
      dcl hash h1(suminc: '_cnt', multidata:'n');
      h1.defineKey('stock','date');
      h1.defineDone();
      do i=1 to nobs;
        set have nobs=nobs point=i;
        date=intnx('month',date,0,'b');
        if ret=0 then h1.ref();
      end;
    end;

    set have;
    h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
    if _zero_ret_cnt<&treshold_cnt then delete;
run;

View solution in original post

8 REPLIES 8
Patrick
Opal | Level 21

@MAC1430

I believe below code should perform reasonably well. I had to set the threshold count to 4 (and not 5) to return some rows using the sample data you've posted.

data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;

%let treshold_cnt=4;
data want(drop=_:);

  if _n_=1 then
    do;
      length _cnt 3;
      _cnt=1;
      dcl hash h1(suminc: '_cnt', multidata:'n');
      h1.defineKey('stock','date');
      h1.defineDone();
      do i=1 to nobs;
        set have nobs=nobs point=i;
        date=intnx('month',date,0,'b');
        if ret=0 then h1.ref();
      end;
    end;

    set have;
    h1.sum(key:stock, key:intnx('month',date,0,'b'), sum:_zero_ret_cnt);
    if _zero_ret_cnt<&treshold_cnt then delete;
run;

MAC1430
Pyrite | Level 9
Thank you very much, it works very well.
Ksharp
Super User
data have;
infile cards expandtabs truncover;
input stock date : yymmdd10. ret ;
month=month(date);
format date yymmdd10.;
cards;
1	2018-03-01	0
1	2018-03-02	0.1
1	2018-03-03	0
1	2018-03-04	0.03
1	2018-03-05	0
1	2018-03-06	0.02
1	2018-03-07	0
2	2018-03-01	0.02
2	2018-03-02	0.03
2	2018-03-03	0.04
2	2018-03-04	0.05
2	2018-03-05	0.06
2	2018-03-06	0.07
2	2018-03-07	0.08
2	2018-03-08	0
1	2018-04-01	0.02
1	2018-04-02	0.03
1	2018-04-03	0.04
1	2018-04-04	0.05
1	2018-04-05	0.06
1	2018-04-06	0.07
1	2018-04-07	0.08
2	2018-04-01	0.02
2	2018-04-02	0.03
2	2018-04-03	0.04
2	2018-04-04	0.05
2	2018-04-05	0.06
2	2018-04-06	0.07
2	2018-04-07	0.08
2	2018-04-08	0
;
run;
proc sql;
create table want as
 select *
  from have
   group by stock,month
    having sum(ret ne 0)>=5 ;
quit;
MAC1430
Pyrite | Level 9
Thanks you very much for ksharp :). Its even short code and more efficient.
Patrick
Opal | Level 21

@MAC1430

You've asked for "millions of observations, so I would appreciate an efficient code".

 

I've made the assumption your source data is a SAS table. If so then the data step hash approach will outperform the SQL. Should your source data be in a database then the SQL is the option of choice.

MAC1430
Pyrite | Level 9
Dear Patrick,

Thank you for pointing out this one. Yes, my source data is a SAS table, so I I will use data setp hash approach them. Thanks again to both of you, its really helpful 🙂
Patrick
Opal | Level 21

@MAC1430

If the SQL performs still good enough then I'd go for the SQL as besides of runtimes it's also always about code maintenance - and the SQL is certainly easier to understand and maintain.

MAC1430
Pyrite | Level 9
Yes, SQL is quite easier to understand and make changes. I will check the run-times of both SQL and hash, I guess hash will outperform since my data-set is 100 millions observations.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1061 views
  • 4 likes
  • 3 in conversation