BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bssiders
Calcite | Level 5

this is driving me crazy!  ok, I have account_id, date and income.  The request is to pull any instance where the account_id appears 10+ consecutive days.   If there is a break of more than 5 days in between any date per account, then it needs to start over looking for 10+ consecutive days as well as assigning some type of indicator noting that it's the 2nd, 3rd, etc break per account number.   Some accounts can appear for a full year with, for example, 13 separate instances where they show 10+ consecutive days.  It could appear 12 consecutive days one time, then 18 days another, then 10 another, then 29....etc.   Each of those times needs to have the account number concatenated with the number of consecutive days it appeared.  for example, Account = 1234abc.    there would be 1234abc-12, 1234abc-18, 1234abc-10, 1234abc-29...etc for each line within each group.  Also, any days that fall outside the consecutive days criteria need to be removed.   I will consider payment if you can come up with something!!  feel free to email azsiders@yahoo.com. Subject SAS Puzzle.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@bssiders wrote:

[....] I decided to take another approach by using multiple data steps to create additional variables [...]


I'd recommend that you try and understand the proposed code instead of going for a sub-optimal solution. Except for custom intervals all the other logic is quite commonly used and though very worthwhile understanding as it will help you with future problems. Feel free to ask for bits that you don't understand.

As a first step: Just copy/paste the code into chatGPT. This AI gets the explanations of SAS code often right. Also lookup in the SAS docu functions that are new to you. And then as a next step ask here for explanation of the bits you still can't fully grasp.

 

Below a variant of code already proposed. I tried to keep it as simple as I could.

 

1. Create sample have data using what you've provided with some of the invalid dates fixed (like: 221MAY2012).

Spoiler
/** create sample have data **/
data have;
  infile datalines truncover dsd dlm=' ';
  input account_id $ date : date9.  income desired_select_flg;
  format date date9.;
datalines;
ac245 01MAR2012 22659
ac245 02MAR2012 22659
ac245 05MAR2012 22659
ac245 06MAR2012 22659
ac245 07MAR2012 22659
ac245 08MAR2012 22659
ac245 09MAR2012 22659
ac245 12MAR2012 22659
ac245 13MAR2012 22659
ac245 15MAR2012 22659 1
ac245 16MAR2012 22659 1
ac245 19MAR2012 22659 1
ac245 20MAR2012 22659 1
ac245 21MAR2012 22659 1
ac245 22MAR2012 22659 1
ac245 23MAR2012 19475 1
ac245 26MAR2012 22659 1
ac245 27MAR2012 6281 1
ac245 28MAR2012 22659 1
ac245 29MAR2012 22659 1
ac245 30MAR2012 22659 1
ac245 03APR2012 22659
ac245 05APR2012 22659
ac245 06APR2012 22659
ac245 09APR2012 22659
ac245 10APR2012 22659
ac245 11APR2012 22659
ac245 12APR2012 22659
ac245 13APR2012 22659
ac245 16APR2012 22659
ac245 17APR2012 22659
ac245 21MAY2012 12420 1
ac245 22MAY2012 12420 1
ac245 23MAY2012 12420 1
ac245 24MAY2012 12420 1
ac245 25MAY2012 12420 1
ac245 29MAY2012 12420 1
ac245 30MAY2012 12420 1
ac245 31MAY2012 12420 1
ac245 01JUN2012 12420 1
ac245 04JUN2012 12420 1
ac245 05JUN2012 12420 1
ac245 06JUN2012 12420 1
tr892 26JAN2022 1087
tr892 27JAN2022 1087
tr892 28JAN2022 1087
tr892 31JAN2022 1087
tr892 03FEB2022 1087
tr892 04FEB2022 1087
tr892 07FEB2022 1087
tr892 10FEB2022 1087
tr892 11FEB2022 1087
tr892 14FEB2022 1087
tr892 15FEB2022 1087
tr892 17FEB2022 1087
tr892 18FEB2022 1087
tr892 22FEB2022 1087
tr892 22FEB2022 1087
tr892 23FEB2022 1087
tr892 24FEB2022 1087
tr892 25FEB2022 1087
tr892 28FEB2022 1284
tr892 03MAR2022 1284
tr892 04MAR2022 1284
tr892 07MAR2022 1284
tr892 09MAR2022 1284 1
tr892 10MAR2022 1284 1
tr892 11MAR2022 1284 1
tr892 14MAR2022 1284 1
tr892 15MAR2022 1284 1
tr892 16MAR2022 1284 1
tr892 17MAR2022 1284 1
tr892 18MAR2022 1284 1
tr892 21MAR2022 1284 1
tr892 22MAR2022 1284 1
tr892 23MAR2022 1284 1
tr892 24MAR2022 1284 1
tr892 28MAR2022 1284 1
;

 

2. Create a company workday table. For use as source for a SAS Custom Interval this table needs to contain a variable with name BEGIN that stores a SAS date value with one row per date that is a company working day.

Ideally such data already exists somewhere in your company and you can just source the data from there and don't need below logic.

Spoiler
/** create source table for use as custom interval for company workdays **/

/** create dataset with company workdays: **/
/** - weekdays and not a holiday          **/

/* create table with holidays */
data work.comp_holidays;
  input date :date9.;
  format date date9.;
  datalines;
28MAY2012
25DEC2012
02JAN2013
25DEC2021
02JAN2022
;

/* create workday calendar: Mon-Fri without holidays */
data work.comp_work_days;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'work.comp_holidays');
      h1.defineKey('date');
      h1.defineDone();
    end;
  format date date9.;
  do date='01jan2012'd to today();
    /* write to target if date is a weekday and not in holiday list */
    if weekday(date) in (2,3,4,5,6) and h1.check() ne 0 then output;
  end;
  /* rename variable date to name required for use as input for a custom interval */
  rename date=begin;
run;

 

3. Once you've got source data and a workday table suitable to create a SAS custom interval logic as below should work.

/* use work.comp_work_days as data source for custom interval comp_work_calendar    */
/* - table with dates used requires a variable named begin that stores the dates    */
/*   that are working days                                                          */
/* - for use with SAS intnx() and intck() calendar functions                        */
options intervalds=(comp_work_calendar=work.comp_work_days);


/**  **/
data want(drop=_: desired_select_flg);
  set have;
  by account_id date;
  format _lag_date date9.;
  _lag_date=lag(date);

  if first.account_id then 
    do;
      _count=1;
      group=0;
    end;
  else
    do;
      if intnx('comp_work_calendar',_lag_date,1)=date then _count+1;
      else 
        do;
          _count=1;
        end;
    end;

    if _count=10 then
      do;
        group+1;
        /* If the _count reached 10 we need to select the data but this is something we only                */
        /* find out when we process the 10th observation (based on the SET statement)                       */
        /* What we do know is the current observation number which is stored in the automatic variable _n_  */
        /* so all we need to do is read and output the last 10 observations from source based on the        */
        /* obs number of the current obs. This is what below do loop does: It accesses the source data      */
        /* a 2nd time to read and output 10 obs relative to the current obs.                                */
        do _obs=_n_-9 to _n_;
          set have point=_obs;
          output;
        end;
      end;
    else
    if _count>10 then output;
run;

proc print data=want;
run;

 

 

What above logic doesn't address because I couldn't understand the requirement is:
"If there is a break of more than 5 days in between any date per account, then it needs to start over looking for 10+ consecutive days as well as assigning some type of indicator noting that it's the 2nd, 3rd, etc break per account number. "

The logic also populates a variable group which let's you identify different sets of selected dates.

 

 

 

 

View solution in original post

13 REPLIES 13
mkeintz
PROC Star

As presented, this is NOT a "tricky puzzle".  It is rather easily solved using well-known features of the DATA step.

 

In particular, it uses

  1. the BY statement in concert with a SET statement
  2. two DO groups, each with a SET and/or MERGE statement inside the group (often called a double DOW for reasons I won't get into), and
  3. the FIRSTOBS= option in a MERGE statement to read both current and next dates in consecutive observations in a data set.

Because you have not provided sample data in the form of a working DATA step, the code below is untested.  It assumes data are sorted by ID/DATE:

 

(Editted note: it also assumes that no ID/DATE has more than one record, although that could be trivially addressed).

 

data want (drop=_:);

  do _ndays=1 by 1 until (last.id  or _nxt_date>date+1 );
    set have (keep=id);
    by id;
    merge have  
          have (firstobs=2 keep=date rename=(date=_nxt_date);
  end;
  length newvar $13;
  newvar=catx('-',id,ndays);

  do _n_=1 to _ndays;
    set have;
    if _ndays>=10 then output;
  end;

run;

There is a not-so-obvious component in this code - namely the use of both a SET and a MERGE inside the same DO group.  The SET is used in combination with the BY statement to facilitate the "last.id" test.  The MERGE is synchronized with the SET to read in all the vars from the current obs, and a single var from the next obs.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bssiders
Calcite | Level 5

This is great and I really appreciate you taking the time to respond! Sorry for delay in replying as this was my first ever post and I missed an email.  Anyway....

 

I forgot one minor detail, smh.  I need to account for holidays and weekends.  In other words, I neeUsing a holiday table with company dates, I need to somehow pull that in.

 

So an account needs to appear 10+ business days in a row.  Once it skips a business day, the count needs to start over. Each iteration of 10+ consec biz days per account needs to be output.   

 

 

 

Reeza
Super User
You can build a custom calendar in SAS and then you use INTNX with the custom calendar. If you want better code, post a fully worked small example with fake data.
mkeintz
PROC Star

So an account needs to appear 10+ business days in a row.  Once it skips a business day, the count needs to start over. Each iteration of 10+ consec biz days per account needs to be output.   

Questions:

  1. Do any dates in the dataset occur on weekends or other non-business days?
  2. Does any date have more than one record?

 


 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bssiders
Calcite | Level 5

Thank you.  Below is an example of the data.  The impact font lines are is what needs to be in the output dataset. They are in 10+ consecutive business day order.  For example, the first 9 lines are in order then there is Mar 14th missing, which is not a holiday or weekend. But then continues on Mar15 -Mar30 without a biz day break, which qualifies it for output. The account continues on Apr03-Apr17, however skips Apr04 which is not a holiday or weekend, therefor even know there are ten days, the skip of the 2nd prevents it from qualifying. It then starts back up on May21-Jun06 without a skip, so this section would be part of the output.  For account tr892, the only section would be from Mar09-24; the other lines would not apply due to breaks not involving holiday or weekends.  An account will not duplicate on the same day, however the income can fluctuate and does not apply but is needed in the output. There may be other variables needed as well, including income, but kept it short for learning purpose.  Thanks again for any assistance you may be able to provide.

 

data have;

   input account_id $ date : date9.  income ;

   format date date9.;

datalines;

ac245 01MAR2012 22659

ac245 02MAR2012 22659

ac245 05MAR2012 22659

ac245 06MAR2012 22659

ac245 07MAR2012 22659

ac245 08MAR2012 22659

ac245 09MAR2012 22659

ac245 12MAR2012 22659

ac245 13MAR2012 22659

ac245 15MAR2012 22659

ac245 16MAR2012 22659

ac245 19MAR2012 22659

ac245 20MAR2012 22659

ac245 21MAR2012 22659

ac245 22MAR2012 22659

ac245 23MAR2012 19475

ac245 26MAR2012 22659

ac245 27MAR2012 6281

ac245 28MAR2012 22659

ac245 29MAR2012 22659

ac245 30MAR2012 22659

ac245 03APR2012 22659

ac245 05APR2012 22659

ac245 06APR2012 22659

ac245 09APR2012 22659

ac245 10APR2012 22659

ac245 11APR2012 22659

ac245 12APR2012 22659

ac245 13APR2012 22659

ac245 16APR2012 22659

ac245 17APR2012 22659

ac245 21MAY2012 12420

ac245 221MAY2012 12420

ac245 231MAY2012 12420

ac245 241MAY2012 12420

ac245 251MAY2012 12420

ac245 291MAY2012 12420

ac245 30MAY2012 12420

ac245 31MAY2012 12420

ac245 01JUN2012 12420

ac245 04JUN2012 12420

ac245 05JUN2012 12420

ac245 06JUN2012 12420

tr892 26JAN2022 1087

tr892 27JAN2022 1087

tr892 28JAN2022 1087

tr892 31JAN2022 1087

tr892 03FEB2022 1087

tr892 04FEB2022 1087

tr892 07FEB2022 1087

tr892 10FEB2022 1087

tr892 11FEB2022 1087

tr892 14FEB2022 1087

tr892 15FEB2022 1087

tr892 17FEB2022 1087

tr892 18FEB2022 1087

tr892 22FEB2022 1087

tr892 22FEB2022 1087

tr892 23FEB2022 1087

tr892 24FEB2022 1087

tr892 25FEB2022 1087

tr892 28FEB2022 1284

tr892 03MAR2022 1284

tr892 04MAR2022 1284

tr892 07MAR2022 1284

tr892 09MAR2022 1284

tr892 10MAR2022 1284

tr892 11MAR2022 1284

tr892 14MAR2022 1284

tr892 15MAR2022 1284

tr892 16MAR2022 1284

tr892 17MAR2022 1284

tr892 18MAR2022 1284

tr892 21MAR2022 1284

tr892 22MAR2022 1284

tr892 23MAR2022 1284

tr892 24MAR2022 1284

tr892 28MAR2022 1284

;

mkeintz
PROC Star

@bssiders wrote:

I forgot one minor detail, smh.  I need to account for holidays and weekends.  In other words, I neeUsing a holiday table with company dates, I need to somehow pull that in.

 

So an account needs to appear 10+ business days in a row.  Once it skips a business day, the count needs to start over. Each iteration of 10+ consec biz days per account needs to be output.   


 

In my original code, instead of 

  do _ndays=1 by 1 until (last.id  or _nxt_date> date+1);

I could change the _nxtdate>date+1 condition and equivalently use

  do _ndays=1 by 1 until (last.id  or _nxt_date> intnx('day',date,1);

to get exactly the same result.

 

But the real benefit of the intnx function in this case is that I can choose intervals other than ordinary calendar days (i.e. other than 'day').  Instead I can use a user-specified calendar, (let's call it "company_calendar") which can be set skip weekends and holidays (both general and company-specific holidays).  Once that is done, as @Reeza  suggests, it's pretty easy.

 

Let's say you make a calendar dataset work.mycomp_calendar .  It has two date variables BEGIN and END (more about that later).

 

Then you could just do this:

 

options  intervalds=(company_calendar=work.mycomp_calendar);   

data want (drop=_:);

  do _ndays=1 by 1 until (last.id  or _nxt_date> intnx('company_calendar',date,+1));
    set have (keep=id);
    by id;
    merge have  
          have (firstobs=2 keep=date rename=(date=_nxt_date));
  end;
  length newvar $13;
  newvar=catx('-',id,_ndays);

  do _n_=1 to _ndays;
    set have;
    if _ndays>=10 then output;
  end;

run;

Edit:  After getting actual sample data, I fixed syntax in the above code.

 

The OPTIONS statement tells sas to use the dataset work.mycomp_calendar for a calendar interval you choose to name "company_calendar".  

 

As to the dataset work.mycomp_calendar, make a dataset with two variables (BEGIN and END) to cover the entire range of your data  (see the Example 2: Custom Intervals section intnx function documentation to see what the dataset should look like.

 

Here's a trial company calendar dataset (your holidays may differ).  It uses 2020 and 2021 as the calendar range:

 

data mycomp_calendar (keep=begin end);
  format begin end weekdate22. ;

  array hd_dummies{%sysevalf("01jan2020"d-7):%sysevalf("31dec2021"d+7)};
  do year=year(lbound(hd_dummies)) to year(hbound(hd_dummies));
    length hname $15;
    do hname='NEWYEAR','MLK','USPRESIDENTS','MEMORIAL','USINDEPENDENCE','LABOR','THANKSGIVING','CHRISTMAS';
      d=holiday(hname,year);
      if lbound(hd_dummies) <= d <= hbound(hd_dummies) then hd_dummies{d}=1;
    end;
  end;

  do begin=intnx('weekday','01jan2020'd,0) by 0 until (end>='31dec2021'd);
    do nxtbeg=intnx('weekday',begin,1) by 0 while(hd_dummies(nxtbeg)=1);
      nxtbeg=intnx('weekday',nxtbeg,1);
    end;
    end=nxtbeg-1;
    output;
    begin=nxtbeg;  **Ready for next loop iteration **;
  end;
run;

Notice the holiday dummies array hd_dummies I extend the date range by 7 days at each end of the array.  This assures that the INTNX function used to build the company calendar does not go beyond the array range (i.e. when calculating END for the last interval and BEGIN for the first interval).

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bssiders
Calcite | Level 5

Again, I really appreciate you taking the time!  However, I still am not getting the output I am looking for.  And your code is WAY above my head so I highly doubt it is you!!  Very impressive to say the least, btw.

 

With that said, I decided to take another approach by using multiple data steps to create additional variables that ultimately created a column 'day_int' with 0/1 values.  0 being the start date then value of 1 for each date thereafter accounting for holiday/weekend and if there was a skip of more than 1 it started over at 0. I then added:

 

if day_int = 0 then x=1 ; else x+1;

 

Which provided a the variable X, counting each line....ex, 1-17, 1-4, 1-59, 1-223, 1-8, 1-9, etc....

 

From there I added :

 

if x>9 then output;

 

Which worked (kind of) it outputted all those that exceeded 10+ days, but only lines 10-17, 10-59, 10-223, etc.... it did not include 1-4, 1-8, or 1-9 which was good,  but for the 10+,  I need lines 1-17, 1-59, 1-223 vs starting from 10.  Any suggestions on doing that? 

 

Thanks again!!

 

Patrick
Opal | Level 21

@bssiders wrote:

[....] I decided to take another approach by using multiple data steps to create additional variables [...]


I'd recommend that you try and understand the proposed code instead of going for a sub-optimal solution. Except for custom intervals all the other logic is quite commonly used and though very worthwhile understanding as it will help you with future problems. Feel free to ask for bits that you don't understand.

As a first step: Just copy/paste the code into chatGPT. This AI gets the explanations of SAS code often right. Also lookup in the SAS docu functions that are new to you. And then as a next step ask here for explanation of the bits you still can't fully grasp.

 

Below a variant of code already proposed. I tried to keep it as simple as I could.

 

1. Create sample have data using what you've provided with some of the invalid dates fixed (like: 221MAY2012).

Spoiler
/** create sample have data **/
data have;
  infile datalines truncover dsd dlm=' ';
  input account_id $ date : date9.  income desired_select_flg;
  format date date9.;
datalines;
ac245 01MAR2012 22659
ac245 02MAR2012 22659
ac245 05MAR2012 22659
ac245 06MAR2012 22659
ac245 07MAR2012 22659
ac245 08MAR2012 22659
ac245 09MAR2012 22659
ac245 12MAR2012 22659
ac245 13MAR2012 22659
ac245 15MAR2012 22659 1
ac245 16MAR2012 22659 1
ac245 19MAR2012 22659 1
ac245 20MAR2012 22659 1
ac245 21MAR2012 22659 1
ac245 22MAR2012 22659 1
ac245 23MAR2012 19475 1
ac245 26MAR2012 22659 1
ac245 27MAR2012 6281 1
ac245 28MAR2012 22659 1
ac245 29MAR2012 22659 1
ac245 30MAR2012 22659 1
ac245 03APR2012 22659
ac245 05APR2012 22659
ac245 06APR2012 22659
ac245 09APR2012 22659
ac245 10APR2012 22659
ac245 11APR2012 22659
ac245 12APR2012 22659
ac245 13APR2012 22659
ac245 16APR2012 22659
ac245 17APR2012 22659
ac245 21MAY2012 12420 1
ac245 22MAY2012 12420 1
ac245 23MAY2012 12420 1
ac245 24MAY2012 12420 1
ac245 25MAY2012 12420 1
ac245 29MAY2012 12420 1
ac245 30MAY2012 12420 1
ac245 31MAY2012 12420 1
ac245 01JUN2012 12420 1
ac245 04JUN2012 12420 1
ac245 05JUN2012 12420 1
ac245 06JUN2012 12420 1
tr892 26JAN2022 1087
tr892 27JAN2022 1087
tr892 28JAN2022 1087
tr892 31JAN2022 1087
tr892 03FEB2022 1087
tr892 04FEB2022 1087
tr892 07FEB2022 1087
tr892 10FEB2022 1087
tr892 11FEB2022 1087
tr892 14FEB2022 1087
tr892 15FEB2022 1087
tr892 17FEB2022 1087
tr892 18FEB2022 1087
tr892 22FEB2022 1087
tr892 22FEB2022 1087
tr892 23FEB2022 1087
tr892 24FEB2022 1087
tr892 25FEB2022 1087
tr892 28FEB2022 1284
tr892 03MAR2022 1284
tr892 04MAR2022 1284
tr892 07MAR2022 1284
tr892 09MAR2022 1284 1
tr892 10MAR2022 1284 1
tr892 11MAR2022 1284 1
tr892 14MAR2022 1284 1
tr892 15MAR2022 1284 1
tr892 16MAR2022 1284 1
tr892 17MAR2022 1284 1
tr892 18MAR2022 1284 1
tr892 21MAR2022 1284 1
tr892 22MAR2022 1284 1
tr892 23MAR2022 1284 1
tr892 24MAR2022 1284 1
tr892 28MAR2022 1284 1
;

 

2. Create a company workday table. For use as source for a SAS Custom Interval this table needs to contain a variable with name BEGIN that stores a SAS date value with one row per date that is a company working day.

Ideally such data already exists somewhere in your company and you can just source the data from there and don't need below logic.

Spoiler
/** create source table for use as custom interval for company workdays **/

/** create dataset with company workdays: **/
/** - weekdays and not a holiday          **/

/* create table with holidays */
data work.comp_holidays;
  input date :date9.;
  format date date9.;
  datalines;
28MAY2012
25DEC2012
02JAN2013
25DEC2021
02JAN2022
;

/* create workday calendar: Mon-Fri without holidays */
data work.comp_work_days;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'work.comp_holidays');
      h1.defineKey('date');
      h1.defineDone();
    end;
  format date date9.;
  do date='01jan2012'd to today();
    /* write to target if date is a weekday and not in holiday list */
    if weekday(date) in (2,3,4,5,6) and h1.check() ne 0 then output;
  end;
  /* rename variable date to name required for use as input for a custom interval */
  rename date=begin;
run;

 

3. Once you've got source data and a workday table suitable to create a SAS custom interval logic as below should work.

/* use work.comp_work_days as data source for custom interval comp_work_calendar    */
/* - table with dates used requires a variable named begin that stores the dates    */
/*   that are working days                                                          */
/* - for use with SAS intnx() and intck() calendar functions                        */
options intervalds=(comp_work_calendar=work.comp_work_days);


/**  **/
data want(drop=_: desired_select_flg);
  set have;
  by account_id date;
  format _lag_date date9.;
  _lag_date=lag(date);

  if first.account_id then 
    do;
      _count=1;
      group=0;
    end;
  else
    do;
      if intnx('comp_work_calendar',_lag_date,1)=date then _count+1;
      else 
        do;
          _count=1;
        end;
    end;

    if _count=10 then
      do;
        group+1;
        /* If the _count reached 10 we need to select the data but this is something we only                */
        /* find out when we process the 10th observation (based on the SET statement)                       */
        /* What we do know is the current observation number which is stored in the automatic variable _n_  */
        /* so all we need to do is read and output the last 10 observations from source based on the        */
        /* obs number of the current obs. This is what below do loop does: It accesses the source data      */
        /* a 2nd time to read and output 10 obs relative to the current obs.                                */
        do _obs=_n_-9 to _n_;
          set have point=_obs;
          output;
        end;
      end;
    else
    if _count>10 then output;
run;

proc print data=want;
run;

 

 

What above logic doesn't address because I couldn't understand the requirement is:
"If there is a break of more than 5 days in between any date per account, then it needs to start over looking for 10+ consecutive days as well as assigning some type of indicator noting that it's the 2nd, 3rd, etc break per account number. "

The logic also populates a variable group which let's you identify different sets of selected dates.

 

 

 

 

bssiders
Calcite | Level 5

I added your code and I didn't get any errors!  however, I also didn't get any obs.  I am not sure which is worse. lol.    and I apologize for the 5 day break confusion. disregard that.  I just need to know if there is break in consecutive biz days.  With no detail in the log as to why zero obs came back, I am not sure what went wrong. 

 

 

bssiders
Calcite | Level 5

Patrick - ok, I took out a ; and it worked!! and again, can't thank you enough!!   

ballardw
Super User

You really should show a small example of the final desired result (and perhaps a WHY you want a summary for "each line"), better would be to show enough input data to generate the shown output. Best is to provide the data as a working data step.

 

Here's an example of how to provide an account and date in a data step:

data have;
    input accountid $ date :date9.;
    format date date9.;
datalines;
abc   01JAN2021
abc   02JAN2021
abc   03JAN2021
abc   04JAN2021
abc   05JAN2021
abc   06JAN2021
abc   12JAN2021
abc   13JAN2021
abc   14JAN2021
abc   15JAN2021
abc   16JAN2021
abc   17JAN2021
abc   18JAN2021
abc   19JAN2021
abc   21JAN2021
abc   31JAN2021
abc   01FEB2021
abc   02FEB2021
;

You should provide some durations of less than 10, some longer than 10 and show what the expected results would look like when done.

 

Getting the basic counts may be pretty easy IF you have actual date values. You do not at all address how a consecutive period that crosses a calendar (or fiscal) year boundary might need to be treated.

 

Personally I find an instruction like "Each of those times needs to have the account number concatenated with the number of consecutive days it appeared." which places TWO different values into ONE variable to be an incredibly poor choice. Separate variables are much better.

bssiders
Calcite | Level 5

Thank you.  Below is an example of the data.  The impact font lines are is what needs to be in the output dataset. They are in 10+ consecutive business day order.  For example, the first 9 lines are in order then there is Mar 14th missing, which is not a holiday or weekend. But then continues on Mar15 -Mar30 without a biz day break, which qualifies it for output. The account continues on Apr03-Apr17, however skips Apr04 which is not a holiday or weekend, therefor even know there are ten days, the skip of the 2nd prevents it from qualifying. It then starts back up on May17-Jun06 without a skip, so this section would be part of the output.  For account tr892, the only section would be from Mar09-24; the other lines would not apply due to breaks not involving holiday or weekends.  An account will not duplicate on the same day, however the income can fluctuate and does not apply but is needed in the output. There may be other variables needed as well, including income, but kept it short for learning purpose.  Thanks again for any assistance you may be able to provide.

 

data have;

   input account_id $ date : date9.  income ;

   format date date9.;

datalines;

ac245 01MAR2012 22659

ac245 02MAR2012 22659

ac245 05MAR2012 22659

ac245 06MAR2012 22659

ac245 07MAR2012 22659

ac245 08MAR2012 22659

ac245 09MAR2012 22659

ac245 12MAR2012 22659

ac245 13MAR2012 22659

ac245 15MAR2012 22659

ac245 16MAR2012 22659

ac245 19MAR2012 22659

ac245 20MAR2012 22659

ac245 21MAR2012 22659

ac245 22MAR2012 22659

ac245 23MAR2012 19475

ac245 26MAR2012 22659

ac245 27MAR2012 6281

ac245 28MAR2012 22659

ac245 29MAR2012 22659

ac245 30MAR2012 22659

ac245 03APR2012 22659

ac245 05APR2012 22659

ac245 06APR2012 22659

ac245 09APR2012 22659

ac245 10APR2012 22659

ac245 11APR2012 22659

ac245 12APR2012 22659

ac245 13APR2012 22659

ac245 16APR2012 22659

ac245 17APR2012 22659

ac245 21MAY2012 12420

ac245 221MAY2012 12420

ac245 231MAY2012 12420

ac245 241MAY2012 12420

ac245 251MAY2012 12420

ac245 291MAY2012 12420

ac245 30MAY2012 12420

ac245 31MAY2012 12420

ac245 01JUN2012 12420

ac245 04JUN2012 12420

ac245 05JUN2012 12420

ac245 06JUN2012 12420

tr892 26JAN2022 1087

tr892 27JAN2022 1087

tr892 28JAN2022 1087

tr892 31JAN2022 1087

tr892 03FEB2022 1087

tr892 04FEB2022 1087

tr892 07FEB2022 1087

tr892 10FEB2022 1087

tr892 11FEB2022 1087

tr892 14FEB2022 1087

tr892 15FEB2022 1087

tr892 17FEB2022 1087

tr892 18FEB2022 1087

tr892 22FEB2022 1087

tr892 22FEB2022 1087

tr892 23FEB2022 1087

tr892 24FEB2022 1087

tr892 25FEB2022 1087

tr892 28FEB2022 1284

tr892 03MAR2022 1284

tr892 04MAR2022 1284

tr892 07MAR2022 1284

tr892 09MAR2022 1284

tr892 10MAR2022 1284

tr892 11MAR2022 1284

tr892 14MAR2022 1284

tr892 15MAR2022 1284

tr892 16MAR2022 1284

tr892 17MAR2022 1284

tr892 18MAR2022 1284

tr892 21MAR2022 1284

tr892 22MAR2022 1284

tr892 23MAR2022 1284

tr892 24MAR2022 1284

tr892 28MAR2022 1284

;

 

 

bssiders
Calcite | Level 5

correction, where it readys May17-Jun06,  s/b May21-Jun06

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2440 views
  • 8 likes
  • 5 in conversation