DATA Step, Macro, Functions and more

Creating dates based on different start and stop intervals

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 114
Accepted Solution

Creating dates based on different start and stop intervals

Dear SAS users,

 

Do you have suggestions on how I can create different start and stop dates based on noncontinuous data points?  For example, for each observation I want to create variables mon_start1, mon_end1, mon_start2, mon_end2, mon_start3, mon_end3, etc.  I can create mon_start1 and mon_end1, but cannot figure out how to calculate the remaining variables (mon_start2, mon_end2, mon_start3, mon_end3, etc.) 

 

Here's a sample dataset for months Jan 2015 - Dec 2016 along with the beginning sas code where I created variables mon_start1 and mon_end1.   For example, for ID=AA, the start/end variables should be as follows:

mon_start1=1, mon_end1=6,

mon_start2=10, mon_end2=18,

mon_start3=22,  mon_end3=24

 

data have;

input id $ dob mmddyy10. total_mon mon1-mon24;

format dob monyy7.;

datalines;

AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1

BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . .

DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 .

;

 

data have2;

set have;

start=1;

 

Mon_Start1=(intck('month','01Jan2015'd,dob))+1;

 

array month mon:;

do m=Mon_Start1-start+1 to dim(month) while(month[m]);

end;

 

if m>=24 then Mon_end1=total_mon;

else Mon_End1=m-1;

run;

 

Thanks!


Accepted Solutions
Solution
‎01-08-2018 11:13 AM
Super User
Super User
Posts: 9,853

Re: Creating dates based on different start and stop intervals

Posted in reply to sophia_SAS

Like any question where you have many columns, it is better to drop the "Excel thinking" and normalise your data, i.e. have data going down in observations.  Now the below code, loops over the array and creates a normalised data structure with start/stop times, if the data was normalised in the first place it would be even simpler than this.  Remember your output report files can be transposed before output and so don't need to look like the data you work with.

data have;
  input id $ dob mmddyy10. total_mon mon1-mon24;
  format dob monyy7.;
datalines;
AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1
BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . .
DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 .
;
run;

data want (keep=id dob start_month end_month);
  set have;
  array mon{24};
  length start_month end_month $20;
  do i=1 to 24;
    if mon{i}=1 and start_month="" then start_month=vname(mon{i});
    else if (mon{i}=. and end_month="" and start_month ne "") then do;
      end_month=vname(mon{i-1});
      output;
      end_month="";
      start_month="";
    end;
    else if i=24 and end_month="" and start_month ne "" then do;
      end_month=vname(mon{i});
      output;
    end;
 end;
run;

View solution in original post


All Replies
Solution
‎01-08-2018 11:13 AM
Super User
Super User
Posts: 9,853

Re: Creating dates based on different start and stop intervals

Posted in reply to sophia_SAS

Like any question where you have many columns, it is better to drop the "Excel thinking" and normalise your data, i.e. have data going down in observations.  Now the below code, loops over the array and creates a normalised data structure with start/stop times, if the data was normalised in the first place it would be even simpler than this.  Remember your output report files can be transposed before output and so don't need to look like the data you work with.

data have;
  input id $ dob mmddyy10. total_mon mon1-mon24;
  format dob monyy7.;
datalines;
AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1
BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . .
DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 .
;
run;

data want (keep=id dob start_month end_month);
  set have;
  array mon{24};
  length start_month end_month $20;
  do i=1 to 24;
    if mon{i}=1 and start_month="" then start_month=vname(mon{i});
    else if (mon{i}=. and end_month="" and start_month ne "") then do;
      end_month=vname(mon{i-1});
      output;
      end_month="";
      start_month="";
    end;
    else if i=24 and end_month="" and start_month ne "" then do;
      end_month=vname(mon{i});
      output;
    end;
 end;
run;
Frequent Contributor
Posts: 114

Re: Creating dates based on different start and stop intervals

This is a beautiful code and does even more than I had hoped for with my initial question! And yes, you are correct, I need to reframe how I think of my data - i.e. drop the Excel thinking.'

 

Thank you for putting it together so quickly. 

 

Super User
Posts: 8,218

Re: Creating dates based on different start and stop intervals

Posted in reply to sophia_SAS

I wouldn't be so quick in deciding that @RW9's code does everything you want to accomplish. I added two example records where the code fails to work.

 

Of course, I'm not suggesting that my code is perfect either, but it doesn't require restructuring your data:

 

data have;
  input id $ dob mmddyy10. total_mon mon1-mon24;
  format dob monyy7.;
  datalines;
AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1
BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . .
DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 .
FF 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . 1
GG 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . .
;

data want (drop=i start_month period);
  set have;
  array month{24} mon1-mon24;
  array mon_start(3);
  array mon_end(3);
  period=1;
  call missing(start_month);
  do i=1 to 24;
    if month{i}=1 then do;
      if missing(start_month) then do;
        start_month=i;
        mon_start{period}=i;
      end;
      if i eq 24 then mon_end{period}=i;
    end;
    else if missing(month{i}) and missing(mon_end{period}) and not missing(start_month) then do;
      mon_end{period}=i-1;
      period+1;
      call missing(start_month);
    end;
 end;
run;

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 114

Re: Creating dates based on different start and stop intervals

Hi Art297,

Thanks for responding to my inquiry! You are right in that the earlier code did not catch your two additional IDs - i.e. FF and GG. In my dataset while I never have any GG instances, I do have instances similar to FF (i.e. only 1 data point in the last month). For these examples, I added in a dummy month variable (month25).



Thank you for carefully considering this question and your excellent code. I'm now trying to figure out which to use given my current data structure and plan forward.


Super User
Posts: 8,218

Re: Creating dates based on different start and stop intervals

Posted in reply to sophia_SAS

@Ksharp's code works well too and provides the result in a long form like the one that @RW9 proposed.

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 10,850

Re: Creating dates based on different start and stop intervals

Posted in reply to sophia_SAS
options missing='x';
data have;
  input id $ dob mmddyy10. total_mon mon1-mon24;
  format dob monyy7.;
  datalines;
AA 01/11/2015 18 1 1 1 1 1 1 . . . 1 1 1 1 1 1 1 1 1 . . . 1 1 1
BB 01/29/2015 24 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
CC 08/23/2015 13 . . . . . . . 1 1 1 1 1 1 1 1 1 . . 1 1 1 1 . .
DD 02/11/2015 20 . 1 1 1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
EE 03/12/2016 8 . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 .
FF 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . 1
GG 03/12/2016 8 . . . . . . . . . . . . . . . . . . . . . . . .
;

data want;
 set have;
 temp=cats(of mon1-mon24);
 pid=prxparse('/1+/');
 s=1;e=length(temp);
 call prxnext(pid,s,e,temp,p,l);
 do while(p>0);
  start=p;end=p+l-1; output;
  call prxnext(pid,s,e,temp,p,l);
 end;
drop temp pid s e p l;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 167 views
  • 0 likes
  • 4 in conversation