Solved: Create a Row for Every Date within a Date Range

djbateman · Posted 06-08-2022 10:45 AM

I have created a process that makes a row for every date in a date range. It is very tedious and slow when working with large data. Can someone help me simply and speed up this process?

As a very small scale example, suppose I have this dataset:

proc sql;
	create table have (SUBJID char(7), STDT num format=date9., ENDT num format=date9.);
	insert into have (subjid, stdt, endt)
		values ('101-001','01JAN2022'd,'07JAN2022'd)
		values ('101-002','01FEB2022'd,'28FEB2022'd)
		values ('101-003','01JAN2022'd,'14JAN2022'd)
		values ('101-004','01JAN2022'd,'03JAN2022'd);
quit;

I want to get a record for every date within the start and end dates. So, if the subject spans 3 days, then I want to turn it into 3 records where each record contains a date within that range. This is what I am doing to get what I want:

data have1;
	set have;
	REPS=endt-stdt+1;
run;



%macro replicate;
	proc sql noprint;
        select count(*) into :nrec trimmed from have1;
		select subjid, reps into :subj1-:subj&nrec., :rep1-:rep&nrec. from have1;
	quit;

	data want0;
		set %do i=1 %to &nrec.;
				%do j=1 %to &&rep&i.;
					have1 (where=(subjid="&&subj&i."))
				%end;
			%end;;
	run;
%mend replicate;

%replicate;



data want;
	set want0;
	by subjid;
	retain DATE;
	if first.subjid then date=stdt;
		else date=date+1;
	format date date9.;
run;
proc sort data=want;
	by subjid date;
run;

This process works, but it isn't very effieicent for large date ranges. I have some ranges that are up to 7000 days, so I am looping A LOT. Is there a quicker/cleaner way to do this same method that won't take so much time?

Astounding · Posted 06-08-2022 10:54 AM

If you have one observation per SUBJID (not clear from your post), why add all the complications? Simply:

data want;
   set have;
   do date = stdt to endt;
      output;
   end;
   drop stdt enddt;
run;

If this doesn't do the trick exactly, it should at least give you a starting point to consider.

View solution in original post

Astounding · Posted 06-08-2022 10:54 AM

If you have one observation per SUBJID (not clear from your post), why add all the complications? Simply:

data want;
   set have;
   do date = stdt to endt;
      output;
   end;
   drop stdt enddt;
run;

If this doesn't do the trick exactly, it should at least give you a starting point to consider.

djbateman · Posted 06-08-2022 10:58 AM

I actually have more than just SUBJID as BY variables in my real process because I'm working with ConMed data. There is also a sequence ID, so a subject can have more than one record (and the date ranges could overlap). I just figured I could take whatever solution someone else has and add in the sequence variable as well. I'll play around with your solution (much simpler than what I have) and see if I can use it with my sequence variable added.

Tom · Posted 06-08-2022 11:11 AM

@djbateman wrote:

I actually have more than just SUBJID as BY variables in my real process because I'm working with ConMed data. There is also a sequence ID, so a subject can have more than one record (and the date ranges could overlap). I just figured I could take whatever solution someone else has and add in the sequence variable as well. I'll play around with your solution (much simpler than what I have) and see if I can use it with my sequence variable added.

Sounds like the goal is NOT to generate one observation per date.

Instead it looks like you want to collapse potentially overlapping periods into a new set of non-overlapping periods.

That is a question that has been asked and answered before. Search around.

Tom · Posted 06-08-2022 11:08 AM

It is trivial given your example data.

Spoiler

data have;
  input subjid :$7. stdt :date. endt :date.;
  format stdt endt date9.;
cards;
101-001 01JAN2022 07JAN2022
101-002 01FEB2022 28FEB2022
101-003 01JAN2022 14JAN2022
101-004 01JAN2022 03JAN2022
;

data have; input subjid :$7. stdt :date. endt :date.; format stdt endt date9.; cards; 101-001 01JAN2022 07JAN2022 101-002 01FEB2022 28FEB2022 101-003 01JAN2022 14JAN2022 101-004 01JAN2022 03JAN2022 ;

Just a simple DO loop.

data want;
  set have;
  do date=stdt to endt ;
    output;
  end;
  format date date9.;
run;

If your real problem is more complex (multiple observations per ID, overlapping date ranges, etc) then perhaps you just want to use an array to mark which dates are covered.

%let mindt=%sysevalf('01JAN2000'd);
%let maxdt=%sysevalf('31DEC2025'd);

data want ;
   set have;
   by subjid ;
   array _d [&mindt:&maxdt] _temporary_ ;
   if first.subjid then call missing(of _d(*));
   do date=max(stdt,&mindt) to min(endt,&maxdt);
      _d[date]=1;
   end;
   if last.subjid then do date=&mindt to &maxdt ;
      if _d[date] then output;
   end;
   keep subjid date ;
   format date date9.;
run;

Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

Re: Create a Row for Every Date within a Date Range

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!