BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
djbateman
Lapis Lazuli | Level 10

I have created a process that makes a row for every date in a date range.  It is very tedious and slow when working with large data.  Can someone help me simply and speed up this process?

 

As a very small scale example, suppose I have this dataset:

 

proc sql;
	create table have (SUBJID char(7), STDT num format=date9., ENDT num format=date9.);
	insert into have (subjid, stdt, endt)
		values ('101-001','01JAN2022'd,'07JAN2022'd)
		values ('101-002','01FEB2022'd,'28FEB2022'd)
		values ('101-003','01JAN2022'd,'14JAN2022'd)
		values ('101-004','01JAN2022'd,'03JAN2022'd);
quit;

I want to get a record for every date within the start and end dates.  So, if the subject spans 3 days, then I want to turn it into 3 records where each record contains a date within that range.  This is what I am doing to get what I want:

 

data have1;
	set have;
	REPS=endt-stdt+1;
run;



%macro replicate;
	proc sql noprint;
        select count(*) into :nrec trimmed from have1;
		select subjid, reps into :subj1-:subj&nrec., :rep1-:rep&nrec. from have1;
	quit;

	data want0;
		set %do i=1 %to &nrec.;
				%do j=1 %to &&rep&i.;
					have1 (where=(subjid="&&subj&i."))
				%end;
			%end;;
	run;
%mend replicate;

%replicate;



data want;
	set want0;
	by subjid;
	retain DATE;
	if first.subjid then date=stdt;
		else date=date+1;
	format date date9.;
run;
proc sort data=want;
	by subjid date;
run;

 

This process works, but it isn't very effieicent for large date ranges.  I have some ranges that are up to 7000 days, so I am looping A LOT.  Is there a quicker/cleaner way to do this same method that won't take so much time?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

If you have one observation per SUBJID (not clear from your post), why add all the complications?  Simply:

data want;
   set have;
   do date = stdt to endt;
      output;
   end;
   drop stdt enddt;
run;

If this doesn't do the trick exactly, it should at least give you a starting point to consider.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

If you have one observation per SUBJID (not clear from your post), why add all the complications?  Simply:

data want;
   set have;
   do date = stdt to endt;
      output;
   end;
   drop stdt enddt;
run;

If this doesn't do the trick exactly, it should at least give you a starting point to consider.

djbateman
Lapis Lazuli | Level 10

I actually have more than just SUBJID as BY variables in my real process because I'm working with ConMed data.  There is also a sequence ID, so a subject can have more than one record (and the date ranges could overlap).  I just figured I could take whatever solution someone else has and add in the sequence variable as well.  I'll play around with your solution (much simpler than what I have) and see if I can use it with my sequence variable added.

 

Tom
Super User Tom
Super User

@djbateman wrote:

I actually have more than just SUBJID as BY variables in my real process because I'm working with ConMed data.  There is also a sequence ID, so a subject can have more than one record (and the date ranges could overlap).  I just figured I could take whatever solution someone else has and add in the sequence variable as well.  I'll play around with your solution (much simpler than what I have) and see if I can use it with my sequence variable added.

 


Sounds like the goal is NOT to generate one observation per date.

Instead it looks like you want to collapse potentially overlapping periods into a new set of non-overlapping periods.

That is a question that has been asked and answered before. Search around.

Tom
Super User Tom
Super User

It is trivial given your example data.

Spoiler
data have;
  input subjid :$7. stdt :date. endt :date.;
  format stdt endt date9.;
cards;
101-001 01JAN2022 07JAN2022
101-002 01FEB2022 28FEB2022
101-003 01JAN2022 14JAN2022
101-004 01JAN2022 03JAN2022
;

Just a simple DO loop.

data want;
  set have;
  do date=stdt to endt ;
    output;
  end;
  format date date9.;
run;

If your real problem is more complex (multiple observations per ID, overlapping date ranges, etc) then perhaps you just want to use an array to mark which dates are covered.

%let mindt=%sysevalf('01JAN2000'd);
%let maxdt=%sysevalf('31DEC2025'd);

data want ;
   set have;
   by subjid ;
   array _d [&mindt:&maxdt] _temporary_ ;
   if first.subjid then call missing(of _d(*));
   do date=max(stdt,&mindt) to min(endt,&maxdt);
      _d[date]=1;
   end;
   if last.subjid then do date=&mindt to &maxdt ;
      if _d[date] then output;
   end;
   keep subjid date ;
   format date date9.;
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 5679 views
  • 0 likes
  • 3 in conversation