I currently have a set of 16 start dates and 16 end dates, per observation. For the sake of this example let's pretend that it is 3 repeating variables, a start time when a pay period began and an end time, when a pay period ended:
What I am interested in doing is collapsing end dates and the subsequent start date when they are less than 30 days apart:
start_1: Jan 1 end_1: Jan 15 start_2: Jan 31 end_2: Feb 16 start_3: May 1 end_3: May 12
This would become:
new_start_1: Jan 1 new_end_1: Feb 16 new_start_2: May 1 new_end_2: May 12
But, simultaneously, there could be another observation that looks like this, where all of the distances are > or equal to 30:
start_1: Jan 1 end_1: Jan 15 start_2: Feb 31 end_2: Mar 16 start_3: May 1 end_3: May 12
This would essentially stay the same, but to maintain uniformity would need to be transferred to the new variable set:
new_start_1: Jan 1 new_end_1: Jan 15 new_start_2: Feb 31 new_end_2: Mar 16 new_start_3: May 1 3 new_end_3: May 12
Some additional points:
- This data was a created flatfile, so the 16th set of start and end variables is for the person that had the most pay periods. Some individuals have only start_1 and end_1 filled and the remaining are missing.
- All of these dates ascend chronologically.
What do you recommend as my most efficient way to do this?
You should start by transposing into a long format, so you have individual records with only one start and end. Then it's only a sequential scan and using the lag() function to compare with the previous record. Keeping data in a long format removes all requirements for knowing the maximum number of periods, and it frees all the space used by missing values.
I even guess that your data was in a long format somewhere along the way.
This is the code that I would use:
data want;
set have (rename=(start=_start end=_end)); /* rename start and end to get them out of the way */
by id;
retain start; /* keep new start variable across observations */
format start end date9.;
oldend = lag(_end); /* do this before everything else, so lag() is never called conditionally */
if first.id
then start = _start; /* initialize */
else do;
if _start - oldend ge 30
then do;
end = oldend; /* take end from previous record */
output;
start = _start; /* initialize again */
end;
end;
if last.id
then do; /* cleanup at end of id group */
end = _end;
output;
end;
drop _start _end oldend; /* get rid of old and intermediate variables */
run;
Edit: changed the comparison to ge instead of just >
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.