I currently have a set of 16 start dates and 16 end dates, per observation. For the sake of this example let's pretend that it is 3 repeating variables, a start time when a pay period began and an end time, when a pay period ended:
What I am interested in doing is collapsing end dates and the subsequent start date when they are less than 30 days apart:
start_1: Jan 1 end_1: Jan 15 start_2: Jan 31 end_2: Feb 16 start_3: May 1 end_3: May 12
This would become:
new_start_1: Jan 1 new_end_1: Feb 16 new_start_2: May 1 new_end_2: May 12
But, simultaneously, there could be another observation that looks like this, where all of the distances are > or equal to 30:
start_1: Jan 1 end_1: Jan 15 start_2: Feb 31 end_2: Mar 16 start_3: May 1 end_3: May 12
This would essentially stay the same, but to maintain uniformity would need to be transferred to the new variable set:
new_start_1: Jan 1 new_end_1: Jan 15 new_start_2: Feb 31 new_end_2: Mar 16 new_start_3: May 1 3 new_end_3: May 12
Some additional points:
- This data was a created flatfile, so the 16th set of start and end variables is for the person that had the most pay periods. Some individuals have only start_1 and end_1 filled and the remaining are missing.
- All of these dates ascend chronologically.
What do you recommend as my most efficient way to do this?
You should start by transposing into a long format, so you have individual records with only one start and end. Then it's only a sequential scan and using the lag() function to compare with the previous record. Keeping data in a long format removes all requirements for knowing the maximum number of periods, and it frees all the space used by missing values.
I even guess that your data was in a long format somewhere along the way.
This is the code that I would use:
data want;
set have (rename=(start=_start end=_end)); /* rename start and end to get them out of the way */
by id;
retain start; /* keep new start variable across observations */
format start end date9.;
oldend = lag(_end); /* do this before everything else, so lag() is never called conditionally */
if first.id
then start = _start; /* initialize */
else do;
if _start - oldend ge 30
then do;
end = oldend; /* take end from previous record */
output;
start = _start; /* initialize again */
end;
end;
if last.id
then do; /* cleanup at end of id group */
end = _end;
output;
end;
drop _start _end oldend; /* get rid of old and intermediate variables */
run;
Edit: changed the comparison to ge instead of just >
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.