BookmarkSubscribeRSS Feed
djohn051
Fluorite | Level 6

I currently have a set of 16 start dates and 16 end dates, per observation. For the sake of this example let's pretend that it is 3 repeating variables, a start time when a pay period began and an end time, when a pay period ended:

  • start_1 end_1 start_2 end_2 start_3 end_3

What I am interested in doing is collapsing end dates and the subsequent start date when they are less than 30 days apart:

start_1: Jan 1 end_1: Jan 15 start_2: Jan 31 end_2: Feb 16 start_3: May 1 end_3: May 12

 

This would become:

new_start_1: Jan 1 new_end_1: Feb 16 new_start_2: May 1 new_end_2: May 12

 

But, simultaneously, there could be another observation that looks like this, where all of the distances are > or equal to 30:

start_1: Jan 1 end_1: Jan 15 start_2: Feb 31 end_2: Mar 16 start_3: May 1 end_3: May 12

 

This would essentially stay the same, but to maintain uniformity would need to be transferred to the new variable set:

new_start_1: Jan 1  new_end_1: Jan 15  new_start_2: Feb 31  new_end_2: Mar 16 new_start_3: May 1new_end_3: May 12

 

Some additional points:

- This data was a created flatfile, so the 16th set of start and end variables is for the person that had the most pay periods. Some individuals have only start_1 and end_1 filled and the remaining are missing.

- All of these dates ascend chronologically. 

 

What do you recommend as my most efficient way to do this?

4 REPLIES 4
Kurt_Bremser
Super User

You should start by transposing into a long format, so you have individual records with only one start and end. Then it's only a sequential scan and using the lag() function to compare with the previous record. Keeping data in a long format removes all requirements for knowing the maximum number of periods, and it frees all the space used by missing values.

 

I even guess that your data was in a long format somewhere along the way.

djohn051
Fluorite | Level 6
That sounds much more feasible. Could you help me with writing the formula for that? Essentially my data would now be three columns: (id) (start) (end) - what I would want to do, within each id, is check if the (observation n+1 start date) - (end date) is <30 or GE 30.

If it is GE 30 then (start date) (end date) line is kept and the process is rerun for (observation n+1 start date) by subtracting (observation n+2 start date) - (observation n+1 end date).

If it is <30, then (observation start date) and (observation end date n+1) replaces (observation end date). Of course it would need to check each subsequent (start date n+#) - (end date) until it does find one GE 30 and then all data in between would be removed and, for example, your new line might be (observation start date) with the end date that came from (observation end date n+5).

How would you do this with the lag function?

Thank you!
Kurt_Bremser
Super User

This is the code that I would use:

data want;
set have (rename=(start=_start end=_end)); /* rename start and end to get them out of the way */
by id;
retain start; /* keep new start variable across observations */
format start end date9.;
oldend = lag(_end); /* do this before everything else, so lag() is never called conditionally */
if first.id
then start = _start; /* initialize */
else do;
  if _start - oldend ge 30
  then do;
    end = oldend; /* take end from previous record */
    output;
    start = _start; /* initialize again */
  end;
end;
if last.id
then do; /* cleanup at end of id group */
  end = _end;
  output;
end;
drop _start _end oldend; /* get rid of old and intermediate variables */
run;

Edit: changed the comparison to ge instead of just >

djohn051
Fluorite | Level 6
Thank you very much. I will give this a try!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 745 views
  • 0 likes
  • 2 in conversation