BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AllSoEasy
Obsidian | Level 7

I'm wondering how, if I have a dataset of data that is in two contiguous sets of dates, i.e.:

date

1/2004

2/2004

3/2004

4/2004

5/2004

6/2004

7/2004

8/2004

9/2004

10/2004

11/2004

12/2004

1/2008

2/2008

3/2008

4/2008

5/2008

6/2008

7/2008

8/2008

9/2008

10/2008

11/2008

12/2008

And then would want separate datasets, one starting at 1/2004 up to the first discontiguos date (i.e. in the above simple example, until 1/2008) and then a dataset from 1/2008 onward. I'm having some trouble coding an algorithm for this as it must be very general, the starting date for each set of contiguous dates could be anything, there could be any amount of data from 100 months of data to 12, and each set could be any number of months apart, I just want to find the distinction as soon as their are two consecutive dates that are more than a month apart.

I deeply appreciate any suggestions.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You need to initialize the variables for each "group" then, ie if first.group then set the lag date to missing and the count back to 1.

Everything else should stay the same:

data want;

set have;

by group date;

retain continuous_group 1;

prev_date=lag(date);

if first.group then do;

    prev_date=.;

    continuous_group=1;

end;

format prev_date date9.;

diff=intck('month', prev_date, date);

if intck('month',  prev_date, date)>1 then continuous_group+1;

run;

View solution in original post

6 REPLIES 6
Reeza
Super User

Use a lag function to check the current date against previous date and create a grouping variable.

Are you separating into SAS datasets or a text file?

Haikuo
Onyx | Level 15

Like Reeza mentioned, retain or lag is the key to your problem. Here is a possible approach using lag() + Hash, giving you a dynamic output;

data have;

input date ANYDTDTE7.;

format date date9.;

cards;

1/2004

2/2004

3/2004

4/2004

5/2004

6/2004

7/2004

8/2004

9/2004

10/2004

11/2004

12/2004

1/2008

2/2008

3/2008

4/2008

5/2008

6/2008

7/2008

8/2008

9/2008

10/2008

11/2008

12/2008

10/2009

11/2009

12/2009

;

data _null_;

  if _n_=1 then do;

  declare hash h(ordered:'a');

  h.definekey('date');

  h.definedata('date');

  h.definedone();

end;

set have end=last;

   if intck('month',lag(date),date) >1 then do;

     n+1;

  rc=h.output(dataset:cats('split',n));

  rc=h.clear();

  end;

  if last then do; rc=h.replace();rc=h.output(dataset:cats('split',n+1)); end;

rc=h.replace();

run;

Please note, if you don't want the group of single obs, you can easily tweak the code by adding a simple condition.

Haikuo

AllSoEasy
Obsidian | Level 7

Reeza & Hai.kuo - thank you both very much for your suggestions. However there is one complication that I forgot to mention and was not clear at all in the very simplistic example I gave -- my data has a grouping variable, i.e. is essentially longitudinal data -- so the sets of dates are repeated accross multiple groups, i.e.

Group        Date

---------------------------

group1      1/2004

group1      2/2004

...

group1      12/2008

group2      1/2004

group2      2/2004

...

group2      12/2008

...etc

And so that's why I believe Hai.kuo's code didn't full work as I got many different datasets with various date ranges.

- Also, Reeza, I am trying to work with lag, I understand how to determine when I first encounter the dis-contiguous month with lag(), but I don't know how to set the flag consistently across all dates after just the first dis-contiguous month, across all grouping scenarios.Also I am writing to temporary SAS datasets as these datasets will immediately be used for further analysis in the SAS script, and are no longer needed when the SAS script finishes executing.

Reeza
Super User

You need to initialize the variables for each "group" then, ie if first.group then set the lag date to missing and the count back to 1.

Everything else should stay the same:

data want;

set have;

by group date;

retain continuous_group 1;

prev_date=lag(date);

if first.group then do;

    prev_date=.;

    continuous_group=1;

end;

format prev_date date9.;

diff=intck('month', prev_date, date);

if intck('month',  prev_date, date)>1 then continuous_group+1;

run;

Tom
Super User Tom
Super User

Assuming you have actual dates instead of just strings that look like mm/yyyy.

Assign a group variable than changes when there is a gap in dates. Restart when a new id group is started.

data want ;

  set have ;

  by id date ;

  lag_date = lag(date);

   if first.id then group=1 ;

  else if intnx('month',lag_date,1) ^= intnx('month',date,0) then group+1;

run;

You probably do NOT need to make three datasets (how do you know it will be three and not 100?).  Just use the new GROUP variable in you later code.

proc means data=want ;

  where group=1;

run;

AllSoEasy
Obsidian | Level 7

Thank you both! both answers work perfectly. I understand now, thank you so much!!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1056 views
  • 6 likes
  • 4 in conversation