DATA Step, Macro, Functions and more

splitting dataset into multiple datasets of only contiguous data

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 105
Accepted Solution

splitting dataset into multiple datasets of only contiguous data

I'm wondering how, if I have a dataset of data that is in two contiguous sets of dates, i.e.:

date

1/2004

2/2004

3/2004

4/2004

5/2004

6/2004

7/2004

8/2004

9/2004

10/2004

11/2004

12/2004

1/2008

2/2008

3/2008

4/2008

5/2008

6/2008

7/2008

8/2008

9/2008

10/2008

11/2008

12/2008

And then would want separate datasets, one starting at 1/2004 up to the first discontiguos date (i.e. in the above simple example, until 1/2008) and then a dataset from 1/2008 onward. I'm having some trouble coding an algorithm for this as it must be very general, the starting date for each set of contiguous dates could be anything, there could be any amount of data from 100 months of data to 12, and each set could be any number of months apart, I just want to find the distinction as soon as their are two consecutive dates that are more than a month apart.

I deeply appreciate any suggestions.


Accepted Solutions
Solution
‎03-13-2014 12:01 PM
Super User
Posts: 17,819

Re: splitting dataset into multiple datasets of only contiguous data

You need to initialize the variables for each "group" then, ie if first.group then set the lag date to missing and the count back to 1.

Everything else should stay the same:

data want;

set have;

by group date;

retain continuous_group 1;

prev_date=lag(date);

if first.group then do;

    prev_date=.;

    continuous_group=1;

end;

format prev_date date9.;

diff=intck('month', prev_date, date);

if intck('month',  prev_date, date)>1 then continuous_group+1;

run;

View solution in original post


All Replies
Super User
Posts: 17,819

Re: splitting dataset into multiple datasets of only contiguous data

Use a lag function to check the current date against previous date and create a grouping variable.

Are you separating into SAS datasets or a text file?

Respected Advisor
Posts: 3,124

Re: splitting dataset into multiple datasets of only contiguous data

Like Reeza mentioned, retain or lag is the key to your problem. Here is a possible approach using lag() + Hash, giving you a dynamic output;

data have;

input date ANYDTDTE7.;

format date date9.;

cards;

1/2004

2/2004

3/2004

4/2004

5/2004

6/2004

7/2004

8/2004

9/2004

10/2004

11/2004

12/2004

1/2008

2/2008

3/2008

4/2008

5/2008

6/2008

7/2008

8/2008

9/2008

10/2008

11/2008

12/2008

10/2009

11/2009

12/2009

;

data _null_;

  if _n_=1 then do;

  declare hash h(ordered:'a');

  h.definekey('date');

  h.definedata('date');

  h.definedone();

end;

set have end=last;

   if intck('month',lag(date),date) >1 then do;

     n+1;

  rc=h.output(dataset:cats('split',n));

  rc=h.clear();

  end;

  if last then do; rc=h.replace();rc=h.output(dataset:cats('split',n+1)); end;

rc=h.replace();

run;

Please note, if you don't want the group of single obs, you can easily tweak the code by adding a simple condition.

Haikuo

Frequent Contributor
Posts: 105

Re: splitting dataset into multiple datasets of only contiguous data

Reeza & Hai.kuo - thank you both very much for your suggestions. However there is one complication that I forgot to mention and was not clear at all in the very simplistic example I gave -- my data has a grouping variable, i.e. is essentially longitudinal data -- so the sets of dates are repeated accross multiple groups, i.e.

Group        Date

---------------------------

group1      1/2004

group1      2/2004

...

group1      12/2008

group2      1/2004

group2      2/2004

...

group2      12/2008

...etc

And so that's why I believe Hai.kuo's code didn't full work as I got many different datasets with various date ranges.

- Also, Reeza, I am trying to work with lag, I understand how to determine when I first encounter the dis-contiguous month with lag(), but I don't know how to set the flag consistently across all dates after just the first dis-contiguous month, across all grouping scenarios.Also I am writing to temporary SAS datasets as these datasets will immediately be used for further analysis in the SAS script, and are no longer needed when the SAS script finishes executing.

Solution
‎03-13-2014 12:01 PM
Super User
Posts: 17,819

Re: splitting dataset into multiple datasets of only contiguous data

You need to initialize the variables for each "group" then, ie if first.group then set the lag date to missing and the count back to 1.

Everything else should stay the same:

data want;

set have;

by group date;

retain continuous_group 1;

prev_date=lag(date);

if first.group then do;

    prev_date=.;

    continuous_group=1;

end;

format prev_date date9.;

diff=intck('month', prev_date, date);

if intck('month',  prev_date, date)>1 then continuous_group+1;

run;

Super User
Super User
Posts: 6,499

Re: splitting dataset into multiple datasets of only contiguous data

Assuming you have actual dates instead of just strings that look like mm/yyyy.

Assign a group variable than changes when there is a gap in dates. Restart when a new id group is started.

data want ;

  set have ;

  by id date ;

  lag_date = lag(date);

   if first.id then group=1 ;

  else if intnx('month',lag_date,1) ^= intnx('month',date,0) then group+1;

run;

You probably do NOT need to make three datasets (how do you know it will be three and not 100?).  Just use the new GROUP variable in you later code.

proc means data=want ;

  where group=1;

run;

Frequent Contributor
Posts: 105

Re: splitting dataset into multiple datasets of only contiguous data

Thank you both! both answers work perfectly. I understand now, thank you so much!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 385 views
  • 6 likes
  • 4 in conversation