I'm wondering how, if I have a dataset of data that is in two contiguous sets of dates, i.e.:
date
1/2004
2/2004
3/2004
4/2004
5/2004
6/2004
7/2004
8/2004
9/2004
10/2004
11/2004
12/2004
1/2008
2/2008
3/2008
4/2008
5/2008
6/2008
7/2008
8/2008
9/2008
10/2008
11/2008
12/2008
And then would want separate datasets, one starting at 1/2004 up to the first discontiguos date (i.e. in the above simple example, until 1/2008) and then a dataset from 1/2008 onward. I'm having some trouble coding an algorithm for this as it must be very general, the starting date for each set of contiguous dates could be anything, there could be any amount of data from 100 months of data to 12, and each set could be any number of months apart, I just want to find the distinction as soon as their are two consecutive dates that are more than a month apart.
I deeply appreciate any suggestions.
You need to initialize the variables for each "group" then, ie if first.group then set the lag date to missing and the count back to 1.
Everything else should stay the same:
data want;
set have;
by group date;
retain continuous_group 1;
prev_date=lag(date);
if first.group then do;
prev_date=.;
continuous_group=1;
end;
format prev_date date9.;
diff=intck('month', prev_date, date);
if intck('month', prev_date, date)>1 then continuous_group+1;
run;
Use a lag function to check the current date against previous date and create a grouping variable.
Are you separating into SAS datasets or a text file?
Like Reeza mentioned, retain or lag is the key to your problem. Here is a possible approach using lag() + Hash, giving you a dynamic output;
data have;
input date ANYDTDTE7.;
format date date9.;
cards;
1/2004
2/2004
3/2004
4/2004
5/2004
6/2004
7/2004
8/2004
9/2004
10/2004
11/2004
12/2004
1/2008
2/2008
3/2008
4/2008
5/2008
6/2008
7/2008
8/2008
9/2008
10/2008
11/2008
12/2008
10/2009
11/2009
12/2009
;
data _null_;
if _n_=1 then do;
declare hash h(ordered:'a');
h.definekey('date');
h.definedata('date');
h.definedone();
end;
set have end=last;
if intck('month',lag(date),date) >1 then do;
n+1;
rc=h.output(dataset:cats('split',n));
rc=h.clear();
end;
if last then do; rc=h.replace();rc=h.output(dataset:cats('split',n+1)); end;
rc=h.replace();
run;
Please note, if you don't want the group of single obs, you can easily tweak the code by adding a simple condition.
Haikuo
Reeza & Hai.kuo - thank you both very much for your suggestions. However there is one complication that I forgot to mention and was not clear at all in the very simplistic example I gave -- my data has a grouping variable, i.e. is essentially longitudinal data -- so the sets of dates are repeated accross multiple groups, i.e.
Group Date
---------------------------
group1 1/2004
group1 2/2004
...
group1 12/2008
group2 1/2004
group2 2/2004
...
group2 12/2008
...etc
And so that's why I believe Hai.kuo's code didn't full work as I got many different datasets with various date ranges.
- Also, Reeza, I am trying to work with lag, I understand how to determine when I first encounter the dis-contiguous month with lag(), but I don't know how to set the flag consistently across all dates after just the first dis-contiguous month, across all grouping scenarios.Also I am writing to temporary SAS datasets as these datasets will immediately be used for further analysis in the SAS script, and are no longer needed when the SAS script finishes executing.
You need to initialize the variables for each "group" then, ie if first.group then set the lag date to missing and the count back to 1.
Everything else should stay the same:
data want;
set have;
by group date;
retain continuous_group 1;
prev_date=lag(date);
if first.group then do;
prev_date=.;
continuous_group=1;
end;
format prev_date date9.;
diff=intck('month', prev_date, date);
if intck('month', prev_date, date)>1 then continuous_group+1;
run;
Assuming you have actual dates instead of just strings that look like mm/yyyy.
Assign a group variable than changes when there is a gap in dates. Restart when a new id group is started.
data want ;
set have ;
by id date ;
lag_date = lag(date);
if first.id then group=1 ;
else if intnx('month',lag_date,1) ^= intnx('month',date,0) then group+1;
run;
You probably do NOT need to make three datasets (how do you know it will be three and not 100?). Just use the new GROUP variable in you later code.
proc means data=want ;
where group=1;
run;
Thank you both! both answers work perfectly. I understand now, thank you so much!!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.