Hi all,
been thinking about this the whole week. your thoughts or help is much appreciated.
If i have a table with the following records,
ID Start Date End Date
1 1-July-2011 5-july-2011
1 6-july-2011 10-july-2011
1 11-july-2011 16-july-2011
2 1-july-2011 15-july-2011
2 17-july-2011 20-july-2011
How best should i code SAS to return only 3 rows with min and max date for the 2 ids above under the condition that only where dates are connected. I am expecting the following results.
1 1-july-2011 16-july-2011
2 1-july-2011 15-july-2011
2 17-july-2011 20-july-2011
Please help!,
Stuck!
It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.
I'd REALLY suggest testing it with your real data and looking over the results quite carefully:
data have;
informat Start_Date End_Date anydtdte21.;
format Start_Date End_Date date9.;
input id Start_Date End_Date;
cards;
1 1-Jul-2011 5-jul-2011
1 6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2 1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;
data want (keep=id Start_Date End_Date);
set have (rename=(Start_Date=S_Date
End_Date=E_Date));
format Start_Date End_Date date9.;
by id;
retain Start_Date End_Date;
last_E_Date=lag(E_Date);
if first.id then do;
Start_Date=S_Date;
End_Date=E_Date;
end;
else if last_E_Date+1 eq S_Date then do;
End_Date=E_Date;
if last.id then output;
end;
else if last_E_Date+1 ne S_date then do;
output;
Start_Date=S_Date;
End_Date=E_Date;
if last.id then output;
end;
run;
It's probably too late at night for me to be looking at your problem, and I've probably made it far too complex, and I may not have included all possible conditions, but the following appears to do what you want.
I'd REALLY suggest testing it with your real data and looking over the results quite carefully:
data have;
informat Start_Date End_Date anydtdte21.;
format Start_Date End_Date date9.;
input id Start_Date End_Date;
cards;
1 1-Jul-2011 5-jul-2011
1 6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2 1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;
data want (keep=id Start_Date End_Date);
set have (rename=(Start_Date=S_Date
End_Date=E_Date));
format Start_Date End_Date date9.;
by id;
retain Start_Date End_Date;
last_E_Date=lag(E_Date);
if first.id then do;
Start_Date=S_Date;
End_Date=E_Date;
end;
else if last_E_Date+1 eq S_Date then do;
End_Date=E_Date;
if last.id then output;
end;
else if last_E_Date+1 ne S_date then do;
output;
Start_Date=S_Date;
End_Date=E_Date;
if last.id then output;
end;
run;
Thanks Art!,
that would work if i have only 1 broken date series (i.e. dates 1-5 and 7-10). Would that still work if i multiple broken date series? (i.e. 1-5, 7-10, 11-16, 18-20, 21-25).
I suspect I may need to throw in a do-loop counter in there... but where?
Art may already be asleep, anyone else has any ideas?
thanks heaps!,
Almost there...
It worked for all of the cases in your new example, but you could have tested that yourself.
Something you might want to watch for is missing or overlapping date ranges. I often use similarly structured data to your example and this is something I have to keep an eye out for.
Also Art's code would required your input data be correctly sorted.
I usually like to take a data intensive approach. If you expand the data to one obs per day you can easily handle overlaps, by discarding duplicate days. You can also easily adjust for a GAP size other than 1 day.
data have;
input ID (Start End) (:date.);
format S: E: date11.;
cards;
1 1-jul-2011 5-jul-2011
1 6-jul-2011 10-jul-2011
1 11-jul-2011 16-jul-2011
2 1-jul-2011 15-jul-2011
2 17-jul-2011 20-jul-2011
;;;;
run;
proc print;
run;
data expandV(keep=id date) / view=expandV;
set have;
do date = start to end;
output;
end;
format date date11.;
run;
proc sort data=expandV out=expand nodupkey;
by id date;
run;
data expandV(keep=id date run) / view=expandV;
set expand;
by id date;
dif = dif(date);
if first.id then do;
dif=1;
run=1;
end;
if dif gt 1 then run + 1;
run;
proc summary nway data=expandV;
by id run;
output out=flat(drop=_type_) min(date)=Start max(date)=End;
run;
proc print;
run;
data test;
input id startdt enddt;
informat startdt enddt date9.;
format startdt enddt date9.;
datalines;
1 01Jul2011 05jul2011
1 06jul2011 10jul2011
1 11jul2011 16jul2011
2 01jul2011 15jul2011
2 017jul2011 20jul2011
;
run;
data test1;
set test;
by id;
flag = (startdt - lag(enddt)=1);
run;
proc sql;
create table test3 as
select id, min(startdt) as startdt format date9., max(enddt) as enddt format date9.
from test1
group by id
having count(id)-sum(flag)=1
union
select id, startdt, enddt from test1
where id not in (select distinct id from test1 group by id having count(id)-sum(flag)=1);
quit;
How about:
data have; input ID (Start End) (:date12.); format S: E: date9.; cards; 1 1-jul-2011 5-jul-2011 1 6-jul-2011 10-jul-2011 1 11-jul-2011 16-jul-2011 2 1-jul-2011 15-jul-2011 2 17-jul-2011 20-jul-2011 ;;;; run; data want; set have; if (id eq lag(id) and start-lag(end) gt 1 ) or (id ne lag(id)) then flag+1; run; data want(keep=id _start end); set want; by flag; retain _start; if first.flag then _start=start; if last.flag then output; format _start date9.; run;
Ksharp
Sorry Find a problem in my origin code.
消息编辑者为:xia keshan
消息编辑者为:xia keshan still a problem
Thank you all,
appreciate your responses. Will try these codes when I return back to my desk!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.