## Combine periods across observations

Solved
Regular Contributor
Posts: 161

# Combine periods across observations

Dear All:

I'm wondering how I can combine a few date range into one single date range if the variable of interest is the same across periods.  Basically, the following is my HAVE table:

doctorperiod_startperiod_endhospital
A19972000X
A20002003X
A20032009X
B19852001Y
B20012009Y
B20092013Z
C19821990K
C19902001M

After combining redundant periods, here is my WANT table:

doctorperiod_startperiod_endhospital
A19972009X
B19852009Y
B20092013Z
C19821990K
C19902001M

How could I best do this, is there a coalesce-like function across observations?  Thank you!

Accepted Solutions
Solution
‎07-25-2013 03:17 AM
SAS Super FREQ
Posts: 825

## Re: Combine periods across observations

Have a look at the following code:

*
* create test data
*;

data test;
infile cards dlm=",";

input
doctor :
\$8.
period_start :
8.
period_end :
8.
hospital :
\$8.
;
cards;
a,1997,2000,x
a,2000,2003,x
a,2003,2009,x
b,1985,2001,y
b,2001,2009,y
b,2009,2013,z
c,1982,1990,k
c,1990,2001,m
;

*
* using SAS Proc
*;

proc summary data=test nway;

class doctor hospital;

output out=want(drop=_ min(period_start)=period_start max(period_end)=period_end;
run;

*
* using SQL
*;

proc sql;

create table want_sql as

select
doctor
, min(period_start)
as period_start
, max(period_end)
as period_end
, hospital

from
test

group by
doctor
, hospital
;
quit;

All Replies
Solution
‎07-25-2013 03:17 AM
SAS Super FREQ
Posts: 825

## Re: Combine periods across observations

Have a look at the following code:

*
* create test data
*;

data test;
infile cards dlm=",";

input
doctor :
\$8.
period_start :
8.
period_end :
8.
hospital :
\$8.
;
cards;
a,1997,2000,x
a,2000,2003,x
a,2003,2009,x
b,1985,2001,y
b,2001,2009,y
b,2009,2013,z
c,1982,1990,k
c,1990,2001,m
;

*
* using SAS Proc
*;

proc summary data=test nway;

class doctor hospital;

output out=want(drop=_ min(period_start)=period_start max(period_end)=period_end;
run;

*
* using SQL
*;

proc sql;

create table want_sql as

select
doctor
, min(period_start)
as period_start
, max(period_end)
as period_end
, hospital

from
test

group by
doctor
, hospital
;
quit;
Super Contributor
Posts: 312

## Re: Combine periods across observations

You should get a successful result with the following.  The retain statement is very powerful in that it causes the value of an observation to be held across multiple observations and when combined with a by statement we can control what values we want retained and when they can be released.

DATA HAVE;

INFILE DATALINES DLM=",";

RETAIN DOCTOR HOSPITAL;

INPUT DOCTOR \$ PERIOD_START \$ PERIOD_END \$ HOSPITAL \$;

DATALINES;

A,1997,2000,X

A,2000,2003,X

A,2003,2009,X

B,1985,2001,Y

B,2001,2009,Y

B,2009,2013,Z

C,1982,1990,K

C,1990,2001,M

;

RUN;

PROC SORT DATA=HAVE;

BY DOCTOR HOSPITAL;

RUN;

DATA WANT;

SET HAVE;

BY DOCTOR HOSPITAL;

RETAIN _PERIOD_START _PERIOD_END;

IF FIRST.HOSPITAL THEN DO;

_PERIOD_START = PERIOD_START;

_PERIOD_END = PERIOD_END;

END;

IF LAST.HOSPITAL;

IF _PERIOD_START < PERIOD_START THEN PERIOD_START = _PERIOD_START;

IF _PERIOD_END > PERIOD_END THEN PERIOD_END = _PERIOD_END;

DROP _PERIOD_END _PERIOD_START;

RUN;

Regular Contributor
Posts: 161

## Re: Combine periods across observations

Thank you so much for your illustration!  Very educational

Frequent Contributor
Posts: 88

## Re: Combine periods across observations

Good answers already but try this if your dates are not continuous for a doctor at a hospital (I've tweaked the second start date) ...

data a;

input doctor \$

period_start

period_end

hospital \$;

cards;

A 1997 2000 X

A 2001 2003 X

A 2003 2009 X

B 1985 2001 Y

B 2001 2009 Y

B 2009 2013 Z

C 1982 1990 K

C 1990 2001 M

;

run;

proc sort data=a out=b;

by doctor hospital period_start;

run;

data wanted (keep=doctor hospital real_start real_end);

set b;

by doctor hospital;

retain start last_end;

if first.hospital then

do;

start = period_start;

last_end = period_end;

end;

if period_start > last_end then

do;

real_start = start;

real_end = last_end;

output;

start = period_start;

last_end = period_end;

end;

if last.hospital  then

do;

real_start = start;

real_end = period_end;

output;

end;

else

last_end = period_end;

run;

Regular Contributor
Posts: 161