Obsidian | Level 7

## Select first and last date from continuous series of adresses by group

I have a list of the countries where people lived. If they changed adres within a country this appears on several lines, however, I would like to make it one line. Consider the following example:

``````data sample;
input ID \$ Country \$ Startdate :date9. Enddate :date9. Time;
format Startdate Enddate date9.;
datalines;
A France 05NOV2006 03OCT2012 6.1
A France 04OCT2012 05SEP2015 3.0
A France 06SEP2015 01JUN2016 0.8
A US 02JUN2016 18SEP2019 3.4
B France 17DEC2006 09MAY2007 0.4
B France 10MAY2007 01FEB2014 6.9
B Germany 02FEB2014 02FEB2015 1.0
B Germany 03FEB2015 02JUL2017 2.5
B France 03JUL2017 05APR2018 0.8
B US 06APR2018 18SEP2019 1.5
;run;``````

I want the result to create one line per block of continuous living in one country and the sum of the time spent in that country:

``````data result;
input ID \$ Country \$ Startdate :date9. Enddate :date9. Time;
format Startdate Enddate date9.;
datalines;
A France 05NOV2006 01JUN2016 9.9
A US 02JUN2016 18SEP2019 3.4
B France 17DEC2006 01FEB2014 7.3
B Germany 02FEB2014 02JUL2017 3.5
B France 03JUL2017 05APR2018 0.8
B US 06APR2018 18SEP2019 1.5
;run;``````

This is my basic attempt, but I can't figure out how to only get the startdate from the first line, and the enddate from the last line, and how to make sure I only combine continuous periods in one country, not when there was another country in between. Possibly something with "if first.startdate..." in a data step?

``````proc sql;
create table combined as
select distinct ID, Country, Startdate, Enddate, sum(Time)
from sample
group by ID, Country;
quit;``````
1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Select first and last date from continuous series of adresses by group

Problems like yours can be solved with a single data-step using first/last:

``````data work.want;
set work.sample;
by ID Country notsorted;

retain sumTime firstStartDate;
drop sumTime firstStartDate;

if  first.Country then do;
sumTime = 0;
firstStartDate = StartDate;
end;

sumTime = sumTime + Time;

if last.Country then do;
StartDate = firstStartDate;
Time = sumTime;
output;
end;
run;``````

EDIT: Just read that non-continuous periods have to be treated in a special way, but you don't have them in the data you posted, so they aren't incorporated in my "solution", too.

3 REPLIES 3
Obsidian | Level 7

## Re: Select first and last date from continuous series of adresses by group

Just figured I can use min and max to get the right startdate, just need to figure out how to distinguish non-continuous periods:

``````proc sql;
create table combined as
select distinct ID, Country, min(Startdate) as Startdate, max(Enddate) as Enddate, sum(Time) as time
from sample
group by ID, Country;
quit;``````

## Re: Select first and last date from continuous series of adresses by group

Problems like yours can be solved with a single data-step using first/last:

``````data work.want;
set work.sample;
by ID Country notsorted;

retain sumTime firstStartDate;
drop sumTime firstStartDate;

if  first.Country then do;
sumTime = 0;
firstStartDate = StartDate;
end;

sumTime = sumTime + Time;

if last.Country then do;
StartDate = firstStartDate;
Time = sumTime;
output;
end;
run;``````

EDIT: Just read that non-continuous periods have to be treated in a special way, but you don't have them in the data you posted, so they aren't incorporated in my "solution", too.

Obsidian | Level 7

## Re: Select first and last date from continuous series of adresses by group

By non-continuous I meant like person B who stays in France twice but with a move in between. Your solution accounts for this so works fine for my purpose. I can imagine there might be a situation where there is some missing data between the same country, but this does not occur in my current dataset. When I run into this I might post again 😉
Discussion stats
• 3 replies
• 682 views
• 1 like
• 2 in conversation