BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SarahDew
Obsidian | Level 7

I have a list of the countries where people lived. If they changed adres within a country this appears on several lines, however, I would like to make it one line. Consider the following example:

data sample;
input ID $ Country $ Startdate :date9. Enddate :date9. Time;
format Startdate Enddate date9.;
datalines;
A France 05NOV2006 03OCT2012 6.1
A France 04OCT2012 05SEP2015 3.0
A France 06SEP2015 01JUN2016 0.8
A US 02JUN2016 18SEP2019 3.4
B France 17DEC2006 09MAY2007 0.4
B France 10MAY2007 01FEB2014 6.9
B Germany 02FEB2014 02FEB2015 1.0
B Germany 03FEB2015 02JUL2017 2.5
B France 03JUL2017 05APR2018 0.8
B US 06APR2018 18SEP2019 1.5
;run;

I want the result to create one line per block of continuous living in one country and the sum of the time spent in that country:

data result;
input ID $ Country $ Startdate :date9. Enddate :date9. Time;
format Startdate Enddate date9.;
datalines;
A France 05NOV2006 01JUN2016 9.9
A US 02JUN2016 18SEP2019 3.4
B France 17DEC2006 01FEB2014 7.3
B Germany 02FEB2014 02JUL2017 3.5
B France 03JUL2017 05APR2018 0.8
B US 06APR2018 18SEP2019 1.5
;run;

This is my basic attempt, but I can't figure out how to only get the startdate from the first line, and the enddate from the last line, and how to make sure I only combine continuous periods in one country, not when there was another country in between. Possibly something with "if first.startdate..." in a data step?

proc sql;
create table combined as
select distinct ID, Country, Startdate, Enddate, sum(Time)
from sample
group by ID, Country;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Problems like yours can be solved with a single data-step using first/last:

data work.want;
   set work.sample;
   by ID Country notsorted;

   retain sumTime firstStartDate;
   drop sumTime firstStartDate;

   if  first.Country then do;
      sumTime = 0;
      firstStartDate = StartDate;
   end;

   sumTime = sumTime + Time;

   if last.Country then do;
      StartDate = firstStartDate;
      Time = sumTime;
      output;
   end;
run;

 EDIT: Just read that non-continuous periods have to be treated in a special way, but you don't have them in the data you posted, so they aren't incorporated in my "solution", too.

View solution in original post

3 REPLIES 3
SarahDew
Obsidian | Level 7

Just figured I can use min and max to get the right startdate, just need to figure out how to distinguish non-continuous periods:

proc sql;
create table combined as
select distinct ID, Country, min(Startdate) as Startdate, max(Enddate) as Enddate, sum(Time) as time
from sample
group by ID, Country;
quit;
andreas_lds
Jade | Level 19

Problems like yours can be solved with a single data-step using first/last:

data work.want;
   set work.sample;
   by ID Country notsorted;

   retain sumTime firstStartDate;
   drop sumTime firstStartDate;

   if  first.Country then do;
      sumTime = 0;
      firstStartDate = StartDate;
   end;

   sumTime = sumTime + Time;

   if last.Country then do;
      StartDate = firstStartDate;
      Time = sumTime;
      output;
   end;
run;

 EDIT: Just read that non-continuous periods have to be treated in a special way, but you don't have them in the data you posted, so they aren't incorporated in my "solution", too.

SarahDew
Obsidian | Level 7
By non-continuous I meant like person B who stays in France twice but with a move in between. Your solution accounts for this so works fine for my purpose. I can imagine there might be a situation where there is some missing data between the same country, but this does not occur in my current dataset. When I run into this I might post again 😉

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1324 views
  • 1 like
  • 2 in conversation