BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SarahDew
Obsidian | Level 7

I have a list of the countries where people lived. If they changed adres within a country this appears on several lines, however, I would like to make it one line. Consider the following example:

data sample;
input ID $ Country $ Startdate :date9. Enddate :date9. Time;
format Startdate Enddate date9.;
datalines;
A France 05NOV2006 03OCT2012 6.1
A France 04OCT2012 05SEP2015 3.0
A France 06SEP2015 01JUN2016 0.8
A US 02JUN2016 18SEP2019 3.4
B France 17DEC2006 09MAY2007 0.4
B France 10MAY2007 01FEB2014 6.9
B Germany 02FEB2014 02FEB2015 1.0
B Germany 03FEB2015 02JUL2017 2.5
B France 03JUL2017 05APR2018 0.8
B US 06APR2018 18SEP2019 1.5
;run;

I want the result to create one line per block of continuous living in one country and the sum of the time spent in that country:

data result;
input ID $ Country $ Startdate :date9. Enddate :date9. Time;
format Startdate Enddate date9.;
datalines;
A France 05NOV2006 01JUN2016 9.9
A US 02JUN2016 18SEP2019 3.4
B France 17DEC2006 01FEB2014 7.3
B Germany 02FEB2014 02JUL2017 3.5
B France 03JUL2017 05APR2018 0.8
B US 06APR2018 18SEP2019 1.5
;run;

This is my basic attempt, but I can't figure out how to only get the startdate from the first line, and the enddate from the last line, and how to make sure I only combine continuous periods in one country, not when there was another country in between. Possibly something with "if first.startdate..." in a data step?

proc sql;
create table combined as
select distinct ID, Country, Startdate, Enddate, sum(Time)
from sample
group by ID, Country;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Problems like yours can be solved with a single data-step using first/last:

data work.want;
   set work.sample;
   by ID Country notsorted;

   retain sumTime firstStartDate;
   drop sumTime firstStartDate;

   if  first.Country then do;
      sumTime = 0;
      firstStartDate = StartDate;
   end;

   sumTime = sumTime + Time;

   if last.Country then do;
      StartDate = firstStartDate;
      Time = sumTime;
      output;
   end;
run;

 EDIT: Just read that non-continuous periods have to be treated in a special way, but you don't have them in the data you posted, so they aren't incorporated in my "solution", too.

View solution in original post

3 REPLIES 3
SarahDew
Obsidian | Level 7

Just figured I can use min and max to get the right startdate, just need to figure out how to distinguish non-continuous periods:

proc sql;
create table combined as
select distinct ID, Country, min(Startdate) as Startdate, max(Enddate) as Enddate, sum(Time) as time
from sample
group by ID, Country;
quit;
andreas_lds
Jade | Level 19

Problems like yours can be solved with a single data-step using first/last:

data work.want;
   set work.sample;
   by ID Country notsorted;

   retain sumTime firstStartDate;
   drop sumTime firstStartDate;

   if  first.Country then do;
      sumTime = 0;
      firstStartDate = StartDate;
   end;

   sumTime = sumTime + Time;

   if last.Country then do;
      StartDate = firstStartDate;
      Time = sumTime;
      output;
   end;
run;

 EDIT: Just read that non-continuous periods have to be treated in a special way, but you don't have them in the data you posted, so they aren't incorporated in my "solution", too.

SarahDew
Obsidian | Level 7
By non-continuous I meant like person B who stays in France twice but with a move in between. Your solution accounts for this so works fine for my purpose. I can imagine there might be a situation where there is some missing data between the same country, but this does not occur in my current dataset. When I run into this I might post again 😉

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 655 views
  • 1 like
  • 2 in conversation