BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cbagdon-cox
Obsidian | Level 7

Trying to merge rows in which an individual was at the same location but has multiple date ranges resulting in multiple rows. Want to have one row per location with the first start date and last end date. Having a blank end date means they are still at that location and that would need to be kept.

 

I don't even know where to start with this merge. Open to DATA and PROC SQL steps. This data has approximately 500 rows currently for approximately 60 individuals. The reason the data populates like this is because there are separate bed assignments for each row, that level of data is not needed for this output, just location and the dates.

 

What my data currently looks like:

FULLNAMESTARTENDLOCATION
SMITH, JOHN14Sep201015Sep2010CEDAR
SMITH, JOHN15Sep201028Sep2010CEDAR
SMITH, JOHN28Sep201027Oct2010CEDAR
SMITH, JOHN28Oct201020Apr2011OAK
SMITH, JOHN20Apr201102May2011OAK
SMITH, JOHN02May201117May2011OAK
SMITH, JOHN17May201118Jul2011PINE-A
SMITH, JOHN18Jul201103Oct2011PINE-A
SMITH, JOHN03Oct201113Dec2011PINE-A
SMITH, JOHN13Dec201131Jan2012OAK
SMITH, JOHN31Jan201208Feb2012OAK
SMITH, JOHN08Feb201208Aug2012OAK
SMITH, JOHN08Aug201217Dec2012OAK
MILLER, SALLY21Aug201526Aug2015PINE-B
MILLER, SALLY26Aug201522Sep2015CEDAR
MILLER, SALLY22Sep201514Oct2015CEDAR
MILLER, SALLY14Oct201522Nov2015CEDAR
MILLER, SALLY22Nov201504Dec2015CEDAR
MILLER, SALLY04Dec201510Dec2015CEDAR
MILLER, SALLY10Dec201510Mar2016OAK
MILLER, SALLY10Mar201629Dec2017OAK
MILLER, SALLY29Dec201710May2018OAK
MILLER, SALLY10May201801Nov2018OAK
MILLER, SALLY01Nov2018 OAK
JOHNSON, ROBERT16Apr201717Apr2017MAPLE-A
JOHNSON, ROBERT17Apr201708May2017MAPLE-A
JOHNSON, ROBERT08May201722May2017MAPLE-B
JOHNSON, ROBERT22May201731May2017WILLOW
JOHNSON, ROBERT31May201708Jun2017WILLOW
JOHNSON, ROBERT08Jun201702Aug2017WILLOW
JOHNSON, ROBERT02Aug201707Aug2017MAPLE-C
JOHNSON, ROBERT07Aug201711Sep2017WILLOW
JOHNSON, ROBERT11Sep201703Apr2018WILLOW

 

What I want my data to look like:

FULLNAMESTARTENDLOCATION
SMITH, JOHN14Sep201027Oct2010CEDAR
SMITH, JOHN28Oct201017May2011OAK
SMITH, JOHN17May201113Dec2011PINE-A
SMITH, JOHN13Dec201117Dec2012OAK
MILLER, SALLY21Aug201526Aug2015PINE-B
MILLER, SALLY26Aug201510Dec2015CEDAR
MILLER, SALLY10Dec2015 OAK
JOHNSON, ROBERT16Apr201708May2017MAPLE-A
JOHNSON, ROBERT08May201722May2017MAPLE-B
JOHNSON, ROBERT22May201702Aug2017WILLOW
JOHNSON, ROBERT02Aug201707Aug2017MAPLE-C
1 ACCEPTED SOLUTION

Accepted Solutions
cbagdon-cox
Obsidian | Level 7

Here was the code that ended up working for this. Essentially had to make a new ID for each stay and then sort by min start date and max end date based on those IDs. Also a step to account for the missing value. Thanks for the assistance!

 

data want1;
set have;
retain ConsecID 0;
ConsecID= ifn(lag(fullname)^=fullname or lag(location)^=location, ConsecID+1, ConsecID);
run;

PROC SQL;
CREATE TABLE want2 AS
SELECT DISTINCT 
  FULLNAME
, min(start) as STARTDATE format=date9.
, case when (max(coalesce(end, '01JAN2099'd)))='01JAN2099'd 
	then . 
	else (max(end)) 
	end as NEWEND format=date9.
, location
FROM WORK.want1 AS A
	group by consecID
	order by fullname
;quit;

 

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20
proc sql;
create table want as
select fullname. min(start) as start format=date9.,max(end) as end format=date9.,
location
from your_table
group by location,fullname;
quit;
cbagdon-cox
Obsidian | Level 7

Almost worked but not quite. Two issues:

1. Doesn't account for blank end dates.

2. It merges multiple stays at a location into one with the first stays start date and the last stays ends date. I need to know the start and end date for each stay at a location.

For example, John Smith stayed at Oak on two occasions, I need the start and end date for each of those stays as opposed to the start date of the first stay and the end date of the last stay.

ballardw
Super User

PLEASE provide data in the form of a data step. I have made a short bit from your post but I really don't like having to spend a lot time manually fixing posted data.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

See if this matches your expectation:

data have;
infile datalines dlm='|';
informat fullname $25. start end date9. location $10.;
format start end date9.;
input FULLNAME	START	END	LOCATION;
datalines;
SMITH, JOHN|14Sep2010|15Sep2010|CEDAR
SMITH, JOHN|15Sep2010|28Sep2010|CEDAR
SMITH, JOHN|28Sep2010|27Oct2010|CEDAR
SMITH, JOHN|28Oct2010|20Apr2011|OAK
SMITH, JOHN|20Apr2011|02May2011|OAK
SMITH, JOHN|02May2011|17May2011|OAK
SMITH, JOHN|17May2011|18Jul2011|PINE-A
SMITH, JOHN|18Jul2011|03Oct2011|PINE-A
SMITH, JOHN|03Oct2011|13Dec2011|PINE-A
SMITH, JOHN|13Dec2011|31Jan2012|OAK
SMITH, JOHN|31Jan2012|08Feb2012|OAK
SMITH, JOHN|08Feb2012|08Aug2012|OAK
SMITH, JOHN|08Aug2012|17Dec2012|OAK
;

proc sort data=have;
  by fullname location start;
run;

data want;
   set have;
   by fullname location;
   lend= lag(end);
   retain locstart;
   if first.location then locstart=start;
   else if start ne lend then do;
   /* break in sequence*/
      output;
      locstart = start;
   end;
   else if last.location then output;
   drop lend locstart;
run;

Because of the requirements you state you may need to sort the output Want data set differently afterwards.

LAG lets you keep a value from the last record.

Retain keeps a specified variable value across records until reset.

The BY statement creates automatic variables First. and Last to indicate if the current record is the first or last of a group (can be both if there is only one) so you can conditionally reset values or do execute other statements such as the output to the data set.

 

cbagdon-cox
Obsidian | Level 7

I apologize for the data display. I am working with protected data and therefore cannot provide the actual data I am working with, the names and locations have been changed.

 

While the code ran, the output still had issues. In some cases stays at locations were omitted completely and in others there were still multiple rows for a single stay at a location. This also does not account for the blank end dates that need to be kept.

ballardw
Super User

@cbagdon-cox wrote:

I apologize for the data display. I am working with protected data and therefore cannot provide the actual data I am working with, the names and locations have been changed.

 

While the code ran, the output still had issues. In some cases stays at locations were omitted completely and in others there were still multiple rows for a single stay at a location. This also does not account for the blank end dates that need to be kept.


 

Code is just as good as example data and clearly stated requirements.

 

You now have an example of how to provide data. Provide some that matches your data better.

 

Since you did not provide any missing ends there was nothing to test.

An appropriate something like this may be needed:

if missing (end) then do;

    output;

    call missing(locstart);

   /* and maybe other stuff in here*/

end;

 

But without an example this a guess just like any other condition that was not provided in example input and output.

 

 

cbagdon-cox
Obsidian | Level 7
Please see original post and example (though incorrectly formatted) data. Missing end dates are mentioned in the question and shown in the example.

I have no code as I didn't know where to start with this task.

First time poster just trying to get some help, not a scolding.
novinosrin
Tourmaline | Level 20

Hi @cbagdon-cox  Please try this-

data have;
input FULLNAME & $30.	(START	END) (:date9.)	LOCATION $;
format START	END date9.;
cards;
SMITH, JOHN	  14Sep2010	15Sep2010	CEDAR
SMITH, JOHN	  15Sep2010	28Sep2010	CEDAR
SMITH, JOHN	  28Sep2010	27Oct2010	CEDAR
SMITH, JOHN	  28Oct2010	20Apr2011	OAK
SMITH, JOHN	  20Apr2011	02May2011	OAK
SMITH, JOHN	  02May2011	17May2011	OAK
SMITH, JOHN	  17May2011	18Jul2011	PINE-A
SMITH, JOHN	  18Jul2011	03Oct2011	PINE-A
SMITH, JOHN	  03Oct2011	13Dec2011	PINE-A
SMITH, JOHN	  13Dec2011	31Jan2012	OAK
SMITH, JOHN	  31Jan2012	08Feb2012	OAK
SMITH, JOHN	  08Feb2012	08Aug2012	OAK
SMITH, JOHN	  08Aug2012	17Dec2012	OAK
MILLER, SALLY	21Aug2015	26Aug2015	PINE-B
MILLER, SALLY	26Aug2015	22Sep2015	CEDAR
MILLER, SALLY	22Sep2015	14Oct2015	CEDAR
MILLER, SALLY	14Oct2015	22Nov2015	CEDAR
MILLER, SALLY	22Nov2015	04Dec2015	CEDAR
MILLER, SALLY	04Dec2015	10Dec2015	CEDAR
MILLER, SALLY	10Dec2015	10Mar2016	OAK
MILLER, SALLY	10Mar2016	29Dec2017	OAK
MILLER, SALLY	29Dec2017	10May2018	OAK
MILLER, SALLY	10May2018	01Nov2018	OAK
MILLER, SALLY	01Nov2018	 .	OAK
JOHNSON, ROBERT	 16Apr2017	17Apr2017	MAPLE-A
JOHNSON, ROBERT	 17Apr2017	08May2017	MAPLE-A
JOHNSON, ROBERT	 08May2017	22May2017	MAPLE-B
JOHNSON, ROBERT	 22May2017	31May2017	WILLOW
JOHNSON, ROBERT	 31May2017	08Jun2017	WILLOW
JOHNSON, ROBERT	 08Jun2017	02Aug2017	WILLOW
JOHNSON, ROBERT	 02Aug2017	07Aug2017	MAPLE-C
JOHNSON, ROBERT	 07Aug2017	11Sep2017	WILLOW
JOHNSON, ROBERT	  11Sep2017	03Apr2018	WILLOW
;

data  want;
 do  until(last.location);
  set have;
  by fullname location notsorted;
  if first.location then _n_=start;
 end;
 start=_n_;
run;
cbagdon-cox
Obsidian | Level 7

Here was the code that ended up working for this. Essentially had to make a new ID for each stay and then sort by min start date and max end date based on those IDs. Also a step to account for the missing value. Thanks for the assistance!

 

data want1;
set have;
retain ConsecID 0;
ConsecID= ifn(lag(fullname)^=fullname or lag(location)^=location, ConsecID+1, ConsecID);
run;

PROC SQL;
CREATE TABLE want2 AS
SELECT DISTINCT 
  FULLNAME
, min(start) as STARTDATE format=date9.
, case when (max(coalesce(end, '01JAN2099'd)))='01JAN2099'd 
	then . 
	else (max(end)) 
	end as NEWEND format=date9.
, location
FROM WORK.want1 AS A
	group by consecID
	order by fullname
;quit;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1856 views
  • 0 likes
  • 3 in conversation