BookmarkSubscribeRSS Feed
NewUsrStat
Lapis Lazuli | Level 10

suppose to have the following:

 

data have1;
input ID :$20. Start :date9. End :date9. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA
0001 01MAY2015 31MAY2015
0001 01JUN2015 31DEC2015 HospitalC
0002 06FEB2018 08FEB2018
0002 09FEB2018 31AUG2018 HospitalC
0002 01SEP2018 31DEC2019
0003 01JAN2019 30SEP2019
0003 01OCT2019 31DEC2020
 ;


data have2;
input ID :$20. Start :date9. End :date9. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015
0001 01MAY2015 31MAY2015
0001 01JUN2015 31DEC2015
0002 06FEB2018 08FEB2018 HospitalA
0002 09FEB2018 31AUG2018
0002 01SEP2018 31DEC2019 HospitalC
0003 01JAN2019 30SEP2019 HospitalD
0003 01OCT2019 31DEC2020
 ;

 

data have3;
input ID :$20. Start :date9. End :date9. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015
0001 01MAY2015 31MAY2015 HospitalA
0001 01JUN2015 31DEC2015
0002 06FEB2018 08FEB2018
0002 09FEB2018 31AUG2018
0002 01SEP2018 31DEC2019
0003 01JAN2019 30SEP2019
0003 01OCT2019 31DEC2020 HospitalD
;


Is there a way to merge the 3 files only for the column "Role" to have the following?
All the files are equal regarding the other columns.


Thank you in advance

Desired output should be:

data out;
input ID :$20. Start :date9. End :date9. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA
0001 01MAY2015 31MAY2015 HospitalA
0001 01JUN2015 31DEC2015 HospitalC
0002 06FEB2018 08FEB2018 HospitalA
0002 09FEB2018 31AUG2018 HospitalC
0002 01SEP2018 31DEC2019 HospitalC
0003 01JAN2019 30SEP2019 HospitalD
0003 01OCT2019 31DEC2020 HospitalD
 ;

2 REPLIES 2
ballardw
Super User

First thing, you want to test your data step code before providing it. Yours as shown generates lost card and warnings like this:

NOTE: LOST CARD.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
207        ;
ID=0003 Start=2019-10-01 End=2020-12-31 Role=  _ERROR_=1 _N_=5
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

because you do not have values for Role on every record the code you show reads the start of the following line as the role for many of the records which is not what you want.

 

Also, it is strongly recommended that you post code into a text box opened on the forum with the </> icon above the message windows because the forum message windows reformat text. When I copied your code to test there were some characters not visible on the forum that generate errors. Text pasted into the text box does not get reformatted and doesn't have the occasional odd html stuff inserted.

 

data have1;
infile datalines  truncover;
input ID :$20. Start :date9. End :date9. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015 HospitalA
0001 01MAY2015 31MAY2015
0001 01JUN2015 31DEC2015 HospitalC
0002 06FEB2018 08FEB2018
0002 09FEB2018 31AUG2018 HospitalC
0002 01SEP2018 31DEC2019
0003 01JAN2019 30SEP2019
0003 01OCT2019 31DEC2020
;

data have2;
infile datalines  truncover;
input ID :$20. Start :date9. End :date9. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015
0001 01MAY2015 31MAY2015
0001 01JUN2015 31DEC2015
0002 06FEB2018 08FEB2018 HospitalA
0002 09FEB2018 31AUG2018
0002 01SEP2018 31DEC2019 HospitalC
0003 01JAN2019 30SEP2019 HospitalD
0003 01OCT2019 31DEC2020
;

data have3;
infile datalines  truncover;
input ID :$20. Start :date9. End :date9. Role :$20.;
format start end yymmdd10.;
cards;
0001 01JAN2015 30APR2015
0001 01MAY2015 31MAY2015 HospitalA
0001 01JUN2015 31DEC2015
0002 06FEB2018 08FEB2018
0002 09FEB2018 31AUG2018
0002 01SEP2018 31DEC2019
0003 01JAN2019 30SEP2019
0003 01OCT2019 31DEC2020 HospitalD
;

data want;
   merge 
      have1 (rename=(role=role1))
      have2 (rename=(role=role2))
      have3 (rename=(role=role3))
   ;
   by id start end;
   role = coalescec(role1,role2,role3);
   drop role1 role2 role3;
run;

Either provide . for missing values, or if the missing value is at the end of data lines you can use the infile options Missover or Truncover to not read past the end of the line.

 

The data step Want would require sorting by ID, Start and End for all three sets if not already so.

Renaming the variables to different names using the dataset option means that all three values are available with new names in the data step. The Coalescec, and equivalent Coalesce for number values, assigns the first non-missing value in the parameter list from left to right as the result.

 

Warning: if you have repeats of Id, Start, End in the data sets the result may not be as desired and if there are repeats of the same Id, Start, End in multiple sets it almost certainly will not be the desired result and you may need a different approach.

Tom
Super User Tom
Super User

It looks like all you did was keep the observations with values for ROLE.

data all;
  set have1-have3;
  by id start end;
  if not missing(role);
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 561 views
  • 1 like
  • 3 in conversation