BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sandyzman1
Obsidian | Level 7

Hi,

 

I have two datasets with patid and numerous other variables. I want to combine both datasets vertically, but with no repeat patid. For example:

 

data dataset1;                     

   input patID year x y $ ;      

   datalines;        

1 2011 34 Yes

2 2012 32 Yes

3 2015 56 No

4 2018 57 No

5 2019 78 Yes

6 2020 66 No

;              

run;

 

data dataset2;                     

   input patID year x y $ ;      

   datalines;        

1 2011 43 Yes

2 2014 32 No

3 2015 56 No

4 2018 56 Yes

5 2021 90 Yes

6 2020 66 No

;              

run;

 

Dataset 1:

patid

Year

X

Z

1

2011

34

Yes

2

2012

32

Yes

3

2015

56

No

4

2018

57

No

5

2019

78

Yes

6

2020

66

No

 

Dataset 2:

patid

Year

X

Y

1

2011

43

Yes

2

2014

32

No

3

2015

56

No

4

2018

56

Yes

5

2021

90

Yes

6

2020

66

No

 

Want:

patid

Year

X

Y

1

2011

34

Yes

2

2012

32

Yes

2

2014

32

No

3

2015

56

No

4

2018

57

No

5

2019

78

Yes

5

2021

90

Yes

6

2020

66

 

 

Here, I combine dataset 1 and dataset 2 by Id and Year:

- if all variables match then output.

- If all variables didn't match (except Id and Year) then keep dataset 1 observations.

- if Id or Year don't match then create a new row in the new dataset.

 

Thanks,

sandyzman1

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I think you want to merge the data, sort the results, and then use the FIRST.patId and FIRST.Year auto variables to control which observations are retained:

 

data A;
length Z $3;
input patid Year X Z;
datalines;
1 2011 34 Yes
2 2012 32 Yes
3 2015 56 No
4 2018 57 No
5 2019 78 Yes
6 2020 66 No
;
 
data B;
length Z $3;
input patid Year X Z;
datalines;
1 2011 43 Yes
2 2014 32 No
3 2015 56 No
4 2018 56 Yes
5 2021 90 Yes
6 2020 66 No
;

data Mrg;
set A(in=in1) B;
if in1 then DS = 1;
else DS = 2;
run;

proc sort data=Mrg;
by patid Year DS;
run;

data Want;
set Mrg;
by patId year;
if First.patID OR First.Year;
run;

proc print data=Want; 
var patId Year X Z;
run;

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I'm confused. Twice you say "no repeated ID" and yet your WANT data set has repeats of PATID 2 and PATID 5

--
Paige Miller
Rick_SAS
SAS Super FREQ

I think you want to merge the data, sort the results, and then use the FIRST.patId and FIRST.Year auto variables to control which observations are retained:

 

data A;
length Z $3;
input patid Year X Z;
datalines;
1 2011 34 Yes
2 2012 32 Yes
3 2015 56 No
4 2018 57 No
5 2019 78 Yes
6 2020 66 No
;
 
data B;
length Z $3;
input patid Year X Z;
datalines;
1 2011 43 Yes
2 2014 32 No
3 2015 56 No
4 2018 56 Yes
5 2021 90 Yes
6 2020 66 No
;

data Mrg;
set A(in=in1) B;
if in1 then DS = 1;
else DS = 2;
run;

proc sort data=Mrg;
by patid Year DS;
run;

data Want;
set Mrg;
by patId year;
if First.patID OR First.Year;
run;

proc print data=Want; 
var patId Year X Z;
run;
Tom
Super User Tom
Super User

@Rick_SAS 

The IF statement does not make sense.

by patId year;
if First.patID OR First.Year;

If the FIRST.PATID flag is true then by definition the FIRST.YEAR flag is also true.  The first observations for a patient is by definition the first observation for the first year within that patient.  Otherwise it is not the first observation for the patient.

 

So you just want.

if First.Year;

Note the whole MRG data step is not needed.  SAS will load the observations from the datasets in the order they are listed on the SET statement.  Just include the BY statement so it does it BY the key variables.

data Want;
  set A B;
  by patId year;
  if First.Year;
run;

 

Tom
Super User Tom
Super User

Sounds like you want to INTERLEAVE the datasets and keep the first one per PATID, YEAR group.

data want;
  set dataset1 dataset2 ;
  by patid year;
  if first.year;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1036 views
  • 5 likes
  • 4 in conversation