Hi,
I have two datasets with patid and numerous other variables. I want to combine both datasets vertically, but with no repeat patid. For example:
data dataset1;
input patID year x y $ ;
datalines;
1 2011 34 Yes
2 2012 32 Yes
3 2015 56 No
4 2018 57 No
5 2019 78 Yes
6 2020 66 No
;
run;
data dataset2;
input patID year x y $ ;
datalines;
1 2011 43 Yes
2 2014 32 No
3 2015 56 No
4 2018 56 Yes
5 2021 90 Yes
6 2020 66 No
;
run;
Dataset 1:
patid |
Year |
X |
Z |
1 |
2011 |
34 |
Yes |
2 |
2012 |
32 |
Yes |
3 |
2015 |
56 |
No |
4 |
2018 |
57 |
No |
5 |
2019 |
78 |
Yes |
6 |
2020 |
66 |
No |
Dataset 2:
patid |
Year |
X |
Y |
1 |
2011 |
43 |
Yes |
2 |
2014 |
32 |
No |
3 |
2015 |
56 |
No |
4 |
2018 |
56 |
Yes |
5 |
2021 |
90 |
Yes |
6 |
2020 |
66 |
No |
Want:
patid |
Year |
X |
Y |
1 |
2011 |
34 |
Yes |
2 |
2012 |
32 |
Yes |
2 |
2014 |
32 |
No |
3 |
2015 |
56 |
No |
4 |
2018 |
57 |
No |
5 |
2019 |
78 |
Yes |
5 |
2021 |
90 |
Yes |
6 |
2020 |
66 |
Here, I combine dataset 1 and dataset 2 by Id and Year:
- if all variables match then output.
- If all variables didn't match (except Id and Year) then keep dataset 1 observations.
- if Id or Year don't match then create a new row in the new dataset.
Thanks,
sandyzman1
I think you want to merge the data, sort the results, and then use the FIRST.patId and FIRST.Year auto variables to control which observations are retained:
data A;
length Z $3;
input patid Year X Z;
datalines;
1 2011 34 Yes
2 2012 32 Yes
3 2015 56 No
4 2018 57 No
5 2019 78 Yes
6 2020 66 No
;
data B;
length Z $3;
input patid Year X Z;
datalines;
1 2011 43 Yes
2 2014 32 No
3 2015 56 No
4 2018 56 Yes
5 2021 90 Yes
6 2020 66 No
;
data Mrg;
set A(in=in1) B;
if in1 then DS = 1;
else DS = 2;
run;
proc sort data=Mrg;
by patid Year DS;
run;
data Want;
set Mrg;
by patId year;
if First.patID OR First.Year;
run;
proc print data=Want;
var patId Year X Z;
run;
I'm confused. Twice you say "no repeated ID" and yet your WANT data set has repeats of PATID 2 and PATID 5
I think you want to merge the data, sort the results, and then use the FIRST.patId and FIRST.Year auto variables to control which observations are retained:
data A;
length Z $3;
input patid Year X Z;
datalines;
1 2011 34 Yes
2 2012 32 Yes
3 2015 56 No
4 2018 57 No
5 2019 78 Yes
6 2020 66 No
;
data B;
length Z $3;
input patid Year X Z;
datalines;
1 2011 43 Yes
2 2014 32 No
3 2015 56 No
4 2018 56 Yes
5 2021 90 Yes
6 2020 66 No
;
data Mrg;
set A(in=in1) B;
if in1 then DS = 1;
else DS = 2;
run;
proc sort data=Mrg;
by patid Year DS;
run;
data Want;
set Mrg;
by patId year;
if First.patID OR First.Year;
run;
proc print data=Want;
var patId Year X Z;
run;
The IF statement does not make sense.
by patId year;
if First.patID OR First.Year;
If the FIRST.PATID flag is true then by definition the FIRST.YEAR flag is also true. The first observations for a patient is by definition the first observation for the first year within that patient. Otherwise it is not the first observation for the patient.
So you just want.
if First.Year;
Note the whole MRG data step is not needed. SAS will load the observations from the datasets in the order they are listed on the SET statement. Just include the BY statement so it does it BY the key variables.
data Want;
set A B;
by patId year;
if First.Year;
run;
Sounds like you want to INTERLEAVE the datasets and keep the first one per PATID, YEAR group.
data want;
set dataset1 dataset2 ;
by patid year;
if first.year;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.