BookmarkSubscribeRSS Feed
raivester
Quartz | Level 8

I was wondering if anyone knows how I can merge two data sets conditionally while also retaining all observations, regardless of whether they merged. I have two data sets, each is unique by year and individ_id_num. Each data set contains multiple years of data (2000-2020), but I only want to merge for the year=2018. I seem to have achieved this with the code below, but my final data set only contains observations where year=2018. I want to retains all observations (in1) regardless of whether it merged to an observation in the merging data set. Any ideas?

 

data want;
    merge have1(where=(year=2018) in=in1) 
               have2(where=(year=2018) in=in2);
    by indiv_id_num;

    if in1;
    
    if in1 and in2 then merge_18=1;
    else merge_18 = 0;
run;
7 REPLIES 7
PaigeMiller
Diamond | Level 26

What is wrong with the code you show?

 
--
Paige Miller
raivester
Quartz | Level 8
It only retains the 2018 observations in the final data set. I want to retain all years (2000-2020), even though only the 2018 observations have been merged to the second data set.
PaigeMiller
Diamond | Level 26

The modification to the code should be obvious.

 
 
--
Paige Miller
raivester
Quartz | Level 8
Do you mean remove if _a; ? This does not solve my problem.
PaigeMiller
Diamond | Level 26

The code you showed uses this fragment of code:

 

have1(where=(year=2018))

If you want to change the code to use all years, you modify the above code to ...

--
Paige Miller
Reeza
Super User
Does the second data set have only 2018 data or other years as well? Do you have a year variable in your data set?
Can you merge by individ_id_num and Year instead?

data want;
merge have1( in=in1)
have2(where=(year=2018) in=in2);
by indiv_id_num;

if in1;

if in1 and in2 then merge_18=1;
else merge_18 = 0;
run;


or


data want;
merge have1( in=in1)
have2(where=(year=2018) in=in2);
by indiv_id_num year;

if in1;

if in1 and in2 then merge_18=1;
else merge_18 = 0;
run;
Kurt_Bremser
Super User

The where= dataset options prevent any years other than 2018 from processing in your data step, so you need to remove those.

You will need to merge by indiv_id_num and year, and set your final variable only when year = 2018.

 

To provide you with (tested) code, we need examples for have1 and have2, and the expected outcome from those examples. Provide those examples as data steps with datalines, so we can easily create your datasets in our environments.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1148 views
  • 0 likes
  • 4 in conversation