BookmarkSubscribeRSS Feed
raivester
Quartz | Level 8

I was wondering if anyone knows how I can merge two data sets conditionally while also retaining all observations, regardless of whether they merged. I have two data sets, each is unique by year and individ_id_num. Each data set contains multiple years of data (2000-2020), but I only want to merge for the year=2018. I seem to have achieved this with the code below, but my final data set only contains observations where year=2018. I want to retains all observations (in1) regardless of whether it merged to an observation in the merging data set. Any ideas?

 

data want;
    merge have1(where=(year=2018) in=in1) 
               have2(where=(year=2018) in=in2);
    by indiv_id_num;

    if in1;
    
    if in1 and in2 then merge_18=1;
    else merge_18 = 0;
run;
7 REPLIES 7
PaigeMiller
Diamond | Level 26

What is wrong with the code you show?

 
--
Paige Miller
raivester
Quartz | Level 8
It only retains the 2018 observations in the final data set. I want to retain all years (2000-2020), even though only the 2018 observations have been merged to the second data set.
PaigeMiller
Diamond | Level 26

The modification to the code should be obvious.

 
 
--
Paige Miller
raivester
Quartz | Level 8
Do you mean remove if _a; ? This does not solve my problem.
PaigeMiller
Diamond | Level 26

The code you showed uses this fragment of code:

 

have1(where=(year=2018))

If you want to change the code to use all years, you modify the above code to ...

--
Paige Miller
Reeza
Super User
Does the second data set have only 2018 data or other years as well? Do you have a year variable in your data set?
Can you merge by individ_id_num and Year instead?

data want;
merge have1( in=in1)
have2(where=(year=2018) in=in2);
by indiv_id_num;

if in1;

if in1 and in2 then merge_18=1;
else merge_18 = 0;
run;


or


data want;
merge have1( in=in1)
have2(where=(year=2018) in=in2);
by indiv_id_num year;

if in1;

if in1 and in2 then merge_18=1;
else merge_18 = 0;
run;
Kurt_Bremser
Super User

The where= dataset options prevent any years other than 2018 from processing in your data step, so you need to remove those.

You will need to merge by indiv_id_num and year, and set your final variable only when year = 2018.

 

To provide you with (tested) code, we need examples for have1 and have2, and the expected outcome from those examples. Provide those examples as data steps with datalines, so we can easily create your datasets in our environments.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 777 views
  • 0 likes
  • 4 in conversation