Conditional merge, retain all observations

raivester · Posted 07-06-2020 06:00 PM

I was wondering if anyone knows how I can merge two data sets conditionally while also retaining all observations, regardless of whether they merged. I have two data sets, each is unique by year and individ_id_num. Each data set contains multiple years of data (2000-2020), but I only want to merge for the year=2018. I seem to have achieved this with the code below, but my final data set only contains observations where year=2018. I want to retains all observations (in1) regardless of whether it merged to an observation in the merging data set. Any ideas?

data want;
    merge have1(where=(year=2018) in=in1) 
               have2(where=(year=2018) in=in2);
    by indiv_id_num;

    if in1;
    
    if in1 and in2 then merge_18=1;
    else merge_18 = 0;
run;

PaigeMiller · Posted 07-06-2020 06:18 PM

What is wrong with the code you show?

--
Paige Miller

raivester · Posted 07-06-2020 06:37 PM

It only retains the 2018 observations in the final data set. I want to retain all years (2000-2020), even though only the 2018 observations have been merged to the second data set.

PaigeMiller · Posted 07-06-2020 06:44 PM

The modification to the code should be obvious.

--
Paige Miller

raivester · Posted 07-06-2020 06:58 PM

Do you mean remove if _a; ? This does not solve my problem.

PaigeMiller · Posted 07-07-2020 06:31 AM

The code you showed uses this fragment of code:

have1(where=(year=2018))

If you want to change the code to use all years, you modify the above code to ...

--
Paige Miller

Reeza · Posted 07-06-2020 06:47 PM

Does the second data set have only 2018 data or other years as well? Do you have a year variable in your data set?
Can you merge by individ_id_num and Year instead?

data want;
merge have1( in=in1)
have2(where=(year=2018) in=in2);
by indiv_id_num;

if in1;

if in1 and in2 then merge_18=1;
else merge_18 = 0;
run;

or

data want;
merge have1( in=in1)
have2(where=(year=2018) in=in2);
by indiv_id_num year;

if in1;

if in1 and in2 then merge_18=1;
else merge_18 = 0;
run;

Kurt_Bremser · Posted 07-07-2020 05:25 AM

The where= dataset options prevent any years other than 2018 from processing in your data step, so you need to remove those.

You will need to merge by indiv_id_num and year, and set your final variable only when year = 2018.

To provide you with (tested) code, we need examples for have1 and have2, and the expected outcome from those examples. Provide those examples as data steps with datalines, so we can easily create your datasets in our environments.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Conditional merge, retain all observations

Re: Conditional merge, retain all observations

Re: Conditional merge, retain all observations

Re: Conditional merge, retain all observations

Re: Conditional merge, retain all observations

Re: Conditional merge, retain all observations

Re: Conditional merge, retain all observations

Re: Conditional merge, retain all observations

Registration is open

SAS Training: Just a Click Away