DATA Step, Macro, Functions and more

Merge create new variable based on values of variables from each dataset

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 123
Accepted Solution

Merge create new variable based on values of variables from each dataset

Let's say I have this merge and the datasets have1 and have2 each have two variables: ID and Country.

 

data want;
  merge have1(in=h1) have2;
  by ID;
  if h1;
run;

I want to create a new variable based on the value of the variable Country in each dataset - a condition of whether Country from have1 = Country from have 2. In SQL, I would do this simply by using a case-when-end-as, but how would I do it in data step merge? I could rename the variables say to Country1 and Country2 then say if Country1=Country2, assign this value to new variable. But I don't wnat to keep these renamed variables - I want the original variable to stay.


Accepted Solutions
Solution
‎11-04-2016 01:56 PM
Trusted Advisor
Posts: 1,372

Re: Merge create new variable based on values of variables from each dataset

You can't have two variables of same name in a dataset, so anyhow you must rename COUNTRY of one of the two datasets.

Your suggestion is right:

 

data want;

    merge have1(in=in1)

               have2(in=in2 rename=(country=country2))

     ; by ID;

       if in1 and not in2 then new_var =  <any value-1 or DELETE; >; else

       if in2 and not in1 then new_var = <any value-2  or DELETE; >; else

       /* ID common on both datasets */

       if country = country2 then new_var = <any value-3>;

                                        else new_var = <any value-4>;

run;

View solution in original post


All Replies
Super User
Posts: 10,490

Re: Merge create new variable based on values of variables from each dataset

If you have a variable in both sets unless you rename at least one of the variables then the value will be from only one data set.

If you don't want to keep a specific variable that is what the DROP statement is for. Rename the county in set have2 using a dataset option rename statement and the drop that one would likely be simplest.

 

You will likely want to have an In option for the second set as well so that you know if one or both sets contribute to the current observation.

Solution
‎11-04-2016 01:56 PM
Trusted Advisor
Posts: 1,372

Re: Merge create new variable based on values of variables from each dataset

You can't have two variables of same name in a dataset, so anyhow you must rename COUNTRY of one of the two datasets.

Your suggestion is right:

 

data want;

    merge have1(in=in1)

               have2(in=in2 rename=(country=country2))

     ; by ID;

       if in1 and not in2 then new_var =  <any value-1 or DELETE; >; else

       if in2 and not in1 then new_var = <any value-2  or DELETE; >; else

       /* ID common on both datasets */

       if country = country2 then new_var = <any value-3>;

                                        else new_var = <any value-4>;

run;

Super User
Posts: 5,081

Re: Merge create new variable based on values of variables from each dataset

You've set up an impossible goal:

 

  • There should only be one COUNTRY variable
  • The COUNTRY values from the two data sets might be different

So if they are different, what would you like COUNTRY to contain?  The value from HAVE1 or the value from HAVE2?

 

For some applications (not clear if this applies to yours or not), the solution is to sort and merge by both variables (BY ID COUNTRY).  Then the in= variables verify whether you have a match or a mismatch.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 201 views
  • 2 likes
  • 4 in conversation