BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Let's say I have this merge and the datasets have1 and have2 each have two variables: ID and Country.

 

data want;
  merge have1(in=h1) have2;
  by ID;
  if h1;
run;

I want to create a new variable based on the value of the variable Country in each dataset - a condition of whether Country from have1 = Country from have 2. In SQL, I would do this simply by using a case-when-end-as, but how would I do it in data step merge? I could rename the variables say to Country1 and Country2 then say if Country1=Country2, assign this value to new variable. But I don't wnat to keep these renamed variables - I want the original variable to stay.

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

You can't have two variables of same name in a dataset, so anyhow you must rename COUNTRY of one of the two datasets.

Your suggestion is right:

 

data want;

    merge have1(in=in1)

               have2(in=in2 rename=(country=country2))

     ; by ID;

       if in1 and not in2 then new_var =  <any value-1 or DELETE; >; else

       if in2 and not in1 then new_var = <any value-2  or DELETE; >; else

       /* ID common on both datasets */

       if country = country2 then new_var = <any value-3>;

                                        else new_var = <any value-4>;

run;

View solution in original post

3 REPLIES 3
ballardw
Super User

If you have a variable in both sets unless you rename at least one of the variables then the value will be from only one data set.

If you don't want to keep a specific variable that is what the DROP statement is for. Rename the county in set have2 using a dataset option rename statement and the drop that one would likely be simplest.

 

You will likely want to have an In option for the second set as well so that you know if one or both sets contribute to the current observation.

Shmuel
Garnet | Level 18

You can't have two variables of same name in a dataset, so anyhow you must rename COUNTRY of one of the two datasets.

Your suggestion is right:

 

data want;

    merge have1(in=in1)

               have2(in=in2 rename=(country=country2))

     ; by ID;

       if in1 and not in2 then new_var =  <any value-1 or DELETE; >; else

       if in2 and not in1 then new_var = <any value-2  or DELETE; >; else

       /* ID common on both datasets */

       if country = country2 then new_var = <any value-3>;

                                        else new_var = <any value-4>;

run;

Astounding
PROC Star

You've set up an impossible goal:

 

  • There should only be one COUNTRY variable
  • The COUNTRY values from the two data sets might be different

So if they are different, what would you like COUNTRY to contain?  The value from HAVE1 or the value from HAVE2?

 

For some applications (not clear if this applies to yours or not), the solution is to sort and merge by both variables (BY ID COUNTRY).  Then the in= variables verify whether you have a match or a mismatch.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2625 views
  • 2 likes
  • 4 in conversation