BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Let's say I have this merge and the datasets have1 and have2 each have two variables: ID and Country.

 

data want;
  merge have1(in=h1) have2;
  by ID;
  if h1;
run;

I want to create a new variable based on the value of the variable Country in each dataset - a condition of whether Country from have1 = Country from have 2. In SQL, I would do this simply by using a case-when-end-as, but how would I do it in data step merge? I could rename the variables say to Country1 and Country2 then say if Country1=Country2, assign this value to new variable. But I don't wnat to keep these renamed variables - I want the original variable to stay.

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

You can't have two variables of same name in a dataset, so anyhow you must rename COUNTRY of one of the two datasets.

Your suggestion is right:

 

data want;

    merge have1(in=in1)

               have2(in=in2 rename=(country=country2))

     ; by ID;

       if in1 and not in2 then new_var =  <any value-1 or DELETE; >; else

       if in2 and not in1 then new_var = <any value-2  or DELETE; >; else

       /* ID common on both datasets */

       if country = country2 then new_var = <any value-3>;

                                        else new_var = <any value-4>;

run;

View solution in original post

3 REPLIES 3
ballardw
Super User

If you have a variable in both sets unless you rename at least one of the variables then the value will be from only one data set.

If you don't want to keep a specific variable that is what the DROP statement is for. Rename the county in set have2 using a dataset option rename statement and the drop that one would likely be simplest.

 

You will likely want to have an In option for the second set as well so that you know if one or both sets contribute to the current observation.

Shmuel
Garnet | Level 18

You can't have two variables of same name in a dataset, so anyhow you must rename COUNTRY of one of the two datasets.

Your suggestion is right:

 

data want;

    merge have1(in=in1)

               have2(in=in2 rename=(country=country2))

     ; by ID;

       if in1 and not in2 then new_var =  <any value-1 or DELETE; >; else

       if in2 and not in1 then new_var = <any value-2  or DELETE; >; else

       /* ID common on both datasets */

       if country = country2 then new_var = <any value-3>;

                                        else new_var = <any value-4>;

run;

Astounding
PROC Star

You've set up an impossible goal:

 

  • There should only be one COUNTRY variable
  • The COUNTRY values from the two data sets might be different

So if they are different, what would you like COUNTRY to contain?  The value from HAVE1 or the value from HAVE2?

 

For some applications (not clear if this applies to yours or not), the solution is to sort and merge by both variables (BY ID COUNTRY).  Then the in= variables verify whether you have a match or a mismatch.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2646 views
  • 2 likes
  • 4 in conversation