proc sql merging

Accepted Solution Solved
Reply
Valued Guide
Posts: 854
Accepted Solution

proc sql merging

It might be because of the hour and that I'm quickly losing interest in the topic, or maybe this is a legit question.  I am combining two datasets with proc sql, through the variables being added I have a.region and b.region, should region be kept from both original datasets or will a.region overide?  This is a left join by the way.

I am finishing a project with three seperate pieces of code coming together to form a complete project, I keep running into ways of writing that I would not do.  I want the variables that are from both original datasets to remain but that doesn't seem to be happening.  I'm trying to do this without rewriting everything but I'm not sure that will be an option.


Accepted Solutions
Solution
‎08-14-2014 02:20 AM
Esteemed Advisor
Posts: 6,646

Re: proc sql merging

Unless you use the ... as ... clause in the select, SAS/PROC SQL will complain about the variable already being there. In that case, the variable that appears first in the select statement will be the one in the output dataset, further ones with the same name will be discarded.

If you want both values, do something like:

select

  ....,

  a.region as a_region,

  b.region as b_region,

  .....

from ......

;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎08-14-2014 02:20 AM
Esteemed Advisor
Posts: 6,646

Re: proc sql merging

Unless you use the ... as ... clause in the select, SAS/PROC SQL will complain about the variable already being there. In that case, the variable that appears first in the select statement will be the one in the output dataset, further ones with the same name will be discarded.

If you want both values, do something like:

select

  ....,

  a.region as a_region,

  b.region as b_region,

  .....

from ......

;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 854

Re: proc sql merging

Thanks Kurt, that helps.  I have changed region to regiona, my intent is to add one dataset at the end with rename=(regiona=region).  There are several variables that I have done this with, it is my thought that this will consolidate all of those variables for the final dataset.  Do you agree?

I'm at a client site and cannot test this right now is why I ask.

Thanks,

Esteemed Advisor
Posts: 6,646

Re: proc sql merging

As soon as you need to merge datasets where a certain variable is present in more than one source dataset, you need to make a conscious decision how to handle the data contained in the various instances of the variable.

In your case, you need to decide which value (region_a or region_b) you want in region. A simple rename will once again make the data step (or sql) see multiple instances of the same variable, with interesting/unpredictable results.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 854

Re: proc sql merging

This dataset will go to tableau for reporting purposes, the request is that all of these variables will be uniform between all products (medigap,HMO) so reports can be run and subsetted across products.  Does that make sense?  Is there an easy solution for that?

Valued Guide
Posts: 854

Re: proc sql merging

There are several pieces of code that are coming together, that's where I come in.  I'm finalizing the code and want everything coming from different sources to have the same variables.  I hope that helps make it clear.

Valued Guide
Posts: 854

Re: proc sql merging

So I'm now processing the last datastep like this:

if regiona <> '' then region = regiona;

drop regiona;

do you agree with that logic?

Esteemed Advisor
Posts: 6,646

Re: proc sql merging

Yes, something like this makes sense.

You could also take a look at the coalesce() function in sql to select the first non-missing value of a list of variables/values.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 353 views
  • 0 likes
  • 2 in conversation