Hello,
I am working with two very large datasets and would like to use PROC SQL to merge a variable (AdmDiag10) from a dataset (new.AdmDiag10_merge) onto another dataset to avoid unnecessary sort procedures. The new.AdmDiag10_merge dataset has only five variables, the four need to merge the records and the new variable (AdmDiag10). Since I am more comfortable using data steps, I am including below the program I would use (if I wanted to use a data step) to help communicate what I am trying to achieve with a left join.
/*Data step I would use to achieve desired merge*/
data medical_1217;
merge medical_1217 (in=a)
new.admdiag10_merge;
by orsid recordid log file_type;
if a = 1;
run;
/*SQL Procedure I have developed to attempt a similar join*/
proc sql;
create table test as
select a.*, b.*
from medical_1217 a
left join new.admdiag10_merge b
on a.orsid = b.orsid
and a.recordid = b.recordid
and a.log = b.log
and a.file_type = b.file_type;
quit;
When I run the PROC SQL program, I get warning messages for each variable used to join on. Can someone please explain what is going on here?
Thanks,
Ted
Your warning is coming from this bit of code:
proc sql; create table test as select a.*, b.*
The * requests ALL the variables. So since both alias A and B have at least 4 variables in common you have requested duplicates for the variables. The JOIN has nothing to do with the warning. Observe:
1730 proc sql ; 1731 create table work.junk as 1732 select name , name 1733 from sashelp.class 1734 ; WARNING: Variable Name already exists on file WORK.JUNK. NOTE: Table WORK.JUNK created, with 19 rows and 1 columns.
It is the mention of the same variable name.
If you do not want the warning then only list the variables from B that you want to add to A, or vice versa. Just mention a variable name once.
Unfortunately data steps don't do well with multiple records with the same BY variables in both sets when using Merge and won't allow multiple records in the base set if using UPDATE. So SQL is often the appropriate tool.
Of course SQL can be extremely verbose due to the syntax requirements.
Your warning is coming from this bit of code:
proc sql; create table test as select a.*, b.*
The * requests ALL the variables. So since both alias A and B have at least 4 variables in common you have requested duplicates for the variables. The JOIN has nothing to do with the warning. Observe:
1730 proc sql ; 1731 create table work.junk as 1732 select name , name 1733 from sashelp.class 1734 ; WARNING: Variable Name already exists on file WORK.JUNK. NOTE: Table WORK.JUNK created, with 19 rows and 1 columns.
It is the mention of the same variable name.
If you do not want the warning then only list the variables from B that you want to add to A, or vice versa. Just mention a variable name once.
Unfortunately data steps don't do well with multiple records with the same BY variables in both sets when using Merge and won't allow multiple records in the base set if using UPDATE. So SQL is often the appropriate tool.
Of course SQL can be extremely verbose due to the syntax requirements.
@LEINAARE wrote:
Does that mean that I do not need to use a SELECT statement for alias A?
No.
As a minimum you likely want a select like:
select a.*, b.othervar, b.somethingelse, b.thatvar
which explicitly lists the variables from B that you want that are not already in A
You will find SQL users that never use the * for selection because it indicates that you don't really know what you really want or need.
select variables in right dataset to avoid notes in log
PROC SQL;
Create table exxb as
Select a.* ,b.exstdtc,extpt,medgroup as medex from xba4 as a left join exxx as b
On a.usubjid = b.usubjid and a.visit=b.visit;
Quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.