Solved: Problem merging in Proc SQL

OzanKirtac · Posted 09-12-2021 05:37 PM

Hi

I am trying to merge these two files but interestingly they dont merge. Do you have any idea why they dont merge?

proc sql; create table perm.Mergedtemp21 as select
a.*, b.conm, b.ipodate, b.cik,b.roa, b.LnAT, b.turn, b.DTE,b.OCFA, b.TACC,b.MVE,b.BTM
from perm.Mergedtemp20 as a left join perm.compustatnew4 as b
on a.gvkey2=b.gvkey
and a.year eq b.year
and a.calendarquarter eq b.calendarquarter
order by permno, mergerdateannounced;
quit;

Thank you for responses.

Tom · Posted 09-13-2021 03:00 PM

You cannot have two variables with the same name. If really want both A.CONM and B.CONM then rename one of them:

select a.*
     , b.CONM as b_conm
     , b.ipodate as b_ipodate
...

PROC SQL code will keep the first instance it sees and ignore the second one, so the first dataset "wins".

DATA STEP merge will read both, but store them into the same variable, so the second one read will "wins".

View solution in original post

tarheel13 · Posted 09-12-2021 06:02 PM

What does the log say? Also, it's much better to put a more descriptive title than just 'Code' and a lot of people will not want to open your attachments so you should just post your datasets as data lines and use Insert SAS Code button to post code. I do not see mergedtemp20 dataset. Please tell us your desired output.

ballardw · Posted 09-13-2021 12:20 AM

The first place to look is your data set names.

You provide an example named Mergedtemp21 the code references Mergedtemp20.

Show the log. Run your code, copy the log including the code and all messages or notes from running your Proc SQL and then on the forum open a text box by clicking on the </> icon that appears above the message window. Paste the text.

Amazingly, SAS provides lots of information in the log about why things may not work as expected.

OzanKirtac · Posted 09-13-2021 02:56 PM

Hi thank you for replies. The code was correct. I downsized my datasets and posted here. The issue is not about the file name. I copied the code I was using thats why the old file name was in the earlier post. The log shows the following:

1162 proc sql;
1162! create table perm.Mergedtemp22 as select
1163 a.*, b.conm, b.ipodate, b.cik,b.roa, b.LnAT, b.turn, b.DTE,b.OCFA, b.TACC,b.MVE,b.BTM
1164 from perm.Mergedtemp21 as a left join perm.compustatnew4 as b
1165 on a.gvkey2=b.gvkey
1166 and a.year eq b.year
1167 and a.calendarquarter eq b.calendarquarter
1168 order by permno, mergerdateannounced;
WARNING: Variable CONM already exists on file PERM.MERGEDTEMP22.
WARNING: Variable IPODATE already exists on file PERM.MERGEDTEMP22.
WARNING: Variable CIK already exists on file PERM.MERGEDTEMP22.
WARNING: Variable ROA already exists on file PERM.MERGEDTEMP22.
WARNING: Variable LnAT already exists on file PERM.MERGEDTEMP22.
WARNING: Variable TURN already exists on file PERM.MERGEDTEMP22.
WARNING: Variable DTE already exists on file PERM.MERGEDTEMP22.
WARNING: Variable OCFA already exists on file PERM.MERGEDTEMP22.
WARNING: Variable TACC already exists on file PERM.MERGEDTEMP22.
WARNING: Variable MVE already exists on file PERM.MERGEDTEMP22.
WARNING: Variable BTM already exists on file PERM.MERGEDTEMP22.
NOTE: Table PERM.MERGEDTEMP22 created, with 1 rows and 23 columns.

1169 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2.73 seconds
cpu time 0.12 seconds

I get this log output when I run:

proc sql; create table perm.Mergedtemp22 as select
a.*, b.conm, b.ipodate, b.cik,b.roa, b.LnAT, b.turn, b.DTE,b.OCFA, b.TACC,b.MVE,b.BTM
from perm.Mergedtemp21 as a left join perm.compustatnew4 as b
on a.gvkey2=b.gvkey
and a.year eq b.year
and a.calendarquarter eq b.calendarquarter
order by permno, mergerdateannounced;
quit;

Tom · Posted 09-13-2021 03:00 PM

You cannot have two variables with the same name. If really want both A.CONM and B.CONM then rename one of them:

select a.*
     , b.CONM as b_conm
     , b.ipodate as b_ipodate
...

PROC SQL code will keep the first instance it sees and ignore the second one, so the first dataset "wins".

DATA STEP merge will read both, but store them into the same variable, so the second one read will "wins".

Patrick · Posted 09-14-2021 07:15 PM

You can't have a target table with two columns that have the same name but this is exactly what you are requesting by having the same variable name twice in your Select statement (once implicit via the start notation). That what causes the warning and would create an error in other SQL flavors.

One lazy way to write an explicit list with not that much typing:

1. add the FEEDBACK option to PROC SQL

2. copy/paste from the SAS log the SQL with fully expanded Select clause into Notepad++

3. get/run the Notepad++ plugin "poor man's T-SQL formatter"

4. Remove the "duplicate" columns from the Select Clause

Below the syntax I've got after step 3 from above. Rest up to you.

SELECT A.PERMNO
	,A.MergerDateAnnounced
	,A.RET02
	,A.ABRET02
	,A.N
	,A.COMNAM
	,A.CUSIP
	,A.gvkey
	,A.TICKER
	,A.calendarquarter
	,A.gvkey2
	,A.CONM
	,A.IPODATE
	,A.CIK
	,A.ROA
	,A.LnAT
	,A.TURN
	,A.DTE
	,A.OCFA
	,A.TACC
	,A.MVE
	,A.BTM
	,A.year
	,B.CONM
	,B.IPODATE
	,B.CIK
	,B.ROA
	,B.LnAT
	,B.TURN
	,B.DTE
	,B.OCFA
	,B.TACC
	,B.MVE
	,B.BTM
FROM PERM.MERGEDTEMP21 A
LEFT OUTER JOIN PERM.COMPUSTATNEW4 B ON (A.gvkey2 = B.GVKEY)
	AND (A.year = B.year)
	AND (A.calendarquarter = B.calendarquarter)
ORDER BY A.PERMNO ASC
	,A.MergerDateAnnounced ASC;

PGStats · Posted 09-14-2021 10:35 PM

You can make SQL keep either version of a column that appears twice in your datasets, while taking care of missing values

select ...

coalesce (a.conm, b.conm) as conm,

will keep a.comn but fill it with b.conm if a.conm is missing, but

coalesce (b.conm, a.conm) as conm,

will keep b.comn but fill it with a.conm if b.conm is missing.

PG

Problem merging in Proc SQL

Re: Problem merging in Proc SQL

Re: Code

Re: Code

Re: Problem merging in Proc SQL

Re: Problem merging in Proc SQL

Re: Problem merging in Proc SQL

Re: Problem merging in Proc SQL

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away