DATA Step, Macro, Functions and more

How to merge 2 datasets that are the same

Reply
Occasional Contributor
Posts: 9

How to merge 2 datasets that are the same

I've tried regular merging and this proc sql but my observations end up getting deleted. The datasets have the same columns except for 1 but I want it all to be in one data set. Thanks!

 

/**proc sql;
create table compustat_1 as
select *
from transform a, transform2 b
where a.gvkey=b.gvkey and
a.datadate=b.datadate and a.fyear=b.fyear and a.cusip=b.cusip and a.conm=b.conm
and a.fyr=b.fyr and a.CIKold=b.CIKold and a.cik=b.cik and a.ni=b.ni and a.revt=b.revt and
a.SICCODE=b.SICCODE and a.count=b.count and a.log_count=b.log_count and a.Size=b.Size and a.ROA-b.ROA;
quit;
run;
**//

Super User
Posts: 19,861

Re: How to merge 2 datasets that are the same

1. Is there any errors in the log - post the full log.

2. Check your types/formats on the variables, does it match for all your key variables

3. You're doing a cross join and then filtering. Try specifying the join type instead (full/left/right) and using an ON clause instead of a WHERE. 

4. Show your data step version and the log, I find that better and showing errors. 

Occasional Contributor
Posts: 9

Re: How to merge 2 datasets that are the same

***on**** didnt work either

 

193 proc sql;
194 create table compustat_1 as
195 select *
196 from transform a, transform2 b
197 on a.gvkey=b.gvkey and
--
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, ',', ANSIMISS, CROSS, EXCEPT, FULL,
GROUP, HAVING, INNER, INTERSECT, JOIN, LEFT, NATURAL, NOMISS, ORDER, OUTER, RIGHT,
UNION, WHERE.

ERROR 76-322: Syntax error, statement will be ignored.

198 a.datadate=b.datadate and a.fyear=b.fyear and a.cusip=b.cusip and a.conm=b.conm
199 and a.fyr=b.fyr and a.CIKold=b.CIKold and a.cik=b.cik and a.ni=b.ni and a.revt=b.revt and
200 a.SICCODE=b.SICCODE and a.count=b.count and a.log_count=b.log_count and a.Size=b.Size and
200! a.ROA-b.ROA;
201 quit;

Respected Advisor
Posts: 4,931

Re: How to merge 2 datasets that are the same

Try

 

create table compustat_1 as
select *
from 
	transform natural join transform2;
quit;
PG
Occasional Contributor
Posts: 9

Re: How to merge 2 datasets that are the same

That didn't work.

 

188 proc sql;
189 create table compustat_1 as
190 select *
191 from transform natural join transform2;
NOTE: Table WORK.COMPUSTAT_1 created, with 0 rows and 18 columns.

 

 

Respected Advisor
Posts: 4,931

Re: How to merge 2 datasets that are the same

This likely means that at least one of the variables that the datasets have in common doesn't have the same value in both datasets. You could work around that problem by renaming the variable in one of the datasets.

 

proc sql;
create table compustat_1 as
select *
from transform natural join transform2(rename=z=z2);

that way the natural join will not try to match variables z from both datasets.

 

PG
Super Contributor
Posts: 345

Re: How to merge 2 datasets that are the same

@dlazer1 this is the right time to example input data and what you expect as result.
"but I want it all to be in one data set" sounds like you need to rename the variables in you dataset to have them all next to each other..
Super User
Posts: 19,861

Re: How to merge 2 datasets that are the same

This part is also likely incorrect:

 

a.ROA-b.ROA

 

Super User
Super User
Posts: 7,076

Re: How to merge 2 datasets that are the same

Are you sure you want to merge and not concatenate the two datasets?

data compustat_1;
 set transform transform2;
run;

What is the variable that is different?  Is it only one variable in one dataset? Or one variable in each dataset?  If in both does it have different names in both datasets?  If it does have the same name in both datasets then which value do you want to appear in the result?

 

If you are merging do the only want observations that are in both? Or observations that are either? 

 

What are your actual key variables that uniquely identify the observations in each file?  What is the purpose of the other variables?

Or do you have multiple observations per key value?  If so is it multiple observations in both input datasets? Or only in one? 

Ask a Question
Discussion stats
  • 8 replies
  • 128 views
  • 0 likes
  • 5 in conversation