topic Re: Merge 3 datasets in New SAS User

Merge 3 datasets

Sandeep77 — Wed, 19 Apr 2023 13:57:53 GMT

Hi all,

I am trying to merge three datasets. I have tried two different ways and they both are giving me incorrect numbers (because each file has around 1 million records, so total should be around 3 millions). One of the way is removing duplicate reference_number and giving me 1.7 million records and other is giving me 4.3 million records. Can you please suggest the correct way to merge these datasets?

/*Merge multiple datasets with reference_number as common value*/
proc sql;
create table Equifax_files as
select * from work.elnz_lowell_11oct
union all
select * from work.elnz_lowell_10feb
union all
select * from work.elnz_lowell_12sep
order by reference_number;
quit;
/* Method 2 */
Data Equifax_files ;
merge work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
by reference_number;
run;

Re: Merge 3 datasets

svh — Wed, 19 Apr 2023 14:03:21 GMT

Are you trying to "stack" (or concatenate) the data sets? If so, in Method 2, the "MERGE" statement can be replaced with a "SET" statement and you can remove the "BY" statement completely.

Currently, in Method 2, you are going to perform a join in which observations with common values for "reference_number" will be combined (which is why you are seeing fewer than 3 million-ish observations in your result).

Re: Merge 3 datasets

yabwon — Wed, 19 Apr 2023 14:16:00 GMT

I would go with @svh idea, i.e.:

data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;

Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0q3sea7uh

Bart

Re: Merge 3 datasets

ballardw — Wed, 19 Apr 2023 14:58:44 GMT

In SAS discussions MERGE is a "side by side" type of data combining where common named variables will behave quite differently than SQL similar operations. Also a data step merge can have very unexpected results when BY variables have repeated values in two or more data sets and will usually show a warning about such in the LOG. Was there such a warning in the LOG for the data step merge. So when discussing combining data in SAS do not use "merge" unless you mean that side-by-side row combination.

The "union all" is a vertical or stack, which would be an Append procedure operation or SET in the data step.