New SAS User

Sandeep77 · Posted 04-19-2023 09:57 AM

Hi all,

I am trying to merge three datasets. I have tried two different ways and they both are giving me incorrect numbers (because each file has around 1 million records, so total should be around 3 millions). One of the way is removing duplicate reference_number and giving me 1.7 million records and other is giving me 4.3 million records. Can you please suggest the correct way to merge these datasets?

/*Merge multiple datasets with reference_number as common value*/
proc sql;
create table Equifax_files as
select * from work.elnz_lowell_11oct
union all
select * from work.elnz_lowell_10feb
union all
select * from work.elnz_lowell_12sep
order by reference_number;
quit;
/* Method 2 */
Data Equifax_files ;
merge work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
by reference_number;
run;

yabwon · Posted 04-19-2023 10:16 AM

I would go with @svh idea, i.e.:

data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;

Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

View solution in original post

svh · Posted 04-19-2023 10:03 AM

Are you trying to "stack" (or concatenate) the data sets? If so, in Method 2, the "MERGE" statement can be replaced with a "SET" statement and you can remove the "BY" statement completely.

Currently, in Method 2, you are going to perform a join in which observations with common values for "reference_number" will be combined (which is why you are seeing fewer than 3 million-ish observations in your result).

yabwon · Posted 04-19-2023 10:16 AM

I would go with @svh idea, i.e.:

data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;

Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

ballardw · Posted 04-19-2023 10:58 AM

In SAS discussions MERGE is a "side by side" type of data combining where common named variables will behave quite differently than SQL similar operations. Also a data step merge can have very unexpected results when BY variables have repeated values in two or more data sets and will usually show a warning about such in the LOG. Was there such a warning in the LOG for the data step merge. So when discussing combining data in SAS do not use "merge" unless you mean that side-by-side row combination.

The "union all" is a vertical or stack, which would be an Append procedure operation or SET in the data step.

New SAS User

Merge 3 datasets

Re: Merge 3 datasets

Re: Merge 3 datasets

Re: Merge 3 datasets

Re: Merge 3 datasets

How to merge/set two dataset

Enterprise Guide_데이터 Merge

Merging datasets

Merging datasets with different variable names.

Merging multiple datasets, but only matching information

Follow Us

What is...

New SAS User

Our biggest data and AI event of the year.

Follow Us

What is...