BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi all,

I am trying to merge three datasets. I have tried two different ways and they both are giving me incorrect numbers (because each file has around 1 million records, so total should be around 3 millions). One of the way is removing duplicate reference_number and giving me 1.7 million records and other is giving me 4.3 million records. Can you please suggest the correct way to merge these datasets?

/*Merge multiple datasets with reference_number as common value*/
proc sql;
create table Equifax_files as
select * from work.elnz_lowell_11oct
union all
select * from work.elnz_lowell_10feb
union all
select * from work.elnz_lowell_12sep
order by reference_number;
quit;
/* Method 2 */
Data Equifax_files ;
merge work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
by reference_number;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Amethyst | Level 16

I would go with @svh idea, i.e.:

data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;

Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

3 REPLIES 3
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10
Are you trying to "stack" (or concatenate) the data sets? If so, in Method 2, the "MERGE" statement can be replaced with a "SET" statement and you can remove the "BY" statement completely.

Currently, in Method 2, you are going to perform a join in which observations with common values for "reference_number" will be combined (which is why you are seeing fewer than 3 million-ish observations in your result).
yabwon
Amethyst | Level 16

I would go with @svh idea, i.e.:

data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;

Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ballardw
Super User

In SAS discussions MERGE is a "side by side" type of data combining where common named variables will behave quite differently than SQL similar operations. Also a data step merge can have very unexpected results when BY variables have repeated values in two or more data sets and will usually show a warning about such in the LOG. Was there such a warning in the LOG for the data step merge. So when discussing combining data in SAS do not use "merge" unless you mean that side-by-side row combination.

 

The "union all" is a vertical or stack, which would be an Append procedure operation or SET in the data step.