BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi all,

I am trying to merge three datasets. I have tried two different ways and they both are giving me incorrect numbers (because each file has around 1 million records, so total should be around 3 millions). One of the way is removing duplicate reference_number and giving me 1.7 million records and other is giving me 4.3 million records. Can you please suggest the correct way to merge these datasets?

/*Merge multiple datasets with reference_number as common value*/
proc sql;
create table Equifax_files as
select * from work.elnz_lowell_11oct
union all
select * from work.elnz_lowell_10feb
union all
select * from work.elnz_lowell_12sep
order by reference_number;
quit;
/* Method 2 */
Data Equifax_files ;
merge work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
by reference_number;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

I would go with @svh idea, i.e.:

data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;

Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

3 REPLIES 3
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10
Are you trying to "stack" (or concatenate) the data sets? If so, in Method 2, the "MERGE" statement can be replaced with a "SET" statement and you can remove the "BY" statement completely.

Currently, in Method 2, you are going to perform a join in which observations with common values for "reference_number" will be combined (which is why you are seeing fewer than 3 million-ish observations in your result).
yabwon
Onyx | Level 15

I would go with @svh idea, i.e.:

data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;

Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ballardw
Super User

In SAS discussions MERGE is a "side by side" type of data combining where common named variables will behave quite differently than SQL similar operations. Also a data step merge can have very unexpected results when BY variables have repeated values in two or more data sets and will usually show a warning about such in the LOG. Was there such a warning in the LOG for the data step merge. So when discussing combining data in SAS do not use "merge" unless you mean that side-by-side row combination.

 

The "union all" is a vertical or stack, which would be an Append procedure operation or SET in the data step.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 692 views
  • 2 likes
  • 4 in conversation