- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am trying to merge three datasets. I have tried two different ways and they both are giving me incorrect numbers (because each file has around 1 million records, so total should be around 3 millions). One of the way is removing duplicate reference_number and giving me 1.7 million records and other is giving me 4.3 million records. Can you please suggest the correct way to merge these datasets?
/*Merge multiple datasets with reference_number as common value*/
proc sql;
create table Equifax_files as
select * from work.elnz_lowell_11oct
union all
select * from work.elnz_lowell_10feb
union all
select * from work.elnz_lowell_12sep
order by reference_number;
quit;
/* Method 2 */
Data Equifax_files ;
merge work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
by reference_number;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would go with @svh idea, i.e.:
data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;
Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Currently, in Method 2, you are going to perform a join in which observations with common values for "reference_number" will be combined (which is why you are seeing fewer than 3 million-ish observations in your result).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would go with @svh idea, i.e.:
data Equifax_files ;
set work.elnz_lowell_12sep work.elnz_lowell_11oct work.elnz_lowell_10feb;
run;
Sone reading about this: https://documentation.sas.com/doc/en/lrcon/9.4/n1tgk0uanvisvon1r26lc036k0w7.htm#n0mvuijqtjdsybn1h4t0...
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In SAS discussions MERGE is a "side by side" type of data combining where common named variables will behave quite differently than SQL similar operations. Also a data step merge can have very unexpected results when BY variables have repeated values in two or more data sets and will usually show a warning about such in the LOG. Was there such a warning in the LOG for the data step merge. So when discussing combining data in SAS do not use "merge" unless you mean that side-by-side row combination.
The "union all" is a vertical or stack, which would be an Append procedure operation or SET in the data step.