BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Using SAS 9.4

 

I am attempting to merge a dataset that has multiple lines with distinct information for each person, so the lines cannot get mixed up. WHhat would be the best way to merge these data? I have tried a SAS merge:

 

DATA WORK.MERGE_021319;
MERGE raw.data1 raw.data2;
BY LAST_NAME FIRST_NAME ID;
RUN;

 

This did not match the data well at all, only 50 of 2000 matched completely. Most of the data was all from data1 or all from data 2

 

so I tried an SQL merge:

 

proc sql;
create table work.merge as
select a.*, b.*
from raw.data1 as a
left join raw.data2 as b
on a.id=b.id;
quit;

 

This matched better but did not keep the distinct information per line, one entry was used across all lines.

 

Is their a better way to merge than my methods or am I not writing correct code? Thanks for any help! 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@GS2 wrote:

Using SAS 9.4

 

I am attempting to merge a dataset that has multiple lines with distinct information for each person, so the lines cannot get mixed up. WHhat would be the best way to merge these data? I have tried a SAS merge:

 

DATA WORK.MERGE_021319;
MERGE raw.data1 raw.data2;
BY LAST_NAME FIRST_NAME ID;
RUN;

 

This did not match the data well at all, only 50 of 2000 matched completely. Most of the data was all from data1 or all from data 2

 

so I tried an SQL merge:

 

proc sql;
create table work.merge as
select a.*, b.*
from raw.data1 as a
left join raw.data2 as b
on a.id=b.id;
quit;

 

This matched better but did not keep the distinct information per line, one entry was used across all lines.

 

Is their a better way to merge than my methods or am I not writing correct code? Thanks for any help! 


Which data set has the distinct information?

And you might need to look closere if the ID value is duplicated in data1. Each record from Data1 would be matched with each match in data2. So if there are multiple values of the ID in data1 you would get duplicated data for the bits from data1 for each match in data 2.

 

An example of multiple values of the Join on variable in both sets:

proc sql;
   create table example as
   select a.*, b.*
   from (select sex,  height
         from sashelp.class ) as a
         left join
         (select sex,  weight
          from sashelp.class) as b
         on a.sex=b.sex
   ;
quit;

Note that each value of weight is paired with height, creating 9 copies of the weight data for Sex=F  and 10 for each Sex=M.

 

Without specific data and indications of what you think should be in the result that isn't it is kind of hard determine exactly what happens.

Can you provide small example versions of data1 and data2 with sensitive values replaced by random letters or numbers as appropriate? You need not include all of your "distinct information" variables just enough to demonstrate the issue.

If the ID value is duplicated in both sets then make sure that your example data has at least one duplicate for each to replicate your data behavior.

 

As far as your first merge, Names and plain text are notorious for poor data entry and comparisons have to be exact. If "Dave" is supposed to match "David" or "David" match "DAVID" then you need to do more work on the data before using a data step Merge By.

 

I might suggest retrying your data step merge using only the ID variable as the SQL did. Though if there are multiple values of ID in both sets then data step merge is likely not the approach you want.

View solution in original post

1 REPLY 1
ballardw
Super User

@GS2 wrote:

Using SAS 9.4

 

I am attempting to merge a dataset that has multiple lines with distinct information for each person, so the lines cannot get mixed up. WHhat would be the best way to merge these data? I have tried a SAS merge:

 

DATA WORK.MERGE_021319;
MERGE raw.data1 raw.data2;
BY LAST_NAME FIRST_NAME ID;
RUN;

 

This did not match the data well at all, only 50 of 2000 matched completely. Most of the data was all from data1 or all from data 2

 

so I tried an SQL merge:

 

proc sql;
create table work.merge as
select a.*, b.*
from raw.data1 as a
left join raw.data2 as b
on a.id=b.id;
quit;

 

This matched better but did not keep the distinct information per line, one entry was used across all lines.

 

Is their a better way to merge than my methods or am I not writing correct code? Thanks for any help! 


Which data set has the distinct information?

And you might need to look closere if the ID value is duplicated in data1. Each record from Data1 would be matched with each match in data2. So if there are multiple values of the ID in data1 you would get duplicated data for the bits from data1 for each match in data 2.

 

An example of multiple values of the Join on variable in both sets:

proc sql;
   create table example as
   select a.*, b.*
   from (select sex,  height
         from sashelp.class ) as a
         left join
         (select sex,  weight
          from sashelp.class) as b
         on a.sex=b.sex
   ;
quit;

Note that each value of weight is paired with height, creating 9 copies of the weight data for Sex=F  and 10 for each Sex=M.

 

Without specific data and indications of what you think should be in the result that isn't it is kind of hard determine exactly what happens.

Can you provide small example versions of data1 and data2 with sensitive values replaced by random letters or numbers as appropriate? You need not include all of your "distinct information" variables just enough to demonstrate the issue.

If the ID value is duplicated in both sets then make sure that your example data has at least one duplicate for each to replicate your data behavior.

 

As far as your first merge, Names and plain text are notorious for poor data entry and comparisons have to be exact. If "Dave" is supposed to match "David" or "David" match "DAVID" then you need to do more work on the data before using a data step Merge By.

 

I might suggest retrying your data step merge using only the ID variable as the SQL did. Though if there are multiple values of ID in both sets then data step merge is likely not the approach you want.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 474 views
  • 0 likes
  • 2 in conversation