Merge is adding records when it should be a one to one match on ID

Reply
Occasional Contributor
Posts: 19

Merge is adding records when it should be a one to one match on ID

Hello,

 

Merge is adding records when it should be a one to one match on ID.  Need corrective action.  The merge started with 4796 records.  After the merge it has 7194 records

 

 

data osteodata;
merge subj enroll allcli00 allcli11 outcome ;
by ID;
run;

 

8896
8897 data osteodata;
8898 merge subj enroll allcli00 allcli11 outcome ;
8899 by ID;
8900 run;
 
NOTE: There were 4796 observations read from the data set WORK.SUBJ.
NOTE: There were 4796 observations read from the data set WORK.ENROLL.
NOTE: There were 4796 observations read from the data set WORK.ALLCLI00.
NOTE: There were 4796 observations read from the data set WORK.ALLCLI11.
NOTE: There were 4796 observations read from the data set WORK.OUTCOME.
NOTE: The data set WORK.OSTEODATA has 7194 observations and 2793 variables.
NOTE: DATA statement used (Total process time):
real time 2.08 seconds
cpu time 0.29 seconds
Grand Advisor
Posts: 17,338

Re: Merge is adding records when it should be a one to one match on ID

You must have duplicate IDs in files. Verify the uniqueness of each file. 

Here's a query that can help. 

 

Proc sql;

select id

from TABLE

group by ID

having count(ID)>1;

quit;

Occasional Contributor
Posts: 19

Re: Merge is adding records when it should be a one to one match on ID

I like the approach.  But how do i know which record to delete?  My thoughts are the record/row that is just a stem (ID) and no leave data.

Grand Advisor
Posts: 17,338

Re: Merge is adding records when it should be a one to one match on ID


fpascal wrote:

 But how do i know which record to delete?  My thoughts are the record/row that is just a stem (ID) and no leave data.


Verify the issue first, is it duplicate ID or mismatch or both? 

Then I would recommend a SQL join instead since you have a bit more control and TEST it thoroughly to make sure it gives you what you want. 

 

Do you want an inner join - where you only keep records that have a value in all tables, or do you want a left/right join where you keep all ID's from a specific table, or do you want all of the values, which it seems you don't. 

Occasional Contributor
Posts: 19

Re: Merge is adding records when it should be a one to one match on ID

There should be an ID in each table for each subject enrolled in the longintudal study.  We see that during the read of all the files.  see below

 

NOTE: There were 4796 observations read from the data set WORK.SUBJ.
NOTE: There were 4796 observations read from the data set WORK.ENROLL.
NOTE: There were 4796 observations read from the data set WORK.ALLCLI00.
NOTE: There were 4796 observations read from the data set WORK.ALLCLI11.
NOTE: There were 4796 observations read from the data set WORK.OUTCOME.

 

 

enrollees = when they became part of the corhort

 

subj = more data about the enrollee

 

allcli00 = baseline collection year 1

 

allcli11 = last collection of clinicals

 

outcome = summary results over 9 years

Grand Advisor
Posts: 17,338

Re: Merge is adding records when it should be a one to one match on ID


fpascal wrote:

There should be an ID in each table for each subject enrolled in the longintudal study.  We see that during the read of all the files.  see below

 

NOTE: There were 4796 observations read from the data set WORK.SUBJ.
NOTE: There were 4796 observations read from the data set WORK.ENROLL.
NOTE: There were 4796 observations read from the data set WORK.ALLCLI00.
NOTE: There were 4796 observations read from the data set WORK.ALLCLI11.
NOTE: There were 4796 observations read from the data set WORK.OUTCOME.

 


No. 

 

We do not know that the 4,796 ID's in SUBJ are the same as the 4,796 ID's in ENROLL. There could be different ID's. 

Or you could have multiple ID's of one value in one table and multiple ID's of a different value in a different table. 

All we know is that the number of records are the same. 

 

You need to verify why you're getting the extra records. I posted code that allows you to check for multiple ID's, did it show that any table had multiple ID's?

 

 

Occasional Contributor
Posts: 19

Re: Merge is adding records when it should be a one to one match on ID

Excellent point. The light bulb just came on Smiley Happy

I am keeping the join and work with the data it produced
Respected Advisor
Posts: 4,974

Re: Merge is adding records when it should be a one to one match on ID

Even if each data source has a single observation per ID, it is possible that there are mismatches.  Some IDs might appear in one source but not the others, increasing the total number of IDs across all sources.

Respected Advisor
Posts: 4,606

Re: Merge is adding records when it should be a one to one match on ID

You could also do

 

data osteodata_multiple;
merge subj enroll allcli00 allcli11 outcome ;
by ID;
if not (first.id and last.id);
run;
PG
Occasional Contributor
Posts: 19

Re: Merge is adding records when it should be a one to one match on ID

i ended up do the below.  does your solution work cleaner?  I will test your solution now

 

 

 

data osteodata ;
merge subj(in=a) enroll(in=b) allcli00(in=c) allcli11(in=d) outcome(in=e);
by ID;
if a and b and c and d and e;
run;

Occasional Contributor
Posts: 19

Re: Merge is adding records when it should be a one to one match on ID

i ended up do the below.  does your solution work cleaner?  I will test your solution now

 

data osteodata ;
merge subj(in=a) enroll(in=b) allcli00(in=c) allcli11(in=d) outcome(in=e);
by ID;
if a and b and c and d and e;
run;

Grand Advisor
Posts: 17,338

Re: Merge is adding records when it should be a one to one match on ID


fpascal wrote:

i ended up do the below.  does your solution work cleaner?  I will test your solution now

 

data osteodata ;
merge subj(in=a) enroll(in=b) allcli00(in=c) allcli11(in=d) outcome(in=e);
by ID;
if a and b and c and d and e;
run;


This is an inner join - all ID's must be in all tables. 

Occasional Contributor
Posts: 19

Re: Merge is adding records when it should be a one to one match on ID

please help with the join

Grand Advisor
Posts: 17,338

Re: Merge is adding records when it should be a one to one match on ID


fpascal wrote:

please help with the join


I was just trying to explain, your code is correct Smiley Happy

Respected Advisor
Posts: 4,974

Re: Merge is adding records when it should be a one to one match on ID

I would recommend doing it this way:

 

data osteodata ;
merge subj(in=a) enroll(in=b) allcli00(in=c) allcli11(in=d) outcome(in=e);
by ID;
matching_pattern = put(a, 1.) || put(b, 1.) || put(c, 1.) || put(d, 1.) || put(e, 1.);
run;

 

proc freq data=osteodata;

tables matching_pattern;

run;

 

This will give you a table showing the pattern of matches and mismatches, and may lead you down a path of fixing some of the IDs.

Ask a Question
Discussion stats
  • 15 replies
  • 866 views
  • 5 likes
  • 5 in conversation