DATA Step, Macro, Functions and more

Merge is skipping few numeric ids that match in two datasets

Reply
New Contributor
Posts: 4

Merge is skipping few numeric ids that match in two datasets

Hello,

 

I am trying to merge two datasets based on a numeric Id and at the same time creating a match flag for them. I need all records from x1 data whereas only those records that match from x2. The code I am using is shown below.

 

data matches nonmatches;
merge x1(in=a) x2(in=b);
by id;
if a and b then id_flag = 'Y';
if a then output matches;
else output nonmatches;
run;

 

While I try to look up the resulting nonmatches again in the datasets, I see a few ids that match but have not been merged or flagged. Could someone please tell me where I am going wrong or any other way to do this.

 

Thank you in advance.

Super User
Posts: 11,343

Re: Merge is skipping few numeric ids that match in two datasets

Example data?

 

Do both of the data sets have multiple records with the same value of Id?  If so you'll only get matches for the set with the lowest number of duplicates.

 

It will help to show a small set of data from BOTH sets and what you expect for a result. You need not include all of the other variables than Id but 1 or 2 will help show which record goes with which.

New Contributor
Posts: 4

Re: Merge is skipping few numeric ids that match in two datasets

Thank you for the response.

Here is how the data looks like. It is a many to one merge. X1 data has a single entry for each id whereas X2 can have multiple entries with same id.

 

X1

Id

Name

Address

12

ABC

23 N W Ave

23

XYZ

1022 Rue De

34

PRAX

1 Mclean Street

45

AFHT

509 Rochester Ave

 

X2

 

Id

ManufactureId

Status

12

290498

Pending

12

290498

 

23

48

Active

23

48

Pending

45

2455

 

Super User
Super User
Posts: 7,039

Re: Merge is skipping few numeric ids that match in two datasets

Posting HTML tables of data is better than photographs of data.  But posting actual data step code that creates the dataset is much better for sharing with others.

data x1 ;
  length id 8 name $20 address $50 ;
  infile cards dsd dlm='|' truncover;
  input id name address;
cards;
12|ABC|23 N W Ave
23|XYZ|1022 Rue De
34|PRAX|1 Mclean Street
45|AFHT|509 Rochester Ave
;
 
data x2 ;
  length id 8 manufactureid 8 status $10 ;
  infile cards dsd dlm='|' truncover;
  input id manufactureid status;
cards;
12|290498|Pending
12|290498|.
23|48|Active
23|48|Pending
45|2455|.
;

Here is what SAS will do if you merge by ID.  I saved the IN= variable to permanent variables so we can see their status in the saved dataset.

data want;
  merge x1(in=a) x2(in=b);
  by id;
  in_x1=a;
  in_x2=b;
run;

Which rows do you want to output to which of your two tables?

Obs    id    name    address              manufactureid    status     in_x1    in_x2

 1     12    ABC     23 N W Ave               290498       Pending      1        1
 2     12    ABC     23 N W Ave               290498                    1        1
 3     23    XYZ     1022 Rue De                  48       Active       1        1
 4     23    XYZ     1022 Rue De                  48       Pending      1        1
 5     34    PRAX    1 Mclean Street               .                    1        0
 6     45    AFHT    509 Rochester Ave          2455                    1        1

 

PROC Star
Posts: 7,467

Re: Merge is skipping few numeric ids that match in two datasets

Can you supply some example data where you are noticing the problem you described. A solution will be dependent upon whether this is a one to one, one to many, or many to many merge.

 

Art, CEO, AnalystFinder.com

 

New Contributor
Posts: 4

Re: Merge is skipping few numeric ids that match in two datasets

Thank you for the response.

Here is how the data looks like. It is a many to one merge. X1 data has a single entry for each id whereas X2 can have multiple entries with same id.

X1

Id

Name

Address

12

ABC

23 N W Ave

23

XYZ

1022 Rue De

34

PRAX

1 Mclean Street

45

AFHT

509 Rochester Ave

 

X2

Id

ManufactureId

Status

12

290498

Pending

12

290498

 

23

48

Active

23

48

Pending

45

2455

 

PROC Star
Posts: 7,467

Re: Merge is skipping few numeric ids that match in two datasets

and what do you expect the two output datasets to look like? Your code appears to be functioning correctly.

 

Art, CEO, AnalystFinder.com

 

New Contributor
Posts: 4

Re: Merge is skipping few numeric ids that match in two datasets

The output should be X1 with X2 data merged columns. If there are duplicates due to 'Status' column in X2, i can deal with it later.

Output

Id

Name

Address

ManufactureId

Status

12

ABC

23 N W Ave

290498

Pending

23

XYZ

1022 Rue De

48

Active

34

PRAX

1 Mclean Street

 

 

45

AFHT

509 Rochester Ave

2455

 

Super User
Super User
Posts: 7,039

Re: Merge is skipping few numeric ids that match in two datasets

Why did you pick STATUS=ACTIVE for ID=23?  It also has a record where STATUS=PENDING.  

How should the program know which record to use?

PROC Star
Posts: 7,467

Re: Merge is skipping few numeric ids that match in two datasets

and, in addition to @Tom's question, what do you expect/want to see in the nonmatches dataset?

 

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 9 replies
  • 123 views
  • 0 likes
  • 4 in conversation