BookmarkSubscribeRSS Feed
shantiv
Calcite | Level 5

Hello,

 

I am trying to merge two datasets based on a numeric Id and at the same time creating a match flag for them. I need all records from x1 data whereas only those records that match from x2. The code I am using is shown below.

 

data matches nonmatches;
merge x1(in=a) x2(in=b);
by id;
if a and b then id_flag = 'Y';
if a then output matches;
else output nonmatches;
run;

 

While I try to look up the resulting nonmatches again in the datasets, I see a few ids that match but have not been merged or flagged. Could someone please tell me where I am going wrong or any other way to do this.

 

Thank you in advance.

9 REPLIES 9
ballardw
Super User

Example data?

 

Do both of the data sets have multiple records with the same value of Id?  If so you'll only get matches for the set with the lowest number of duplicates.

 

It will help to show a small set of data from BOTH sets and what you expect for a result. You need not include all of the other variables than Id but 1 or 2 will help show which record goes with which.

shantiv
Calcite | Level 5

Thank you for the response.

Here is how the data looks like. It is a many to one merge. X1 data has a single entry for each id whereas X2 can have multiple entries with same id.

 

X1

Id

Name

Address

12

ABC

23 N W Ave

23

XYZ

1022 Rue De

34

PRAX

1 Mclean Street

45

AFHT

509 Rochester Ave

 

X2

 

Id

ManufactureId

Status

12

290498

Pending

12

290498

 

23

48

Active

23

48

Pending

45

2455

 

Tom
Super User Tom
Super User

Posting HTML tables of data is better than photographs of data.  But posting actual data step code that creates the dataset is much better for sharing with others.

data x1 ;
  length id 8 name $20 address $50 ;
  infile cards dsd dlm='|' truncover;
  input id name address;
cards;
12|ABC|23 N W Ave
23|XYZ|1022 Rue De
34|PRAX|1 Mclean Street
45|AFHT|509 Rochester Ave
;
 
data x2 ;
  length id 8 manufactureid 8 status $10 ;
  infile cards dsd dlm='|' truncover;
  input id manufactureid status;
cards;
12|290498|Pending
12|290498|.
23|48|Active
23|48|Pending
45|2455|.
;

Here is what SAS will do if you merge by ID.  I saved the IN= variable to permanent variables so we can see their status in the saved dataset.

data want;
  merge x1(in=a) x2(in=b);
  by id;
  in_x1=a;
  in_x2=b;
run;

Which rows do you want to output to which of your two tables?

Obs    id    name    address              manufactureid    status     in_x1    in_x2

 1     12    ABC     23 N W Ave               290498       Pending      1        1
 2     12    ABC     23 N W Ave               290498                    1        1
 3     23    XYZ     1022 Rue De                  48       Active       1        1
 4     23    XYZ     1022 Rue De                  48       Pending      1        1
 5     34    PRAX    1 Mclean Street               .                    1        0
 6     45    AFHT    509 Rochester Ave          2455                    1        1

 

art297
Opal | Level 21

Can you supply some example data where you are noticing the problem you described. A solution will be dependent upon whether this is a one to one, one to many, or many to many merge.

 

Art, CEO, AnalystFinder.com

 

shantiv
Calcite | Level 5

Thank you for the response.

Here is how the data looks like. It is a many to one merge. X1 data has a single entry for each id whereas X2 can have multiple entries with same id.

X1

Id

Name

Address

12

ABC

23 N W Ave

23

XYZ

1022 Rue De

34

PRAX

1 Mclean Street

45

AFHT

509 Rochester Ave

 

X2

Id

ManufactureId

Status

12

290498

Pending

12

290498

 

23

48

Active

23

48

Pending

45

2455

 

art297
Opal | Level 21

and what do you expect the two output datasets to look like? Your code appears to be functioning correctly.

 

Art, CEO, AnalystFinder.com

 

shantiv
Calcite | Level 5

The output should be X1 with X2 data merged columns. If there are duplicates due to 'Status' column in X2, i can deal with it later.

Output

Id

Name

Address

ManufactureId

Status

12

ABC

23 N W Ave

290498

Pending

23

XYZ

1022 Rue De

48

Active

34

PRAX

1 Mclean Street

 

 

45

AFHT

509 Rochester Ave

2455

 

Tom
Super User Tom
Super User

Why did you pick STATUS=ACTIVE for ID=23?  It also has a record where STATUS=PENDING.  

How should the program know which record to use?

art297
Opal | Level 21

and, in addition to @Tom's question, what do you expect/want to see in the nonmatches dataset?

 

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 993 views
  • 0 likes
  • 4 in conversation