Help using Base SAS procedures

Merge (conditional?) with markers?

Reply
Contributor
Posts: 35

Merge (conditional?) with markers?

I normally use proc sql for merging tables.  What I have is 2 tables that I want to merge together.  Both tables have 2 IDs (call them ID1 and ID2).  Because of time differences, etc. neither ID is 100% accurate at matching.

What I'd like is to do a left merge merging Table 2 onto Table 1 using ID1 and create a new variable equal to 1 if a match was found using ID1.  After that, I'd like to do a second left merge of Table 2 onto Table 1 using ID2, but only if a match wasn't found using ID1 (and create a second new variable equal to 1 if a match was found using ID2).  Although not necessary, it would also be nice if there were a way to not actually do a merge, but have a third new variable equal to 1 if a match was found using ID2 but wasn't performed because a match had already been found using ID1.

Here's an example:

Table 1:

ID1ID2Data1
1ab80
2.100
.b55
5x10

Table 2:

ID1ID2Data2
.a12
1ab14
4b15
5.22

WANTED RESULT TABLE:

ID1ID2Data1Data2Merged on ID1Merged on ID2Match with ID2 but not merged
1ab8014101
2.100.000
.b5515010
5x1022100
Trusted Advisor
Posts: 3,214

Re: Merge (conditional?) with markers?

There are a lot of differences between a SQL and SAS datastep approach.

Using a SAS datastep Merge SAS(R) 9.3 Statements: Reference and the IN= dataset option SAS(R) 9.3 Data Set Options: Reference is given you quite a lot of requested logic.

What you are really wanting with matched/unmatched observations or possible using a table-lookup on the additional identifier is not clear.

As you are possible in a data-cleansing stage it could you are also not knowing all details. By that you need to investigate.

---->-- ja karman --<-----
Super User
Posts: 7,795

Re: Merge (conditional?) with markers?

This gets you your required result:

data table1;

input ID1 ID2:$ Data1;

cards;

1 ab 80

2 . 100

. b 55

5 x 10

;

run;

data table2;

input ID1 ID2:$ Data2;

cards;

. a 12

1 ab 14

4 b 15

5 . 22

;

run;

proc sort data=table1;

by ID1;

run;

proc sort data=table2;

by ID1;

run;

data

  merge1 (keep=ID1 ID2 data1 data2 var1 var3)

  rest1 (keep=ID1 ID2 data1)

  rest2 (keep=ID1 ID21 data2 rename=(ID21=ID2))

;

merge

  table1 (in=a)

  table2 (

    in=b

    rename=(ID2=ID21)

  )

;

by ID1;

if a and b and ID1 ne .

then do;

  var1 = 1;

  if ID2 = ID21

  then var3 = 1;

  else var3 = 0;

  output merge1;

end;

else do;

  var1 = 0;

  if a

  then output rest1;

  if b

  then do;

    ID2 = ID21;

    output rest2;

  end;

end;

run;

proc sort data=rest1;

by ID2;

run;

proc sort data=rest2;

by ID2;

run;

data

  merge2 (drop=ID11)

  rest3 (drop=ID11)

;

merge

  rest1 (in=a)

  rest2 (in=b rename=(ID1=ID11))

;

by ID2;

if a and b

then do;

  var2 = 1;

  output merge2;

end;

else if ID1 ne .

then do;

  if a then output rest3;

  else do;

    ID1 = ID11;

    output rest3;

  end;

end;

run;

data result;

format ID1 ID2 data1 data2 var1 var2 var3; *puts the vars in order;

set merge1 merge2 rest3;

if var1 = . then var1 = 0;

if var2 = . then var2 = 0;

if var3 = . then var3 = 0;

run;

var1 = matched by ID1

var2 = matched by ID2;

var3 = ID2 would have matched, but ID1 took preference

var2 =

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,509

Re: Merge (conditional?) with markers?

While you could probably achieve this with hashing, my inclination is to use formats instead.  This solution will fail if Table 2 contains duplicate values for either ID1 or ID2, but then you would expect that to be a problem case.  This code assumes that ID1 and ID2 are both character, although that issue can be circumvented if necessary.

First, create two formats from Table 2:

data table2_revised;

  set table2 end=done;

  length label $ 9;

  label=data2;

  fmtname='$id1fmt';

  start=id1;

  output;

  start=id2;

  output;

  if done;

  hlo='O';

  label='Not Found';

  fmtname='$id1fmt';

  output;

  fmtname='$id2fmt';

  output;

run;

proc sort data=table2_revised;

  by fmtname;

run;

proc format cntlin=table2_revised;

run;

This gives you 2 formats to work with, $id1fmt and $id2fmt, which translate ID1 and ID2 respectively into the matching value for DATA2.  There might be a little more complication if DATA 2 is numeric ... it might need to be converted to character along these lines:

label = put(data2, best9.);

At any rate, once you have the formats, the hard part is over.  You can use the formats along these lines:

data want;

  set table1;

  match_from_id1 = put(id1, $id1fmt.);

  match_from_id2 = put(id2, $id2fmt.);

run;

Then you can play with the two new variables in any way you see fit.

Good luck.

Ask a Question
Discussion stats
  • 3 replies
  • 324 views
  • 0 likes
  • 4 in conversation