BookmarkSubscribeRSS Feed
Prudhvi_007
Calcite | Level 5

Hi Everyone. I m currently using SAS 9.4 version . I have to merge two tables A and B based on SUBJID and LABID. The SUBJID is of 10 characters and LABID is of 12 characters . My requirement is first I have to check if I can match SUBJID and first 12 digits of LABID. 

SUBJIDLABID
EQ98765599987653456986
MN65473829634528754100
OP32419877362567375709

 

SUBJIDLABID
EQ98765599987653456986
MN65473829634528754187
OP32419877362567375709

 

The subjects highlighted in green are exact match where as the subject highlighted in red are not exact match because only first 10 characters are matching in LABID and 11th ,12th characters are different in LABID , so they wont come in output. Is it possible to merge the data like if 12 digits of LABID are matching then output the data , if 12 digits of LABID are not matching then check if 10 digits of LABID are matching then output the data. 

5 REPLIES 5
s_lassen
Meteorite | Level 14

The message the @Ksharp referred to appears to be deleted, but my guess is that it contains a suggestion that you present test data as data step code, like this:

data have1;
length SUBJID LABID $20;
input subjid labid;
cards;
EQ98765599 987653456986
MN65473829 634528754100
OP32419877 1362567375709
;run;

Data have2;
length SUBJID LABID $20;
input subjid labid;
cards;
EQ98765599 987653456986
MN65473829 634528754187
OP32419877 1362567375709
;run;

I think the easiest way to make the match is to use SQL:

proc sql;
  create table want as select 
    have1.*,
    have2.labid as labid2, 
    case
      when have1.labid=have2.labid then 1
      else 0
    end as match
  from have1 join have2 on have1.subjid=have2.subjid and substr(have1.labid,1,10)=substr(have2.labid,1,10)
  ;

This will merge all the data on the criteria you mentioned. The variable MATCH will contain a 1 if the two LABIDs are identical, and otherwise a 0. If there are multiple occurences of each SUBJID and LABID in your data, you may want to add additional criteria, as you will otherwise get a cartesian product (all combinations of possible matches from the two tables).

Sajid01
Meteorite | Level 14

Hello @Prudhvi_007 
it is indeed possible to have an output of matching and non-matching observations. One can use either the SAS dataset merge or Proc SQL whichever one is comfortable with.

 

mkeintz
PROC Star

 

These questions are not answered by your sample data, but the answers would significantly impact the appropriate responses:

 

  1. Do you expect any many-to-many matches of SUBJID/12-digit-LABID?  If yes, then do you want to keep implied cartesian crossing of all SUBJID/12-digit matches?
  2. If you do not expect the above conditino, then are your 12-digit LABID's constructed in such a way that any of them share the first 10 digits?  If so, and assuming a given row is matched on SUBJID, but no 12-digit-LABID match, what do you want to do if there is a set of many-to-many matches on SUBJID/10-digit-LABID?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

The clearest way is to make a new variable that has just the first 10 characters of the LABID.  Let's call it LABSUBJ.

data fix1;
  set one;
  labsubj = substr(labid,1,10);
run;

data fix2;
  set two;
  labsubj = substr(labid,1,10);
run;

data want;
  merge fix1 fix2(rename=(labid=labid2));
  by subjid labsubj;
  exact = labid = labid2;
run;

proc print;
run;

Result

Obs      SUBJID         LABID         labsubj         labid2       exact

 1     EQ98765599    987653456986    9876534569    987653456986      1
 2     MN65473829    634528754100    6345287541    634528754187      0
 3     OP32419877    362567375709    3625673757    362567375709      1

If there are multiple observations of for the same values of SUBJID and LABSUBJ in both datasets then you probably need to do a many to many join which will require SQL code instead.

proc sql ;
create table both as 
  select a.*
       , coalesce(substr(a.labid,1,10),substr(b.labid,1,10)) as LABSUBJ
       , b.labid as labid2
       , a.labid=b.labid as exact
  from one a 
  full join two b
  on a.subjid = b.subjid
  and substr(a.labid,1,10) eqt b.labid
  order by 1,2,3,4
;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1082 views
  • 2 likes
  • 6 in conversation