Hi Everyone. I m currently using SAS 9.4 version . I have to merge two tables A and B based on SUBJID and LABID. The SUBJID is of 10 characters and LABID is of 12 characters . My requirement is first I have to check if I can match SUBJID and first 12 digits of LABID.
SUBJID | LABID |
EQ98765599 | 987653456986 |
MN65473829 | 634528754100 |
OP32419877 | 362567375709 |
SUBJID | LABID |
EQ98765599 | 987653456986 |
MN65473829 | 634528754187 |
OP32419877 | 362567375709 |
The subjects highlighted in green are exact match where as the subject highlighted in red are not exact match because only first 10 characters are matching in LABID and 11th ,12th characters are different in LABID , so they wont come in output. Is it possible to merge the data like if 12 digits of LABID are matching then output the data , if 12 digits of LABID are not matching then check if 10 digits of LABID are matching then output the data.
The message the @Ksharp referred to appears to be deleted, but my guess is that it contains a suggestion that you present test data as data step code, like this:
data have1;
length SUBJID LABID $20;
input subjid labid;
cards;
EQ98765599 987653456986
MN65473829 634528754100
OP32419877 1362567375709
;run;
Data have2;
length SUBJID LABID $20;
input subjid labid;
cards;
EQ98765599 987653456986
MN65473829 634528754187
OP32419877 1362567375709
;run;
I think the easiest way to make the match is to use SQL:
proc sql;
create table want as select
have1.*,
have2.labid as labid2,
case
when have1.labid=have2.labid then 1
else 0
end as match
from have1 join have2 on have1.subjid=have2.subjid and substr(have1.labid,1,10)=substr(have2.labid,1,10)
;
This will merge all the data on the criteria you mentioned. The variable MATCH will contain a 1 if the two LABIDs are identical, and otherwise a 0. If there are multiple occurences of each SUBJID and LABID in your data, you may want to add additional criteria, as you will otherwise get a cartesian product (all combinations of possible matches from the two tables).
Hello @Prudhvi_007
it is indeed possible to have an output of matching and non-matching observations. One can use either the SAS dataset merge or Proc SQL whichever one is comfortable with.
These questions are not answered by your sample data, but the answers would significantly impact the appropriate responses:
The clearest way is to make a new variable that has just the first 10 characters of the LABID. Let's call it LABSUBJ.
data fix1;
set one;
labsubj = substr(labid,1,10);
run;
data fix2;
set two;
labsubj = substr(labid,1,10);
run;
data want;
merge fix1 fix2(rename=(labid=labid2));
by subjid labsubj;
exact = labid = labid2;
run;
proc print;
run;
Result
Obs SUBJID LABID labsubj labid2 exact 1 EQ98765599 987653456986 9876534569 987653456986 1 2 MN65473829 634528754100 6345287541 634528754187 0 3 OP32419877 362567375709 3625673757 362567375709 1
If there are multiple observations of for the same values of SUBJID and LABSUBJ in both datasets then you probably need to do a many to many join which will require SQL code instead.
proc sql ;
create table both as
select a.*
, coalesce(substr(a.labid,1,10),substr(b.labid,1,10)) as LABSUBJ
, b.labid as labid2
, a.labid=b.labid as exact
from one a
full join two b
on a.subjid = b.subjid
and substr(a.labid,1,10) eqt b.labid
order by 1,2,3,4
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.