Hi
I think the new prxmatch should work but I don't know how
dataset 1 has a variable that has these type of mutations: M200V/I, M200M/A/N/K, M200V/K, M200M/K, M200I/K, K65K, K65K/A/B, K65J, M200Z
dataset 2 has a list of mutations that we want to compare to:
M200V/I/K, K65A/B/C, C140J
So when I compare the variable in dataset 1 to the list in dataset 2. I want to say every one of the mutations in dataset 1 find a match in dataset 2 except K65K, K65J, M200Z
(has to match M200 or K65, then anything after that, we just need to match any letters after the digits)
Thank you in advance
could you please try the below one
data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K
M200V/K
M200M/K
M200I/K
K65K
K65K/A/B
K65J
M200Z
;
data dataset12;
set dataset1;
do i = 1 to countw(compress(mut),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut),i,'/'));
output;
end;
drop i;
run;
data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C
C140J
;
data dataset22;
set dataset2;
do i = 1 to countw(compress(mut2),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut2),i,'/'));
output;
end;
drop i;
run;
proc sort data=dataset22 nodupkey;
by code letters;
run;
proc sort data=dataset12 ;
by code letters;
run;
data want;
merge dataset12(in=a) dataset22(in=b);
by code letters;
if a and b;
run;
proc sort data=want nodupkey;
by mut;
run;
Could you please try the below code
data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K
M200V/K
M200M/K
M200I/K
K65K
K65K/A/B
K65J
M200Z
;
data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C
C140J
;
proc sort data=dataset2 nodupkey;
by code;
run;
proc sort data=dataset1 ;
by code mut;
run;
data want;
merge dataset1(in=a) dataset2(in=b);
by code;
if a and b;
run;
Thanks Jag, your code will match the 2 datasets by code but how do you compare the letters after the code?
could you please try the below one
data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K
M200V/K
M200M/K
M200I/K
K65K
K65K/A/B
K65J
M200Z
;
data dataset12;
set dataset1;
do i = 1 to countw(compress(mut),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut),i,'/'));
output;
end;
drop i;
run;
data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C
C140J
;
data dataset22;
set dataset2;
do i = 1 to countw(compress(mut2),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut2),i,'/'));
output;
end;
drop i;
run;
proc sort data=dataset22 nodupkey;
by code letters;
run;
proc sort data=dataset12 ;
by code letters;
run;
data want;
merge dataset12(in=a) dataset22(in=b);
by code letters;
if a and b;
run;
proc sort data=want nodupkey;
by mut;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.