BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saspert1947
Calcite | Level 5

Hi 

I think the new prxmatch should work but I don't know how

 

dataset 1 has a variable that has these type of mutations:  M200V/I,   M200M/A/N/K, M200V/K,   M200M/K,  M200I/K,  K65K, K65K/A/B,  K65J,  M200Z

 

dataset 2 has a list of mutations that we want to compare to:

M200V/I/K, K65A/B/C,  C140J

 

So when I compare the variable in dataset 1 to the list in dataset 2.  I want to say every one of the mutations in dataset 1 find a match in dataset 2 except K65K, K65J, M200Z

(has to match M200 or K65,  then anything after that, we just need to match any letters after the digits)

 

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

could you please try the below one

 

data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K 
M200V/K   
M200M/K  
M200I/K  
K65K
K65K/A/B  
K65J  
M200Z
;

data dataset12;
set dataset1;
do i = 1 to countw(compress(mut),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut),i,'/'));
output;
end;
drop i;
run;

data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C 
C140J
;

data dataset22;
set dataset2;
do i = 1 to countw(compress(mut2),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut2),i,'/'));
output;
end;
drop i;
run;


proc sort data=dataset22 nodupkey;
by code letters;
run;

proc sort data=dataset12 ;
by code letters;
run;


data want;
merge dataset12(in=a) dataset22(in=b);
by code letters;
if a and b;
run;

proc sort data=want nodupkey;
by mut;
run;
Thanks,
Jag

View solution in original post

3 REPLIES 3
Jagadishkatam
Amethyst | Level 16

Could you please try the below code

 

data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K 
M200V/K   
M200M/K  
M200I/K  
K65K
K65K/A/B  
K65J  
M200Z
;

data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C 
C140J
;

proc sort data=dataset2 nodupkey;
by code;
run;

proc sort data=dataset1 ;
by code mut;
run;


data want;
merge dataset1(in=a) dataset2(in=b);
by code;
if a and b;
run;
Thanks,
Jag
Saspert1947
Calcite | Level 5

Thanks Jag,  your code will match the 2 datasets by code but how do you compare the letters after the code?  

Jagadishkatam
Amethyst | Level 16

could you please try the below one

 

data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K 
M200V/K   
M200M/K  
M200I/K  
K65K
K65K/A/B  
K65J  
M200Z
;

data dataset12;
set dataset1;
do i = 1 to countw(compress(mut),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut),i,'/'));
output;
end;
drop i;
run;

data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C 
C140J
;

data dataset22;
set dataset2;
do i = 1 to countw(compress(mut2),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut2),i,'/'));
output;
end;
drop i;
run;


proc sort data=dataset22 nodupkey;
by code letters;
run;

proc sort data=dataset12 ;
by code letters;
run;


data want;
merge dataset12(in=a) dataset22(in=b);
by code letters;
if a and b;
run;

proc sort data=want nodupkey;
by mut;
run;
Thanks,
Jag