BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saspert1947
Calcite | Level 5

Hi 

I think the new prxmatch should work but I don't know how

 

dataset 1 has a variable that has these type of mutations:  M200V/I,   M200M/A/N/K, M200V/K,   M200M/K,  M200I/K,  K65K, K65K/A/B,  K65J,  M200Z

 

dataset 2 has a list of mutations that we want to compare to:

M200V/I/K, K65A/B/C,  C140J

 

So when I compare the variable in dataset 1 to the list in dataset 2.  I want to say every one of the mutations in dataset 1 find a match in dataset 2 except K65K, K65J, M200Z

(has to match M200 or K65,  then anything after that, we just need to match any letters after the digits)

 

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

could you please try the below one

 

data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K 
M200V/K   
M200M/K  
M200I/K  
K65K
K65K/A/B  
K65J  
M200Z
;

data dataset12;
set dataset1;
do i = 1 to countw(compress(mut),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut),i,'/'));
output;
end;
drop i;
run;

data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C 
C140J
;

data dataset22;
set dataset2;
do i = 1 to countw(compress(mut2),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut2),i,'/'));
output;
end;
drop i;
run;


proc sort data=dataset22 nodupkey;
by code letters;
run;

proc sort data=dataset12 ;
by code letters;
run;


data want;
merge dataset12(in=a) dataset22(in=b);
by code letters;
if a and b;
run;

proc sort data=want nodupkey;
by mut;
run;
Thanks,
Jag

View solution in original post

3 REPLIES 3
Jagadishkatam
Amethyst | Level 16

Could you please try the below code

 

data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K 
M200V/K   
M200M/K  
M200I/K  
K65K
K65K/A/B  
K65J  
M200Z
;

data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C 
C140J
;

proc sort data=dataset2 nodupkey;
by code;
run;

proc sort data=dataset1 ;
by code mut;
run;


data want;
merge dataset1(in=a) dataset2(in=b);
by code;
if a and b;
run;
Thanks,
Jag
Saspert1947
Calcite | Level 5

Thanks Jag,  your code will match the 2 datasets by code but how do you compare the letters after the code?  

Jagadishkatam
Amethyst | Level 16

could you please try the below one

 

data dataset1;
input mut :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut));
cards;
M200V/I
M200M/A/N/K 
M200V/K   
M200M/K  
M200I/K  
K65K
K65K/A/B  
K65J  
M200Z
;

data dataset12;
set dataset1;
do i = 1 to countw(compress(mut),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut),i,'/'));
output;
end;
drop i;
run;

data dataset2;
input mut2 :$100.;
code=prxchange('s/(\w\d+)(\w.*)/$1/',-1,compress(mut2));
cards;
M200V/I/K
K65A/B/C 
C140J
;

data dataset22;
set dataset2;
do i = 1 to countw(compress(mut2),'/');
letters=prxchange('s/(\w\d+)(\w.*)/$2/',-1,scan(compress(mut2),i,'/'));
output;
end;
drop i;
run;


proc sort data=dataset22 nodupkey;
by code letters;
run;

proc sort data=dataset12 ;
by code letters;
run;


data want;
merge dataset12(in=a) dataset22(in=b);
by code letters;
if a and b;
run;

proc sort data=want nodupkey;
by mut;
run;
Thanks,
Jag

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 515 views
  • 1 like
  • 2 in conversation