DATA Step, Macro, Functions and more

one index function question

Accepted Solution Solved
Reply
Contributor
Posts: 66
Accepted Solution

one index function question

 X=”GTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTG”

 

a=”-CCTCA”

b=”A-CTCA”

c=”AC-TCA”

d=”ACC-CA”

e=”ACCT-A”

f=”ACCTC-”

 

“-“ is any one letter, such as “G” “T” or “A”;

I thought a,b,c,d,e,f are all equal to pattern “ACCTCA”

How can I write an index function to find the position of a (or b, or c, or d, or e, or f) in X; 

 

Thanks.


Accepted Solutions
Solution
‎10-03-2016 09:12 AM
Respected Advisor
Posts: 3,894

Re: one index function question

You say "pattern" I think RegEx. Below a variation of what has already been proposed.

data sample;
  input string:$60.;
  match_pos= prxmatch('/(.CCTCA|A.CTCA|AC.TCA|ACC.CA|ACCT.A|ACCTC.)/oi',string);
  match_flg= prxmatch('/(.CCTCA|A.CTCA|AC.TCA|ACC.CA|ACCT.A|ACCTC.)/oi',string)>0;
  datalines;
GTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTG
GTTCACTAGCACACTCAAACAGACACCATGGTGCACCTG
;
run;

View solution in original post


All Replies
Super Contributor
Posts: 251

Re: one index function question

I'm no great Perl maven, but using prxmatch will give you what you're after. Here's how I did it:

 

data _null_;
infile cards;
retain x "GTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTG";
length pattern $ 9;
input pattern;
match = prxmatch('/' || strip(pattern) || '/', x);
put match=;
cards;
.CCTCA
A.CTCA
AC.TCA
ACC.CA
ACCT.A
ACCTC.
;
run;

In Perl, the wild card is '.', not '-'.

 

When this code is run, the pattern is found at position 12 every time. If it weren't found, like with index, it would return 0.

Contributor
Posts: 66

Re: one index function question

Thanks a lot. I never used prxmatch function. I will try it.

Respected Advisor
Posts: 3,894

Re: one index function question

find() or index() don't allow for wildcard searches.

 

You could either use a Regular Expression or a Like operator in a where clause.

 

Would you just need the first position where one of your patterns matches? And would ALL patterns have to match or only one of them?

Super User
Posts: 17,840

Re: one index function question

Also, look into COMPGED, Complev and other functions. 

Not sure they'll give you enough control though. 

Contributor
Posts: 66

Re: one index function question

 X=”GTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTG”
pattern="ACCTCA" or pattern="A-CTCA"


Actually I want to check " if X contains pattern "; 
I found I can not  use this logistic judgement,

So I thought maybe I can use index() function,

if result >0 it means X contains pattern;

if result =0 it means X does not contains pattern.

 

if someone can suggest a better method, it will be great. 

Solution
‎10-03-2016 09:12 AM
Respected Advisor
Posts: 3,894

Re: one index function question

You say "pattern" I think RegEx. Below a variation of what has already been proposed.

data sample;
  input string:$60.;
  match_pos= prxmatch('/(.CCTCA|A.CTCA|AC.TCA|ACC.CA|ACCT.A|ACCTC.)/oi',string);
  match_flg= prxmatch('/(.CCTCA|A.CTCA|AC.TCA|ACC.CA|ACCT.A|ACCTC.)/oi',string)>0;
  datalines;
GTTCACTAGCAACCTCAAACAGACACCATGGTGCACCTG
GTTCACTAGCACACTCAAACAGACACCATGGTGCACCTG
;
run;
Contributor
Posts: 66

Re: one index function question

Thanks a lot. It is very helpful.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 462 views
  • 2 likes
  • 4 in conversation