Solved: Re: finding multiple positions of the same phrase in a long string

aiannone · Posted 01-22-2021 09:28 AM

Hello! I have been stuck on this issue for a long time and would be super grateful if anyone could help!

I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase.

No matter what I do, I am unable to figure this out. It is also tricky because the dataset contains thousands of observations and all of them have different test result variable lengths and orders.

Is there a way that I can get SAS to return all positions of the same phrase among this large string of test results?

Thanks!

FreelanceReinh · Posted 01-22-2021 11:59 AM

@aiannone wrote:

(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)

Hello @aiannone,

I think even to get result=4 you would need to add the comma to the list of word delimiters:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');

But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).

/* Create sample data */

data have;
do id=1 to 2;
  newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg ,   blood-pos, blood-neg, blood-pos";
  output;
end;
run;

/* Find positions of phrase */

data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
  result=countc(substr(newvar_test,1,_p-1),',')+1;
  output;
  call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;

Ideally, however, you would receive those blood test data in the form

Obs    ID            dt              test

 1      1    20JAN2021:07:00:00    blood-neg
 2      1    21JAN2021:07:00:00    blood-neg
 3      1    22JAN2021:07:00:00    blood-pos
 ...   ...   ...                   ...

so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.

View solution in original post

data_null__ · Posted 01-22-2021 11:07 AM

e or E

counts the words that are scanned until the specified word is found, instead of determining the character position of the specified word in the string. Fragments of a word are not counted.

FreelanceReinh · Posted 01-22-2021 11:59 AM

@aiannone wrote:

(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)

Hello @aiannone,

I think even to get result=4 you would need to add the comma to the list of word delimiters:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');

But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).

/* Create sample data */

data have;
do id=1 to 2;
  newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg ,   blood-pos, blood-neg, blood-pos";
  output;
end;
run;

/* Find positions of phrase */

data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
  result=countc(substr(newvar_test,1,_p-1),',')+1;
  output;
  call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;

Ideally, however, you would receive those blood test data in the form

Obs    ID            dt              test

 1      1    20JAN2021:07:00:00    blood-neg
 2      1    21JAN2021:07:00:00    blood-neg
 3      1    22JAN2021:07:00:00    blood-pos
 ...   ...   ...                   ...

so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.

aiannone · Posted 01-25-2021 02:36 PM

@FreelanceReinh THANK YOU SO MUCH! That worked! I am VERY appreciative!

finding multiple positions of the same phrase in a long string

Re: finding multiple positions of the same phrase in a long string

Re: finding multiple positions of the same phrase in a long string

e or E

Re: finding multiple positions of the same phrase in a long string

Re: finding multiple positions of the same phrase in a long string

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away