topic Re: finding multiple positions of the same phrase in a long string in SAS Programming

finding multiple positions of the same phrase in a long string

aiannone — Fri, 22 Jan 2021 14:28:13 GMT

Hello! I have been stuck on this issue for a long time and would be super grateful if anyone could help!

I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase.

No matter what I do, I am unable to figure this out. It is also tricky because the dataset contains thousands of observations and all of them have different test result variable lengths and orders.

Is there a way that I can get SAS to return all positions of the same phrase among this large string of test results?

Thanks!

Re: finding multiple positions of the same phrase in a long string

data_null__ — Fri, 22 Jan 2021 16:07:52 GMT

e or E

counts the words that are scanned until the specified word is found, instead of determining the character position of the specified word in the string. Fragments of a word are not counted.

Re: finding multiple positions of the same phrase in a long string

FreelanceReinh — Fri, 22 Jan 2021 16:59:00 GMT

@aiannone wrote:

(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)

Hello @aiannone,

I think even to get result=4 you would need to add the comma to the list of word delimiters:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');

But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).

/* Create sample data */

data have;
do id=1 to 2;
  newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg ,   blood-pos, blood-neg, blood-pos";
  output;
end;
run;

/* Find positions of phrase */

data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
  result=countc(substr(newvar_test,1,_p-1),',')+1;
  output;
  call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;

Ideally, however, you would receive those blood test data in the form

Obs    ID            dt              test

 1      1    20JAN2021:07:00:00    blood-neg
 2      1    21JAN2021:07:00:00    blood-neg
 3      1    22JAN2021:07:00:00    blood-pos
 ...   ...   ...                   ...

so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.

Re: finding multiple positions of the same phrase in a long string

aiannone — Mon, 25 Jan 2021 19:36:45 GMT

@FreelanceReinh THANK YOU SO MUCH! That worked! I am VERY appreciative!