BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aiannone
Calcite | Level 5

Hello! I have been stuck on this issue for a long time and would be super grateful if anyone could help!

 

I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

 

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position: 

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E'); 

 

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. 

 

No matter what I do, I am unable to figure this out. It is also tricky because the dataset contains thousands of observations and all of them have different test result variable lengths and orders. 

 

Is there a way that I can get SAS to return all positions of the same phrase among this large string of test results?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@aiannone wrote:

(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

 

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position: 

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E'); 

 

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)


Hello @aiannone,

 

I think even to get result=4 you would need to add the comma to the list of word delimiters:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');

 

But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).

/* Create sample data */

data have;
do id=1 to 2;
  newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg ,   blood-pos, blood-neg, blood-pos";
  output;
end;
run;

/* Find positions of phrase */

data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
  result=countc(substr(newvar_test,1,_p-1),',')+1;
  output;
  call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;

Ideally, however, you would receive those blood test data in the form

Obs    ID            dt              test

 1      1    20JAN2021:07:00:00    blood-neg
 2      1    21JAN2021:07:00:00    blood-neg
 3      1    22JAN2021:07:00:00    blood-pos
 ...   ...   ...                   ...

so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.

View solution in original post

3 REPLIES 3
data_null__
Jade | Level 19

e or E

counts the words that are scanned until the specified word is found, instead of determining the character position of the specified word in the string. Fragments of a word are not counted.

FreelanceReinh
Jade | Level 19

@aiannone wrote:

(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

 

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position: 

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E'); 

 

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)


Hello @aiannone,

 

I think even to get result=4 you would need to add the comma to the list of word delimiters:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');

 

But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).

/* Create sample data */

data have;
do id=1 to 2;
  newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg ,   blood-pos, blood-neg, blood-pos";
  output;
end;
run;

/* Find positions of phrase */

data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
  result=countc(substr(newvar_test,1,_p-1),',')+1;
  output;
  call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;

Ideally, however, you would receive those blood test data in the form

Obs    ID            dt              test

 1      1    20JAN2021:07:00:00    blood-neg
 2      1    21JAN2021:07:00:00    blood-neg
 3      1    22JAN2021:07:00:00    blood-pos
 ...   ...   ...                   ...

so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.

aiannone
Calcite | Level 5

@FreelanceReinh THANK YOU SO MUCH! That worked! I am VERY appreciative!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 558 views
  • 0 likes
  • 3 in conversation