SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aiannone
Calcite | Level 5

Hello! I have been stuck on this issue for a long time and would be super grateful if anyone could help!

 

I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

 

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position: 

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E'); 

 

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. 

 

No matter what I do, I am unable to figure this out. It is also tricky because the dataset contains thousands of observations and all of them have different test result variable lengths and orders. 

 

Is there a way that I can get SAS to return all positions of the same phrase among this large string of test results?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@aiannone wrote:

(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

 

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position: 

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E'); 

 

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)


Hello @aiannone,

 

I think even to get result=4 you would need to add the comma to the list of word delimiters:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');

 

But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).

/* Create sample data */

data have;
do id=1 to 2;
  newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg ,   blood-pos, blood-neg, blood-pos";
  output;
end;
run;

/* Find positions of phrase */

data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
  result=countc(substr(newvar_test,1,_p-1),',')+1;
  output;
  call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;

Ideally, however, you would receive those blood test data in the form

Obs    ID            dt              test

 1      1    20JAN2021:07:00:00    blood-neg
 2      1    21JAN2021:07:00:00    blood-neg
 3      1    22JAN2021:07:00:00    blood-pos
 ...   ...   ...                   ...

so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.

View solution in original post

3 REPLIES 3
data_null__
Jade | Level 19

e or E

counts the words that are scanned until the specified word is found, instead of determining the character position of the specified word in the string. Fragments of a word are not counted.

FreelanceReinh
Jade | Level 19

@aiannone wrote:

(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"

 

in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"

I have been using this code to find the position: 

result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E'); 

 

in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)


Hello @aiannone,

 

I think even to get result=4 you would need to add the comma to the list of word delimiters:

result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');

 

But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).

/* Create sample data */

data have;
do id=1 to 2;
  newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg ,   blood-pos, blood-neg, blood-pos";
  output;
end;
run;

/* Find positions of phrase */

data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
  result=countc(substr(newvar_test,1,_p-1),',')+1;
  output;
  call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;

Ideally, however, you would receive those blood test data in the form

Obs    ID            dt              test

 1      1    20JAN2021:07:00:00    blood-neg
 2      1    21JAN2021:07:00:00    blood-neg
 3      1    22JAN2021:07:00:00    blood-pos
 ...   ...   ...                   ...

so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.

aiannone
Calcite | Level 5

@FreelanceReinh THANK YOU SO MUCH! That worked! I am VERY appreciative!

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 792 views
  • 0 likes
  • 3 in conversation