Hello! I have been stuck on this issue for a long time and would be super grateful if anyone could help!
I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"
in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"
I have been using this code to find the position:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');
in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase.
No matter what I do, I am unable to figure this out. It is also tricky because the dataset contains thousands of observations and all of them have different test result variable lengths and orders.
Is there a way that I can get SAS to return all positions of the same phrase among this large string of test results?
Thanks!
@aiannone wrote:
(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"
in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"
I have been using this code to find the position:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');
in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)
Hello @aiannone,
I think even to get result=4 you would need to add the comma to the list of word delimiters:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');
But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).
/* Create sample data */
data have;
do id=1 to 2;
newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg , blood-pos, blood-neg, blood-pos";
output;
end;
run;
/* Find positions of phrase */
data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
result=countc(substr(newvar_test,1,_p-1),',')+1;
output;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;
Ideally, however, you would receive those blood test data in the form
Obs ID dt test 1 1 20JAN2021:07:00:00 blood-neg 2 1 21JAN2021:07:00:00 blood-neg 3 1 22JAN2021:07:00:00 blood-pos ... ... ... ...
so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.
counts the words that are scanned until the specified word is found, instead of determining the character position of the specified word in the string. Fragments of a word are not counted.
@aiannone wrote:
(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"
in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"
I have been using this code to find the position:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');
in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)
Hello @aiannone,
I think even to get result=4 you would need to add the comma to the list of word delimiters:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');
But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).
/* Create sample data */
data have;
do id=1 to 2;
newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg , blood-pos, blood-neg, blood-pos";
output;
end;
run;
/* Find positions of phrase */
data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
result=countc(substr(newvar_test,1,_p-1),',')+1;
output;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;
Ideally, however, you would receive those blood test data in the form
Obs ID dt test 1 1 20JAN2021:07:00:00 blood-neg 2 1 21JAN2021:07:00:00 blood-neg 3 1 22JAN2021:07:00:00 blood-pos ... ... ... ...
so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.
@FreelanceReinh THANK YOU SO MUCH! That worked! I am VERY appreciative!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.