- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello! I have been stuck on this issue for a long time and would be super grateful if anyone could help!
I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"
in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"
I have been using this code to find the position:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');
in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase.
No matter what I do, I am unable to figure this out. It is also tricky because the dataset contains thousands of observations and all of them have different test result variable lengths and orders.
Is there a way that I can get SAS to return all positions of the same phrase among this large string of test results?
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@aiannone wrote:
(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"
in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"
I have been using this code to find the position:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');
in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)
Hello @aiannone,
I think even to get result=4 you would need to add the comma to the list of word delimiters:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');
But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).
/* Create sample data */
data have;
do id=1 to 2;
newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg , blood-pos, blood-neg, blood-pos";
output;
end;
run;
/* Find positions of phrase */
data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
result=countc(substr(newvar_test,1,_p-1),',')+1;
output;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;
Ideally, however, you would receive those blood test data in the form
Obs ID dt test 1 1 20JAN2021:07:00:00 blood-neg 2 1 21JAN2021:07:00:00 blood-neg 3 1 22JAN2021:07:00:00 blood-pos ... ... ... ...
so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
e or E
counts the words that are scanned until the specified word is found, instead of determining the character position of the specified word in the string. Fragments of a word are not counted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@aiannone wrote:
(...) I have a variable that contains a long string of test results: "blood-neg, blood-neg, blood-pos, serum-neg, blood-pos, serum-neg, blood-pos, blood-neg, blood-pos"
in this string I am specifically looking to find the position of the phrase "serum-neg, blood-pos"
I have been using this code to find the position:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ' , 'E');
in the example above, the result I get is 4- which is correct! However, I would also like to get a result that says 6 since this is the position of the next occurrence of this phrase. (...)
Hello @aiannone,
I think even to get result=4 you would need to add the comma to the list of word delimiters:
result=findw(newvar_test, 'serum-neg, blood-pos', ' ,' , 'E');
But I would rather switch from FINDW to CALL PRXNEXT because that CALL routine gives you more flexibility in defining the search phrase. For example, there might be blanks or other white-space characters around the comma between "serum-neg" and "blood-pos" -- which you likely want to ignore -- or perhaps no blanks. The Perl regular expression used below is also case-insensitive (cf. the "i" modifier of FINDW).
/* Create sample data */
data have;
do id=1 to 2;
newvar_test="blood-neg, blood-neg, blood-pos, Serum-NEG,blood-Pos, serum-neg , blood-pos, blood-neg, blood-pos";
output;
end;
run;
/* Find positions of phrase */
data want(drop=_:);
set have;
if _n_=1 then _rid+prxparse('/serum-neg\s*,\s*blood-pos/i');
_s=1;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
do while(_p);
result=countc(substr(newvar_test,1,_p-1),',')+1;
output;
call prxnext(_rid, _s, -1, newvar_test, _p, _l);
end;
run;
Ideally, however, you would receive those blood test data in the form
Obs ID dt test 1 1 20JAN2021:07:00:00 blood-neg 2 1 21JAN2021:07:00:00 blood-neg 3 1 22JAN2021:07:00:00 blood-pos ... ... ... ...
so that you wouldn't need to rely on counting commas in long strings etc. to determine sequence numbers of measurements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@FreelanceReinh THANK YOU SO MUCH! That worked! I am VERY appreciative!