Hello,
I am trying to create a flag variable for certain values in a string variable. I tried the “findw” statement but is not specific enough to capture certain same values as shown below without having to use endless number of entries. Any idea how I can a write a brief code that would capture cases as highlighted without having to put accompanying semicolon, or question marks?
If findw(dose, "") or findw(dose, "") then dose_3_months = 1;
? other frequency:once every 3 month;other route:inject?
? other frequency:q3months ;?
? other frequency:q3months;?
200 units?every 3 months?head, neck, and shoulders
every 3 months?
every 3 months?given into/under the skin
every 3 months?head/neck injections
I'm thinking that the goal has not been clearly explained. Is the goal to find which text strings contain
every 3 months
but not those that contain
q3months
??
Or do you want string that contain either of those two strings? If so, please provide strings that don't contain either so we can test our code.
Or do you want something else?
Thanks for your quick response! One flag variable to capture all these cases without having to coping them all in the findw statement or a similar one.
@ama220 wrote:
Thanks for your quick response! One flag variable to capture all these cases without having to coping them all in the findw statement or a similar one.
It is still not clear to me. When you say "all these cases", are there just the two, the purple and the light blue? Or are there more that would be considered a match? If so, explain what would be considered a match, and provide a data set with a large variety of cases that match.
We also need in the data set some strings that are not considered a match (which was requested earlier, and I request it again).
Thanks for your help, all the examples I provided are a match: every three months or q3months. The challenge is the values are presented as shown in the example I shared so I am trying to shorten the code to capture these cases.
Ok, I asked twice now for examples that don't match.
every three months or q3months.
Is it really "every three months" or "every 3 months"??
It is not at all clear what you tried. FINDW() should work fine for your example strings.
data have;
input string $80.;
cards4;
? other frequency:once every 3 month;other route:inject?
? other frequency:q3months ;?
? other frequency:q3months;?
200 units?every 3 months?head, neck, and shoulders
every 3 months?
every 3 months?given into/under the skin
every 3 months?head/neck injections
;;;;
data want;
set have;
dose_3_months = findw(string,'every 3 month',' ;:?')
or findw(string,'every 3 months',' ;:?')
or findw(string,'q3months',' ;:?')
;
run;
proc print;
run;
Thanks! This seems to be the right way; just had an error:
findw(dose_vbm, (string,'every 3 month',' ;:?') or findw(string,'every 3 months',' ;
-
22
2223! :?') or findw(string,'q3months',' ;:?') then dose_vbm_freq = 'months' ;else
ERROR 22-322: Syntax error, expecting one of the following: (, ), [, {.
Is the name of your variable DOSE_VBM or is it called STRING like the variable in my example?
Whichever name it is using you only include the variable once in each of the FINDW() calls. And you probably want to always search the same variable every time, otherwise I doubt the logic will work right.
if findw(dose_vbm, 'every 3 month',' ;:?')
or findw(dose_vbm,'every 3 months',' ;:?')
or findw(dose_vbm,'q3months',' ;:?')
then dose_vbm_freq = 'months' ;
else dose_vbm_freq ='??????';
Hi @ama220 ,
I would recommend looking into using Regular Expression for such text parsing scenarios.
I used @Tom data step to create the have data set
data have;
input string $80.;
cards4;
? other frequency:once every 3 month;other route:inject?
? other frequency:q3months ;?
? other frequency:q3months;?
200 units?every 3 months?head, neck, and shoulders
every 3 months?
every 3 months?given into/under the skin
every 3 months?head/neck injections
;;;;
run;
data want;
set work.have;
dose_x_months = ifn(prxmatch('/(every |q)?\d\s?months*/', string),1,0); /* This will search for various monthly frequencies rather than just 3 */
run;
Here are couple of useful links related to RegEx
Hope this helps
Thanks, unofratnely, I am still missing instances of flagging "month" instances in the character variable in question b/c my data is messy. The only thing that seems to work is using a data step and the index variable. Any idea of what would be a better command to use. I tried many things to no avail and given that I have other similar character variables to code, I need a better way.
if index(dose_vbm,'month') > 0;
data have;
input string $80.;
cards4;
? other frequency:once every 3 month;other route:inject?
? other frequency:q3months ;?
? other frequency:q3months;?
200 units?every 3 months?head, neck, and shoulders
every 3 months?
every 3 months?given into/under the skin
every 3 months?head/neck injections
;;;;
data want;
set have;
pid=prxparse('/every\s+\d+\s+month(s)?|q\d+month(s)?/io');
if prxmatch(pid,string) then do;
call prxsubstr(pid,string,p,l);
want=substr(string,p,l);
end;
drop pid p l;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.