BookmarkSubscribeRSS Feed
ama220
Obsidian | Level 7

 

Hello,

I am trying to create a flag variable for certain values in a string variable. I tried the findw” statement but is not specific enough to capture certain same values as shown below without having to use endless number of entries. Any idea how I can a write a brief code that would capture cases as highlighted without having to put accompanying semicolon, or question marks?

 

If findw(dose, "") or findw(dose, "")  then dose_3_months = 1;

 

? other frequency:once every 3 month;other route:inject?

? other frequency:q3months ;?

? other frequency:q3months;?

200 units?every 3 months?head, neck, and shoulders

every 3 months?

every 3 months?given into/under the skin

every 3 months?head/neck injections

 

ama220_0-1680985268812.png

 

12 REPLIES 12
PaigeMiller
Diamond | Level 26

I'm thinking that the goal has not been clearly explained. Is the goal to find which text strings contain

 

every 3 months

 

but not those that contain

 

q3months

 

??

 

Or do you want string that contain either of those two strings? If so, please provide strings that don't contain either so we can test our code.

 

Or do you want something else?

--
Paige Miller
ama220
Obsidian | Level 7

Thanks for your quick response! One flag variable to capture all these cases without having to coping them all in the findw statement or a similar one. 

PaigeMiller
Diamond | Level 26

@ama220 wrote:

Thanks for your quick response! One flag variable to capture all these cases without having to coping them all in the findw statement or a similar one. 


It is still not clear to me. When you say "all these cases", are there just the two, the purple and the light blue? Or are there more that would be considered a match? If so, explain what would be considered a match, and provide a data set with a large variety of cases that match.

 

We also need in the data set some strings that are not considered a match (which was requested earlier, and I request it again).

--
Paige Miller
ama220
Obsidian | Level 7

Thanks for your help, all the examples I provided are a match: every three months or q3months. The challenge is the values are presented as shown in the example I shared so I am trying to shorten the code to capture these cases. 

PaigeMiller
Diamond | Level 26

Ok, I asked twice now for examples that don't match.

 


every three months or q3months.

Is it really "every three months" or "every 3 months"??

--
Paige Miller
Tom
Super User Tom
Super User

It is not at all clear what you tried. FINDW() should work fine for your example strings.

data have;
  input string $80.;
cards4;
? other frequency:once every 3 month;other route:inject?
? other frequency:q3months ;?
? other frequency:q3months;?
200 units?every 3 months?head, neck, and shoulders
every 3 months?
every 3 months?given into/under the skin
every 3 months?head/neck injections
;;;;

data want;
  set have;
  dose_3_months = findw(string,'every 3 month',' ;:?')
               or findw(string,'every 3 months',' ;:?')
               or findw(string,'q3months',' ;:?')
  ;
run;

proc print;
run;
ama220
Obsidian | Level 7

Thanks! This seems to be the right way; just had an error:

 

findw(dose_vbm, (string,'every 3 month',' ;:?') or findw(string,'every 3 months',' ;
-
22
2223! :?') or findw(string,'q3months',' ;:?') then dose_vbm_freq = 'months' ;else
ERROR 22-322: Syntax error, expecting one of the following: (, ), [, {.

Tom
Super User Tom
Super User

Is the name of your variable DOSE_VBM or is it called STRING like the variable in my example?

Whichever name it is using you only include the variable once in each of the FINDW() calls. And you probably want to always search the same variable every time, otherwise I doubt the logic will work right.

if findw(dose_vbm, 'every 3 month',' ;:?')
 or findw(dose_vbm,'every 3 months',' ;:?')
 or findw(dose_vbm,'q3months',' ;:?') 
then dose_vbm_freq = 'months' ;
else dose_vbm_freq  ='??????';

 

 

 

AhmedAl_Attar
Rhodochrosite | Level 12

Hi @ama220 ,

I would recommend looking into using Regular Expression for such text parsing scenarios.

I used @Tom  data step to create the have data set 

data have;
  input string $80.;
cards4;
? other frequency:once every 3 month;other route:inject?
? other frequency:q3months ;?
? other frequency:q3months;?
200 units?every 3 months?head, neck, and shoulders
every 3 months?
every 3 months?given into/under the skin
every 3 months?head/neck injections
;;;;
run;

data want;
	set work.have;
	dose_x_months = ifn(prxmatch('/(every |q)?\d\s?months*/', string),1,0); /* This will search for various monthly frequencies rather than just 3 */
run;

Here are couple of useful links related to RegEx 

Hope this helps

ama220
Obsidian | Level 7

Thanks, unofratnely, I am still missing instances of flagging "month" instances in the character variable in question b/c my data is messy. The only thing that seems to work is using a data step and the index variable. Any idea of what would be a better command to use. I tried many things to no avail and given that I have other similar character variables to code, I need a better way. 

 

if index(dose_vbm,'month') > 0;

Reeza
Super User
Show a better example of your data where records are missed if you want further help with the code.

As an FYI - ChatGPT will help you build regular expressions to find data. One of it's more useful features IMO.
Ksharp
Super User
data have;
  input string $80.;
cards4;
? other frequency:once every 3 month;other route:inject?
? other frequency:q3months ;?
? other frequency:q3months;?
200 units?every 3 months?head, neck, and shoulders
every 3 months?
every 3 months?given into/under the skin
every 3 months?head/neck injections
;;;;

data want;
 set have;
 pid=prxparse('/every\s+\d+\s+month(s)?|q\d+month(s)?/io');
 if prxmatch(pid,string) then do;
  call prxsubstr(pid,string,p,l);
  want=substr(string,p,l);
 end;
drop pid p l;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 863 views
  • 6 likes
  • 6 in conversation