BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
rmacarthur
Quartz | Level 8

Hi SAS Friends, 

Need to remove variable junk text from a long list of varying strings within a single text variable.

Here is a sample of the text variable with the junk text (HAVE),

What needs to be removed (REMOVE),

and what needs to be kept (WANT).

/***********************************************************/

data Have;
infile datalines truncover;
input Drug_Phrases $100.;
datalines;
2 days Ivermectin
2 mg moxidectin
20% deet insect repellent
3 days Ivermectin
3 drug dose - IDA with second dose of ivermectin
4 mg moxidectin
40 mg Atorvastatin/day for 120 days P.O.
400 μg/kg Ivermectin + 400 mg Albendazole
8 mg moxidectin
ABBV-4083
ALBENDAZOLE 400 Mg ORAL TABLET [ZENTEL]
;
run;

data Remove;
infile datalines truncover;
input Remove $100.;
datalines;
2 days
2 mg
20%
insect repellent
3 days
3 drug dose - IDA with second dose of
4 mg
40 mg
/day for 120 days P.O.
400 μg/kg
+ 400 mg
8 mg
400 Mg ORAL TABLET [ZENTEL]
;
run;

data Want;
infile datalines truncover;
input Drug_Names $100.;
datalines;
Ivermectin
moxidectin
deet
Ivermectin
ivermectin
moxidectin
Atorvastatin
Ivermectin Albendazole
moxidectin
ABBV-4083
ALBENDAZOLE
;
run;

/***************************************************/

I think, given the long list of strings involved, it might be easier to create a "REMOVE" file, listing the text to be removed, then somehow use that file to extract those strings from the HAVE text file.  But beyond that don't know where to start. 

Appreciate any suggestions, 

Thank you !

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

This seems to work for most of your provided example:

You may need to modify the "remove" values so that each "phrase" is a single value. If you phrase text looks like

"blah blah Drugname other text" then the Remove values should be two lines like

blah blah

other text

as the TRANWRD function used below replace exact matches and cannot infer you may have a value in the middle to work around.

 

data _null_;
  set remove end=lastone;
  if _n_=1 then call execute('data want; set have;');
  call execute ('drug_phrases= tranwrd(drug_phrases,'||quote(strip(remove))||',"*");');
  if lastone then call execute('drug_phrases=strip(compress(drug_phrases,"*"));run;');
run;

Call Execute creates statements that are placed in an execution buffer that runs after the data step calling it completes. This creates a data step. The _n_=1 is executed only one time to write the boiler plate to start the data step.

The main bit repeated for each value in remove is to write a TRANWRD call replacing the text in the remove with an *. The Strip is needed because otherwise the || concatenate operator will pad the length of the value in remove with spaces and not actually match the value in the Drug_phrases variable. Quote is so the value of removed is valid for the syntax.

When the last record is read from Remove, the Set statement option sets a variable to indicate this, then all the * are removed with the Compress function and the Strip makes sure the remaining text is left justified (no leading spaces).

 

You might consider modifying the code to actually write a Program file using the File statement and PUT instead of Call execute so you have a reference of the code created and then execute that program file.

View solution in original post

2 REPLIES 2
ballardw
Super User

This seems to work for most of your provided example:

You may need to modify the "remove" values so that each "phrase" is a single value. If you phrase text looks like

"blah blah Drugname other text" then the Remove values should be two lines like

blah blah

other text

as the TRANWRD function used below replace exact matches and cannot infer you may have a value in the middle to work around.

 

data _null_;
  set remove end=lastone;
  if _n_=1 then call execute('data want; set have;');
  call execute ('drug_phrases= tranwrd(drug_phrases,'||quote(strip(remove))||',"*");');
  if lastone then call execute('drug_phrases=strip(compress(drug_phrases,"*"));run;');
run;

Call Execute creates statements that are placed in an execution buffer that runs after the data step calling it completes. This creates a data step. The _n_=1 is executed only one time to write the boiler plate to start the data step.

The main bit repeated for each value in remove is to write a TRANWRD call replacing the text in the remove with an *. The Strip is needed because otherwise the || concatenate operator will pad the length of the value in remove with spaces and not actually match the value in the Drug_phrases variable. Quote is so the value of removed is valid for the syntax.

When the last record is read from Remove, the Set statement option sets a variable to indicate this, then all the * are removed with the Compress function and the Strip makes sure the remaining text is left justified (no leading spaces).

 

You might consider modifying the code to actually write a Program file using the File statement and PUT instead of Call execute so you have a reference of the code created and then execute that program file.

rmacarthur
Quartz | Level 8

Hi ballardw, 

Thank you, this works great !

It will take some time to understand all the parts, but the output is correct

A very cool solution.

Thank you !

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 165 views
  • 1 like
  • 2 in conversation