BookmarkSubscribeRSS Feed
jahanm
Calcite | Level 5

Hi,

Let's say my data set is set up like this:

 

study_id    structure_id

1                 BLADDER

2                 .

3                  BOWEL

4                  BOWEL_SMALL

5                  GTV

6                  GTV_small

 

I want to delete records where structure_id contains "BOWEL" or "GTV".

 

I tried using this code:

data dvh_edited;
set dvh;
found=indexc(structure_id, "BOWEL", "GTV");
if found ^=0 then delete;
run;

But with that code it is looking for any letter in "BOWEL" or "GTV" so it's tagging many records.  I'm not that familiar with PROC SQL so I'd prefer to do this in a data step if it's fairly simple.  Thanks!

 

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Use the FIND function

 

if find(structure_id, "BOWEL", "i") > 0 or find(structure_id,"GTV","i") > 0 then delete;
--
Paige Miller
novinosrin
Tourmaline | Level 20

data have;
input study_id    structure_id  : $13.;
cards;
1                 BLADDER
2                 .
3                  BOWEL
4                  BOWEL_SMALL
5                  GTV
6                  GTV_small
;

data want;
set have;
found=findw(strip(structure_id), "BOWEL",'_') or findw(strip(structure_id), "GTV",'_');
if found ^=0 then delete;
run;
Astounding
PROC Star

It's easy if you are willing to code each word separately:

 

if findw(structure_id, 'GTV', , '_ ,' , 'I') then delete;

if findw(structure_id, 'BOWEL', , '_ ,' , 'I') then delete;

 

Assuming I got all the pieces of this in the right places:

 

The third parameter is starting position so can default to beginning of the string.

 

The fourth parameter is a list of word delimiters.

 

The fifth parameter asks that case be ignored, so these are both to be deleted:  GTV and gtv

 

Note that this is probably more what you want, rather than INDEX or FIND.  You probably don't want to delete observations that contain "HGTV", or "EMBOWEL".  

 

 

Hmmm......  3 responses in the first 10 minutes.  Might be time for me to shut down the computer and go to lunch.

PaigeMiller
Diamond | Level 26

FINDW finds words, if the input text string is GTVsmall without the underscore, FINDW will not find it. I don't see a problem using FIND instead of FINDW.

--
Paige Miller
Astounding
PROC Star

@PaigeMiller ,

 

I think the answer is to know something about your data.  You can construct problem strings for most simple solutions (which includes mine, just for the record):

 

WatchingTV

Embowelling

Bowel/Intestine

 

 

 

PaigeMiller
Diamond | Level 26

Good point.

--
Paige Miller
jahanm
Calcite | Level 5

Unfortunately the structure_id variable is human-entered freetext so I actually do run into the problem you stated.  There are some variables "GTVp" that I also want to eliminate, and the findw code earlier doesn't seem to filter variants like that. 

Patrick
Opal | Level 21

 


@jahanm wrote:

Unfortunately the structure_id variable is human-entered freetext so I actually do run into the problem you stated.  There are some variables "GTVp" that I also want to eliminate, and the findw code earlier doesn't seem to filter variants like that. 



Regular expressions allow you to search for text patterns - you just need to be able to formulate the rules for these patterns.

Below RegEx will match any string with bowel or gtv in it. But using a RegEx would also allow you to search for a pattern like a word consisting of GTV plus max. one additional letter.

data have;
  input study_id structure_id : $13.;
  cards;
1 BLADDER
2 .
3 BOWEL
4 BOWEL_SMALL
5 GTV
6 GTV_small
;

data want;
  set have;
  if prxmatch('/bowel|gtv/oi',structure_id)>0 then delete;
run;

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 8873 views
  • 0 likes
  • 5 in conversation