BookmarkSubscribeRSS Feed
BochengJing
Fluorite | Level 6

Hello SAS experts, 

 

I have a question regarding identify the multiple positions of the same word in a string. For example, I have a string containing all the medical chart notes and I want to identify the positions of the word insulin used in the note every time. 

 

The note is like: 

"Insulin glargine increased to 85 units twice a day. This may not be the right level to control your blood sugar, and so you will be called Saturday 9/7/13 to see how your blood sugar levels are doing. You will likely only need to continue with this higher dose of insulin while you are taking steroids.

INSULIN GLARGINE INJ

Directions: inject 85units under the skin twice a day do not mix with other

insulins, discard open vials after 28 days

INSULIN,ASPART 100UN/ML NOVO FLEXPEN 3ML

Directions: Inject sliding scale under the skin before meals and at bedtime (for diabetes) , use this scale to determine insulin dose.for blood sugar "

 

I have boldened, reddened and underlined the keyword insulin, which also have lower/upper case change. Is there a way to output the position for each insulin used, such as insulin_pos1=50, insulin_pos2=100, insulin_pos3=150, etc?

 

Thank you all.

3 REPLIES 3
Astounding
PROC Star

A few notes ...

 

You are probably better off using multiple observations instead of multiple variables.  For example:

 

Obsno  Position

1              50

1              100

2              1

2              150

 

Use the FIND function.  FINDW looks like it could help, but you would need to use it twice (once for "insulin" then again for "insulins").  FIND lets you find "insulin" and "insulins" at the same time by searching for "insulin".

 

It has options to ignore upper vs. lower case, and to begin the search after the starting position found by the previous search.

BochengJing
Fluorite | Level 6

The medical notes is super long -- it is one integration note sitting in one cell of the table. How do we convert it into multiple observations? 

 

Astounding
PROC Star

Here's the idea of how you can start out.

 

data want;
   set have;
   found=0;
original_recno = _n_; do i=1 to 500 until (found=0); found = find(chart_notes, 'insulin', 'i', found+1); if found > 0 then output; end; run;

This gives you an observation each time "insulin" is found (doesn't matter if it is upper or lower case).  FOUND is the position of "insulin" within the longer string.

 

The resulting data set has a couple of flaws.

 

If "insulin" is not found at all, the observation is deleted entirely.

 

It assumes "insulin" will never be found more than 500 times in a single medical note.

 

The data set may be large because the medical note gets repeated each time "insulin" is found.

 

Despite the potential flaws, this gives you something to work with, and you can consider what the best final form to the data would be.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1501 views
  • 0 likes
  • 2 in conversation