Hello SAS community.
I am trying to extract a text segment based on a keyword from long, unstandardized text strings. I need the text string to include 30 characters prior to the key word and 50 characters after the key word. I am using the following code to do this. The starting position of the extracted string appears to be accurate, but the end point varies dramatically. I also get many warning notes in the log. I've tried any number of parenthetical configurations around the FIND arguments to no avail. Does anyone have any advice?
[jobtext is the variable that I am searching through. Experience is the key word that I'm looking for.]
extract= substr(jobtext,FIND(jobtext,"experience",'i')-20, FIND(jobtext,"experience",'i')+30);
Thank you,
Brandi
From your description my guess is that you substring outside the variable boundaries. (Supplying any error /warning messages helps).
For the length parameter try to use a combination of length() and min() function.
It sounds like you are likely getting values for your extract variable but possibly not exactly what you want, correct?
What types of warnings are you getting?
Is there any pattern that actually indicates the end of the characters after the word that you want that might help?
It may be helpful to post some example data that demonstrates the issues.
Yes, I'm getting something, but not getting exactly what I want.
The error message is "Invalid second argument to function SUBSTR." This only happens when I add the '-20' after the FIND argument.
I am not setting the parameters outside of my variable. The extracted portion cuts off before the end of the variable (they are verrrrry long text strings).
I am thinking that the issue has to do with my adding '-20' and '+30' to the FIND argument but I don't know of another way to capture the text around my keyword. Unfortunately there are no patterns that I can use to set an end value.
How would I use length and min within the FIND function?
Thanks for all of the feedback. I appreciate the assistance.
I'm not sure if I can/should post my raw data here. I'm searching through job postings, so the text is long and free-form. Here's an example of one of my extracts:
"specialty. previous experience in geriatric primary care, or icu/er rn is preferred. knowledge of medicare reimbursement and coding, etc., and electronic medical records used for documentation. excellent oral and written communication skills, experienced working in a collaborative care team environment and functioning autonomously with minimal supervision. ability to provide exceptional customer service to patients about medical home team: medical home team is a healthcare management company with a unique medical practice model; aligning office-based physicians with specia"
Thanks again!
FIND(jobtext,"experience",'i')-20
If FIND() is < 20 then argument becomes negative.
Use ifn to return the appropriate value instead.
From your description my guess is that you substring outside the variable boundaries. (Supplying any error /warning messages helps).
For the length parameter try to use a combination of length() and min() function.
Oh, you mean just set the length of the extract variable to what I want. Of course. Thanks. That works. I'd still like to know why the original syntax is incorrect or if there's a better way to extract text around a keyword. The error messages are arising on cases where the argument is valid (i.e.I'm not asking SAS to return a value before the beginning/after the end).
Q 1. "why the original syntax is incorrect",
A 1. Please RTM of SUBSTR(). The third element is length of the extract, not the position of ending point.
Q 2. " if there's a better way",
A 2. Not sure about that, but there are definitely alternatives, for one, using PRX functions:
extract=prxchange('s/.+(.{20}experience.{30}).+/$1/io', -1, jobtext);
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.