Hi,
How can I extract part of a character variable including the delimiter ("_")? For example I want to extract "MyGroup_" from the value "MyGroup_152".
data Want;
String = 'MyGroup_152';
SubString = substr(String, 1, find(String,'_'));
put _all_;
run;
data Want;
String = 'MyGroup_152';
SubString = substr(String, 1, find(String,'_'));
put _all_;
run;
Thanks very much. It worked. I don't need the
put _all_;
PUT _ALL_ is just to see the results in the log 🙂.
This will find all occurances
data _null_;
x='MyGroup_152 YourGroup_1230 pref_OurGroup_232';
start=1;
stop=length(x);
pattern_id=prxparse('/[a-zA-Z]+_/i');
call prxnext(pattern_id, start, stop, x, position, length);
do while(position>0);
str=substr(x, position, length);
call prxnext(pattern_id, start, stop, x, position, length);
put str=;
end;
run;
How can i extract a keyword from a list of words in a variable? For examples, if the variable x ='MyGroup_152 YourGroup_1230 pref_OurGroup_232' I want to extract either of "MyGroup", "Your" and "Our", whichever appears first and save into a different variable.
Hi @bayzid
Is this what you are looking for?
data want(KEEP=x my our your);
x='YourGroup_1230 pref_OurGroup_232 MyGroup_152';
array words {3} 3 my your our;
pattern_id=prxparse('/(My|Your|Our)/i');
start=1;
stop=length(x);
call prxnext(pattern_id, start, stop, x, position, length);
do while(position>0);
str=substr(x, position, length);
call prxnext(pattern_id, start, stop, x, position, length);
put str=;
found=0;
do v=1 to dim(words) until(found);
if ( lowcase(str) = lowcase(vname(words[v])) ) then
do;
found=1;
words[v]=1;
end;
end;
end;
run;
I have variable x in my dataset and want to create extract as below.
this should do it
data want(KEEP=x extract);
x='YourGroup_1230 pref_OurGroup_232 MyGroup_152';
array words {3} 3 my your our;
pattern_id=prxparse('/(My|Your|Our)/i');
start=1;
stop=length(x);
call prxnext(pattern_id, start, stop, x, position, length);
if (position>0)then
extract=substr(x, position, length);
run;
Thanks. My situation is a little bit more complicated. The key words i want to extract can have spaces but not be a part of another bigger word and it can end with a dot or comma.
We can only go by the sample of data you provided in your post.
In all cases, We have provided you with multiple ways to search and extract the words you are looking for. Now it's your turn to read the docs and extend/customize any of these solution to fit your needs. Otherwise you'll never learn and expand your skills sets.
Hope this helps
No problem. I have sorted it out. I got rid of the array statement as it was returning error message for large number words.
data ht (keep=ResidentId Fac_id Diagnosis caps ht Hypertension htc);
set ehr;
caps = compbl(tranwrd(caps, ",", " ,"));
caps = compbl(tranwrd(caps, ".", " ."));
caps = compbl(tranwrd(caps, ";", " ;"));
caps=' '||caps;
pattern_id=prxparse('/( HTN| HT| HYPTENSION| HIGH BP| HIGH BLOOD P| HBP| HYPERTENSION)/i');
start=1;
stop=length(caps);
call prxnext(pattern_id, start, stop, caps, position, length);
if (position>0)then
htc=substr(caps, position, length);
ht=(htc ne "");
run;
Some of the key words contain hyphen and backslash which returns error message.
pattern_id=prxparse('/(DEMENTIA|ALZH|DEMENTIA|DEMENTIA - ALZH|DEMETIA|DEMENI|DEMENTAI|H/O DEMENETIA)/i');
ERROR: Invalid characters "DEMENETIA)/i" after end delimiter "/" of regular expression
"/(DEMENTIA|ALZH|DEMENTIA|DEMENTIA - ALZH|DEMETIA|DEMENI|DEMENTAI|H/O DEMENETIA)/i".
ERROR: The regular expression passed to the function PRXPARSE contains a syntax error.
NOTE: Argument 1 to function PRXPARSE('/(DEMENTIA|A'[12 of 81 characters shown]) at line 24703
column 13 is invalid.
NOTE: Argument 1 to the function PRXNEXT is missing.
ERROR: Argument 1 to the function PRXNEXT must be a positive integer returned by PRXPARSE for
a valid pattern.
Is there any way to get around this problem?
You can use backslash to "escape" the next character. So if you have a character in your pattern that happens to be an special character to RegEx just prefic it with a backslash.
So in your example it is the / that is causing trouble.
So fix it like this:
"/(DEMENTIA|ALZH|DEMENTIA|DEMENTIA - ALZH|DEMETIA|DEMENI|DEMENTAI|H\/O DEMENETIA)/i"
Thanks. That worked.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.