BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bayzid
Obsidian | Level 7

Hi,
How can I extract part of a character variable including the delimiter ("_")? For example I want to extract "MyGroup_" from the value "MyGroup_152".

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star
data Want;
  String = 'MyGroup_152';
  SubString = substr(String, 1, find(String,'_'));
  put _all_;
run;

View solution in original post

14 REPLIES 14
SASKiwi
PROC Star
data Want;
  String = 'MyGroup_152';
  SubString = substr(String, 1, find(String,'_'));
  put _all_;
run;
bayzid
Obsidian | Level 7

Thanks very much. It worked. I don't need the

put _all_;
SASKiwi
PROC Star

PUT _ALL_ is just to see the results in the log 🙂.

AhmedAl_Attar
Ammonite | Level 13

@bayzid 

This will find all occurances 

data _null_;
	x='MyGroup_152 YourGroup_1230 pref_OurGroup_232';
	start=1;
	stop=length(x);
	pattern_id=prxparse('/[a-zA-Z]+_/i');
	call prxnext(pattern_id, start, stop, x, position, length);
	do while(position>0);
		str=substr(x, position, length);
		call prxnext(pattern_id, start, stop, x, position, length);
		put str=;
	end;
run;
bayzid
Obsidian | Level 7

How can i extract a keyword from a list of words in a variable? For examples, if the variable x ='MyGroup_152 YourGroup_1230 pref_OurGroup_232' I want to extract either of "MyGroup", "Your" and "Our", whichever appears first and save into a different variable.

 

AhmedAl_Attar
Ammonite | Level 13

Hi @bayzid 

Is this what you are looking for?

data want(KEEP=x my our your);
	x='YourGroup_1230 pref_OurGroup_232 MyGroup_152';
	array words {3} 3 my your our;
	pattern_id=prxparse('/(My|Your|Our)/i');
	start=1;
	stop=length(x);
	call prxnext(pattern_id, start, stop, x, position, length);
	do while(position>0);
		str=substr(x, position, length);
		call prxnext(pattern_id, start, stop, x, position, length);
		put str=;
		found=0;
		do v=1 to dim(words) until(found);
			if ( lowcase(str) = lowcase(vname(words[v])) ) then
			do;
				found=1;
				words[v]=1;
			end;
		end;
	end;
run;
BayzidurRahman
Obsidian | Level 7

I have variable x in my dataset and want to create extract as below.

BayzidurRahman_0-1684011860784.png

 

AhmedAl_Attar
Ammonite | Level 13

this should do it

data want(KEEP=x extract);
	x='YourGroup_1230 pref_OurGroup_232 MyGroup_152';
	array words {3} 3 my your our;
	pattern_id=prxparse('/(My|Your|Our)/i');
	start=1;
	stop=length(x);
	call prxnext(pattern_id, start, stop, x, position, length);
	if (position>0)then
		extract=substr(x, position, length);

run;
BayzidurRahman
Obsidian | Level 7

Thanks. My situation is a little bit more complicated. The key words i want to extract can have spaces but not be a part of another bigger word and it can end with a dot or comma.

BayzidurRahman_0-1684019551382.png

 

AhmedAl_Attar
Ammonite | Level 13

@BayzidurRahman 

We can only go by the sample of data you provided in your post.

In all cases, We have provided you with multiple ways to search and extract the words you are looking for. Now it's your turn to read the docs and extend/customize any of these solution to fit your needs. Otherwise you'll never learn and expand your skills sets.

 

Hope this helps

BayzidurRahman
Obsidian | Level 7

No problem. I have sorted it out. I got rid of the array statement as it was returning error message for large number words.

data ht (keep=ResidentId Fac_id Diagnosis caps ht Hypertension htc);
	set ehr;
			caps = compbl(tranwrd(caps, ",", " ,"));
			caps = compbl(tranwrd(caps, ".", " ."));
			caps = compbl(tranwrd(caps, ";", " ;"));
				caps='     '||caps;
  	pattern_id=prxparse('/( HTN| HT| HYPTENSION| HIGH BP| HIGH BLOOD P| HBP| HYPERTENSION)/i');
	start=1;
	stop=length(caps);
	call prxnext(pattern_id, start, stop, caps, position, length);
	if (position>0)then
		htc=substr(caps, position, length);
ht=(htc ne "");
run;
bayzid
Obsidian | Level 7

Some of the key words contain hyphen and backslash which returns error message.

pattern_id=prxparse('/(DEMENTIA|ALZH|DEMENTIA|DEMENTIA - ALZH|DEMETIA|DEMENI|DEMENTAI|H/O DEMENETIA)/i');


ERROR: Invalid characters "DEMENETIA)/i" after end delimiter "/" of regular expression
       "/(DEMENTIA|ALZH|DEMENTIA|DEMENTIA - ALZH|DEMETIA|DEMENI|DEMENTAI|H/O DEMENETIA)/i".
ERROR: The regular expression passed to the function PRXPARSE contains a syntax error.
NOTE: Argument 1 to function PRXPARSE('/(DEMENTIA|A'[12 of 81 characters shown]) at line 24703
      column 13 is invalid.
NOTE: Argument 1 to the function PRXNEXT is missing.
ERROR: Argument 1 to the function PRXNEXT must be a positive integer returned by PRXPARSE for
       a valid pattern.

Is there any way to get around this problem?

Tom
Super User Tom
Super User

You can use backslash to "escape" the next character.  So if you have a character in your pattern that happens to be an special character to RegEx just prefic it with a backslash.

So in your example it is the / that is causing trouble.

So fix it like this:

"/(DEMENTIA|ALZH|DEMENTIA|DEMENTIA - ALZH|DEMETIA|DEMENI|DEMENTAI|H\/O DEMENETIA)/i"
bayzid
Obsidian | Level 7

Thanks. That worked.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2821 views
  • 2 likes
  • 5 in conversation