BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Christen
Fluorite | Level 6

Hi all! 
Pleas help me with my problem..i want select text from brackets, but i don't khow how to do it.
For example i have 3 values in one variable - var
What i have:
var:  1   I'm falling. (In all the good times I find) myself
        2   Longin (for change) and in the bad times (I fear myself) 

        3   Tell me something, boy (aren't you (tired) tryin' to fill that void?) 

What i want:

        1     In all the good times I find

        2     I fear myself

        3     aren't you (tired) tryin' to fill that void?

 

i wtite something like this, but for second value this dont work..
data New;
set Old;
str_pos = find(var, "(");
end_pos = find(var, ")", -length(var));
len = end_pos - str_pos;
want = substrn(var, str_pos+1, len-1);
run;

for second value my code prodused this: for change) and in the bad times (I fear myself  

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

Why don't you try the following program?

 

Jim

 

DATA	Lyrics;
	INFILE	Datalines	MISSOVER;
	LENGTH	Text		$1026;
	INPUT	Text	&	$;
DATALINES;
I'm falling. (In all the good times I find) myself
Longin (for change) and in the bad times (I fear myself) 
Tell me something, boy (aren't you (tired) tryin' to fill that void?) 
;
RUN;

DATA	Extracted_Text;
	DROP	_:;
	SET	Lyrics;

	_Left_Paren_Cnt						=	COUNTC(Text, '(');

	IF	_Left_Paren_Cnt					>	2					THEN
		DO;
			PUTLOG	'WARNING:  Encountered more left parentheses than expected.  Using whole record.';
		END;
	ELSE
	IF	_Left_Paren_Cnt					=	2					THEN
		DO;
			_First_Right_Paren			=	INDEX(Text, ')');
			_Next_Right_Paren			=	INDEX(SUBSTR(Text, _First_Right_Paren + 1), ')') + _First_Right_Paren;
			_First_Left_Paren			=	INDEX(Text, '(');
			_Next_Left_Paren			=	INDEX(SUBSTR(Text, _First_Left_Paren + 1), '(') + _First_Left_Paren;
			IF	_Next_Left_Paren		<	_First_Right_Paren	THEN
				DO;
					Text				=	SUBSTR(Text, (_First_Left_Paren + 1), (_Next_Right_Paren - _First_Left_Paren - 1));
				END;
			ELSE
				DO;
					Text				=	SUBSTR(Text, (_Next_Left_Paren + 1), (_Next_Right_Paren - _Next_Left_Paren - 1));
				END;
		END;
	ELSE
	IF	_Left_Paren_Cnt					=	1					THEN
		DO;
			Text						=	SUBSTR(Text, INDEX(Text, '(') + 1, INDEX(Text, ')') - INDEX(Text, '(') - 1);
		END;
	ELSE
		DO;
			PUTLOG	'WARNING:  No left parentheses are in the text.  Using whole record.';
		END;
RUN;

View solution in original post

13 REPLIES 13
jimbarbour
Meteorite | Level 14

If there are two parenthetical expressions in one variable, do you consistently want the second one?

 

Jim

Christen
Fluorite | Level 6

Hi Jim! Yes, i need the second one
jimbarbour
Meteorite | Level 14

Why don't you try the following program?

 

Jim

 

DATA	Lyrics;
	INFILE	Datalines	MISSOVER;
	LENGTH	Text		$1026;
	INPUT	Text	&	$;
DATALINES;
I'm falling. (In all the good times I find) myself
Longin (for change) and in the bad times (I fear myself) 
Tell me something, boy (aren't you (tired) tryin' to fill that void?) 
;
RUN;

DATA	Extracted_Text;
	DROP	_:;
	SET	Lyrics;

	_Left_Paren_Cnt						=	COUNTC(Text, '(');

	IF	_Left_Paren_Cnt					>	2					THEN
		DO;
			PUTLOG	'WARNING:  Encountered more left parentheses than expected.  Using whole record.';
		END;
	ELSE
	IF	_Left_Paren_Cnt					=	2					THEN
		DO;
			_First_Right_Paren			=	INDEX(Text, ')');
			_Next_Right_Paren			=	INDEX(SUBSTR(Text, _First_Right_Paren + 1), ')') + _First_Right_Paren;
			_First_Left_Paren			=	INDEX(Text, '(');
			_Next_Left_Paren			=	INDEX(SUBSTR(Text, _First_Left_Paren + 1), '(') + _First_Left_Paren;
			IF	_Next_Left_Paren		<	_First_Right_Paren	THEN
				DO;
					Text				=	SUBSTR(Text, (_First_Left_Paren + 1), (_Next_Right_Paren - _First_Left_Paren - 1));
				END;
			ELSE
				DO;
					Text				=	SUBSTR(Text, (_Next_Left_Paren + 1), (_Next_Right_Paren - _Next_Left_Paren - 1));
				END;
		END;
	ELSE
	IF	_Left_Paren_Cnt					=	1					THEN
		DO;
			Text						=	SUBSTR(Text, INDEX(Text, '(') + 1, INDEX(Text, ')') - INDEX(Text, '(') - 1);
		END;
	ELSE
		DO;
			PUTLOG	'WARNING:  No left parentheses are in the text.  Using whole record.';
		END;
RUN;
Christen
Fluorite | Level 6
Thank you very much, Jim!!!
It is work. But, I saw that the data have errors, for example, an unclosed parenthesis, which looks like this: Longin (for change) and in the bad times (I (fear myself)

Сould you please help me to account for this problem?
In this case i need to get someting like that: (I (fear myself) )
jimbarbour
Meteorite | Level 14

OK, that should not be too hard.  I have some things to do right now, but let me look at this in a little be later.

 

Jim

Christen
Fluorite | Level 6
Ok, thank you! )
jimbarbour
Meteorite | Level 14

Hi, @Christen,

 

Lot's of things to do today, but I finally got a chance to look at the code.  Please try the below revision.

 

@ChrisNZ has a good point that sometimes it's good to clean up the data instead of spending too much time writing overly complex code.  However, do please try the program below and let me know how it goes.

 

Jim

 

%LET	Max_Paren							=	3;
%LET	db									=	*;

DATA	Extracted_Text;
	DROP	_:;
	SET	Lyrics;
	ARRAY	_L_Paren	[*]		_L_Paren1	-	_L_Paren&Max_Paren;
	ARRAY	_R_Paren	[*]		_R_Paren1	-	_R_Paren&Max_Paren;

	LINK	Get_Parens;

	IF	_L_Paren_Cnt						>	1							THEN
		DO;
			IF	_L_Paren[_L_Paren_Cnt]		<	_R_Paren[_L_Paren_Cnt - 1]	THEN
				DO;
					Text					=	SUBSTR(Text, _L_Paren[_L_Paren_Cnt - 1] + 1, _R_Paren[_L_Paren_Cnt] - _L_Paren[_L_Paren_Cnt - 1] - 1);
&db					PUTLOG	"&Nte1  A. "  _L_Paren_Cnt=  _L_Paren[_L_Paren_Cnt - 1]=  _R_Paren[_L_Paren_Cnt]=  Text=;
				END;
			ELSE
				DO;
					Text					=	SUBSTR(Text, _L_Paren[_L_Paren_Cnt] + 1, _R_Paren[_L_Paren_Cnt] - _L_Paren[_L_Paren_Cnt] - 1);
&db					PUTLOG	"&Nte1  B. "  _L_Paren_Cnt=  _L_Paren[_L_Paren_Cnt]=  _R_Paren[_L_Paren_Cnt]=  Text=;
				END;		
		END;
	ELSE
		IF	_L_Paren_Cnt					=	1							THEN
			DO;
				Text						=	SUBSTR(Text, _L_Paren[1] + 1, _R_Paren[1] - _L_Paren[1] - 1);
&db				PUTLOG	"&Nte1  C. "  _L_Paren_Cnt=  _L_Paren[1]=  _R_Paren[1]=  Text=;
			END;
	ELSE
		DO;
			PUTLOG	'WARNING:  No left parentheses are in the text.  Using whole record.';
		END;

	******;
	RETURN;
	******;

	**********;
	Get_Parens:
	**********;
		_Temp								=	LENGTHN(Text);
&db		PUTLOG	"&Nte2  ";
&db		PUTLOG	"&Nte1  "  _N_=  _Temp=  ' (Text length)';

		_L_Paren_Cnt						=	COUNTC(Text, '(');
		_p									=	0;
		DO	_i								=	1	TO	_L_Paren_Cnt	
			UNTIL						(_p	=	0); 
			_p								=	FIND(Text, '(', (_p + 1));
			_L_Paren[_i]					=	_p;
		END;

		_R_Paren_Cnt						=	COUNTC(Text, ')');
		_p									=	0;
		DO	_i								=	1	TO	_R_Paren_Cnt	
			UNTIL						(_p	=	0); 
			_p								=	FIND(Text, ')', (_p + 1));
			_R_Paren[_i]					=	_p;
		END;

		IF	_L_Paren_Cnt					=	_R_Paren_Cnt	+	1	THEN
			DO;
				_R_Paren[_L_Paren_Cnt]		=	LENGTHN(Text)	+	1;
&db				PUTLOG	"&Nte1  D. "  _R_Paren[_L_Paren_Cnt]=  ' (Text length)';
			END;
	******;
	RETURN;
	******;
RUN;
Christen
Fluorite | Level 6
Thank you! It works=)
jimbarbour
Meteorite | Level 14

OK, great. 👍  Perhaps we can mark this topic as solved, then?

 

Jim

ChrisNZ
Tourmaline | Level 20

One way:

data HAVE;
  input STR :& $80.;
cards;
I'm falling. (In all the good times I find) myself
Longin (for change) and in the bad times (I fear myself) 
Tell me something, boy (aren't you (tired) tryin' to fill that void?) 
run;

data WANT;
  set HAVE;
  STR1=prxchange('s/ ( \( [^(]* ) \( ( [^)]* ) \) ( [^)]* \) ) /\1^\2#\3/x',-1,STR);
  STR2=prxchange('s/ .* \( ( .* ) \) [^)(]* \Z/\1/x',1,STR1);
  STR3=translate(STR2,'()','^#');
run;

STR STR1 STR2 STR3
I'm falling. (In all the good times I find) myself I'm falling. (In all the good times I find) myself In all the good times I find In all the good times I find
Longin (for change) and in the bad times (I fear myself) Longin (for change) and in the bad times (I fear myself) I fear myself I fear myself
Tell me something, boy (aren't you (tired) tryin' to fill that void?) Tell me something, boy (aren't you ^tired# tryin' to fill that void?) aren't you ^tired# tryin' to fill that void? aren't you (tired) tryin' to fill that void?

 

Christen
Fluorite | Level 6
Thank you a lot!!!
This code cuts down the program significantly. Could you tell me please where I can read about regular expressions in sas?
Is it possible to get someting like this: (I (fear myself) ) if i have string with issue: Longin (for change) and in the bad times (I (fear myself) ?
ChrisNZ
Tourmaline | Level 20

SAS uses Perl regular expressions. There are countless tutorials on the web, that's how I learnt.
Rather than write crazy complicated code to cater for weird data, it's best to clean the data before processing. 

Christen
Fluorite | Level 6
Thank you)

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1987 views
  • 7 likes
  • 3 in conversation