Hi all!
Pleas help me with my problem..i want select text from brackets, but i don't khow how to do it.
For example i have 3 values in one variable - var
What i have:
var: 1 I'm falling. (In all the good times I find) myself
2 Longin (for change) and in the bad times (I fear myself)
3 Tell me something, boy (aren't you (tired) tryin' to fill that void?)
What i want:
1 In all the good times I find
2 I fear myself
3 aren't you (tired) tryin' to fill that void?
i wtite something like this, but for second value this dont work..
data New;
set Old;
str_pos = find(var, "(");
end_pos = find(var, ")", -length(var));
len = end_pos - str_pos;
want = substrn(var, str_pos+1, len-1);
run;
for second value my code prodused this: for change) and in the bad times (I fear myself
Thank you!
Why don't you try the following program?
Jim
DATA Lyrics;
INFILE Datalines MISSOVER;
LENGTH Text $1026;
INPUT Text & $;
DATALINES;
I'm falling. (In all the good times I find) myself
Longin (for change) and in the bad times (I fear myself)
Tell me something, boy (aren't you (tired) tryin' to fill that void?)
;
RUN;
DATA Extracted_Text;
DROP _:;
SET Lyrics;
_Left_Paren_Cnt = COUNTC(Text, '(');
IF _Left_Paren_Cnt > 2 THEN
DO;
PUTLOG 'WARNING: Encountered more left parentheses than expected. Using whole record.';
END;
ELSE
IF _Left_Paren_Cnt = 2 THEN
DO;
_First_Right_Paren = INDEX(Text, ')');
_Next_Right_Paren = INDEX(SUBSTR(Text, _First_Right_Paren + 1), ')') + _First_Right_Paren;
_First_Left_Paren = INDEX(Text, '(');
_Next_Left_Paren = INDEX(SUBSTR(Text, _First_Left_Paren + 1), '(') + _First_Left_Paren;
IF _Next_Left_Paren < _First_Right_Paren THEN
DO;
Text = SUBSTR(Text, (_First_Left_Paren + 1), (_Next_Right_Paren - _First_Left_Paren - 1));
END;
ELSE
DO;
Text = SUBSTR(Text, (_Next_Left_Paren + 1), (_Next_Right_Paren - _Next_Left_Paren - 1));
END;
END;
ELSE
IF _Left_Paren_Cnt = 1 THEN
DO;
Text = SUBSTR(Text, INDEX(Text, '(') + 1, INDEX(Text, ')') - INDEX(Text, '(') - 1);
END;
ELSE
DO;
PUTLOG 'WARNING: No left parentheses are in the text. Using whole record.';
END;
RUN;
If there are two parenthetical expressions in one variable, do you consistently want the second one?
Jim
Why don't you try the following program?
Jim
DATA Lyrics;
INFILE Datalines MISSOVER;
LENGTH Text $1026;
INPUT Text & $;
DATALINES;
I'm falling. (In all the good times I find) myself
Longin (for change) and in the bad times (I fear myself)
Tell me something, boy (aren't you (tired) tryin' to fill that void?)
;
RUN;
DATA Extracted_Text;
DROP _:;
SET Lyrics;
_Left_Paren_Cnt = COUNTC(Text, '(');
IF _Left_Paren_Cnt > 2 THEN
DO;
PUTLOG 'WARNING: Encountered more left parentheses than expected. Using whole record.';
END;
ELSE
IF _Left_Paren_Cnt = 2 THEN
DO;
_First_Right_Paren = INDEX(Text, ')');
_Next_Right_Paren = INDEX(SUBSTR(Text, _First_Right_Paren + 1), ')') + _First_Right_Paren;
_First_Left_Paren = INDEX(Text, '(');
_Next_Left_Paren = INDEX(SUBSTR(Text, _First_Left_Paren + 1), '(') + _First_Left_Paren;
IF _Next_Left_Paren < _First_Right_Paren THEN
DO;
Text = SUBSTR(Text, (_First_Left_Paren + 1), (_Next_Right_Paren - _First_Left_Paren - 1));
END;
ELSE
DO;
Text = SUBSTR(Text, (_Next_Left_Paren + 1), (_Next_Right_Paren - _Next_Left_Paren - 1));
END;
END;
ELSE
IF _Left_Paren_Cnt = 1 THEN
DO;
Text = SUBSTR(Text, INDEX(Text, '(') + 1, INDEX(Text, ')') - INDEX(Text, '(') - 1);
END;
ELSE
DO;
PUTLOG 'WARNING: No left parentheses are in the text. Using whole record.';
END;
RUN;
OK, that should not be too hard. I have some things to do right now, but let me look at this in a little be later.
Jim
Hi, @Christen,
Lot's of things to do today, but I finally got a chance to look at the code. Please try the below revision.
@ChrisNZ has a good point that sometimes it's good to clean up the data instead of spending too much time writing overly complex code. However, do please try the program below and let me know how it goes.
Jim
%LET Max_Paren = 3;
%LET db = *;
DATA Extracted_Text;
DROP _:;
SET Lyrics;
ARRAY _L_Paren [*] _L_Paren1 - _L_Paren&Max_Paren;
ARRAY _R_Paren [*] _R_Paren1 - _R_Paren&Max_Paren;
LINK Get_Parens;
IF _L_Paren_Cnt > 1 THEN
DO;
IF _L_Paren[_L_Paren_Cnt] < _R_Paren[_L_Paren_Cnt - 1] THEN
DO;
Text = SUBSTR(Text, _L_Paren[_L_Paren_Cnt - 1] + 1, _R_Paren[_L_Paren_Cnt] - _L_Paren[_L_Paren_Cnt - 1] - 1);
&db PUTLOG "&Nte1 A. " _L_Paren_Cnt= _L_Paren[_L_Paren_Cnt - 1]= _R_Paren[_L_Paren_Cnt]= Text=;
END;
ELSE
DO;
Text = SUBSTR(Text, _L_Paren[_L_Paren_Cnt] + 1, _R_Paren[_L_Paren_Cnt] - _L_Paren[_L_Paren_Cnt] - 1);
&db PUTLOG "&Nte1 B. " _L_Paren_Cnt= _L_Paren[_L_Paren_Cnt]= _R_Paren[_L_Paren_Cnt]= Text=;
END;
END;
ELSE
IF _L_Paren_Cnt = 1 THEN
DO;
Text = SUBSTR(Text, _L_Paren[1] + 1, _R_Paren[1] - _L_Paren[1] - 1);
&db PUTLOG "&Nte1 C. " _L_Paren_Cnt= _L_Paren[1]= _R_Paren[1]= Text=;
END;
ELSE
DO;
PUTLOG 'WARNING: No left parentheses are in the text. Using whole record.';
END;
******;
RETURN;
******;
**********;
Get_Parens:
**********;
_Temp = LENGTHN(Text);
&db PUTLOG "&Nte2 ";
&db PUTLOG "&Nte1 " _N_= _Temp= ' (Text length)';
_L_Paren_Cnt = COUNTC(Text, '(');
_p = 0;
DO _i = 1 TO _L_Paren_Cnt
UNTIL (_p = 0);
_p = FIND(Text, '(', (_p + 1));
_L_Paren[_i] = _p;
END;
_R_Paren_Cnt = COUNTC(Text, ')');
_p = 0;
DO _i = 1 TO _R_Paren_Cnt
UNTIL (_p = 0);
_p = FIND(Text, ')', (_p + 1));
_R_Paren[_i] = _p;
END;
IF _L_Paren_Cnt = _R_Paren_Cnt + 1 THEN
DO;
_R_Paren[_L_Paren_Cnt] = LENGTHN(Text) + 1;
&db PUTLOG "&Nte1 D. " _R_Paren[_L_Paren_Cnt]= ' (Text length)';
END;
******;
RETURN;
******;
RUN;
OK, great. 👍 Perhaps we can mark this topic as solved, then?
Jim
One way:
data HAVE;
input STR :& $80.;
cards;
I'm falling. (In all the good times I find) myself
Longin (for change) and in the bad times (I fear myself)
Tell me something, boy (aren't you (tired) tryin' to fill that void?)
run;
data WANT;
set HAVE;
STR1=prxchange('s/ ( \( [^(]* ) \( ( [^)]* ) \) ( [^)]* \) ) /\1^\2#\3/x',-1,STR);
STR2=prxchange('s/ .* \( ( .* ) \) [^)(]* \Z/\1/x',1,STR1);
STR3=translate(STR2,'()','^#');
run;
STR | STR1 | STR2 | STR3 |
---|---|---|---|
I'm falling. (In all the good times I find) myself | I'm falling. (In all the good times I find) myself | In all the good times I find | In all the good times I find |
Longin (for change) and in the bad times (I fear myself) | Longin (for change) and in the bad times (I fear myself) | I fear myself | I fear myself |
Tell me something, boy (aren't you (tired) tryin' to fill that void?) | Tell me something, boy (aren't you ^tired# tryin' to fill that void?) | aren't you ^tired# tryin' to fill that void? | aren't you (tired) tryin' to fill that void? |
SAS uses Perl regular expressions. There are countless tutorials on the web, that's how I learnt.
Rather than write crazy complicated code to cater for weird data, it's best to clean the data before processing.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.