Hi All
data have ;
text="The subject experienced [AESER] adverse event of [AESEV] intensity, reported as '[VBA]' ([LLT]).";
output;
text="[Samsung - the first major smartphone maker to release a foldable smartphone]";
output ;
text="School districts across the [COUNTRY] are on the cusp of integrating [new technology in K-12 classrooms] by such as [SOCIAL_MEDIA] use as early as preschool.exploring unexpected [CURRICULUM] such as Facebook/Twitter use as early as preschool.";
output;
run ;
i want to split the entire text into multiple variables based on the square/Angular brackets ([), both start and close brackets along with the text present in the brackets should be present in each variable as shown below. Tried with do loop and array, something missing in the loop . Can any one suggest how to perform this split.
VAR1 | VAR2 | VAR3 | VAR4 | VAR5 | VAR6 | VAR7 | VAR8 |
The subject experienced | [AESER] | adverse event of | [AESEV] | intensity, reported as | [VBA]' | ([LLT]) | |
[Samsung - the first major smartphone maker to release a foldable smartphone] | |||||||
School districts across the | [COUNTRY] | are on the cusp of integrating | [new technology in K-12 classrooms] | by such as | [SOCIAL_MEDIA] | use as early as preschool.exploring unexpected | such as Facebook/Twitter use as early as preschool. |
@keen_sas wrote:
Hi All
data have ;
text="The subject experienced [AESER] adverse event of [AESEV] intensity, reported as '[VBA]' ([LLT]).";
output;
text="[Samsung - the first major smartphone maker to release a foldable smartphone]";
output ;
text="School districts across the [COUNTRY] are on the cusp of integrating [new technology in K-12 classrooms] by such as [SOCIAL_MEDIA] use as early as preschool.exploring unexpected [CURRICULUM] such as Facebook/Twitter use as early as preschool.";
output;
run ;
i want to split the entire text into multiple variables based on the square/Angular brackets ([), both start and close brackets along with the text present in the brackets should be present in each variable as shown below. Tried with do loop and array, something missing in the loop . Can any one suggest how to perform this split.
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 The subject experienced [AESER] adverse event of [AESEV] intensity, reported as [VBA]' ([LLT]) [Samsung - the first major smartphone maker to release a foldable smartphone] School districts across the [COUNTRY] are on the cusp of integrating [new technology in K-12 classrooms] by such as [SOCIAL_MEDIA] use as early as preschool.exploring unexpected such as Facebook/Twitter use as early as preschool.
You have some requirements that are moderately odd, such as why does [VBA]' have the ' as part of the value, and why does ([LLT]) have both parentheses.
Perhaps if you describe how the result will actual be used we can make some additional suggestions.
This may help you get started but your rules for when to include the [] as part of the value need a lot of explanation.
Note, your example data set cuts off the third example row because you did not define a maximum length for TEXT and the first line sets the length of the text variable.
data have ; length text $ 1000; text="The subject experienced [AESER] adverse event of [AESEV] intensity, reported as '[VBA]' ([LLT])."; output; text="[Samsung - the first major smartphone maker to release a foldable smartphone]"; output ; text="School districts across the [COUNTRY] are on the cusp of integrating [new technology in K-12 classrooms] by such as [SOCIAL_MEDIA] use as early as preschool.exploring unexpected [CURRICULUM] such as Facebook/Twitter use as early as preschool."; output; run; data need (keep= row phrase); set have; row = _n_; length phrase $ 100; do i= 1 to countw(text,'[]'); phrase = scan(text,i,'[]'); output; end; run; proc transpose data=need out=trans prefix=var ; by row; var phrase; run;
Thank you @ballardw @noling @andreas_lds for your quick response. The inconsistencies in the output which i have displayed is my typo mistake.But the concept is to split the text based on [] ( square brackets) and all of the 3 solutions are working fine.
The @ballardw solution is intuitive , but i have one query here.Since the code is using Scan function phrase = scan(text,i,'[]'); the output is excluding the square brackets , the phrase (text) should be inclusive of square brackets as shown below.
Row | Phrase | Required Output |
1 | The subject experienced | The subject experienced |
1 | AESER | [AESER] |
1 | adverse event of | adverse event of |
1 | AESEV | [AESEV] |
1 | intensity, reported as ' | intensity, reported as ' |
1 | VBA | [VBA] |
1 | ' ( | ' ( |
1 | LLT | [LLT] |
1 | ). | ). |
2 | Samsung - the first major smartphone maker to release a foldable smartphone | [Samsung - the first major smartphone maker to release a foldable smartphone] |
The solutions provide good start to go ahead.
@keen_sas wrote:
Thank you @ballardw @noling @andreas_lds for your quick response. The inconsistencies in the output which i have displayed is my typo mistake.But the concept is to split the text based on [] ( square brackets) and all of the 3 solutions are working fine.
The @ballardw solution is intuitive , but i have one query here.Since the code is using Scan function phrase = scan(text,i,'[]'); the output is excluding the square brackets , the phrase (text) should be inclusive of square brackets as shown below.
Row Phrase Required Output 1 The subject experienced The subject experienced 1 AESER [AESER] 1 adverse event of adverse event of 1 AESEV [AESEV] 1 intensity, reported as ' intensity, reported as ' 1 VBA [VBA] 1 ' ( ' ( 1 LLT [LLT] 1 ). ). 2 Samsung - the first major smartphone maker to release a foldable smartphone [Samsung - the first major smartphone maker to release a foldable smartphone]
The solutions provide good start to go ahead.
I can sort of see a potential use for the single "word" brackets, but without a real world description of why that last entire phrase is in [] when everything else is a single "word" I am not going to spend any time trying to parse an incomplete problem description, which was not helped by the earlier examples with the ([LLT]) and ' being included with the brackets.
data have ;
length text $1000;
text="The subject experienced [AESER] adverse event of [AESEV] intensity, reported as '[VBA]' ([LLT]).";
output;
text="[Samsung - the first major smartphone maker to release a foldable smartphone]";
output ;
text="School districts across the [COUNTRY] are on the cusp of integrating [new technology in K-12 classrooms] by such as [SOCIAL_MEDIA] use as early as preschool.exploring unexpected [CURRICULUM] such as Facebook/Twitter use as early as preschool.";
output;
run ;
data want;
set have;
length output_text $1000 char $1;
array vars{100} $1000 var1-var100;
output_text="";
j=0; *number of output var;
k=0; *char for output string;
do i = 1 to 1000; *loop across original text;
k + 1;
char=substr(text,i,1);
substr(output_text,k,1)=strip(char);
if char = "[" and output_text ne "[" then do;
j+1;
vars{j}=substr(output_text,1,length(output_text)-1);
output_text=strip("[");
k=1;
end;
if char = "]" then do;
j+1;
k=0;
vars{j}=strip(output_text);
output_text="";
end;
end;
run;
Ballardw's code is probably more intuitive.
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Some of the values seem to be inconsistent:
Here is another suggestion to solve the issue:
data work.intermediate;
set have;
length
id 8
part $ 400
_start _stop _pos _len 8
;
drop _: text;
id = _n_;
rx = prxparse('/(.*?)(\(?\[.+?\]\)?)/');
_start = 1;
_stop = length(text);
put _stop=;
call prxnext(rx, _start, _stop, text, _pos, _len);
do while (_pos > 0);
do i = 1 to 3;
part = prxposn(rx, i, text);
if not missing(part) then output;
end;
call prxnext(rx, _start, _stop, text, _pos, _len);
end;
put _start= _stop= _len= _pos=;
if _start < _stop then do;
part = substr(text, _start);
output;
end;
run;
proc transpose data=work.intermediate out=work.want(drop=id _name_) prefix=var;
by id;
var part;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.