data ex_5;
length parts $1000;
set text;
do i=1 to length(text)-200; parts=substr(text, i, 200);
last_word=scan(parts, countw(parts));
k=lengthn(last_word);
if find(text, last_word) then do;
temp=200;
end;
else do;
temp=200 - length(last_word);
end;
i+(temp-1);
output;
n+1;
end;
run;
data final;
set ex_5;
len=length(parts) - length(last_word);
if find(text, last_word) then parts1=parts;
else parts1=substr(parts, 1, len);
drop parts i len temp;
run;
proc transpose data=final out=final_final;
id last_word;
var parts1;
run;
I tried this code, but it has some issues, such as the find
function not always working.
I tried this code, but it has some issues, such as the find
function not always working.
data ex_5; length parts $200; set text; do i=1 to length(text)-200; parts=substr(text, i, 200); last_word=scan(parts, countw(parts)); k=lengthn(last_word); if find(text, last_word) then do; temp=200; end; else do; temp=200 - length(last_word); end; i+(temp-1); output; n+1; end; run; data final; set ex_5; len=length(parts) - length(last_word); if find(text, last_word) then parts1=parts; else parts1=substr(parts, 1, len); drop parts i len temp; run; proc transpose data=final out=final_final; id last_word; var parts1; run;
Please do not double-post.
Here a quick example for splitting along word boundaries:
data have;
text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;
%let chunk = 50; /* length of individual pieces */
data want;
set have;
length part $&chunk.;
do while (lengthn(text) > 0);
do until (
length(part) + length(scan(text,1," ")) + 1 > &chunk.
or lengthn(text) = 0
);
part = catx(" ",part,scan(text,1," "));
text = substr(text,indexc(text," ") + 1);
end;
output;
part = "";
end;
drop text;
run;
I would make a copy of the original string, then starting at character 201, search backwards for a blank. Copy the identified length from the copy to the subtext variable (length $200), shift the copy leftwards by the identified number of characters, and repeat, ... until the copy is blank:
data want (drop=_: i);
set have;
length subtext $200;
_copy=original;
do part=1 by 1 while (_copy^=' ');
do i=201 by -1 while (char(_copy,i)^=' ');
end;
subtext=substr(_copy,1,i-1);
_copy=substr(_copy,i+1);
output;
end;
run;
Note this program assumes there is always blank somewhere from character 2 through character 201, through each modification of variable _COPY.
Using a regular expression:
data have;
text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;
%let chunk = 50; /* length of individual pieces */
data want;
set have;
length part $ &chunk.;
retain rx;
drop rx;
if _n_ = 1 then do;
/* Don't add a space after the comma in the following statement! */
rx = prxparse("/(.{0,&chunk.}\b)/");
end;
begin = 1;
end = -1;
pos = 0;
len = 0;
call prxnext(rx, begin, end, trim(text), pos, len);
do while (pos > 0 and len> 0);
part = substr(text, pos, len);
output;
call prxnext(rx, begin, end, trim(text), pos, len);
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.