data ex_5;
length parts $1000;
set text;
do i=1 to length(text)-200; parts=substr(text, i, 200);
last_word=scan(parts, countw(parts));
k=lengthn(last_word);
if find(text, last_word) then do;
temp=200;
end;
else do;
temp=200 - length(last_word);
end;
i+(temp-1);
output;
n+1;
end;
run;
data final;
set ex_5;
len=length(parts) - length(last_word);
if find(text, last_word) then parts1=parts;
else parts1=substr(parts, 1, len);
drop parts i len temp;
run;
proc transpose data=final out=final_final;
id last_word;
var parts1;
run;
I tried this code, but it has some issues, such as the find
function not always working.
I tried this code, but it has some issues, such as the find
function not always working.
data ex_5; length parts $200; set text; do i=1 to length(text)-200; parts=substr(text, i, 200); last_word=scan(parts, countw(parts)); k=lengthn(last_word); if find(text, last_word) then do; temp=200; end; else do; temp=200 - length(last_word); end; i+(temp-1); output; n+1; end; run; data final; set ex_5; len=length(parts) - length(last_word); if find(text, last_word) then parts1=parts; else parts1=substr(parts, 1, len); drop parts i len temp; run; proc transpose data=final out=final_final; id last_word; var parts1; run;
Please do not double-post.
Here a quick example for splitting along word boundaries:
data have;
text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;
%let chunk = 50; /* length of individual pieces */
data want;
set have;
length part $&chunk.;
do while (lengthn(text) > 0);
do until (
length(part) + length(scan(text,1," ")) + 1 > &chunk.
or lengthn(text) = 0
);
part = catx(" ",part,scan(text,1," "));
text = substr(text,indexc(text," ") + 1);
end;
output;
part = "";
end;
drop text;
run;
I would make a copy of the original string, then starting at character 201, search backwards for a blank. Copy the identified length from the copy to the subtext variable (length $200), shift the copy leftwards by the identified number of characters, and repeat, ... until the copy is blank:
data want (drop=_: i);
set have;
length subtext $200;
_copy=original;
do part=1 by 1 while (_copy^=' ');
do i=201 by -1 while (char(_copy,i)^=' ');
end;
subtext=substr(_copy,1,i-1);
_copy=substr(_copy,i+1);
output;
end;
run;
Note this program assumes there is always blank somewhere from character 2 through character 201, through each modification of variable _COPY.
Using a regular expression:
data have;
text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;
%let chunk = 50; /* length of individual pieces */
data want;
set have;
length part $ &chunk.;
retain rx;
drop rx;
if _n_ = 1 then do;
/* Don't add a space after the comma in the following statement! */
rx = prxparse("/(.{0,&chunk.}\b)/");
end;
begin = 1;
end = -1;
pos = 0;
len = 0;
call prxnext(rx, begin, end, trim(text), pos, len);
do while (pos > 0 and len> 0);
part = substr(text, pos, len);
output;
call prxnext(rx, begin, end, trim(text), pos, len);
end;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.