BookmarkSubscribeRSS Feed
aaaaaaaaaaaaa10
Calcite | Level 5

data ex_5;

length parts $1000;

set text;

do i=1 to length(text)-200; parts=substr(text, i, 200);

last_word=scan(parts, countw(parts));

k=lengthn(last_word);

if find(text, last_word) then do;

temp=200;

end;

else do;

temp=200 - length(last_word);

end;

i+(temp-1);

output;

n+1;

end;

run;

data final;

set ex_5;

len=length(parts) - length(last_word);

if find(text, last_word) then parts1=parts;

else parts1=substr(parts, 1, len);

drop parts i len temp;

run;

proc transpose data=final out=final_final;

id last_word;

var parts1;

run;

I tried this code, but it has some issues, such as the find function not always working.

4 REPLIES 4
aaaaaaaaaaaaa10
Calcite | Level 5

I tried this code, but it has some issues, such as the find function not always working.

data ex_5;
	length parts $200;
	set text;

	do i=1 to length(text)-200;
		parts=substr(text, i, 200);
		last_word=scan(parts, countw(parts));
		k=lengthn(last_word);

		if find(text, last_word) then
			do;
				temp=200;
			end;
		else
			do;
				temp=200 - length(last_word);
			end;
		i+(temp-1);
		output;
		n+1;
	end;
run;

data final;
	set ex_5;
	len=length(parts) - length(last_word);

	if find(text, last_word) then
		parts1=parts;
	else
		parts1=substr(parts, 1, len);
	drop parts i len temp;
run;

proc transpose data=final out=final_final;
	id last_word;
	var parts1;
run;
Kurt_Bremser
Super User

Please do not double-post.

 

Here a quick example for splitting along word boundaries:

data have;
text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;

%let chunk = 50; /* length of individual pieces */

data want;
set have;
length part $&chunk.;
do while (lengthn(text) > 0);
  do until (
    length(part) + length(scan(text,1," ")) + 1 > &chunk.
    or lengthn(text) = 0
  );
    part = catx(" ",part,scan(text,1," "));
    text = substr(text,indexc(text," ") + 1);
  end;
  output;
  part = "";
end;
drop text;
run;
mkeintz
PROC Star

I would make a copy of the original string, then starting at character 201, search backwards for a blank.  Copy the identified length from the copy to the subtext variable (length $200), shift the copy leftwards by the identified number of characters, and repeat, ... until the copy is blank:

 


data want (drop=_: i);
  set have;
  length subtext $200;

  _copy=original;
  do part=1 by 1 while (_copy^=' ');
    do i=201 by -1 while (char(_copy,i)^=' ');
    end;
    subtext=substr(_copy,1,i-1);
    _copy=substr(_copy,i+1);
    output;
  end;
run;

Note this program assumes there is always blank somewhere from character 2 through character 201, through each modification of variable _COPY.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Using a regular expression:

 

data have;
    text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;

%let chunk = 50; /* length of individual pieces */

data want;
    set have;
    
    length part $ &chunk.;
    retain rx;
    drop rx;
    
    if _n_ = 1 then do;
        /* Don't add a space after the comma in the following statement! */
        rx = prxparse("/(.{0,&chunk.}\b)/");
    end;
    
    begin = 1;
    end = -1;
    pos = 0;
    len = 0;
    
    call prxnext(rx, begin, end, trim(text), pos, len);
    
    do while (pos > 0 and len> 0);
        part = substr(text, pos, len);
        output;
        call prxnext(rx, begin, end, trim(text), pos, len);
    end;
run;

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1029 views
  • 0 likes
  • 4 in conversation