BookmarkSubscribeRSS Feed
aaaaaaaaaaaaa10
Calcite | Level 5

data ex_5;

length parts $1000;

set text;

do i=1 to length(text)-200; parts=substr(text, i, 200);

last_word=scan(parts, countw(parts));

k=lengthn(last_word);

if find(text, last_word) then do;

temp=200;

end;

else do;

temp=200 - length(last_word);

end;

i+(temp-1);

output;

n+1;

end;

run;

data final;

set ex_5;

len=length(parts) - length(last_word);

if find(text, last_word) then parts1=parts;

else parts1=substr(parts, 1, len);

drop parts i len temp;

run;

proc transpose data=final out=final_final;

id last_word;

var parts1;

run;

I tried this code, but it has some issues, such as the find function not always working.

4 REPLIES 4
aaaaaaaaaaaaa10
Calcite | Level 5

I tried this code, but it has some issues, such as the find function not always working.

data ex_5;
	length parts $200;
	set text;

	do i=1 to length(text)-200;
		parts=substr(text, i, 200);
		last_word=scan(parts, countw(parts));
		k=lengthn(last_word);

		if find(text, last_word) then
			do;
				temp=200;
			end;
		else
			do;
				temp=200 - length(last_word);
			end;
		i+(temp-1);
		output;
		n+1;
	end;
run;

data final;
	set ex_5;
	len=length(parts) - length(last_word);

	if find(text, last_word) then
		parts1=parts;
	else
		parts1=substr(parts, 1, len);
	drop parts i len temp;
run;

proc transpose data=final out=final_final;
	id last_word;
	var parts1;
run;
Kurt_Bremser
Super User

Please do not double-post.

 

Here a quick example for splitting along word boundaries:

data have;
text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;

%let chunk = 50; /* length of individual pieces */

data want;
set have;
length part $&chunk.;
do while (lengthn(text) > 0);
  do until (
    length(part) + length(scan(text,1," ")) + 1 > &chunk.
    or lengthn(text) = 0
  );
    part = catx(" ",part,scan(text,1," "));
    text = substr(text,indexc(text," ") + 1);
  end;
  output;
  part = "";
end;
drop text;
run;
mkeintz
PROC Star

I would make a copy of the original string, then starting at character 201, search backwards for a blank.  Copy the identified length from the copy to the subtext variable (length $200), shift the copy leftwards by the identified number of characters, and repeat, ... until the copy is blank:

 


data want (drop=_: i);
  set have;
  length subtext $200;

  _copy=original;
  do part=1 by 1 while (_copy^=' ');
    do i=201 by -1 while (char(_copy,i)^=' ');
    end;
    subtext=substr(_copy,1,i-1);
    _copy=substr(_copy,i+1);
    output;
  end;
run;

Note this program assumes there is always blank somewhere from character 2 through character 201, through each modification of variable _COPY.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Using a regular expression:

 

data have;
    text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;

%let chunk = 50; /* length of individual pieces */

data want;
    set have;
    
    length part $ &chunk.;
    retain rx;
    drop rx;
    
    if _n_ = 1 then do;
        /* Don't add a space after the comma in the following statement! */
        rx = prxparse("/(.{0,&chunk.}\b)/");
    end;
    
    begin = 1;
    end = -1;
    pos = 0;
    len = 0;
    
    call prxnext(rx, begin, end, trim(text), pos, len);
    
    do while (pos > 0 and len> 0);
        part = substr(text, pos, len);
        output;
        call prxnext(rx, begin, end, trim(text), pos, len);
    end;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 422 views
  • 0 likes
  • 4 in conversation