BookmarkSubscribeRSS Feed
aaaaaaaaaaaaa10
Calcite | Level 5

data ex_5;

length parts $1000;

set text;

do i=1 to length(text)-200; parts=substr(text, i, 200);

last_word=scan(parts, countw(parts));

k=lengthn(last_word);

if find(text, last_word) then do;

temp=200;

end;

else do;

temp=200 - length(last_word);

end;

i+(temp-1);

output;

n+1;

end;

run;

data final;

set ex_5;

len=length(parts) - length(last_word);

if find(text, last_word) then parts1=parts;

else parts1=substr(parts, 1, len);

drop parts i len temp;

run;

proc transpose data=final out=final_final;

id last_word;

var parts1;

run;

I tried this code, but it has some issues, such as the find function not always working.

4 REPLIES 4
aaaaaaaaaaaaa10
Calcite | Level 5

I tried this code, but it has some issues, such as the find function not always working.

data ex_5;
	length parts $200;
	set text;

	do i=1 to length(text)-200;
		parts=substr(text, i, 200);
		last_word=scan(parts, countw(parts));
		k=lengthn(last_word);

		if find(text, last_word) then
			do;
				temp=200;
			end;
		else
			do;
				temp=200 - length(last_word);
			end;
		i+(temp-1);
		output;
		n+1;
	end;
run;

data final;
	set ex_5;
	len=length(parts) - length(last_word);

	if find(text, last_word) then
		parts1=parts;
	else
		parts1=substr(parts, 1, len);
	drop parts i len temp;
run;

proc transpose data=final out=final_final;
	id last_word;
	var parts1;
run;
Kurt_Bremser
Super User

Please do not double-post.

 

Here a quick example for splitting along word boundaries:

data have;
text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;

%let chunk = 50; /* length of individual pieces */

data want;
set have;
length part $&chunk.;
do while (lengthn(text) > 0);
  do until (
    length(part) + length(scan(text,1," ")) + 1 > &chunk.
    or lengthn(text) = 0
  );
    part = catx(" ",part,scan(text,1," "));
    text = substr(text,indexc(text," ") + 1);
  end;
  output;
  part = "";
end;
drop text;
run;
mkeintz
PROC Star

I would make a copy of the original string, then starting at character 201, search backwards for a blank.  Copy the identified length from the copy to the subtext variable (length $200), shift the copy leftwards by the identified number of characters, and repeat, ... until the copy is blank:

 


data want (drop=_: i);
  set have;
  length subtext $200;

  _copy=original;
  do part=1 by 1 while (_copy^=' ');
    do i=201 by -1 while (char(_copy,i)^=' ');
    end;
    subtext=substr(_copy,1,i-1);
    _copy=substr(_copy,i+1);
    output;
  end;
run;

Note this program assumes there is always blank somewhere from character 2 through character 201, through each modification of variable _COPY.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
andreas_lds
Jade | Level 19

Using a regular expression:

 

data have;
    text = "Very long text with a lot of words, which should be split along word boundaries into chunks of a given size";
run;

%let chunk = 50; /* length of individual pieces */

data want;
    set have;
    
    length part $ &chunk.;
    retain rx;
    drop rx;
    
    if _n_ = 1 then do;
        /* Don't add a space after the comma in the following statement! */
        rx = prxparse("/(.{0,&chunk.}\b)/");
    end;
    
    begin = 1;
    end = -1;
    pos = 0;
    len = 0;
    
    call prxnext(rx, begin, end, trim(text), pos, len);
    
    do while (pos > 0 and len> 0);
        part = substr(text, pos, len);
        output;
        call prxnext(rx, begin, end, trim(text), pos, len);
    end;
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1396 views
  • 0 likes
  • 4 in conversation