Hi
I am trying to split a string,
comment='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum' .
using a simple code line,
if length(comment) >0 then a=strip(substr(comment,1,200));
if length(comment) >200 then b=strip(substr(comment,201,200));
so on, on a basic code, can we upgrade this so it doesn't allow word chopping, for example, the word 'tempor' gets pushed to a new var or be kept in the same var if the length of 200 permits otherwise the entire word gets pushed to new var from first letter without chopping the word as 'tem' 'por' to two different variables.
Please advise.
Thanks
Try like that:
data have;
comment='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
run;
proc print;
run;
%macro cutMeIntoParts(
have
,variable
,want=want
,L=100
,keep=part
);
data &want.;
set &have.;
length part $ &L.;
do i=1 by 1;
word = scan(&variable., i, " ");
if word = " " then
do;
output;
leave;
end;
/*
put word= / part =;
*/
if length(catx(" ",part, word))>&L. then
do;
output;
part = word;
end;
else
part = catx(" ",part, word);
end;
keep &keep.;
run;
%mend cutMeIntoParts;
%cutMeIntoParts(have,comment,L=50)
proc print data=want(keep=part);
run;
Bart
One more (just for fun) approach with "funcy filename" and "fast _infile_ read" inspired by:
Data:
data have;
length comment $ 32767;
comment='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
output;
comment="Split a variable to multiple variables.
Hi, I am trying to split a string, comment='....'. using a simple code line.
So on, on a basic code, can we upgrade this so it doesn't allow word chopping, for example, the word 'tempor' gets pushed to a new var or be kept in the same var if the length of 200 permits otherwise the entire word gets pushed to new var from first letter without chopping the word as 'tem' 'por' to two different variables. Please advise. Thanks";
output;
run;
proc print;
run;
Code:
%macro cutMeIntoParts2(
have
,variable
,want=want
,L = 100
,WN = 200 /* max. expected number of words */
,keep=part
);
filename f "!SASROOT/*.cfg";
data &want.;
set &have.;
infile f truncover dlm = " ";
input @1 @;
_infile_ = &variable.;
input @1 (m1-m&WN.) (: $128.) @@;
mmX=" ";
length part $ &L.;
array mm $ m1--m&WN. mmX;
do over mm;
L = lengthn(mm);
if P + L > &L.-1 then
do;
output;
part = " ";
end;
if L = 0 then
do;
output;
leave;
end;
part = catx(" ", part, mm);
P = lengthn(part);
end;
keep &keep.;
run;
filename f clear;
%mend cutMeIntoParts2;
%cutMeIntoParts2(have,comment,L=50)
proc print data=want(keep=part);
run;
Bart
Based on the sample code found in the docu for the CALL PRXNEXT Routine below should work.
%let l_source=1000;
%let l_part =200;
%let n_parts =%eval(&l_source/&l_part +1);
data demo(drop=_:);
length text $&l_source;
array text_parts_ {&n_parts} $&l_part;
_prxid = prxparse("/\b(.{1,%eval(&l_part-1)}\b)/");
text='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum';
_start = 1;
_stop = length(text);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the _start parameter so that searching */
/* begins again after the last match. */
call prxnext(_prxid, _start, _stop, text, _pos, _len);
do while (_pos > 0);
_i=sum(_i,1);
text_parts_[_i] = substr(text, _pos, _len);
call prxnext(_prxid, _start, _stop, text, _pos, _len);
end;
run;
proc print data=demo;
var text_parts_:;
run;
Just for having some fun.
data have;
id+1;
length comment $ 20000;
comment='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum' ;
run;
data temp;
set have;
length token temp want $ 400 want2 $ 200;
if length(comment)>200 then do;
do i=1 to countw(comment,' ');
token=scan(comment,i,' ');
temp=want;
want=catx(' ',want,token);
if length(want)>200 then do;
want=temp;want2=want;output;want=token;
end;
end;
if not missing(want) then do;want2=want; output;end;
end;
else do;want2=comment; output;end;
keep id want2;
run;
proc transpose data=temp out=want;
by id;
var want2;
run;
Two of the responses you received build each split word-by-word, from left to right. The other, using regular expressions, I suspect, counts columns under the hood from the left up to the split length, making sure to not truncate the rightmost word.
The code below takes a different tack. It starts each split by looking at the rightmost blank candidate column and stepping backward until a blank is located. Then the entire resulting split is copied, and the next leftmost column and rightmost blank candidate are established, and the same approach is used for the next split. I use the term "rightmost blank candidate" because for a split of, say 100 bytes length, the rightmost valid blank candidate in the source would be byte number 101, just after the maximum split length. It's only a candidate, however, until a word separator is encountered.
The thought here is that there is likely a lot fewer comparisons taking place searching for a blank in this way, than there is in assembling each split from the left. Especially for wide splits.
data have;
comment='Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';
run;
%let L_source=1000;
%let L_split=100;
%let nsplits=%eval( (&L_source + &L_split-1)/&L_split );
data want (drop=_: i);
set have ;
array split {&nsplits} $&L_split;
_left_col=1;
do i=1 to &nsplits until (substr(comment,_right_col)=' ' or _left_col>length(comment));
do _right_col=min(length(comment),_left_col+&L_split+1) by -1 until (char(comment,_right_col)=' ' or _right_col=length(comment));
end;
split{i}=substr(comment,_left_col,_right_col+1-_left_col);
_left_col=min(_right_col+1,length(comment)+1);
end;
run;
Notes on the code:
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.