Hi @Rakesh93
Here is a slightly different way of tackling the problem, where it is not necessary to know the number of new variables. The idea is:
Length is set to 20 for test purposes and should be changed in 2 places.
data have;
length ID 8.;
infile cards truncover;
input str $char80.;
ID = _N_;
cards;
New Year
Copenhagen is the Capital of Denmark
Jensen is a´defunct British manufacturer of sports cars
Another line
;
run;
data tmp (drop=str i word); set have;
length nystr $20;
str = compbl(str);
do i = 1 to countw(str,' ');
word = scan(str,i,' ');
if length(nystr) + length(word) + 1 > 20 then do;
output;
nystr = word;
end;
else nystr = catx(' ',nystr,scan(str,i,' '));
end;
if nystr ne '' then output;
run;
proc transpose data=tmp out=want (drop=_name_);
by ID;
var _character_;
run;
Output:
Break your problem into steps.
First you need to determine the length of the longest observation in TEST variable.
Once you have that, you can then scan across each observation (by 200) checking if the character sitting in the 200n+1 position is a space (if not, find the closest space in an earlier position). I’d recommend looking into array processing for this step.
Hi @Rakesh93,
Here is an illustration for you. You can alter the lengths to your need.
data aa;
input list & $36.;
datalines;
Pack my box with twelve dozen balls
This is another sentence to split
;
run;
data _null_;
length str1 $20 str2 $20 s $1;
set aa;
do i = 20 by - 1 until(s = ' ');
s = char(list,i);
end;
str1 = trim(substr(list, 1, i-1));
put '---->' str1=;
str2 = strip(substr(list, i+1));
put '---->' str2 =;
run;
The TRIM( ) and STRIP( ) are used to take care of more than one SPACE character between words.
Hi @Rakesh93
Here is a slightly different way of tackling the problem, where it is not necessary to know the number of new variables. The idea is:
Length is set to 20 for test purposes and should be changed in 2 places.
data have;
length ID 8.;
infile cards truncover;
input str $char80.;
ID = _N_;
cards;
New Year
Copenhagen is the Capital of Denmark
Jensen is a´defunct British manufacturer of sports cars
Another line
;
run;
data tmp (drop=str i word); set have;
length nystr $20;
str = compbl(str);
do i = 1 to countw(str,' ');
word = scan(str,i,' ');
if length(nystr) + length(word) + 1 > 20 then do;
output;
nystr = word;
end;
else nystr = catx(' ',nystr,scan(str,i,' '));
end;
if nystr ne '' then output;
run;
proc transpose data=tmp out=want (drop=_name_);
by ID;
var _character_;
run;
Output:
If you don't mind guessing the number of needed columns beforehand, the algorithm is simple:
data have ;
input txt $80. ;
cards ;
The time has come, the Walrus said, to talk of many things:
Of shoes and ships and sealing-wax, of cabbages and kings
And why the sea is boiling hot and whether pigs have wings
;
run ;
%let w = 20 ; * change to 200 in your case ;
data want (drop = _:) ;
set have ;
array c $ &w col1-col5 ;
_i_ = 1 ;
do _x = 1 to countw (txt, "") ;
c = catx (" ", c, scan (txt, _x, "")) ;
if length (c) <= &w and length (c) + length (scan (txt, _x + 1)) > &w or _x = countw (txt, "") then _i_ + 1 ;
end ;
run ;
If you want it to be truly dynamic, run the same exact algorithm twice: (1) to determine the actual dimension of array C and (2) to produce the final output, e.g.:
data _null_ ; set have end = z ; length c $ &w ; do _x = 1 to countw (txt, "") ; c = catx (" ", c, scan (txt, _x, "")) ; if length (c) <= &w and length (c) + length (scan (txt, _x + 1)) > &w or _x = countw (txt, "") then do ; _i_ = sum (_i_, 1) ; call missing (c) ; end ; end ; last._i_ = max (last._i_, _i_) ; if z then call symputx ("d", last._i_) ; run ; data want (drop = _:) ; set have ; array c $ &w col1-col&d ; _i_ = 1 ; do _x = 1 to countw (txt, "") ; c = catx (" ", c, scan (txt, _x, "")) ; if length (c) <= &w and length (c) + length (scan (txt, _x + 1)) > &w or _x = countw (txt, "") then _i_ + 1 ; end ; run ;
You can massage it if you wish to encapsulate the repetitive code.
Kind regards
Paul D.
@hashman has given a solution. I improve my previous program to do the job dynamically. You have now a choice to choose.
As before text line is split into 20 characters with word-boundary. The split sub-list is saved as a big list each separated by a pipe character (|). The maximum number of such splits are passed as a macro variable for the next data step.
In that step, sub-lists are scanned as variables for output.
data have;
input txt $80.;
datalines;
I love hash tables. You will, too, after you see what they can do.
This paper will start with the classic 2 table join.
Nothing new, you say? But this version does it in one step without
sorting. And, it's the fastest method around.
Next, you will see the use of hash tables to join multiple
datasets on different keys in one pass, again with no
sorting. How about unlimited recursive chained lookups
(like when you need to update an ID through as many
incarnations as it may have had)? Hash tables solve that one,
too. And before you catch your breath, you'll see
how to create n datasets from one (or more), where n is
determined (at runtime) by the number of values
encountered in a variable. No muss, no fuss. One step
(well, OK, this one does require a prior sort) and you're
done. We’ll wind up with an outer join, just to show that hash
tables can do that, too. Take a look at what they can
deliver, and you'll be counting the ways you love hash tables
before the year is out.
;
run;
The program:
%let w = 20;
data temp;
set have end = eof;
length w $ &w;
length str $80 strout $80;
str = compbl(txt); /* remove more than one SPACE */
split = 0;
retain maxsplit;
do until(str=' ');
/* Check the Word boundary ends at the W-th position.
Confirm it based on the next position */
if char(str,&w) ne ' ' and char(str, &w+1)=' ' then i=&w+1;
else
do i = &w by - 1 until(s = ' ');
s = char(str,i);
end;
w = substr(str,1,i-1);
call catx('|', strout, w);
split + 1;
str = substr(str, i+1);
end;
maxsplit = max(maxsplit, split);
if eof then call symputx('maxsplit', maxsplit);
keep strout;
run;
data want;
set temp;
array s[&maxsplit] $ &w;
do i = 1 to &maxsplit;
s[i] = scan(strout,i,'|');
end;
drop i strout;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.