Dear Reader,
I am not getting a correct output.
My requirement is col1 = hema; and col2 = ashu pallu and so on.
Space and latest semicolon should be consider while splitting the data.
**********************************************************
data abc;
old = "hema;ashu pallu;hema;ashu pallu;hema;
ashu pallu;hema;";output;
old = "usa;swati;sajju;priyanka;priyanka";output;
old = "hema;m?;ashu pallu;hema;ashu pallu;hema;
ashu palluhema;";output;
old = "My prog";output;
run;
%let w = 10 ;
data want (drop = _:) ;
set abc ;
array c $ &w col col1-col9 ;
/*if old^="" and length(strip(old)) le 10 then col=old; */
_i_ = 1 ;
do _x = 1 to countw (old, "; ") ;
c = catx ("; ", c, scan (old, _x, "; ")) ;
if length (c) <= &w and length (c) + length (scan (old, _x + 1)) > &w or _x = countw (old, "; ") then _i_ + 1 ;
end ;
run ;
************************************************************
Thank you in advance for your help. Take care.
Regards
Priyanka
Perhaps I am not understanding the issue but don't you just want to use SCAN()? If you want to know how many items there are use COUNTW().
First let's make some test data:
data have;
input old $80.;
cards4;
hema;ashu pallu;hema;ashu pallu;hema;ashu pallu;hema
usa;swati;sajju;priyanka;priyanka
hema;m?;ashu pallu;hema;ashu pallu;hema;ashu palluhema
My prog
;;;;
Now let's run a program to create a lot of new variables to hold the individual words (tokens). The data step needs to know how many we are going to create. And how long to make each one. We can make it more flexible by setting those numbers in macro variables.
%let max=9;
%let width=10;
Now just read in the existing data and use an ARRAY and a DO loop.
data want ;
set have ;
array new [&max] $&width;
do i=1 to countw(old,';');
new[i] = scan(old,i,';');
end;
drop i;
run;
Obs new1 new2 new3 new4 new5 new6 new7 new8 new9 1 hema ashu pallu hema ashu pallu hema ashu pallu hema 2 usa swati sajju priyanka priyanka 3 hema m? ashu pallu hema ashu pallu hema ashu pallu 4 My prog
If you want to treat space as a delimiter also then include it in the list of delimiters. You will need more than 9 variables.
%let width=10;
%let max=12;
data want ;
set have ;
array new [&max] $&width;
do i=1 to countw(old,'; ');
new[i] = scan(old,i,'; ');
end;
drop i;
run;
Obs new1 new2 new3 new4 new5 new6 new7 new8 new9 new10 new11 new12 1 hema ashu pallu hema ashu pallu hema ashu pallu hema 2 usa swati sajju priyanka priyanka 3 hema m? ashu pallu hema ashu pallu hema ashu palluhema 4 My prog
Priyanka, I think you may have better results with COUNTC just looking for the semi-colon than looking for the combination of a semi-colon and a blank. I think the end of line will be a problem otherwise.
Here's how I would code the solution. I will place the results with the Old string broken into string tokens below the code.
Regards,
Jim
data abc;
DROP _:;
RETAIN _Max_Tokens 0;
old = "hema;ashu pallu;hema;ashu pallu;hema;ashu pallu;hema;";
_Token_Cnt = COUNTC(Old, ';');
IF _Token_Cnt > _Max_Tokens THEN
_Max_Tokens = _Token_Cnt;
output;
old = "usa;swati;sajju;priyanka;priyanka";
_Token_Cnt = COUNTC(Old, ';');
IF _Token_Cnt > _Max_Tokens THEN
_Max_Tokens = _Token_Cnt;
output;
old = "hema;m?;ashu pallu;hema;ashu pallu;hema;ashu palluhema;";
_Token_Cnt = COUNTC(Old, ';');
IF _Token_Cnt > _Max_Tokens THEN
_Max_Tokens = _Token_Cnt;
output;
old = "My prog";
_Token_Cnt = COUNTC(Old, ';');
IF _Token_Cnt > _Max_Tokens THEN
_Max_Tokens = _Token_Cnt;
output;
CALL SYMPUTX('Max_Tokens', _Max_Tokens, 'G');
RUN;
%PUT NOTE: &=Max_Tokens;
DATA Parsed_Data;
DROP _:;
SET abc;
ARRAY Tokens [*] $32 Token1 - Token&Max_Tokens;
_i = 0;
DO _i = 1 TO &Max_Tokens;
Tokens[_i] = SCAN(Old, _i, ';');
END;
RUN;
Results:
Here are the results that I have from a program that I have written:
I believe these results are correct based on a $10 restriction. I inserted an extra record (record 4) as follows:
Old = "hema;m?;ashu PROG;hema;ashu pallu;hema;ashu palluhema;";
I inserted this record because I wanted a 5 byte token ("ashu ") and another 5 byte token ("PROG;") so that I could test that the space would be preserved in the output. It was preserved.
Code is below.
Jim
DATA abc;
DROP _:;
LENGTH Old $256.;
RETAIN _Max_Tokens 0;
Old = "hema;ashu pallu;hema;ashu pallu;hema;ashu pallu;hema;";
LINK Count_Tokens;
Old = "usa;swati;sajju;priyanka;priyanka";
LINK Count_Tokens;
Old = "hema;m?;ashu pallu;hema;ashu pallu;hema;ashu palluhema;";
LINK Count_Tokens;
Old = "hema;m?;ashu PROG;hema;ashu pallu;hema;ashu palluhema;";
LINK Count_Tokens;
Old = "My prog";
LINK Count_Tokens;
CALL SYMPUTX('Max_Tokens', _Max_Tokens, 'G');
******;
RETURN;
******;
************;
Count_Tokens:
************;
IF SUBSTR(Old, LENGTHN(Old), 1) ^= ';' THEN
Old = CATS(Old, ';');
_Token_Cnt = COUNTC(TRANSTRN(STRIP(Old), ' ', ';'), ';');
IF _Token_Cnt > _Max_Tokens THEN
_Max_Tokens = _Token_Cnt;
OLD = TRANSTRN(STRIP(Old), ' ', '~');
OUTPUT;
******;
RETURN;
******;
RUN;
%PUT NOTE: &=Max_Tokens;
DATA Parsed_Data;
DROP _:;
SET abc;
ARRAY Tokens [*] $10 New1 - New&Max_Tokens;
_Token_Length = 10;
_Token_Nbr = 1;
_Total_Length = LENGTHN(Old);
_Current_Position = 1;
_Current_Length = 0;
_End_of_Record = 0;
_Prev_Delimiter = 0;
_Next_Delimiter = 0;
LINK Get_Next_Delimiter;
DO WHILE (_Current_Position < _Total_Length);
Tokens[_Token_Nbr] = SUBSTR(Old, _Current_Position, (_Next_Delimiter - _Prev_Delimiter));
_Current_Length = LENGTH(Tokens[_Token_Nbr]);
_Current_Position = _Current_Position + (_Next_Delimiter - _Prev_Delimiter);
_Prev_Delimiter = _Next_Delimiter;
LINK Get_Next_Delimiter;
IF NOT _End_Of_Record AND
_Next_Delimiter - _Prev_Delimiter <= _Token_Length - _Current_Length THEN
DO;
Tokens[_Token_Nbr] = CAT(STRIP(Tokens[_Token_Nbr]), SUBSTR(Old, _Current_Position, (_Next_Delimiter - _Prev_Delimiter)));
_Current_Position = (_Current_Position + _Next_Delimiter - _Prev_Delimiter);
_Prev_Delimiter = _Next_Delimiter;
LINK Get_Next_Delimiter;
END;
_Token_Nbr + 1;
END;
DO _j = 1 TO DIM(Tokens);
Tokens[_j] = TRANSTRN(Tokens[_j], '~', ' ');
END;
******;
RETURN;
******;
******************;
Get_Next_Delimiter:
******************;
IF _Current_Position <= _Total_Length THEN
_Next_Delimiter = 0;
ELSE
_End_of_Record = 1;
DO _i = _Current_Position TO _Total_Length;
_Current_Byte = SUBSTR(Old, _i, 1);
IF SUBSTR(Old, _i, 1) = '~' THEN
DO;
_Next_Delimiter = _i;
_i = _Total_Length;
END;
ELSE
IF SUBSTR(Old, _i, 1) = ';' THEN
DO;
_Next_Delimiter = _i;
_i = _Total_Length;
END;
END;
******;
RETURN;
******;
RUN;
Perhaps I am not understanding the issue but don't you just want to use SCAN()? If you want to know how many items there are use COUNTW().
First let's make some test data:
data have;
input old $80.;
cards4;
hema;ashu pallu;hema;ashu pallu;hema;ashu pallu;hema
usa;swati;sajju;priyanka;priyanka
hema;m?;ashu pallu;hema;ashu pallu;hema;ashu palluhema
My prog
;;;;
Now let's run a program to create a lot of new variables to hold the individual words (tokens). The data step needs to know how many we are going to create. And how long to make each one. We can make it more flexible by setting those numbers in macro variables.
%let max=9;
%let width=10;
Now just read in the existing data and use an ARRAY and a DO loop.
data want ;
set have ;
array new [&max] $&width;
do i=1 to countw(old,';');
new[i] = scan(old,i,';');
end;
drop i;
run;
Obs new1 new2 new3 new4 new5 new6 new7 new8 new9 1 hema ashu pallu hema ashu pallu hema ashu pallu hema 2 usa swati sajju priyanka priyanka 3 hema m? ashu pallu hema ashu pallu hema ashu pallu 4 My prog
If you want to treat space as a delimiter also then include it in the list of delimiters. You will need more than 9 variables.
%let width=10;
%let max=12;
data want ;
set have ;
array new [&max] $&width;
do i=1 to countw(old,'; ');
new[i] = scan(old,i,'; ');
end;
drop i;
run;
Obs new1 new2 new3 new4 new5 new6 new7 new8 new9 new10 new11 new12 1 hema ashu pallu hema ashu pallu hema ashu pallu hema 2 usa swati sajju priyanka priyanka 3 hema m? ashu pallu hema ashu pallu hema ashu palluhema 4 My prog
@pdhokriya wrote:
Thank you for the solution, But my requirement is 10 length with or without space and colon. e.g. at 8th and 9th position data is present after semicolon or space then it should go in next colon (in case length = 10)
obs1 should be new1 = hema; and new2=ashu pallu
obs2 should be new1= usa;swati; and new2=sujju;
obs3 should be new1=hema;m?; and new2=ashu pallu
If you want to break the string in 10 character pieces just use SUBSTR().
new[i] = substr(old,1+(i-1)*10);
If don't want that then explain the rules.
If you use FINDC() you can search backwards from the end of the string to find last ';'.
Looks like you also want to remove semi-colons that follow the 10 character substring.
data tall ;
set have ;
length new $%eval(&width+1) ;
copy=old;
do col=1 by 1 until(copy=' ');
new=copy;
loc = findc(new,';',-10);
if loc=0 then loc=&width;
copy = substrn(copy,loc+1+(substrn(copy,loc+1,1)=';'));
new = substrn(new,1,loc);
if col=1 or new ne ' ' then output;
end;
run;
proc transpose data=tall out=want (drop=_name_);
by row old;
var new ;
run;
Results:
Obs row old COL1 1 1 hema;ashu pallu;hema;ashu pallu;hema;ashu pallu;hema; hema; 2 2 usa;swati;sajju;priyanka;priyanka usa;swati; 3 3 hema;m?;ashu pallu;hema;ashu pallu;hema;ashu palluhema; hema;m?; 4 4 My prog My prog Obs COL2 COL3 COL4 COL5 COL6 COL7 1 ashu pallu hema; ashu pallu hema; ashu pallu hema; 2 sajju; priyanka; priyanka 3 ashu pallu hema; ashu pallu hema; ashu pallu hema; 4
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.