Col1 | Col2 |
1 | 1,5,4,3,2,6 |
2 | 21,3,2,1,5,15,17,3 |
3 | 1,2,3 |
4 | 4,3,2 |
Dear Friends,
I have dataset like above, I would like to convert it like below. Can someone suggest me the easy way. Thanks in advance.
Col1 | Col2 |
1 | 1,2,3,4,5,6 |
2 | 1,2,3,3,5,15,17,21 |
3 | 1,2,3 |
4 | 2,3,4 |
Or use the "RESOLVE trick" (just for fun):
data want(drop=_ i);
set have;
_=col2;
col2=' ';
do i=1 to countw(_,',');
col2=catx(',',col2,input(resolve(cats('%sysfunc(ordinal(',i,',',_,'))')),32.));
end;
run;
Normalize the data first.
data tall;
set have;
length index value 8;
do index=1 to max(1,countw(col2,','));
value=input(scan(col2,index,','),32.);
output;
end;
run;
Then normal sorting works (add NODUPKEY option to PROC SORT statement to eliminate duplicates).
proc sort data=tall ;
by col1 value;
run;
If for some reason you need to recreate that comma delimited list run another data step.
data want;
do until (last.col1);
set tall;
by col1;
length new_col2 $200;
new_col2=catx(',',new_col2,value);
end;
drop index value ;
run;
You could try combing "infile trick" and missing=" " option:
data have;
input Col1 Col2 :$ 200.;
cards;
1 1,5,4,3,2,6
2 21,3,2,1,5,15,17,3
3 1,2,3
4 4,3,2
;
run;
proc print data = have;
run;
/* create a fake temp file */
filename f temp;
data _null_;
file f;
put;
run;
options missing=" "; /* for CATX() to replace missing dot (.) with a space (" ") */
%let WN=20; /* max number of possible values on a list */
data want;
set have;
infile f truncover dlm=",";
input @1 @;
_infile_ = Col2;
input @1 (m1-m&WN.) (:) @@;
array mm m1-m&WN.;
put mm[*]=;
call sortn(of m:);
Col2 = catx(",",of m:);
keep Col1 Col2;
run;
filename f clear;
options missing=.;
proc print data = want;
run;
Bart
Or use the "RESOLVE trick" (just for fun):
data want(drop=_ i);
set have;
_=col2;
col2=' ';
do i=1 to countw(_,',');
col2=catx(',',col2,input(resolve(cats('%sysfunc(ordinal(',i,',',_,'))')),32.));
end;
run;
If you know the range of number you could use an array to collect and catx() regenerate the string. Make sure to set the MISSING option properly.
data have;
input col1 col2 :$50. ;
cards;
1 1,5,4,3,2,6
2 21,3,2,1,5,15,17,3
3 1,2,3
4 4,3,2
5 1,2,junk,3
;
options missing=' ';
data want;
set have;
array x[100] ;
do index=1 to countw(col2,',');
value=input(scan(col2,index,','),??32.);
if value in (1:100) then x[value]=value;
else put 'WARNING: Invalid number at ' index= col1= col2=;
end;
col2=catx(',',of x:);
drop index value x:;
run;
options missing='.';
Log
11 options missing=' '; 12 data want; 13 set have; 14 array x[100] ; 15 do index=1 to countw(col2,','); 16 value=input(scan(col2,index,','),??32.); 17 if value in (1:100) then x[value]=value; 18 else put 'WARNING: Invalid number at ' index= col1= col2=; 19 end; 20 col2=catx(',',of x:); 21 drop index value x:; 22 run; WARNING: Invalid number at index=3 col1=5 col2=1,2,junk,3 NOTE: There were 5 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 5 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 23 options missing='.';
Results
You can transfer the components of the cell into a hash object with the "ordered" attribute. Then iterate through them and concatenate each of them:
data have;
input col1 col2 :$50. ;
cards;
1 1,5,4,3,2,6,3.5
2 21,3,2,1,5,15,17,3
3 1,2,3
4 4,3,2
5 1,2,junk,3
;
data want (drop=i _:);
set have;
if _n_=1 then do;
declare hash h (ordered:'A',multidata:'Y');
h.definekey('_x');
h.definedata('_x');
h.definedone();
declare hiter hi ('h');
end;
do i=1 to countw(col2,',');
_x=input(scan(col2,i,','),best12.);
if _x^=. then h.add();
end;
length new_col $50;
do while (hi.next()=0);
new_col=catx(',',new_col,_x);
end;
h.clear();
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.