Hi there!
I'm having a problem with concatenating a series of rows into a single row based on a group within a SAS dataset.
Basically a simplified version of what I want is to be able to is start with something like this:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
run;
and end up with something like this:
Letter | Numbers |
---|---|
a | 1,2,3 |
b | 4,5,6 |
c | |
d | 7,8 |
e | 9, 10 |
f | 11 |
The delimeter doesn't matter as long as I can see all the numbers that pertain to a group. In this case I don't know how many 'numbers' would be applied to each letter (assume max 5) and I'd have 100K+ 'letters'.
Thanks in advance for your help!!
Try this:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
data want;
length cat $20.;
do until (last.letter);
set intable;
by letter notsorted;
cat=catx(',',cat,number);
end;
drop number;
run;
proc print;run;
Haikuo
Try this:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
data want;
length cat $20.;
do until (last.letter);
set intable;
by letter notsorted;
cat=catx(',',cat,number);
end;
drop number;
run;
proc print;run;
Haikuo
Love this solution and have found it extremely usefully. Is there a way to exand this so that the "numbers" columns is grouped using two columns?
I tried this but it didn't work
data want;
length cat $20.;
do until (last.letter1 and last.letter2);
set intable;
by letter2 letter1 notsorted;
cat=catx(',',cat,number);
end;
drop number;
run;
For the do until line of code try the following instead. This will loop through different letter1 within each letter2.
do until (last.letter1);
or by hash:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
data _null_;
length letter $8 cat $20;
if _n_=1 then do;
declare hash h(ordered:'a');
h.definekey('letter');
h.definedata('letter','cat');
h.definedone();
end;
set intable end=last;
if h.find() ne 0 then do;
cat=number;
h.add();
end;
else do;
cat=catx(',',cat,number);
h.replace();
end;
if last then h.output(dataset:'want');
run;
proc print data=want;run;
Obs letter cat
1 a 1,2,3
2 b 4,5,6
3 c 7
4 d 8
5 e 9,10
6 f 11
Linlin
Or:
data want ;
length cat $200 ;
retain cat ;
set intable ;
by letter notsorted ;
if first.letter then cat = cats(number) ;
else cat = catx(',',cat,number) ;
if last.letter then output ;
run ;
data inTable; input letter $ number $; datalines; a 1 a 2 a 3 b 4 b 5 b 6 c 7 d 8 e 9 e 10 f 11 ; run; data want(drop=number); set inTable ; by letter; length l $ 4000; retain l; l=catx(',',l,number); if last.letter then do; output;call missing(l);end; run;
Ksharp
Or, use the SAS tools which match the needs, in this case PROC TRANSPOSE and the CATX function.
proc transpose data=intable out=tempwide ;
by letter ;
var number ;
run ;
data want(keep = letter cat) ;
set tempwide ;
length cat $20 ;
cat = catx( ',' , of col: ) ;
run ;
Good afternoon!
I thought this was really cool solution--thank you for posting it. I was curious about doing something similar, but rather than populating the transposed cell with the series of values (i.e., 1,2,3, etc.), I would want to populate the cell with the sum of values (i.e., 6). Could I modify this proc transpose syntax to accomplish this?
I appreciate your learned insight.
Cheers!
P
Dear Phil,
I am really sorry I did not get your question, could you please share your question with examples?
Regards,
SP.
Sure thing. Basically, I want to be able to turn this:
ID LOS
a 10
a 20
a 30
b 20
b 20
b 20
c 20
c 10
c 50
into something like this:
ID Total LOS
a 60
b 60
c 80
or, even something like this:
ID Mean LOS
a 20
b 20
c 26.67
Does that make sense? I appreciate the follow up question.
Cheers!
P
Dear Phil,
Here is the solution for your question, Hope it will helps you.
data inds;
input id $ los;
cards;
a 10
a 20
a 30
b 20
b 20
b 20
c 20
c 10
c 50
;
proc sort data = inds;
by id;
run;
/* Sum by Group */
data outds1 (keep = id sumvar);
set inds;
by id;
if first.id then sumvar = 0;
sumvar + los;
if last.id;
run;
/* Mean by Group */
data outds2 (keep = id meanvar);
set inds;
by id;
if first.id then do;
sumvar = 0;
grpcnt = 0;
end;
sumvar + los;
grpcnt + 1;
meanvar = round(sumvar/grpcnt, .01);
if last.id;
run;
Regards,
SP Choudary.
Here is another example from SP Choudary.
data inds;
input grp $ invar $;
cards;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d .
e 8
e 9
f 10
g .
g .
;
data outds (drop = invar);
set inds;
by grp;
length outvar $200;
retain outvar;
if first.grp then call missing(outvar);
outvar = catx(', ', outvar, invar);
if last.grp;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.