Hi there!
I'm having a problem with concatenating a series of rows into a single row based on a group within a SAS dataset.
Basically a simplified version of what I want is to be able to is start with something like this:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
run;
and end up with something like this:
Letter | Numbers |
---|---|
a | 1,2,3 |
b | 4,5,6 |
c | |
d | 7,8 |
e | 9, 10 |
f | 11 |
The delimeter doesn't matter as long as I can see all the numbers that pertain to a group. In this case I don't know how many 'numbers' would be applied to each letter (assume max 5) and I'd have 100K+ 'letters'.
Thanks in advance for your help!!
Try this:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
data want;
length cat $20.;
do until (last.letter);
set intable;
by letter notsorted;
cat=catx(',',cat,number);
end;
drop number;
run;
proc print;run;
Haikuo
Try this:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
data want;
length cat $20.;
do until (last.letter);
set intable;
by letter notsorted;
cat=catx(',',cat,number);
end;
drop number;
run;
proc print;run;
Haikuo
Love this solution and have found it extremely usefully. Is there a way to exand this so that the "numbers" columns is grouped using two columns?
I tried this but it didn't work
data want;
length cat $20.;
do until (last.letter1 and last.letter2);
set intable;
by letter2 letter1 notsorted;
cat=catx(',',cat,number);
end;
drop number;
run;
For the do until line of code try the following instead. This will loop through different letter1 within each letter2.
do until (last.letter1);
or by hash:
data inTable;
input letter $ number $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
data _null_;
length letter $8 cat $20;
if _n_=1 then do;
declare hash h(ordered:'a');
h.definekey('letter');
h.definedata('letter','cat');
h.definedone();
end;
set intable end=last;
if h.find() ne 0 then do;
cat=number;
h.add();
end;
else do;
cat=catx(',',cat,number);
h.replace();
end;
if last then h.output(dataset:'want');
run;
proc print data=want;run;
Obs letter cat
1 a 1,2,3
2 b 4,5,6
3 c 7
4 d 8
5 e 9,10
6 f 11
Linlin
Or:
data want ;
length cat $200 ;
retain cat ;
set intable ;
by letter notsorted ;
if first.letter then cat = cats(number) ;
else cat = catx(',',cat,number) ;
if last.letter then output ;
run ;
data inTable; input letter $ number $; datalines; a 1 a 2 a 3 b 4 b 5 b 6 c 7 d 8 e 9 e 10 f 11 ; run; data want(drop=number); set inTable ; by letter; length l $ 4000; retain l; l=catx(',',l,number); if last.letter then do; output;call missing(l);end; run;
Ksharp
Or, use the SAS tools which match the needs, in this case PROC TRANSPOSE and the CATX function.
proc transpose data=intable out=tempwide ;
by letter ;
var number ;
run ;
data want(keep = letter cat) ;
set tempwide ;
length cat $20 ;
cat = catx( ',' , of col: ) ;
run ;
Good afternoon!
I thought this was really cool solution--thank you for posting it. I was curious about doing something similar, but rather than populating the transposed cell with the series of values (i.e., 1,2,3, etc.), I would want to populate the cell with the sum of values (i.e., 6). Could I modify this proc transpose syntax to accomplish this?
I appreciate your learned insight.
Cheers!
P
Dear Phil,
I am really sorry I did not get your question, could you please share your question with examples?
Regards,
SP.
Sure thing. Basically, I want to be able to turn this:
ID LOS
a 10
a 20
a 30
b 20
b 20
b 20
c 20
c 10
c 50
into something like this:
ID Total LOS
a 60
b 60
c 80
or, even something like this:
ID Mean LOS
a 20
b 20
c 26.67
Does that make sense? I appreciate the follow up question.
Cheers!
P
Dear Phil,
Here is the solution for your question, Hope it will helps you.
data inds;
input id $ los;
cards;
a 10
a 20
a 30
b 20
b 20
b 20
c 20
c 10
c 50
;
proc sort data = inds;
by id;
run;
/* Sum by Group */
data outds1 (keep = id sumvar);
set inds;
by id;
if first.id then sumvar = 0;
sumvar + los;
if last.id;
run;
/* Mean by Group */
data outds2 (keep = id meanvar);
set inds;
by id;
if first.id then do;
sumvar = 0;
grpcnt = 0;
end;
sumvar + los;
grpcnt + 1;
meanvar = round(sumvar/grpcnt, .01);
if last.id;
run;
Regards,
SP Choudary.
Here is another example from SP Choudary.
data inds;
input grp $ invar $;
cards;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d .
e 8
e 9
f 10
g .
g .
;
data outds (drop = invar);
set inds;
by grp;
length outvar $200;
retain outvar;
if first.grp then call missing(outvar);
outvar = catx(', ', outvar, invar);
if last.grp;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.