Hello Dear Colleagues!
Could you please tell me if it is possible to transpose, concat(delimiter=any) and filter data in SAS Proc SQL?
Or such operations could be done only in datastep?
data have; input name_i name_d; datalines; A a A b A a
B c
B c
B c
C d
C a
C b
D d
;
data want; input name_i name_d; datalines; A a, b B c
C d, a, b
D c
;
Thanks!
Other approach, assuming `name_i` is sorted:
data have;
input name_i $ name_d $;
datalines;
A a
A b
A a
B c
B c
B c
C d
C a
C b
D d
;
run;
data want(drop=__:);
declare hash U();
U.defineKey("name_i");
U.defineKey("__tmp");
U.defineDone();
declare hash H(multidata:"Y");
H.defineKey("name_i");
H.defineData("__tmp");
H.defineDone();
do until(last.name_i);
set have(rename=(name_d=__tmp));
by name_i;
__key=name_i;
if U.check() then do; H.add(key:__key, data: __tmp); U.add(); end;
end;
H.reset_dup();
length name_d $ 200; retain name_d;
name_d = "";
do while(h.do_over(key:__key) = 0);
name_d = catx(", ", name_d, __tmp);
end;
H.delete();
U.delete();
run;
proc print;
run;
All the best
Bart
Hi,
try this:
data have;
input name_i $ name_d $; n=_N_;
datalines;
A a
A b
A a
B c
B c
B c
C d
C a
C b
D d
;
run;
proc sort data = have;
by name_i name_d n;
run;
/* if you need to keep the order form the original dataset uncomment below */
/*
data have;
set have;
by name_i name_d;
if first.name_d;
run;
proc sort data = have;
by name_i n;
run;
*/
data want(drop=__:);
set have(rename=(name_d=__tmp) drop = n);
by name_i __tmp;
length name_d $ 200; retain name_d;
if first.name_i then name_d = "";
if first.__tmp then name_d = catx(", ", name_d, __tmp);
if last.name_i;
run;
proc print;
run;
All the best
Bart
A lot of other SQL dialects have functions for aggregating strings across rows (e.g. STRING_AGG in MS SQL), but SAS missed out on that one. I am not sure why, it should not be that hard to expand the CAT family functions in the same way that e.g. SUM and MIN are expanded in SAS SQL.
You can do it in SQL, but the code becomes rather tedious:
proc sql;
create table want as select
all.name_i,
catx(', ',a.name_d,b.name_d,c.name_d,d.name_d) as name_d
from
(select distinct name_i from have) all left join
(select distinct name_d from have where name_d='a') a
on all.name_i=a.name_i left join
(select distinct name_d from have where name_d='b') b
on all.name_i=b.name_i left join
(select distinct name_d from have where name_d='c') c
on all.name_i=c.name_i left join
(select distinct name_d from have where name_d='c') d
on all.name_i=d.name_i
;
You could write a macro to do something like that with any NAME_D values, but I would recommend against it. A datastep is much faster and easier to understand:
proc sort data=have;
by name_i name_d;
run;
data want;
do until last.name_i;
set have;
by name_i name_d;
if first.name_d then
new_d=catx(', ',new_d,name_d);
end;
drop name_d;
rename new_d=name_d;
run;
Other approach, assuming `name_i` is sorted:
data have;
input name_i $ name_d $;
datalines;
A a
A b
A a
B c
B c
B c
C d
C a
C b
D d
;
run;
data want(drop=__:);
declare hash U();
U.defineKey("name_i");
U.defineKey("__tmp");
U.defineDone();
declare hash H(multidata:"Y");
H.defineKey("name_i");
H.defineData("__tmp");
H.defineDone();
do until(last.name_i);
set have(rename=(name_d=__tmp));
by name_i;
__key=name_i;
if U.check() then do; H.add(key:__key, data: __tmp); U.add(); end;
end;
H.reset_dup();
length name_d $ 200; retain name_d;
name_d = "";
do while(h.do_over(key:__key) = 0);
name_d = catx(", ", name_d, __tmp);
end;
H.delete();
U.delete();
run;
proc print;
run;
All the best
Bart
Thank you very much, but unfortunately your code is not working properly. Maybe that is my fault, but I am getting errors while compiling it and cannot understand what I am doing wrong. I'm sorry..
Thank you very much, i've resend your code to my colleguaes(people, which asked me about this problem), they transmitted big thanks to you.
Glad I could help 🙂
Bart
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.