Hi All,
I have the following have data where I only care about codes a, b, and c. I have an identifier variable for each of the codes that are 1 or 0 for whether that data row has the specific code. However, I'd like that identifier variable for each code to be 1 for every row of the ID if that ID had the code at some point. I've included the want data below. Thanks!
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 1 1
1 b 1 1 1
1 c 1 1 1
2 b 0 1 0
2 f 0 1 0
3 a 1 0 1
3 c 1 0 1
;
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
data want(drop=a b c);
do until (last.id);
set have;
by id;
a = max(code_a, a);
b = max(code_b, b);
c = max(code_c, c);
end;
do until (last.id);
set have;
by id;
code_a = a;
code_b = b;
code_c = c;
output;
end;
run;
proc summary data=have;
class id;
var code_:;
output
out=want (drop=_type_ _freq_)
max()=
;
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
data want(drop=a b c);
do until (last.id);
set have;
by id;
a = max(code_a, a);
b = max(code_b, b);
c = max(code_c, c);
end;
do until (last.id);
set have;
by id;
code_a = a;
code_b = b;
code_c = c;
output;
end;
run;
Here is a program that takes advantage of how the sas data step populates a PDV (program data vector, think of it as a list of variables.
The strategy below does the following:
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
merge have
have (keep=id code_a where=(code_a=1))
have (keep=id code_b where=(code_b=1))
have (keep=id code_c where=(code_c=1)) ;
by id;
if first.id then do until (last.id);
set have (drop=code_a code_b code_c);
by id;
output;
end;
run;
Ordinarily I would suggest using the proc summary solutioni offered by @Kurt_Bremser , but it produces one record per id, which you would have to re-merge with have if you want the original number of records.
And this program may look a little strange compared to the well-known double do offered by draycut. I'd suggest sticking with @PeterClemmensen's suggestion. But if you need to scale to situations with multiple variables for each code (e.g. code_a1 code_a2 code_a3 code_a4 code_a5, code_b1 code_b2 code_b3 code_b4 code_b5, code_c1 code_c2 code_c3 code_c4 code_c5 ), then the code above might be easier to maintain.
@mkeintz very nice.
Hi @PeterBr
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
do _iorc_=1 by 1 until(last.id);
set have;
by id;
array t code_a code_b code_c ;
array u(3) _temporary_;
do _n_=1 to dim(t);
if t(_n_) then u(_n_)=t(_n_);
end;
end;
do _iorc_=1 to _iorc_;
set have;
do _n_=1 to dim(t);
t(_n_)=^^u(_n_);
end;
output;
end;
call missing(of u(*));
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
proc sql;
create table want as
select id,code,
max(code_a) as code_a,
max(code_b) as code_b,
max(code_c) as code_c
from have
group by id;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.