Hi All,
I have the following have data where I only care about codes a, b, and c. I have an identifier variable for each of the codes that are 1 or 0 for whether that data row has the specific code. However, I'd like that identifier variable for each code to be 1 for every row of the ID if that ID had the code at some point. I've included the want data below. Thanks!
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 1 1
1 b 1 1 1
1 c 1 1 1
2 b 0 1 0
2 f 0 1 0
3 a 1 0 1
3 c 1 0 1
;
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
data want(drop=a b c);
do until (last.id);
set have;
by id;
a = max(code_a, a);
b = max(code_b, b);
c = max(code_c, c);
end;
do until (last.id);
set have;
by id;
code_a = a;
code_b = b;
code_c = c;
output;
end;
run;
proc summary data=have;
class id;
var code_:;
output
out=want (drop=_type_ _freq_)
max()=
;
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
data want(drop=a b c);
do until (last.id);
set have;
by id;
a = max(code_a, a);
b = max(code_b, b);
c = max(code_c, c);
end;
do until (last.id);
set have;
by id;
code_a = a;
code_b = b;
code_c = c;
output;
end;
run;
Here is a program that takes advantage of how the sas data step populates a PDV (program data vector, think of it as a list of variables.
The strategy below does the following:
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
merge have
have (keep=id code_a where=(code_a=1))
have (keep=id code_b where=(code_b=1))
have (keep=id code_c where=(code_c=1)) ;
by id;
if first.id then do until (last.id);
set have (drop=code_a code_b code_c);
by id;
output;
end;
run;
Ordinarily I would suggest using the proc summary solutioni offered by @Kurt_Bremser , but it produces one record per id, which you would have to re-merge with have if you want the original number of records.
And this program may look a little strange compared to the well-known double do offered by draycut. I'd suggest sticking with @PeterClemmensen's suggestion. But if you need to scale to situations with multiple variables for each code (e.g. code_a1 code_a2 code_a3 code_a4 code_a5, code_b1 code_b2 code_b3 code_b4 code_b5, code_c1 code_c2 code_c3 code_c4 code_c5 ), then the code above might be easier to maintain.
@mkeintz very nice.
Hi @PeterBr
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
do _iorc_=1 by 1 until(last.id);
set have;
by id;
array t code_a code_b code_c ;
array u(3) _temporary_;
do _n_=1 to dim(t);
if t(_n_) then u(_n_)=t(_n_);
end;
end;
do _iorc_=1 to _iorc_;
set have;
do _n_=1 to dim(t);
t(_n_)=^^u(_n_);
end;
output;
end;
call missing(of u(*));
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
proc sql;
create table want as
select id,code,
max(code_a) as code_a,
max(code_b) as code_b,
max(code_c) as code_c
from have
group by id;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.