Hi All,
I have the following have data where I only care about codes a, b, and c. I have an identifier variable for each of the codes that are 1 or 0 for whether that data row has the specific code. However, I'd like that identifier variable for each code to be 1 for every row of the ID if that ID had the code at some point. I've included the want data below. Thanks!
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 1 1
1 b 1 1 1
1 c 1 1 1
2 b 0 1 0
2 f 0 1 0
3 a 1 0 1
3 c 1 0 1
;
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
data want(drop=a b c);
do until (last.id);
set have;
by id;
a = max(code_a, a);
b = max(code_b, b);
c = max(code_c, c);
end;
do until (last.id);
set have;
by id;
code_a = a;
code_b = b;
code_c = c;
output;
end;
run;
proc summary data=have;
class id;
var code_:;
output
out=want (drop=_type_ _freq_)
max()=
;
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
data want(drop=a b c);
do until (last.id);
set have;
by id;
a = max(code_a, a);
b = max(code_b, b);
c = max(code_c, c);
end;
do until (last.id);
set have;
by id;
code_a = a;
code_b = b;
code_c = c;
output;
end;
run;
Here is a program that takes advantage of how the sas data step populates a PDV (program data vector, think of it as a list of variables.
The strategy below does the following:
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
merge have
have (keep=id code_a where=(code_a=1))
have (keep=id code_b where=(code_b=1))
have (keep=id code_c where=(code_c=1)) ;
by id;
if first.id then do until (last.id);
set have (drop=code_a code_b code_c);
by id;
output;
end;
run;
Ordinarily I would suggest using the proc summary solutioni offered by @Kurt_Bremser , but it produces one record per id, which you would have to re-merge with have if you want the original number of records.
And this program may look a little strange compared to the well-known double do offered by draycut. I'd suggest sticking with @PeterClemmensen's suggestion. But if you need to scale to situations with multiple variables for each code (e.g. code_a1 code_a2 code_a3 code_a4 code_a5, code_b1 code_b2 code_b3 code_b4 code_b5, code_c1 code_c2 code_c3 code_c4 code_c5 ), then the code above might be easier to maintain.
@mkeintz very nice.
Hi @PeterBr
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
data want;
do _iorc_=1 by 1 until(last.id);
set have;
by id;
array t code_a code_b code_c ;
array u(3) _temporary_;
do _n_=1 to dim(t);
if t(_n_) then u(_n_)=t(_n_);
end;
end;
do _iorc_=1 to _iorc_;
set have;
do _n_=1 to dim(t);
t(_n_)=^^u(_n_);
end;
output;
end;
call missing(of u(*));
run;
data have;
input id $ code $ code_a code_b code_c ;
datalines;
1 a 1 0 0
1 b 0 1 0
1 c 0 0 1
2 b 0 1 0
2 f 0 0 0
3 a 1 0 0
3 c 0 0 1
;
run;
proc sql;
create table want as
select id,code,
max(code_a) as code_a,
max(code_b) as code_b,
max(code_c) as code_c
from have
group by id;
quit;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: