Hello all,
I am trying to create sets of dummy variables, as shown by my example below. I'd prefer not to hard code this process and was thinking an array and/or do loop might be the best option but not exactly sure how to go about this. I just want the dummy variables to flag if the diagnosis occurred, not to count the number of a specific diagnosis per ID. I have multiple other variables that I will be creating dummy variables, hence why I thought an array or do loop would help. Any help is appreciated!
Dataset:
ID | Sex | Diagnosis |
01 |
1 | 4 |
01 | 1 | 4 |
01 | 1 | 2 |
02 | 1 | 3 |
02 | 1 | 3 |
03 | 2 | 2 |
03 | 2 | 2 |
03 | 2 | 1 |
04 | 2 | 1 |
Want:
ID | Sex | Diag_1 | Diag_2 | Diag_3 | Diag_4 |
01 | 1 | 0 | 1 | 0 | 1 |
02 | 1 | 0 | 0 | 1 | 0 |
03 | 2 | 1 | 1 | 0 | 0 |
04 | 2 | 1 | 0 | 0 | 0 |
data have;
input id $ sex $ diag $;
dummy=1;
datalines;
01 1 4
01 1 4
01 1 2
02 1 3
02 1 3
03 2 2
03 2 2
03 2 1
04 2 1
;
run;
proc glmselect data=have outdesign(addinputvars)=temp noprint;
class diag;
model dummy=diag/selection=none noint;
run;
proc summary data=temp nway;
class id sex;
var diag_:;
output out=want(drop=_freq_ _type_) max=;
run;
Regarding frequencies, definitely no dummy coding required for that:
proc freq data=have;
table diag / out=diag_counts;
run;
proc print data=diag_counts;
run;
To create dummy variables automatically see this post as well as the linked posts below. One of those options will work for you.
data have;
input id $ sex $ diag $;
datalines;
01 1 4
01 1 4
01 1 2
02 1 3
02 1 3
03 2 2
03 2 2
03 2 1
04 2 1
;
run;
data have2;
set have;
diag_1=diag=1;
diag_2=diag=2;
diag_3=diag=3;
diag_4=diag=4;
run;
proc sql;
select id,sex,max(diag_1) as diag_1,
max(diag_2) as diag_2,
max(diag_3) as diag_3,
max(diag_4) as diag_4
from have2
group by id,sex;
quit;
data have;
input id $ sex $ diag $;
dummy=1;
datalines;
01 1 4
01 1 4
01 1 2
02 1 3
02 1 3
03 2 2
03 2 2
03 2 1
04 2 1
;
run;
proc glmselect data=have outdesign(addinputvars)=temp noprint;
class diag;
model dummy=diag/selection=none noint;
run;
proc summary data=temp nway;
class id sex;
var diag_:;
output out=want(drop=_freq_ _type_) max=;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.