BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
monsterpie
Obsidian | Level 7

Hello all,

 

I am trying to create sets of dummy variables, as shown by my example below. I'd prefer not to hard code this process and was thinking an array and/or do loop might be the best option but not exactly sure how to go about this. I just want the dummy variables to flag if the diagnosis occurred, not to count the number of a specific diagnosis per ID. I have multiple other variables that I will be creating dummy variables, hence why I thought an array or do loop would help. Any help is appreciated!

 

Dataset:

ID Sex Diagnosis

01

1 4
01 1 4
01 1 2
02 1 3
02 1 3
03 2 2
03 2 2
03 2 1
04 2 1

 

Want:

ID Sex Diag_1 Diag_2 Diag_3 Diag_4
01 1 0 1 0 1
02 1 0 0 1 0
03 2 1 1 0 0
04 2 1 0 0 0
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input id $ sex $ diag $;
dummy=1;
datalines;
01 1 4
01 1 4
01 1 2
02 1 3
02 1 3
03 2 2
03 2 2
03 2 1
04 2 1
;
run;

proc glmselect data=have outdesign(addinputvars)=temp noprint;
class diag;
model dummy=diag/selection=none noint;
run;
proc summary data=temp nway;
class id sex;
var diag_:;
output out=want(drop=_freq_ _type_) max=;
run;

View solution in original post

7 REPLIES 7
Reeza
Super User
Why are you dummy coding? If you're planning to use PROC LOGISTIC or PHREG the CLASS statement will dummy code the variables for you.
monsterpie
Obsidian | Level 7
I want to be able to my analyze my data by diagnosis type by ID and the only way I could think to do that since my data is in the long format is to create dummy variables.
Reeza
Super User
Couldn't you use the Diagnosis variable as BY/CLASS variable in proc freq/summary/means as well?

What exactly do you mean by 'analyze my data by diagnosis type'?

monsterpie
Obsidian | Level 7
I'd like to be able to get the frequencies of how many IDs had Diagnosis code=1, how many had Diagnosis code=2 etc, etc etc. I also wanted to add dummy vars for each diagnosis because in the future I may be analyzing my dataset using different software and thought it might be more useful to already have these dummy variables in the data.
Reeza
Super User

Regarding frequencies, definitely no dummy coding required for that:

 

proc freq data=have;
table diag / out=diag_counts;
run;

proc print data=diag_counts;
run;

To create dummy variables automatically see this post as well as the linked posts below. One of those options will work for you.

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Var...

tarheel13
Rhodochrosite | Level 12
data have;
input id $ sex $ diag $;
datalines;
01 1 4
01 1 4
01 1 2
02 1 3
02 1 3
03 2 2
03 2 2
03 2 1
04 2 1
;
run;

data have2;
	set have;
	diag_1=diag=1;
	diag_2=diag=2;
	diag_3=diag=3;
	diag_4=diag=4;
run;

proc sql;
	select id,sex,max(diag_1) as diag_1,
	max(diag_2) as diag_2,
	max(diag_3) as diag_3,
	max(diag_4) as diag_4
		from have2
		group by id,sex;
quit;
Ksharp
Super User
data have;
input id $ sex $ diag $;
dummy=1;
datalines;
01 1 4
01 1 4
01 1 2
02 1 3
02 1 3
03 2 2
03 2 2
03 2 1
04 2 1
;
run;

proc glmselect data=have outdesign(addinputvars)=temp noprint;
class diag;
model dummy=diag/selection=none noint;
run;
proc summary data=temp nway;
class id sex;
var diag_:;
output out=want(drop=_freq_ _type_) max=;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1521 views
  • 2 likes
  • 4 in conversation