DATA Step, Macro, Functions and more

how to create a dummy variable based on the number of appearance of a certain value ?

Reply
Occasional Contributor
Posts: 18

how to create a dummy variable based on the number of appearance of a certain value ?

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.

SAS Super FREQ
Posts: 496

Re: how to create a dummy variable based on the number of appearance of a certain value ?

[ Edited ]
Posted in reply to Viveme789

I don't know about SQL, but it is easy with MEANS and a DATA step.

data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;
proc means; class id; var diagnosis; output out=b sum=sum; run;
proc print; run;

data all(drop=_: sum);
   merge x b(where=(sum >= 2 and n(id)) in=f); by id;
   y = f;
   run;
proc print; run;
Occasional Contributor
Posts: 18

Re: how to create a dummy variable based on the number of appearance of a certain value ?

Posted in reply to WarrenKuhfeld
Thanks for the effort! But I still have to do it with proc sql required by the professor.
Super User
Posts: 22,874

Re: how to create a dummy variable based on the number of appearance of a certain value ?

[ Edited ]
Posted in reply to Viveme789

 

 

 

Select *, Sum(diagnosis) as num_diag, case when calculated num_diag>1 then 1 else 0 end as indicator

from have

group by ID;

 

This may or may not be the approach your instructor was expecting. I would highly suggest reviewing your lecture notes and seeing if a specific example wasn’t provided. 

 


Viveme789 wrote:

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.


 

Occasional Contributor
Posts: 18

Re: how to create a dummy variable based on the number of appearance of a certain value ?

Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.
Regular Contributor
Posts: 249

Re: how to create a dummy variable based on the number of appearance of a certain value ?

Posted in reply to Viveme789
It is an addition of the diagnosis value for each patient not the number of rows for each patient. The latter is done by count(variable).
Super User
Posts: 22,874

Re: how to create a dummy variable based on the number of appearance of a certain value ?

Posted in reply to Viveme789

Viveme789 wrote:
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.

If the values are 0/1 its the same thing.....

Super User
Posts: 10,623

Re: how to create a dummy variable based on the number of appearance of a certain value ?

Posted in reply to Viveme789
data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;

proc sql;
select *,sum(diagnosis=1)>=2 as flag 
 from x
  group by id;
quit;


Ask a Question
Discussion stats
  • 7 replies
  • 229 views
  • 0 likes
  • 5 in conversation