BookmarkSubscribeRSS Feed
Fluorite | Level 6

## how to create a dummy variable based on the number of appearance of a certain value ?

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

How to do it with proc sql? Thanks.

7 REPLIES 7
Rhodochrosite | Level 12

## Re: how to create a dummy variable based on the number of appearance of a certain value ?

I don't know about SQL, but it is easy with MEANS and a DATA step.

``````data x(drop=miss);
input ID miss diagnosis hospital \$;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;
proc means; class id; var diagnosis; output out=b sum=sum; run;
proc print; run;

data all(drop=_: sum);
merge x b(where=(sum >= 2 and n(id)) in=f); by id;
y = f;
run;
proc print; run;``````
Fluorite | Level 6

## Re: how to create a dummy variable based on the number of appearance of a certain value ?

Thanks for the effort! But I still have to do it with proc sql required by the professor.
Super User

## Re: how to create a dummy variable based on the number of appearance of a certain value ?

Select *, Sum(diagnosis) as num_diag, case when calculated num_diag>1 then 1 else 0 end as indicator

from have

group by ID;

This may or may not be the approach your instructor was expecting. I would highly suggest reviewing your lecture notes and seeing if a specific example wasn’t provided.

@Viveme789 wrote:

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

How to do it with proc sql? Thanks.

Fluorite | Level 6

## Re: how to create a dummy variable based on the number of appearance of a certain value ?

Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.
Barite | Level 11

## Re: how to create a dummy variable based on the number of appearance of a certain value ?

It is an addition of the diagnosis value for each patient not the number of rows for each patient. The latter is done by count(variable).
Super User

## Re: how to create a dummy variable based on the number of appearance of a certain value ?

@Viveme789 wrote:
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.

If the values are 0/1 its the same thing.....

Super User

## Re: how to create a dummy variable based on the number of appearance of a certain value ?

```data x(drop=miss);
input ID miss diagnosis hospital \$;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;

proc sql;
select *,sum(diagnosis=1)>=2 as flag
from x
group by id;
quit;

```
Discussion stats
• 7 replies
• 1195 views
• 0 likes
• 5 in conversation