BookmarkSubscribeRSS Feed
Viveme789
Fluorite | Level 6

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.

7 REPLIES 7
WarrenKuhfeld
Ammonite | Level 13

I don't know about SQL, but it is easy with MEANS and a DATA step.

data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;
proc means; class id; var diagnosis; output out=b sum=sum; run;
proc print; run;

data all(drop=_: sum);
   merge x b(where=(sum >= 2 and n(id)) in=f); by id;
   y = f;
   run;
proc print; run;
Viveme789
Fluorite | Level 6
Thanks for the effort! But I still have to do it with proc sql required by the professor.
Reeza
Super User

 

 

 

Select *, Sum(diagnosis) as num_diag, case when calculated num_diag>1 then 1 else 0 end as indicator

from have

group by ID;

 

This may or may not be the approach your instructor was expecting. I would highly suggest reviewing your lecture notes and seeing if a specific example wasn’t provided. 

 


@Viveme789 wrote:

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.


 

Viveme789
Fluorite | Level 6
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.
Miracle
Barite | Level 11
It is an addition of the diagnosis value for each patient not the number of rows for each patient. The latter is done by count(variable).
Reeza
Super User

@Viveme789 wrote:
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.

If the values are 0/1 its the same thing.....

Ksharp
Super User
data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;

proc sql;
select *,sum(diagnosis=1)>=2 as flag 
 from x
  group by id;
quit;


How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2482 views
  • 0 likes
  • 5 in conversation