BookmarkSubscribeRSS Feed
Viveme789
Fluorite | Level 6

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.

7 REPLIES 7
WarrenKuhfeld
Ammonite | Level 13

I don't know about SQL, but it is easy with MEANS and a DATA step.

data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;
proc means; class id; var diagnosis; output out=b sum=sum; run;
proc print; run;

data all(drop=_: sum);
   merge x b(where=(sum >= 2 and n(id)) in=f); by id;
   y = f;
   run;
proc print; run;
Viveme789
Fluorite | Level 6
Thanks for the effort! But I still have to do it with proc sql required by the professor.
Reeza
Super User

 

 

 

Select *, Sum(diagnosis) as num_diag, case when calculated num_diag>1 then 1 else 0 end as indicator

from have

group by ID;

 

This may or may not be the approach your instructor was expecting. I would highly suggest reviewing your lecture notes and seeing if a specific example wasn’t provided. 

 


@Viveme789 wrote:

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.


 

Viveme789
Fluorite | Level 6
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.
Miracle
Barite | Level 11
It is an addition of the diagnosis value for each patient not the number of rows for each patient. The latter is done by count(variable).
Reeza
Super User

@Viveme789 wrote:
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.

If the values are 0/1 its the same thing.....

Ksharp
Super User
data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;

proc sql;
select *,sum(diagnosis=1)>=2 as flag 
 from x
  group by id;
quit;


hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2019 views
  • 0 likes
  • 5 in conversation