BookmarkSubscribeRSS Feed
Viveme789
Fluorite | Level 6

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.

7 REPLIES 7
WarrenKuhfeld
Rhodochrosite | Level 12

I don't know about SQL, but it is easy with MEANS and a DATA step.

data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;
proc means; class id; var diagnosis; output out=b sum=sum; run;
proc print; run;

data all(drop=_: sum);
   merge x b(where=(sum >= 2 and n(id)) in=f); by id;
   y = f;
   run;
proc print; run;
Viveme789
Fluorite | Level 6
Thanks for the effort! But I still have to do it with proc sql required by the professor.
Reeza
Super User

 

 

 

Select *, Sum(diagnosis) as num_diag, case when calculated num_diag>1 then 1 else 0 end as indicator

from have

group by ID;

 

This may or may not be the approach your instructor was expecting. I would highly suggest reviewing your lecture notes and seeing if a specific example wasn’t provided. 

 


@Viveme789 wrote:

Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:

 

ID . diagnosis hospital
2022 .   1 .         A
2022 .   0           B
2023 .   1           A
2023 .   1 .         D

 

Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:

 

1 if count (diagnosis = 1) >= 2 by each ID

0 otherwise

 

How to do it with proc sql? Thanks.


 

Viveme789
Fluorite | Level 6
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.
Miracle
Barite | Level 11
It is an addition of the diagnosis value for each patient not the number of rows for each patient. The latter is done by count(variable).
Reeza
Super User

@Viveme789 wrote:
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.

If the values are 0/1 its the same thing.....

Ksharp
Super User
data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 .   1           A
2022 .   0           B
2023 .   1           A
2023 .   1          D
;

proc sql;
select *,sum(diagnosis=1)>=2 as flag 
 from x
  group by id;
quit;


Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1186 views
  • 0 likes
  • 5 in conversation