Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:
ID . diagnosis hospital
2022 . 1 . A
2022 . 0 B
2023 . 1 A
2023 . 1 . D
Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:
1 if count (diagnosis = 1) >= 2 by each ID
0 otherwise
How to do it with proc sql? Thanks.
I don't know about SQL, but it is easy with MEANS and a DATA step.
data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 . 1 A
2022 . 0 B
2023 . 1 A
2023 . 1 D
;
proc means; class id; var diagnosis; output out=b sum=sum; run;
proc print; run;
data all(drop=_: sum);
merge x b(where=(sum >= 2 and n(id)) in=f); by id;
y = f;
run;
proc print; run;
Select *, Sum(diagnosis) as num_diag, case when calculated num_diag>1 then 1 else 0 end as indicator
from have
group by ID;
This may or may not be the approach your instructor was expecting. I would highly suggest reviewing your lecture notes and seeing if a specific example wasn’t provided.
@Viveme789 wrote:
Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:
ID . diagnosis hospital
2022 . 1 . A
2022 . 0 B
2023 . 1 A
2023 . 1 . D
Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:
1 if count (diagnosis = 1) >= 2 by each ID
0 otherwise
How to do it with proc sql? Thanks.
@Viveme789 wrote:
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.
If the values are 0/1 its the same thing.....
data x(drop=miss); input ID miss diagnosis hospital $; datalines; 2022 . 1 A 2022 . 0 B 2023 . 1 A 2023 . 1 D ; proc sql; select *,sum(diagnosis=1)>=2 as flag from x group by id; quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.