Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:
ID . diagnosis hospital
2022 . 1 . A
2022 . 0 B
2023 . 1 A
2023 . 1 . D
Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:
1 if count (diagnosis = 1) >= 2 by each ID
0 otherwise
How to do it with proc sql? Thanks.
I don't know about SQL, but it is easy with MEANS and a DATA step.
data x(drop=miss);
input ID miss diagnosis hospital $;
datalines;
2022 . 1 A
2022 . 0 B
2023 . 1 A
2023 . 1 D
;
proc means; class id; var diagnosis; output out=b sum=sum; run;
proc print; run;
data all(drop=_: sum);
merge x b(where=(sum >= 2 and n(id)) in=f); by id;
y = f;
run;
proc print; run;
Select *, Sum(diagnosis) as num_diag, case when calculated num_diag>1 then 1 else 0 end as indicator
from have
group by ID;
This may or may not be the approach your instructor was expecting. I would highly suggest reviewing your lecture notes and seeing if a specific example wasn’t provided.
@Viveme789 wrote:
Hello!
I wanna ask how to create a dummy variable based on the number of appearance of a certain value ?
For example, I have dataset:
ID . diagnosis hospital
2022 . 1 . A
2022 . 0 B
2023 . 1 A
2023 . 1 . D
Say a certain disease Y is classified positive if it's been diagnosed positive at lease twice. In this class, ID 2023 would be identified having the disease Y. I want to assign another variable called Y as dummy variable:
1 if count (diagnosis = 1) >= 2 by each ID
0 otherwise
How to do it with proc sql? Thanks.
@Viveme789 wrote:
Thank you! I have a question regarding the statement:
Sum(diagnosis) as num_diag
is it summing the number of diagnosis for each patient, which is actually the number of rows per patient, or summing (diagnosis = 1)?
And there's no specific example in the lecture notes:/
Thank you.
If the values are 0/1 its the same thing.....
data x(drop=miss); input ID miss diagnosis hospital $; datalines; 2022 . 1 A 2022 . 0 B 2023 . 1 A 2023 . 1 D ; proc sql; select *,sum(diagnosis=1)>=2 as flag from x group by id; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.