Hi folks,
I'm trying to create a table shown in image which requires distinct counts of patients by class variables (enrolled and cancer and sites). Simple proc freq with class would do if my data was a row per patient. However, my data is repeated and proc sql using distinct functon shown in the code didn't cross-tabulate for me.
I appreciate your time for help. Thanks in advance.
My data has:
enrolled: constant variable for each subjects. takes 0,1. if someone enrolled then enrolled (1) or not enrolled (0) across all rows for each patient
cancer: a dummy variable (0,1). One patient can take both 0 or 1 for different rows.
sites: distinct categorical variable for each patient. one patient has only one site.
/*failed codes*/ proc freq data=have nlevels;
class sites; tables a*b/norow nocol nopercent nocum; run; proc sql; SELECT count(distinct ID) as N_PATIENTS, CANCER, ENROLLED, SITES from HAVE GROUP BY ENROLLED, CANCER, SITES; quit; /*mock data*/
DATA HAVE;
INPUT ID CANCER enrolled SITES $;
CARDS;
1 0 1 Lung
1 1 1 Lung
2 0 1 Liver
2 0 1 Liver
2 1 1 Liver
3 0 1 Lung
3 1 1 Lung
4 0 1 Cervix
4 0 1 Cervix
4 1 1 Cervix
5 0 0 Anus
6 0 1 Thyroid
6 1 1 Thyroid
7 0 1 CNS
7 1 1 CNS
8 0 1 Prostate
8 1 1 Prostate
8 1 1 Prostate
9 0 1 Breast
9 1 1 Breast
10 0 1 Breast
10 1 1 Breast
11 0 0 Thyroid
11 0 0 Thyroid
11 0 0 Thyroid
12 0 1 Kidney
12 1 1 Kidney
13 0 1 CNS
13 0 1 CNS
13 1 1 CNS
14 0 1 Prostate
14 1 1 Prostate
;
Thank @Cynthia_sas
proc format;
value Cancer_Fmt
0 = 'Cancer=0'
1 = 'Cancer=1' ;
value Enroll_Fmt
0 = 'Enrolled=0'
1 = 'Enrolled=1' ;
picture ColPCnt other='009.99%';
run;
proc tabulate data=HAVE;
class sites cancer enrolled;
table sites all, all*(n='#' colpctn='Col%'*f=ColPCnt.) CANCER='' * (all*(n='#' colpctn='Col%'*f=ColPCnt.) enrolled='' * (n='#' colpctn='Col%'*f=ColPCnt.)) ;
format cancer Cancer_Fmt. enrolled Enroll_Fmt.;
Run;
output :
It's a double proc freq for distinct counts.
Do it the first time with the variable you want to count included and then a second time without. Except it doesn't seem like you're actually counting distinct based on the WANT data set. Can you explain why there is no entries/counts for Cancer=0 and Enrolled=1 when you have those in the data? Are you using the assumption that if you have cancer/enrolled that needs to be kept at all? If so, take the max for each cancer/enrolled combo then do the proc freq.
data have;
input ID CANCER ENROLLED;
cards;
1 1 1
1 1 1
1 1 1
1 1 1
1 0 1
1 0 1
1 0 1
1 0 1
2 1 0
2 1 0
2 1 0
2 1 0
2 0 0
2 0 0
2 0 0
2 0 0
;
*distinct counts
proc freq data=have noprint;
table cancer*enrolled*id /out = temp;run;
proc freq data=temp;
table cancer*enrolled / out=want;
run;
*what you likely actually want;
proc sql;
create table temp2 as
select id, max(cancer) as cancer, max(enrolled) as enrolled
from have
group by ID;
quit;
proc freq data=temp2;
table cancer*enrolled;
run;
I was just staring at this forum where you offered similar appraoch. Let me look at your suggestion to my post here. I'll get back to you asap. Thanks for responding me on weekend!
https://communities.sas.com/t5/SAS-Programming/PROC-FREQ-with-nodupkey/td-p/389788
Reeza thanks for a pointer. I have all combinations of cancer*enrolled. Attached image is the desired table.
DATA HAVE;
INPUT ID CANCER enrolled SITES $;
CARDS;
1 0 1 Lung
1 1 1 Lung
2 0 1 Liver
2 0 1 Liver
2 1 1 Liver
3 0 1 Lung
3 1 1 Lung
4 0 1 Cervix
4 0 1 Cervix
4 1 1 Cervix
5 0 0 Anus
6 0 1 Thyroid
6 1 1 Thyroid
7 0 1 CNS
7 1 1 CNS
8 0 1 Prostate
8 1 1 Prostate
8 1 1 Prostate
9 0 1 Breast
9 1 1 Breast
10 0 1 Breast
10 1 1 Breast
11 0 0 Thyroid
11 0 0 Thyroid
11 0 0 Thyroid
12 0 1 Kidney
12 1 1 Kidney
13 0 1 CNS
13 0 1 CNS
13 1 1 CNS
14 0 1 Prostate
14 1 1 Prostate
;
One of the options I posted should work then. Do they not? If not, how so?
Thank @Cynthia_sas
proc format;
value Cancer_Fmt
0 = 'Cancer=0'
1 = 'Cancer=1' ;
value Enroll_Fmt
0 = 'Enrolled=0'
1 = 'Enrolled=1' ;
picture ColPCnt other='009.99%';
run;
proc tabulate data=HAVE;
class sites cancer enrolled;
table sites all, all*(n='#' colpctn='Col%'*f=ColPCnt.) CANCER='' * (all*(n='#' colpctn='Col%'*f=ColPCnt.) enrolled='' * (n='#' colpctn='Col%'*f=ColPCnt.)) ;
format cancer Cancer_Fmt. enrolled Enroll_Fmt.;
Run;
output :
Applied neatly to my actual data. May I ask one more question please? Do you know how to control the order of columns appear in the table by any chance?
read up on proc tabulate order=formatted.
class sites cancer enrolled / order=formatted;
OR
class sites ;
class cancer / order=formatted;
class enrolled / order=formatted;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.