Dear Everyone,
I would like to discuss with you three items.
1) 'count' vs. 'array'. I appreciate your comments on this topic - 'count' vs. 'array'.
In this example of HIV cohort, I was looking how many comorbidity that subjects had developed with comorbidity after their initial treatment.
I compared results between 'count' and 'array' in which I received the same results. A colleague of mind thought that 'count' may not be a proper method for analyses. In deeded, I admire array as this method is simple.
Please kindly see my codes and SAS outputs below:
**count**
if dxyear ge arv_startyr then HTN_howmany=count(dxcode, "HTN");
if dxyear ge arv_startyr then DYSLIPID_howmany=count(dxcode, "DYSLIPID");
if dxyear ge arv_startyr then KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");
if HTN_howmany=1 or DYSLIPID_howmany=1 or KIDNEY_UN_howmany=1 then comorbid_sta=1;
else comorbid_sta=0;
format comorbid_sta dx_staf.;
**array**
array array_comorbid[3] $20 ('HTN', 'DYSLIPID', 'KIDNEY_UN');
comorbid_number=0;
do I=1 to dim(array_comorbid);
if dxyear ge arv_startyr and dxcode=array_comorbid(I) then comorbid_number=comorbid_number+1;
end;
format comorbid_number dx_staf.;
The FREQ Procedure
Cumulative Cumulative
comorbid_sta Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
dx_prior_art 6929 99.91 6929 99.91
dx_post_art 6 0.09 6935 100.00
comorbid_ Cumulative Cumulative
number Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
dx_prior_art 6929 99.91 6929 99.91
dx_post_art 6 0.09 6935 100.00
2) In 'count' method, when I used 'and' instead of 'or', I do not get the results. I do this becuase I wonder a subject may have more than one comorbidity.
if HTN_howmany=1 and DYSLIPID_howmany=1 and KIDNEY_UN_howmany=1 then comorbid_sta=1;
else comorbid_sta=0;
3) Also, I wonder if someone can share with me a simple method replacing 'count' or 'array', and thus I can count over 20 AIDS-defining illness that wrote in a different names with 'string variables', such as 'CANDIDA', 'CMV', 'CRYPTOCO', 'CRYPTSP', ...
Thanks everyone in advance for your time sharing your knowledge and skills in SAS.
Phan S.
Just a couple of comments.
Given your small number of subjects/records it won't make a lot of difference, but in both cases you are using a number of redundant calls for dxyear that could be reduced to one statement. e.g., with your count code, instead of:
if dxyear ge arv_startyr then HTN_howmany=count(dxcode, "HTN");
if dxyear ge arv_startyr then DYSLIPID_howmany=count(dxcode, "DYSLIPID");
if dxyear ge arv_startyr then KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");
you could have used:
if dxyear ge arv_startyr then do;
HTN_howmany=count(dxcode, "HTN");
DYSLIPID_howmany=count(dxcode, "DYSLIPID");
KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");
end;
However, since from your array code you are really interested in only checking whether the entry contains those values, and you're not interested in the individual values, the whole thing could be reduced to something like:
data want;
set have;
comorbid_sta=ifn(dxyear ge arv_startyr and dxcode in ('HTN', 'DYSLIPID', 'KIDNEY_UN'),1,0);
run;
Just a couple of comments.
Given your small number of subjects/records it won't make a lot of difference, but in both cases you are using a number of redundant calls for dxyear that could be reduced to one statement. e.g., with your count code, instead of:
if dxyear ge arv_startyr then HTN_howmany=count(dxcode, "HTN");
if dxyear ge arv_startyr then DYSLIPID_howmany=count(dxcode, "DYSLIPID");
if dxyear ge arv_startyr then KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");
you could have used:
if dxyear ge arv_startyr then do;
HTN_howmany=count(dxcode, "HTN");
DYSLIPID_howmany=count(dxcode, "DYSLIPID");
KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");
end;
However, since from your array code you are really interested in only checking whether the entry contains those values, and you're not interested in the individual values, the whole thing could be reduced to something like:
data want;
set have;
comorbid_sta=ifn(dxyear ge arv_startyr and dxcode in ('HTN', 'DYSLIPID', 'KIDNEY_UN'),1,0);
run;
Dear Sir,
Thanks for your useful comments. Your codes are perfect! I should save my time and energy if I have used your codes at the beginning.
At the same time, since you have raised this issue -- 'you're not interested in the individual values', I wonder if you can recommend syntax that I can deal with the issue.
Sincerely,
Phan S
Not sure what syntax you are referring to. I was commenting on the fact that you created three how-many fields in the non-array code that you didn't create in the array code.
Do you actually want those fields calculated?
Dear Sir,
It is correct. I want to calculate separately numbers of subjects who had 'HTN, DYSLIPID, KIDNEY_UN' and total of comorbidity in the cohort prior and after the initial treatment.
Thanks,
Phan S.
Why not just take care of the separate counts in proc freq? e.g.,
data want;
set have;
comorbid_sta=ifn(dxyear ge arv_startyr and dxcode in ('HTN', 'DYSLIPID', 'KIDNEY_UN'),1,0);
run;
proc freq data=want;
tables comorbid_sta;
table comorbid_sta*dxcode;
run;
Dear Sir,
Certainly, I did use these procedures. Thanks for your clarification.
Before my closing, I would like to thank you again for sharing your thoughts and supports.
Sincerely,
Phan S.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.