Dealing with string variabes.

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

Dealing with string variabes.

Dear Everyone,

I would like to discuss with you three items.

1) 'count' vs. 'array'. I appreciate your comments on this topic - 'count' vs. 'array'.

In this example of HIV cohort, I was looking how many comorbidity that subjects had developed with comorbidity after their initial treatment.

I compared results between 'count' and 'array' in which I received the same results. A colleague of mind thought that 'count' may not be a proper method for analyses. In deeded, I admire array as this method is simple.

Please kindly see my codes and SAS outputs below:

**count**

if dxyear ge arv_startyr then HTN_howmany=count(dxcode, "HTN");

if dxyear ge arv_startyr then DYSLIPID_howmany=count(dxcode, "DYSLIPID");

if dxyear ge arv_startyr then KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");

if HTN_howmany=1 or DYSLIPID_howmany=1 or KIDNEY_UN_howmany=1 then comorbid_sta=1;

else comorbid_sta=0;

format comorbid_sta dx_staf.;

**array**

array array_comorbid[3] $20 ('HTN', 'DYSLIPID', 'KIDNEY_UN');

comorbid_number=0;

do I=1 to dim(array_comorbid);

if dxyear ge arv_startyr and dxcode=array_comorbid(I) then comorbid_number=comorbid_number+1;

end;

format comorbid_number dx_staf.;

  

                                 The FREQ Procedure

                                     Cumulative    Cumulative

          comorbid_sta    Frequency     Percent      Frequency      Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

          dx_prior_art        6929       99.91          6929        99.91

          dx_post_art            6        0.09          6935       100.00

             comorbid_               Cumulative    Cumulative

                number    Frequency     Percent    Frequency      Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

          dx_prior_art        6929       99.91          6929        99.91

          dx_post_art            6        0.09          6935       100.00

2) In 'count' method, when I used 'and' instead of 'or', I do not get the results. I do this becuase I wonder a subject may have more than one comorbidity.

if HTN_howmany=1 and DYSLIPID_howmany=1 and KIDNEY_UN_howmany=1 then comorbid_sta=1;

else comorbid_sta=0;

3) Also, I wonder if someone can share with me a simple method replacing 'count' or 'array', and thus I can count over 20 AIDS-defining illness that wrote in a different names with 'string variables', such as 'CANDIDA', 'CMV', 'CRYPTOCO', 'CRYPTSP', ...

Thanks everyone in advance for your time sharing your knowledge and skills in SAS.   

Phan S.


Accepted Solutions
Solution
‎10-10-2012 04:17 PM
PROC Star
Posts: 7,363

Re: Dealing with string variabes.

Just a couple of comments.

Given your small number of subjects/records it won't make a lot of difference, but in both cases you are using a number of redundant calls for dxyear that could be reduced to one statement.  e.g., with your count code, instead of:

if dxyear ge arv_startyr then HTN_howmany=count(dxcode, "HTN");

if dxyear ge arv_startyr then DYSLIPID_howmany=count(dxcode, "DYSLIPID");

if dxyear ge arv_startyr then KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");


you could have used:

if dxyear ge arv_startyr then do;

  HTN_howmany=count(dxcode, "HTN");

  DYSLIPID_howmany=count(dxcode, "DYSLIPID");

  KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");

end;


However, since from your array code you are really interested in only checking whether the entry contains those values, and you're not interested in the individual values, the whole thing could be reduced to something like:


data want;

  set have;

  comorbid_sta=ifn(dxyear ge arv_startyr and dxcode in ('HTN', 'DYSLIPID', 'KIDNEY_UN'),1,0);

run;

View solution in original post


All Replies
Solution
‎10-10-2012 04:17 PM
PROC Star
Posts: 7,363

Re: Dealing with string variabes.

Just a couple of comments.

Given your small number of subjects/records it won't make a lot of difference, but in both cases you are using a number of redundant calls for dxyear that could be reduced to one statement.  e.g., with your count code, instead of:

if dxyear ge arv_startyr then HTN_howmany=count(dxcode, "HTN");

if dxyear ge arv_startyr then DYSLIPID_howmany=count(dxcode, "DYSLIPID");

if dxyear ge arv_startyr then KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");


you could have used:

if dxyear ge arv_startyr then do;

  HTN_howmany=count(dxcode, "HTN");

  DYSLIPID_howmany=count(dxcode, "DYSLIPID");

  KIDNEY_UN_howmany=count(dxcode, "KIDNEY_UN");

end;


However, since from your array code you are really interested in only checking whether the entry contains those values, and you're not interested in the individual values, the whole thing could be reduced to something like:


data want;

  set have;

  comorbid_sta=ifn(dxyear ge arv_startyr and dxcode in ('HTN', 'DYSLIPID', 'KIDNEY_UN'),1,0);

run;

Contributor
Posts: 43

Re: Dealing with string variabes.

Dear Sir,

Thanks for your useful comments. Your codes are perfect! I should save my time and energy if I have used your codes at the beginning. 

At the same time, since you have raised this issue --  'you're not interested in the individual values', I wonder if you can recommend syntax that I can deal with the issue.

Sincerely,

Phan S

PROC Star
Posts: 7,363

Re: Dealing with string variabes.

Not sure what syntax you are referring to.  I was commenting on the fact that you created three how-many fields in the non-array code that you didn't create in the array code.

Do you actually want those fields calculated?

Contributor
Posts: 43

Re: Dealing with string variabes.

Dear Sir,

It is correct. I want to calculate separately numbers of subjects who had 'HTN, DYSLIPID, KIDNEY_UN' and total of comorbidity in the cohort prior and after the initial treatment. 

Thanks,

Phan S.

PROC Star
Posts: 7,363

Re: Dealing with string variabes.

Why not just take care of the separate counts in proc freq?  e.g.,

data want;

  set have;

  comorbid_sta=ifn(dxyear ge arv_startyr and dxcode in ('HTN', 'DYSLIPID', 'KIDNEY_UN'),1,0);

run;

proc freq data=want;

  tables comorbid_sta;

  table comorbid_sta*dxcode;

run;

Contributor
Posts: 43

Re: Dealing with string variabes.

Dear Sir,

Certainly, I did use these procedures. Thanks for your clarification.

Before my closing, I would like to thank you again for sharing your thoughts and supports.

Sincerely,

Phan S.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 240 views
  • 0 likes
  • 2 in conversation