BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Banke
Pyrite | Level 9
data thcics.question8;
    input $ admitting_diag princ_diag_code oth_diag_code1  oth_diag_code2 ;
datalines;

v350       25001                  4567
25001     v4001                  v4001
v3560     5670      418
2502       v4000
v4000     25032
e560       4352     v3201
518         5670
25029     A1054                6530 
;
run;
data thcics.question8;
set thcics.thcicsamp;
ID = _N_;
DIABETES = 0;
CHF = 0;
keep ID DIABETES CHF discharge thcic_id length_of_stay provider_name Admitting_Diag  Princ_Diag_Code  Oth_Diag_Code2-Oth_Diag_Code10;
array A (*)$5 Admitting_Diag  Princ_Diag_Code Oth_Diag_Code2-Oth_Diag_Code10;
do i = 1 to dim(A);
diagnosis = A(i);
IF diagnosis = '39891' OR diagnosis = '40201' OR diagnosis = '40211' OR                                                                            
    diagnosis = '40291' OR diagnosis = '40401' OR diagnosis = '40403' OR                                                                            
    diagnosis = '40411' OR diagnosis = '40413' OR diagnosis = '40491' OR                                                                            
    diagnosis = '40493' OR SUBSTR(diagnosis,1,4)='4251' OR                                                                                     
    SUBSTR(diagnosis,1,4)='4254' OR SUBSTR(diagnosis,1,4)='4255' OR                                                                            
    SUBSTR(diagnosis,1,4)='4259' OR SUBSTR(diagnosis,1,4)='4280' OR                                                                            
    SUBSTR(diagnosis,1,4)='4281' OR SUBSTR(diagnosis,1,4)='4289'                                                                               
  then CHF=1; 
else CHF = 0;
 label chf = 'chf';
IF substr (diagnosis,1,3) IN: ('250') then diabetes = 1;
else diabetes = 0;
label diabetes = 'diabetes';
          
end;
end;
run;

 Hello everyone,

please I have a challenge, I want to code some observations in 10 of my variables ( diagnosis variables containing disease types in codes) to chronic heart failure (chf) and diabetes and then get the number of patients with chf and diabetes. my log shows no error but i oberved that all my chf and diabetes variable was 0 even if the variables had the code for them. my variables from oth_diag_code1-10 was also blank. I will appreciate if you can please help me fish out what is wrong with my syntax. Thanks a lot, as always

ID THCIC_ID admitting_diag princ_diag_code oth_diag_code_1 oth_diag_code_2 oth_diag_code_3 oth_diag_code_10 diabetes chf
1 1001 v0291 25010       25021 1 1
2 1001 40291 40291 39891   4255   0 1
3 1002 39891 40493   25053     1 1
4 1003 25001 25002         1 1

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

i oberved that all my chf and diabetes variable was 0 even if the variables had the code for them. my variables from oth_diag_code1-10 was also blank. I will appreciate if you can please help me fish out what is wrong with my syntax.

 

This is because the last step in the IF-THEN-ELSE construct else chf=0 and else diabetes=0 the way you wrote it sets variables CHF and DIABETES to zero. By removing those lines, you should get what you want. Also please note the proper indentation of your code makes everything easier to understand and debug and to see if you have the proper matching DO with END (which you don't have)

 

data thcics.question8;
    set thcics.thcicsamp;
    ID = _N_;
    DIABETES = 0;
    CHF = 0;
    keep ID DIABETES CHF discharge thcic_id length_of_stay 
        provider_name Admitting_Diag Princ_Diag_Code Oth_Diag_Code2-Oth_Diag_Code10;
    array A (*)$5 Admitting_Diag  Princ_Diag_Code Oth_Diag_Code2-Oth_Diag_Code10;
    do i = 1 to dim(A);
        IF a(i)= '39891' OR a(i)= '40201' OR a(i)= '40211' OR                                                                            
            a(i)= '40291' OR a(i)= '40401' OR a(i)= '40403' OR                                                                            
            a(i)= '40411' OR a(i)= '40413' OR a(i)= '40491' OR                                                                            
            a(i)= '40493' OR substr(a(i),1,4)='4251' OR                                                                                     
            SUBSTR(a(i),1,4)='4254' OR SUBSTR(a(i),1,4)='4255' OR                                                                            
            SUBSTR(a(i),1,4)='4259' OR SUBSTR(a(i),1,4)='4280' OR                                                                            
            SUBSTR(a(i),1,4)='4281' OR SUBSTR(a(i),1,4)='4289'                                                                               
            then CHF=1; 
        IF substr (a(i),1,3) IN: ('250') then diabetes = 1;      
    end;
end;  /* THERE IS NO DO FOR THIS END, THIS SHOULD CAUSE AN ERROR */
run;

I also question the purpose and logic behind these statements:

 

label diabetes = 'diabetes';
label chf='chf';

Usually (in fact ALWAYS), labels are meant to make the output readable. There's really no reason to assign a label of 'chf' to chf, this enhances nothing. You might want to use proper capitalization in your label, that would make things more readable. Or if CHF is an acronym (I don't really know what it means, it may be that in your industry everyone knows what CHF is and so no need to spell out the words of the acronym, but I will give an example anyway)

 

label diabetes='Diabetes'; /* Note proper capitalization */
label chf='Chicago Hostile Ferengi'; /* OPTIONAL: Spell out the acronym */

 

--
Paige Miller

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

i oberved that all my chf and diabetes variable was 0 even if the variables had the code for them. my variables from oth_diag_code1-10 was also blank. I will appreciate if you can please help me fish out what is wrong with my syntax.

 

This is because the last step in the IF-THEN-ELSE construct else chf=0 and else diabetes=0 the way you wrote it sets variables CHF and DIABETES to zero. By removing those lines, you should get what you want. Also please note the proper indentation of your code makes everything easier to understand and debug and to see if you have the proper matching DO with END (which you don't have)

 

data thcics.question8;
    set thcics.thcicsamp;
    ID = _N_;
    DIABETES = 0;
    CHF = 0;
    keep ID DIABETES CHF discharge thcic_id length_of_stay 
        provider_name Admitting_Diag Princ_Diag_Code Oth_Diag_Code2-Oth_Diag_Code10;
    array A (*)$5 Admitting_Diag  Princ_Diag_Code Oth_Diag_Code2-Oth_Diag_Code10;
    do i = 1 to dim(A);
        IF a(i)= '39891' OR a(i)= '40201' OR a(i)= '40211' OR                                                                            
            a(i)= '40291' OR a(i)= '40401' OR a(i)= '40403' OR                                                                            
            a(i)= '40411' OR a(i)= '40413' OR a(i)= '40491' OR                                                                            
            a(i)= '40493' OR substr(a(i),1,4)='4251' OR                                                                                     
            SUBSTR(a(i),1,4)='4254' OR SUBSTR(a(i),1,4)='4255' OR                                                                            
            SUBSTR(a(i),1,4)='4259' OR SUBSTR(a(i),1,4)='4280' OR                                                                            
            SUBSTR(a(i),1,4)='4281' OR SUBSTR(a(i),1,4)='4289'                                                                               
            then CHF=1; 
        IF substr (a(i),1,3) IN: ('250') then diabetes = 1;      
    end;
end;  /* THERE IS NO DO FOR THIS END, THIS SHOULD CAUSE AN ERROR */
run;

I also question the purpose and logic behind these statements:

 

label diabetes = 'diabetes';
label chf='chf';

Usually (in fact ALWAYS), labels are meant to make the output readable. There's really no reason to assign a label of 'chf' to chf, this enhances nothing. You might want to use proper capitalization in your label, that would make things more readable. Or if CHF is an acronym (I don't really know what it means, it may be that in your industry everyone knows what CHF is and so no need to spell out the words of the acronym, but I will give an example anyway)

 

label diabetes='Diabetes'; /* Note proper capitalization */
label chf='Chicago Hostile Ferengi'; /* OPTIONAL: Spell out the acronym */

 

--
Paige Miller
Tom
Super User Tom
Super User

@PaigeMiller Be glad that you have not had reason to learn about (or worse experience) congestive heart failure.

Banke
Pyrite | Level 9

Thank you so much @PaigeMiller and @Tom,  I understand it better now. I had even intialized a value of zero to the variables. Thank you for the tips on labelling and identation. The second "run;" is an error,  sorry about that. 

Banke
Pyrite | Level 9

Hi @PaigeMiller and everyone, please, is it possible to use array to get the top most frequent diagnosis (i.e the top 5 frequent observations in my array elements)? thanks 

PaigeMiller
Diamond | Level 26
  1. Use PROC TRANSPOSE to place all of the diagnosis codes into a single column.
  2. Use PROC FREQ on this new single column to obtain the counts for each diagnosis code.
  3. Sort the counts so the top 5 are available.
--
Paige Miller
Banke
Pyrite | Level 9

brilliant! thanks a lot. It worked

data thcics.question4;
    set thcics.thcicsamp;
         ID = _N_;
   keep  ID thcic_id length_of_stay provider_name Admitting_Diag Princ_Diag_Code Oth_Diag_Code_1-Oth_Diag_Code_8;
run;

proc transpose data = thcics.question4 out = thcics.question4a name = alldiag;
by id;
var Admitting_Diag Princ_Diag_Code Oth_Diag_Code_1-Oth_Diag_Code_8;
run;

proc freq data = thcics.question4a;
tables col1 / out = thcics.freq4a;
run;

proc sort data = thcics.freq4a;
by descending count;
run;
proc print data = thcics.freq4b (firstobs=2 obs=6);
run;
Tom
Super User Tom
Super User

The way you structured your loop is only testing the last element in the array.

So this code:

x=0;
do index=1 to dim(array);
   if array[index]='xxx' then x=1;
   else x=0;
end;

Is the same as:

if array[dim(array)]='xxx' then x=1;
else x=0;

So remove the ELSE statement(s).

  do i = 1 to dim(A);
    if a[i] in: ('39891' '40201' '40211' '40291' '40401' '40403' 
                '40411' '40413' '40491' '40493' 
                '4251' '4254' '4255' '4259' '4280' '4281' '4289'
                )  
       then CHF=1
    ; 
    if a[i] in: ('250') then diabetes = 1;
  end;
Banke
Pyrite | Level 9
thanks so much
ballardw
Super User

For what its worth, you can replace this code:

IF a(i)= '39891' OR a(i)= '40201' OR a(i)= '40211' OR                                                                            
   a(i)= '40291' OR a(i)= '40401' OR a(i)= '40403' OR                                                                            
   a(i)= '40411' OR a(i)= '40413' OR a(i)= '40491' OR                                                                            
   a(i)= '40493' OR substr(a(i),1,4)='4251' OR                                                                                     
   SUBSTR(a(i),1,4)='4254' OR SUBSTR(a(i),1,4)='4255' OR                                                                            
   SUBSTR(a(i),1,4)='4259' OR SUBSTR(a(i),1,4)='4280' OR                                                                            
   SUBSTR(a(i),1,4)='4281' OR SUBSTR(a(i),1,4)='4289'                                                                               
   then CHF=1; 

With:

if a(i) in ('39891' '40201' '40211' '40291' '40401' '40403' '40411' '40413' '40491' '40493')
   or substr(a(i),1,4) in ('4251' '4254' '4255' '4259' '4280' '4281' '4289')
   then CHF=1;

The IN operator will compare a variable or single expression to the list of values in the () and return a true/false result if the value matches any of them.

May save some time typing code.

 

Banke
Pyrite | Level 9
Thank you so much, definitely saves more time and easier to read

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1827 views
  • 4 likes
  • 4 in conversation