BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Banke
Pyrite | Level 9

Hello everyone, 

 

I ran the attached code and although it worked, the distribution of my DM  = 0  AND DM  =1 (6,807,243 AND 629,015) is different from what my colleagues are getting and our codes are very similar. I have attached a picture of one of my colleagues. I don't

DATA WORK.MED;
SET HUMANA.Deid_2014_2016_v1_MED;
KEEP PATIENT_ID ADMISSION_DIAGNOSIS_CODE MEDICAL_DIAGNOSIS_CODE_2 MEDICAL_DIAGNOSIS_CODE_3
MEDICAL_DIAGNOSIS_CODE_4 MEDICAL_DIAGNOSIS_CODE_5 MEDICAL_DIAGNOSIS_CODE_6 
MEDICAL_DIAGNOSIS_CODE_7 MEDICAL_DIAGNOSIS_CODE_8 MEDICAL_DIAGNOSIS_CODE_9
MEDICAL_PRIMARY_DIAGNOSIS_CODE;
RUN;

/***B. IDENTIFYING PATIENTS WITH TYPE 2 DIABETES MELLITUS BY SUBSTRACTING THOSE WITH TYPE 1 DIABTES FROM THOSE WI
 SUBSTRING FUNCTION IS USED TO EXTRACT THE FIRST THREE OBSERVATIONS OF "250" 
AND THE LAST 2 DIGITS CORESSPONDING TO THE DIAGNOSIS FOR ICD9 TYPE I DIABETES,
FOR ICD10, E11 IS THE FIRST 3 DIGITS FOR ALL T2D DIAGNOSIS****/
/**INITIATE DIAGOSIS VARIABLE “T2D” AT 0, PATIENTS WITH T2D WILL HAVE A SCORE OF 1**/


DATA DIABETES; /**804,378,068**/
SET MED; 
*T2D = "0"; /**a new variable for type II diabetes**/
ARRAY DIAG (*) $ ADMISSION_DIAGNOSIS_CODE MEDICAL_PRIMARY_DIAGNOSIS_CODE 
MEDICAL_DIAGNOSIS_CODE_2 - MEDICAL_DIAGNOSIS_CODE_9;
DM = "0";
DO I = 1 TO DIM (DIAG);
IF SUBSTR (DIAG(I),1,3) IN:('250') AND SUBSTR(DIAG(I),4,2) 
NOT IN: ('03', '11', '13', '21', '23', '31', '33', '41', '43', '51', '53', '61', '63', 
'71', '73', '81', '83', '91', '93') THEN DM = "1";
IF SUBSTR (DIAG(I),1,3) IN: ('E11')THEN DM = "1";  
IF DM = "" THEN DM = "0";
LABEL DM =' Type II Diabetes ';   
END;
RUN;
/**C. CREATING A DATASET FOR DISTINCT OBSERVATIONS WITH DM AND NO DM **/
PROC SORT DATA = DIABETES /***7,436,258***/
OUT = T2D; 
BY PATIENT_ID;
RUN;

/**CHECK FOR UNIQUE PATIENTS WITH T2D**/
PROC FREQ DATA = T2D;
TABLES  DM  /**DM, 1 = 629,015 ; 0 = 6,807,243**/
RUN; 

CODE 1.jpgCODE 2.jpg know where I'm missing it. I will appreciate your help please. I cant share the file because it is copyright

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

You don't need to colon modifier because you the length of your substring and the length of the strings in the value list are the same. The colon modifier won't hurt in your case but it's just not necessary (you don't need a contains that would also match sub-strings).

 

It's a bit guessing on my side but if I understand right then you want DM set to one if for any of the variable in your array the expression is TRUE. 

If that's the case then leave the loop as soon as you you're expression becomes TRUE. 

Patrick_0-1663454854142.png

 

In the code you've posted consider the case where in the very last iteration of the loop below condition becomes TRUE.

 

 

IF DM = "" THEN DM = "0";

 

 

Here you set DM to 0 no matter if in any of the earlier iterations of the loops DM had been set to 1.

 

In the code I've posted above DM gets initialized to 0 before the loop starts and within the loop only ever gets set to 1. Because of this you wouldn't have to leave the loop once DM becomes 1 as it will never get reset to 0. But it's not necessary to continue looping once DM is 1 and that's why it's better for performance to just stop looping.

 

And here the code I've posted above as screen shot

/**INITIATE DIAGOSIS VARIABLE “T2D” AT 0, PATIENTS WITH T2D WILL HAVE A SCORE OF 1**/
DATA DIABETES; /**804,378,068**/
  SET MED;

  *T2D = "0"; /**a new variable for type II diabetes**/
  ARRAY DIAG (*) $  ADMISSION_DIAGNOSIS_CODE MEDICAL_PRIMARY_DIAGNOSIS_CODE 
                    MEDICAL_DIAGNOSIS_CODE_2 - MEDICAL_DIAGNOSIS_CODE_9;
  DM = "0";

  DO I = 1 TO DIM (DIAG);
    IF SUBSTR(DIAG(I),1,3) IN ('250','E11') 
       OR 
       SUBSTR(DIAG(I),4,2) NOT IN 
        ('03', '11', '13', '21', '23', '31', '33', '41', '43', '51', '53', '61', '63','71', '73', '81', '83', '91', '93') 
      THEN 
      do;
        DM = "1";
        leave;
      end;
  END;
  LABEL DM =' Type II Diabetes ';

RUN;

 

And last but not least: In your code with the two IFs without an ELSE DM gets set to 1 if either of the conditions becomes True. I've converted this to single IF statement with an OR operator. If both conditions need to be True then replace the OR with an AND.

Patrick_1-1663454998445.png

 

View solution in original post

4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello,

 

The colon ( : ) for truncated comparison can be very powerful

, but are you sure you want to use it?

 

PharmaSUG 2012 - Paper TA05
Using SAS Colon Effectively
Sudhir Singh, EMD Serono, Rockland, MA

https://www.pharmasug.org/proceedings/2012/TA/PharmaSUG-2012-TA05.pdf

 

I think you want to drop it!

 

Cheers,

Koen

Banke
Pyrite | Level 9

Thank you Koen,

I did not use the semicolon initially, I used it because one of my colleagues who got a similar number used it. I got the same number whether I used or not

Banke
Pyrite | Level 9
I MEANT COLON(:)
Patrick
Opal | Level 21

You don't need to colon modifier because you the length of your substring and the length of the strings in the value list are the same. The colon modifier won't hurt in your case but it's just not necessary (you don't need a contains that would also match sub-strings).

 

It's a bit guessing on my side but if I understand right then you want DM set to one if for any of the variable in your array the expression is TRUE. 

If that's the case then leave the loop as soon as you you're expression becomes TRUE. 

Patrick_0-1663454854142.png

 

In the code you've posted consider the case where in the very last iteration of the loop below condition becomes TRUE.

 

 

IF DM = "" THEN DM = "0";

 

 

Here you set DM to 0 no matter if in any of the earlier iterations of the loops DM had been set to 1.

 

In the code I've posted above DM gets initialized to 0 before the loop starts and within the loop only ever gets set to 1. Because of this you wouldn't have to leave the loop once DM becomes 1 as it will never get reset to 0. But it's not necessary to continue looping once DM is 1 and that's why it's better for performance to just stop looping.

 

And here the code I've posted above as screen shot

/**INITIATE DIAGOSIS VARIABLE “T2D” AT 0, PATIENTS WITH T2D WILL HAVE A SCORE OF 1**/
DATA DIABETES; /**804,378,068**/
  SET MED;

  *T2D = "0"; /**a new variable for type II diabetes**/
  ARRAY DIAG (*) $  ADMISSION_DIAGNOSIS_CODE MEDICAL_PRIMARY_DIAGNOSIS_CODE 
                    MEDICAL_DIAGNOSIS_CODE_2 - MEDICAL_DIAGNOSIS_CODE_9;
  DM = "0";

  DO I = 1 TO DIM (DIAG);
    IF SUBSTR(DIAG(I),1,3) IN ('250','E11') 
       OR 
       SUBSTR(DIAG(I),4,2) NOT IN 
        ('03', '11', '13', '21', '23', '31', '33', '41', '43', '51', '53', '61', '63','71', '73', '81', '83', '91', '93') 
      THEN 
      do;
        DM = "1";
        leave;
      end;
  END;
  LABEL DM =' Type II Diabetes ';

RUN;

 

And last but not least: In your code with the two IFs without an ELSE DM gets set to 1 if either of the conditions becomes True. I've converted this to single IF statement with an OR operator. If both conditions need to be True then replace the OR with an AND.

Patrick_1-1663454998445.png

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 429 views
  • 0 likes
  • 3 in conversation