Hi everyone
I need to identify the patient of type2 Diabetes from claims medical condition file
I am using the following code but it gave me unexpected numbers.
Please let me know am I using All icd 9 and 10 codes of diabetes in the code correctly?
DATA Array;
SET MED;
ARRAY cc (*) MEDICAL_PRIMARY_DIAGNOSIS_CODE MEDICAL_DIAGNOSIS_CODE_2-MEDICAL_DIAGNOSIS_CODE_9;
DIABETESc =0;
DO i=1 TO 10;
IF cc (i) in ('E11.9' 'E11.8' 'E11.9' 'E11.69' 'E11.51' 'E11.21' 'E11.620''E11.621') or ('E11.00' <= cc(i) <= 'E11.65') then
DIABETESc=1;
If DIABETESc=. then
DIABETESc=0;
END;
RUN;
PROC SORT DATA= array2;
BY Patient_ID DESCENDING DIABETESc;
RUN;
DATA ARRAY3;
SET array2;
BY Patient_ID;
IF FIRST.Patient_ID;
RUN;
/*REMOVE THE MISSING VALUE FROM THE DATA*/
DATA ARRAY4;
SET ARRAY3;
IF DIABETESc=. THEN DELETE;
RUN;
/*COUNT ONLY diabetes PATIENTS*/
DATA ARRAY5;
SET ARRAY4;
IF DIABETESc=1;
RUN;
/*NUMBERS OF DIABETES PATIENTS*/
PROC FREQ DATA= ARRAY5;
TABLES DIABETESc;
RUN;
You need to give actual example of "unexpected numbers". Which likely means examples of the CC array variable values that return "unexpected numbers" and the "expected numbers" for the examples provided.
Do you get too many or too few?
You may have a typo as you duplicate a code E11.9 in this line:
IF cc (i) in ('E11.9' 'E11.8' 'E11.9' 'E11.69' 'E11.51' 'E11.21' 'E11.620''E11.621') or ('E11.00' <= cc(i) <= 'E11.65') then
Less than or greater than often do not work as expected with character values. For example, your 'E11.620' and 'E11.621' are in range between E11.00 and E11.65. So if there are other codes such as E11.623 that you do not want they are included.
You may want to run this little bit code, look at the log and see if that points towards some possible codes in your data that you did not want to include.
data example;
input cc $;
if ('E11.00' <= cc <= 'E11.65') then put CC= " is in the range";
datalines;
E11.00
E11.001
E11.009
E11.0003
E11.620
E11.621
E11.622
E11.623
;
You code does not capture "E11" or "E11.0" for starters. If you have any of those codes that would be one reason to be "low".
Comparison of character variables is done left to right. So when E11.0 is compared to E11.00 it is identical up to the first 0. But since E11.0 does not have the second 0 it is not "equal" to E11.00. And since the number of characters in the shorter version is exhausted then the E11.0 is considered "less than" E11.00. Same thing happens for E11 only quicker.
Something else to consider would be to see if any of your variables have spaces at the beginning of the characters. " E11.00" is not going to be equal to "E11.00" and when you compare a space to the E it comes out as less as well.
Something else to examine in your data would be if the first character is upper or lower case. "e11" is not going to match your criteria, as would any values with more than 2 digits without a decimal.
When debugging a program, all good programmers always ... ALWAYS ... examine the log. Since we don't have your data, and we don't have your log, here is a step you can take.
Find some observations that you expect should have DIABETESc=1, but actually have DIABETESc=0. You won't need many, probably 2 or 3 would be enough. If the answer doesn't jump out at you from that exercise, then post those observations here (removing any identifiers such as the patient ID.
Note that the program you posted won't run. The array has 9 elements in it, but you are looping i=1 to 10. So I'm not sure what additional differences there are between what you posted and what you ran.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.