Hi,
I am trying to label (as PTSD) and then look at how often the ICD codes 'F43.10' and 'F43.11' are used for seven diagnosis variables (diagnosis_code_1 - diagnosis_code_7). I was able to do this with array statement for all the other codes except these two -- as now when I run proc freq for my newly created PTSD variable, it comes up as empty (all missing) even though I know that the above codes exist in the data. Can you please help to troubleshoot? I cannot figure out why it is working for all the other variables except these? I have bolded the array that does not seem to be working correctly judging by the 0 frequency when I run proc freq of PTSD
data WORK.testunder1;
set work.testunder1;
ARRAY stress_array(7) $ diagnosis_code_1-diagnosis_code_7;
DO i= 1 to 7;
IF substr(stress_array(i),1,5) in ('F4310', 'F4312') then PTSD=1;
end;
ARRAY anxiety_array(7) $ diagnosis_code_1-diagnosis_code_7;
DO i= 1 to 7;
IF substr(anxiety_array(i),1,3) in ('F41', 'F40') or substr(anxiety_array(i),1,5) in ('F4322') or substr(anxiety_array(i),1,4) in ('F064') then anxiety=1;
end;
ARRAY depression_array(7) $ diagnosis_code_1-diagnosis_code_7;
DO i= 1 to 7;
IF substr(depression_array(i),1,3) in ('F32', 'F33') or substr(depression_array(i),1,5) in ('F4321', 'F4323') or substr(depression_array(i),1,4)
in ('F341')
then depression=1;
end;
ARRAY bipolar_array(7) $ diagnosis_code_1-diagnosis_code_7;
DO i= 1 to 7;
IF substr(bipolar_array(i),1,3) in ('F31') then bipolar=1;
end;
run;
Thank you!
You didn't provide any sample data, which would help a lot. Nevertheless, your paragraph of explanation does not match your SAS code.
In your paragraph of explanation, you are looking for code that are text strings that contain the dot character (as well as other characters). In your SAS code, the dot is missing.
Also note, by using the colon operator, you can simplify the code and make it more readable. Here's one section that you have now:
ARRAY depression_array(7) $ diagnosis_code_1-diagnosis_code_7;
DO i= 1 to 7;
IF substr(depression_array(i),1,3) in ('F32', 'F33') or substr(depression_array(i),1,5) in ('F4321', 'F4323') or substr(depression_array(i),1,4)
in ('F341')
then depression=1;
end;
Compare that with:
ARRAY depression_array(7) $ diagnosis_code_1-diagnosis_code_7;
DO i= 1 to 7;
IF depression_array(i) in : ('F32', 'F33', 'F4321', 'F4323', 'F341')
then depression=1;
end;
Note that even this set of comparisons may be failing to find some of the observations. The F4323 and F4321 may need to become F43.23 and F43.21 instead.
I strongly suggest not removing any dots from ICD coding. The dots tell you the difference between (hypothetical) F4.310 and F43.10.
I don't know if the F series have the single digit codes but they do exist in other places and removing the dot can seriously effect.
Another way to look for specific values out of a large number is to use a custom format that assigns a key word like "Valid" "Match" or similar for the ones of interest and "Invalid" "No Match" for the remainder. This is very easy with the OTHER keyword proc format:
Proc format ;
value $icd_pst
'F43.10', 'F43.11' = 'PTSD'
other = 'No PTSD'
;
run;
Then in your code you test using
if put(variable,&icd_pst.) = 'Match' then <do what you want when found>;
This can be extended to search for large numbers of blocks of values if the data is nice, meaning ranges are clearly defined. Since the parts after the . in ICD coding are all digits you can use ranges such as
'F40.0000' - 'F40.9999' = 'Anxiety'
'F41.0000' - 'F41.9999' = 'Anxiety'
There reasons involving character comparison rules not to attempt using 'F40.0000' - 'F41.9999' because 'F403.0' is likely to fall in the range.
Thanks for your help!
I am trying out what you wrote below for when diagnosis_code_1 = F43.10 or F43.11, but still getting some errors, which I presume is do to my poor syntax in the second block(?). Also, how would I alter this code to capture any time the ICD10 codes F43.10 or F43.11 appear for any of the diagnostic variables (diagnosis_code_1-diagnosis_code_7)?
Proc format;
value $icd_pst
'F43.10', 'F43.11' = 'PTSD'
other='No PTSD';
run;
data test;
set WORK.testunder1;
If put(diagnosis_code_1,&icd_pst.) = 'Match' then PTSD=1;
run;
Thank you so much!
@jessho wrote:
Thanks for your help!
I am trying out what you wrote below for when diagnosis_code_1 = F43.10 or F43.11, but still getting some errors, which I presume is do to my poor syntax in the second block(?). Also, how would I alter this code to capture any time the ICD10 codes F43.10 or F43.11 appear for any of the diagnostic variables (diagnosis_code_1-diagnosis_code_7)?
Proc format;
value $icd_pst
'F43.10', 'F43.11' = 'PTSD'
other='No PTSD';
run;
data test;
set WORK.testunder1;
If put(diagnosis_code_1,&icd_pst.) = 'PTSD' then PTSD=1;
run;
Thank you so much!
Wrong character, $ for formats, & for macro variable.
And make sure the value compared is the value used in the Format.
You can even modify the code to be
PTSD = (Put(diagnosis_code_1,$icd_pst.) = 'PTSD');
Which would assign 0 when diagnosis_code_1 is anything except the two values, including blank or missing. SAS logic comparison will return numeric 1 for 'True' and 0 for 'False'.
Thank you! That fixed it for when I am labeling the diagnosis_code_1. If I wanted to modify the code to capture all the PTSD codes listed in all seven diagnosis codes (diagnosis_code_1 - diagnosis_code_7), how would you amend the code? I tried to - versus or statements with the other diagnosis code variables but continue to get errors. Thank you!
Proc format;
value $icd_pst
'F43.10', 'F43.11' = 'PTSD'
other='No PTSD';
run;
data test;
set WORK.testunder1;
If Put(diagnosis_code_1,$icd_pst.) = 'PTSD' then PTSD=1; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.