I know how to label each diagnosis_code variable at a time. Is there a quick way to do this in one step for all my diagnosis code variables (diagnosis_code1, diagnosis_code2, etc).
In other words, if I wanted to modify the code below to capture all the PTSD diagnosis codes for any of the seven diagnosis code variables (diagnosis_code_1 - diagnosis_code_7), how would you amend the code? I tried to - versus or statements with the other diagnosis code variables but continue to get errors. Thank you!
Proc format;
value $icd_pst
'F43.10', 'F43.11' = 'PTSD'
other='No PTSD';
run;
data test;
set WORK.testunder1;
If Put(diagnosis_code_1,$icd_pst.) = 'PTSD' then PTSD=1; run;
could you try
data test;
set WORK.testunder1;
If strip(Put(diagnosis_code_1,$icd_pst.)) = 'PTSD' then PTSD=1;
run;
Use arrays to process many variables in the same way:
data test;
set WORK.testunder1;
array diags {8} diagnosis_code_1 - diagnosis_code_8;
do k = 1 to 8 until (ptsd=1);
If Put(diags{k}, icd_pst.) = 'PTSD' then PTSD=1;
end;
run;
If you want a variable that indicates if at least one of a group of variables has a value here is one way:
Proc format library=work; value $icd_pst 'F43.10', 'F43.11' = 'PTSD' other='No PTSD'; run; data example; infile datalines truncover; informat d1 - d7 $8.; input d1 -d7; array dx d1-d7; array temp{7} $ 10 _temporary_ ; call missing(of temp(*)); do i= 1 to dim(dx); temp[i]= put(dx[i],$icd_pst.); end; PSTD = ( whichc('PTSD', of temp[*])>0 ); drop i; datalines; F43.2 F42.1 F15.4 F43.2 F42.1 F15.4 F43.10 F41.4 F43.2 F43.11 ; run;
The array temp defines variables TEMP1 to TEMP7 to hold the formatted value of diagnosis_code (I'm too lazy to use that long of a variable for example so just used D1 to D7).
The call missing sets the array to blank values. Otherwise _temporary_ arrays can hold values across records. Then populates with the values.
The function WHICHC searches for the value of the first parameter, in this case the literal 'PTSD' in the following variables. The "of temp[*] " indicates all of the elements of the Array temp are to be used in the search. The function returns which variable in order a match is found (often useful) . In this case just comparing to see if the result is > 0 , i.e. at least one match was found, is used to set the PTSD flag to 1 when found or 0 otherwise.
To search for the other codes from you other post you would repeat this block of code
call missing(of temp(*)); do i= 1 to dim(dx); temp[i]= put(dx[i],$icd_pst.); end; PSTD = ( whichc('PTSD', of temp[*])>0 ); drop i;
for the other code replacing the 1) format $icd_pst, 2) the variable PSTD to the other flag, and 3) the value 'PTSD' with the other formatted value.
If you have a largish number of these codes to search for you could build separate arrays to handle the 1,2, and 3 elements above and wrap the repeated code in a do loop that uses the size of the arrays holding those three things and replace them with array references.
An additional change would be to change the put(dx[I],&icd_pst.) to PUTC(dx[I],formatarray[j] ); PUTC will allow having a variable to hold the name of a format but a simple PUT requires the literal text of the variable.
You could use temporary arrays to hold the format name and the search strings but you the array with the flag variable names wouldn't. This is left as an exercise for the interested reader.
Awesome -- thank you.
When I run the code above, I get this output:
diagnosis_code1 diagnosis_code2 diagnosis_code3 diagnosis_code4 diagnosis_code5 diagnosis_code6 diagnosis_code7 PSTD
F43.10 F43.12 F43.13 F43.9 F43.0 F43.8 1
However, I cannot seem to count the number of times the diagnosis of PTSD was made during each visit (PTSD response for variable diagnosis_code1, diagnosis_code2, etc) for each patient in the dataset from this. Proc freq does not recognize PTSD? How would I do that?
Also, just to get clarity on exactly what $icd_pst pulls up so that I understand the logic, could you explain further?
Thank you!
@jessho wrote:
Awesome -- thank you.
When I run the code above, I get this output:
diagnosis_code1 diagnosis_code2 diagnosis_code3 diagnosis_code4 diagnosis_code5 diagnosis_code6 diagnosis_code7 PSTD
F43.10 F43.12 F43.13 F43.9 F43.0 F43.8 1
However, I cannot seem to count the number of times the diagnosis of PTSD was made during each visit (PTSD response for variable diagnosis_code1, diagnosis_code2, etc) for each patient in the dataset from this. Proc freq does not recognize PTSD? How would I do that?
Also, just to get clarity on exactly what $icd_pst pulls up so that I understand the logic, could you explain further?
Thank you!
If you are using this code, it only sets the value to 1, so no "count"
data test; set WORK.testunder1; array diags {8} diagnosis_code_1 - diagnosis_code_8; do k = 1 to 8 until (ptsd=1); If Put(diags{k}, icd_pst.) = 'PTSD' then PTSD=1; end;run;
Likely not the best but
data test; set WORK.testunder1; array diags {8} diagnosis_code_1 - diagnosis_code_8; do k = 1 to 8 until (ptsd=1); Ptsd= sum(ptsd,Put(diags{k}, icd_pst.) = 'PTSD'); end;run;
Personally I would transpose the data to a long format with a diagnosis id and separate value and use a single format to count all the diagnosis code groups at one time similar to:
proc format library=work; value $example 'A','B'='Group1' 'C','D','E' = 'Group 2' other = 'Everything else' ; run; data example; input pid vid did dv $; label pid='Patient' vid='Visit number' dv= 'Diagnosis group' ; datalines; 1 1 1 A 1 1 2 B 1 1 3 F 1 2 1 F 1 2 2 F 1 2 3 C 1 2 4 B 2 1 1 A 2 1 2 B 2 2 3 F 2 2 1 F 2 3 2 F 2 3 3 C 2 3 4 B ; run; proc tabulate data=example; class pid vid dv; format dv $example.; table pid*vid*dv, n='Count' ; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.