I'm am trying to categorize multiple diagnostic fields (diag1-diag12) across thousands of individuals (id, by row) according to 3 categories (A=always, S=Sometimes, N=Never). The final dataset should have a diagnostic category (dcat1-12) corresponding to each diagnostic field (diag1-12).
I've tried to adapt test scoring code found in "Get it together: Combining data with SAS MERGE, UPDATE, and SET" by Mel Widawski with some progress, but not the end result that I'm needing.
Here's an abbreviated example of the data I'm working with:
Have 1: Main Data
data have1 (label='Main');
input id:1. diag1: $7. diag2:$7. diag3:$7. diag4:$7.;
datalines;
1 S61131S S320 S32000B S32000B
2 S320 S320 S320 S320
3 S32000B S9702 S320 S61131S
4 S9702 . . .
;
run;
Have 2: Scoring Data
data have2 (label='scoring');
input icd:$7. category:$1.;
datalines;
S61131S S
S320 N
S32000B A
S9702 N
;
run;
Want: Scored Data
data want (label='scored');
input id:1. diag1: $7. diag2:$7. diag3:$7. diag4:$7. dcat1:$1. dcat2:$1. dcat3:$1. dcat4:$1.;
datalines;
1 S61131S S320 S32000B S32000B S N A A
2 S320 S320 S320 S320 N N N N
3 S32000B S9702 S320 S61131S A N N S
4 S9702 . . . N . . .
;
run;
I'd appreciate any guidance!
1. Create a format from Have 2
data diag_fmt;
set have2;
fmtname = 'diag_fmt';
type='C';
start = icd;
label=category;
run;
proc format cntlin=diag_fmt;
run;
2. Apply the format to Have 1.
data want;
set have1;
array diag(*) diag1-diag4;
array dcat(*) $1. dcat1-dcat4;
do i=1 to dim(diag);
dcat(i) = put(diag1, $diag_fmt.);
end;
run;
I did not test this code, but it should work with some minor fixes necessary if the log generates any errors.
1. Create a format from Have 2
data diag_fmt;
set have2;
fmtname = 'diag_fmt';
type='C';
start = icd;
label=category;
run;
proc format cntlin=diag_fmt;
run;
2. Apply the format to Have 1.
data want;
set have1;
array diag(*) diag1-diag4;
array dcat(*) $1. dcat1-dcat4;
do i=1 to dim(diag);
dcat(i) = put(diag1, $diag_fmt.);
end;
run;
I did not test this code, but it should work with some minor fixes necessary if the log generates any errors.
Some references to understand the code:
Formats:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/001-30.pdf
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
You have2 data set is very close to what would be needed to create a custom format assuming none of the ICD values are repeated.
Example of modifying the data set as needed. Variable names expected by proc format must be used.
data have2 ; input start :$7. label :$1.; fmtname='Dcat'; type='C'; datalines; S61131S S S320 N S32000B A S9702 N ; proc format cntlin=have2; run;
Then you can use put(icdcodevalue, dcat.) to create the code.
Arrays make this easier for doing the same thing to multiple variables.
data want (label='Main'); set have1; array dcat (4) $ 1; array diag diag1-diag4; do i=1 to dim(diag); dcat[i]= put(diag[i],dcat.); end; drop i; run;
Note use of the code box opened with the forum's </> icon. The message windows will reformat text and may make code pasted into the windows not run because of changed white space or sometimes inserted html tags and such.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.