A quick question
I am tying to join 2 tables using:
The first table is ICD10s:
data ICD10;
input Group:$ L1:$ L2:$ L3:$ ;
datalines;
1 F70 E16 G12
2 F71 E17 .
3 F72 E18 H4
4 F73 E19 .
5 F74 E76 .
run;
The second table has patient names and their ICD10 code
What I am trying to achieve is to get the group number from the table above based on the matched ICD codes
Say that a patient has an ICD code of E17, then I need the combined table to contain the patient's name and group "2" extracted from the table above
If the patient has ICD of G12 then the column should have the group number 1
How could this be achieved please? The problem here is that I dont know beforehand how many rows and columns the ICD table above will contain, so it needs to be a "generic" code...
Apologies if I am not making sense here!
We really have to back up here. How did you know that your INPUT statement should input L3 but shouldn't try to input L4?
If you are certain the ICD10 data set will contain only variables named GROUP, plus L1, L2, L3, ..., Ln, it's an easy change to the existing code.
First, the ARRAY statement becomes:
array icds {*} L:;
Secondly, the DO loop becomes:
do k=1 to dim(icds);
It almost makes sense to me 😉
Please post an example og´f the other table and what you want the result to look like.
Transpose your icd10 dataset, so you can easily join:
data ICD10;
input Group:$ L1:$ L2:$ L3:$ ;
datalines;
1 F70 E16 G12
2 F71 E17 .
3 F72 E18 H4
4 F73 E19 .
5 F74 E76 .
;
run;
proc transpose
data=icd10
out=icd_trans (
drop=_name_
rename=(col1=icd10)
where=(icd10 ne ' ')
)
;
by group;
var l:;
run;
proc sort data=icd_trans;
by icd10;
run;
data want;
merge
icd_trans (in=a)
patients (in=b)
;
by icd10;
run;
(assuming that patients has the required variables and is sorted accordingly)
I would recommend creating a format from your ICD10 data set. That way, you won't need to sort your patient data. In fact, patient data often contains multiple ICD10 codes per observation and wouldn't be suitable for sorting without jumping through major hoops. At any rate:
data reshape;
set icd10 end=done;
retain fmtname '$group';
array icds {3} L1-L3;
label = group;
do k=1 to 3;
if icds{k} > ' ' then do;
start = icds{k};
output;
end;
end;
if done;
label = ' ';
hlo='O';
output;
run;
You do need to know the maximum number of codes that can appear in one group, since that is hard-coded here at 3. That piece can be automated but it takes a little work to do that. This gives you a data set that PROC FORMAT can consume, creating a format named $group:
proc format cntlin=reshape;
run;
All that is left to do is apply that format:
data want;
set have;
group = put(ICD10, $group.);
run;
Thank you @Astounding
I have to say I am very impressed by the solution you suggested using proc format. I like the simplicity yet lateral thinking approach here.
Could you please elaborate on how to automate the maximun number of ICD columns (in the example i gave there were 3 columns and you created and array: array icds {3} L1-L3;) is there a way to expand this code so that it could account for any number of columns in the ICD10 table I had at the beginning of the example?
@ammarhm wrote:
Thank you @Astounding
is there a way to expand this code so that it could account for any number of columns in the ICD10 table I had at the beginning of the example?
See my proc transpose example. It takes care of an arbitrary number of columns, as long as they start with an L. It's always better to work with a long vs. a wide dataset layout.
We really have to back up here. How did you know that your INPUT statement should input L3 but shouldn't try to input L4?
If you are certain the ICD10 data set will contain only variables named GROUP, plus L1, L2, L3, ..., Ln, it's an easy change to the existing code.
First, the ARRAY statement becomes:
array icds {*} L:;
Secondly, the DO loop becomes:
do k=1 to dim(icds);
⏰
Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.
Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.