BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eap
Obsidian | Level 7 eap
Obsidian | Level 7

I have a large dataset of patients with multiple diagnosis codes each.  I want to create a new variable 'Final_Dx' that contains the most relevant diagnosis based on an ordered preference. Thank you!


Hierarchical Dx Order
1) Cancer
2) MI
3) Diabetes
4) Obesity
5) COPD

 

CaseID

Dx1

Dx2

Dx3

FINALDX(Desired)

123

Cancer

Obesity

 

Cancer

456

COPD

Obesity

 

Obesity

789

Diabetes

MI

COPD

MI

112

Other

COPD

 

COPD

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
proc format;
invalue fmt(upcase)
CANCER=1
MI=2
DIABETES=3
OBESITY=4
COPD=5
;
run;
data have;
input caseid Dx1 $ Dx2 $ Dx3 $ ;
cards;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;
run;

data want;
 set have;
 length FINALDX $ 40;
 min=999999;
 array x{*} Dx:;
 do i=1 to dim(x);
  n=input(x{i},?? fmt.);
  if not missing(n) and n<min then do;min=n;FINALDX=x{i};end;
 end;
 drop i n min;
run;

View solution in original post

8 REPLIES 8
ballardw
Super User

Please provide example data in the form of data step code. The stuff you posted doesn't even like to paste into an editor to right a data step.

 

Untested because of no data:

data want; 
set have;
array dx(*) dx1-dx3;
array f (5) $ 20 _temporary_ ('COPD','Obesity','Diabetes','MI','Cancer');
if cmiss(of dx(*)) < dim(dx) then do;
pos = max(whichc(dx1,of f(*)),whichc(dx2,of f(*)),whichc(dx3,of f(*)));
if pos then Finaldiagnosis = f[pos];
end;
drop pos;
run;

The WhichC, and numeric equivalent WhichN, return the number position in a list of values. The array F has the values with the most important (assuming 1 meant such) as the last position because the Whichc function returns a 0 if not found and using a Min function to attempt to return 1 wouldn't work with a lot of extra coding. The MAX function however has no problem so returns the largest value when found and if not zero can be used as the index into the temporary array holding the list desired diagnosis values. The CMISS is so the searchis only even attempted if at least one DX value is found.

mkeintz
PROC Star

In the absence of a working data step for sample data, here is untested code. 

 

data want;
  set have; 
  array _preferences {5} $8 _temporary_ ("Cancer","MI","Diabetes","Obesity","COPD");
  do i=1 to 5 until ( whichc(_preferences{i}, of dx:));
  end;
  if i<6 then finaldx=_preferences{i};
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
eap
Obsidian | Level 7 eap
Obsidian | Level 7

I figured it out. Here is the test data and syntax. 

data have;
input caseid Dx1 $ Dx2 $ Dx3 $ ;
cards;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;
run;

data want;
set have;
array dx (3) $20 Dx1 - Dx3;
do i = 1 to dim(dx);
if dx(i)= 'Cancer' then Cancer=1;
else if Cancer=. then Cancer=0;

if dx(i)= 'Obesity' then Obesity=1;
else if Obesity=. then Obesity=0;

if dx(i) = 'COPD' then COPD=1;
else if COPD=. then COPD=0;

if dx(i) = 'Diabetes' then Diabetes=1;
else if Diabetes=. then Diabetes=0;

if dx(i) ='MI' then MI=1;
else if MI=. then MI=0;

if dx(i) ='Other' then Other=1;
else if Other=. then Other=0;
end;


if Cancer=1 then FinalDx=1;
else if MI =1 then FinalDx=2;
else if Diabetes=1 then FinalDx=3;
else if Obesity=1 then FinalDx=4;
else if COPD=1 then FinalDx=5;
else FinalDx=6;

drop i;
format FinalDx commonhier.;
Run;

ballardw
Super User

I will say "Good for you" for getting a solution.

 

I will add "I don't want to see the solution for 15 diagnosis fields and 10 in the priority list".

eap
Obsidian | Level 7 eap
Obsidian | Level 7

I agree - my solution worked but is a bit clunky. I love improving my SAS skills via this community!

Ksharp
Super User
proc format;
invalue fmt(upcase)
CANCER=1
MI=2
DIABETES=3
OBESITY=4
COPD=5
;
run;
data have;
input caseid Dx1 $ Dx2 $ Dx3 $ ;
cards;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;
run;

data want;
 set have;
 length FINALDX $ 40;
 min=999999;
 array x{*} Dx:;
 do i=1 to dim(x);
  n=input(x{i},?? fmt.);
  if not missing(n) and n<min then do;min=n;FINALDX=x{i};end;
 end;
 drop i n min;
run;
PGStats
Opal | Level 21

If you want to extend to any number of diagnoses, here is a way to do so with SQL :

 

data have;
input caseId (Dx1 - Dx3) ($) ;
datalines;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;

data order;
input diag $ order; 
datalines;
CANCER 1
MI 2
DIABETES 3
OBESITY 4
COPD 5
;

data haveLong;
set have;
array d $ dx:;
do IdDx = 1 to dim(d);
    if not missing (d{IdDx}) then do;
        Diag = d{IdDx};
        output;
        end;
    end;
keep caseId IdDx Diag;
run;

proc sql;
create table want as
select 
    have.*, final_Dx
from
    have left join
   (select
        caseId,
        a.Diag as final_Dx
    from 
        haveLong as a inner join
        order as b on upcase(a.diag) = b.diag
    group by caseId
    having order = min(order) ) as c
        on have.caseId = c.caseId;
quit;

PGStats_0-1649873247303.png

 

PG
eap
Obsidian | Level 7 eap
Obsidian | Level 7

Thanks PGStats! This works perfectly too but I was only able to accept one solution.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 732 views
  • 5 likes
  • 5 in conversation