Solved: Re: Select value from a list based on ordered preference

eap · Posted 04-12-2022 11:02 AM

I have a large dataset of patients with multiple diagnosis codes each. I want to create a new variable 'Final_Dx' that contains the most relevant diagnosis based on an ordered preference. Thank you!

Hierarchical Dx Order
1) Cancer
2) MI
3) Diabetes
4) Obesity
5) COPD

CaseID	Dx1	Dx2	Dx3	FINALDX(Desired)
123	Cancer	Obesity		Cancer
456	COPD	Obesity		Obesity
789	Diabetes	MI	COPD	MI
112	Other	COPD		COPD

Ksharp · Posted 04-13-2022 07:50 AM

proc format;
invalue fmt(upcase)
CANCER=1
MI=2
DIABETES=3
OBESITY=4
COPD=5
;
run;
data have;
input caseid Dx1 $ Dx2 $ Dx3 $ ;
cards;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;
run;

data want;
 set have;
 length FINALDX $ 40;
 min=999999;
 array x{*} Dx:;
 do i=1 to dim(x);
  n=input(x{i},?? fmt.);
  if not missing(n) and n<min then do;min=n;FINALDX=x{i};end;
 end;
 drop i n min;
run;

View solution in original post

ballardw · Posted 04-12-2022 11:38 AM

Please provide example data in the form of data step code. The stuff you posted doesn't even like to paste into an editor to right a data step.

Untested because of no data:

data want; 
   set have;
   array dx(*) dx1-dx3;
   array f (5) $ 20 _temporary_ ('COPD','Obesity','Diabetes','MI','Cancer');
   if cmiss(of dx(*)) < dim(dx) then do;
     pos = max(whichc(dx1,of f(*)),whichc(dx2,of f(*)),whichc(dx3,of f(*)));
     if pos then Finaldiagnosis = f[pos];
  end;
  drop pos;
run;

The WhichC, and numeric equivalent WhichN, return the number position in a list of values. The array F has the values with the most important (assuming 1 meant such) as the last position because the Whichc function returns a 0 if not found and using a Min function to attempt to return 1 wouldn't work with a lot of extra coding. The MAX function however has no problem so returns the largest value when found and if not zero can be used as the index into the temporary array holding the list desired diagnosis values. The CMISS is so the searchis only even attempted if at least one DX value is found.

mkeintz · Posted 04-12-2022 11:45 AM

In the absence of a working data step for sample data, here is untested code.

data want;
  set have; 
  array _preferences {5} $8 _temporary_ ("Cancer","MI","Diabetes","Obesity","COPD");
  do i=1 to 5 until ( whichc(_preferences{i}, of dx:));
  end;
  if i<6 then finaldx=_preferences{i};
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

eap · Posted 04-12-2022 03:45 PM

I figured it out. Here is the test data and syntax.

data have;
input caseid Dx1 $ Dx2 $ Dx3 $ ;
cards;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;
run;

data want;
set have;
array dx (3) $20 Dx1 - Dx3;
do i = 1 to dim(dx);
if dx(i)= 'Cancer' then Cancer=1;
else if Cancer=. then Cancer=0;

if dx(i)= 'Obesity' then Obesity=1;
else if Obesity=. then Obesity=0;

if dx(i) = 'COPD' then COPD=1;
else if COPD=. then COPD=0;

if dx(i) = 'Diabetes' then Diabetes=1;
else if Diabetes=. then Diabetes=0;

if dx(i) ='MI' then MI=1;
else if MI=. then MI=0;

if dx(i) ='Other' then Other=1;
else if Other=. then Other=0;
end;

if Cancer=1 then FinalDx=1;
else if MI =1 then FinalDx=2;
else if Diabetes=1 then FinalDx=3;
else if Obesity=1 then FinalDx=4;
else if COPD=1 then FinalDx=5;
else FinalDx=6;

drop i;
format FinalDx commonhier.;
Run;

ballardw · Posted 04-12-2022 05:39 PM

I will say "Good for you" for getting a solution.

I will add "I don't want to see the solution for 15 diagnosis fields and 10 in the priority list".

eap · Posted 04-13-2022 06:34 PM

I agree - my solution worked but is a bit clunky. I love improving my SAS skills via this community!

Ksharp · Posted 04-13-2022 07:50 AM

proc format;
invalue fmt(upcase)
CANCER=1
MI=2
DIABETES=3
OBESITY=4
COPD=5
;
run;
data have;
input caseid Dx1 $ Dx2 $ Dx3 $ ;
cards;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;
run;

data want;
 set have;
 length FINALDX $ 40;
 min=999999;
 array x{*} Dx:;
 do i=1 to dim(x);
  n=input(x{i},?? fmt.);
  if not missing(n) and n<min then do;min=n;FINALDX=x{i};end;
 end;
 drop i n min;
run;

PGStats · Posted 04-13-2022 02:07 PM

If you want to extend to any number of diagnoses, here is a way to do so with SQL :

data have;
input caseId (Dx1 - Dx3) ($) ;
datalines;
123 Cancer Obesity .
456 COPD Obesity .
789 Diabetes MI COPD
112 Other COPD .
;

data order;
input diag $ order; 
datalines;
CANCER 1
MI 2
DIABETES 3
OBESITY 4
COPD 5
;

data haveLong;
set have;
array d $ dx:;
do IdDx = 1 to dim(d);
    if not missing (d{IdDx}) then do;
        Diag = d{IdDx};
        output;
        end;
    end;
keep caseId IdDx Diag;
run;

proc sql;
create table want as
select 
    have.*, final_Dx
from
    have left join
   (select
        caseId,
        a.Diag as final_Dx
    from 
        haveLong as a inner join
        order as b on upcase(a.diag) = b.diag
    group by caseId
    having order = min(order) ) as c
        on have.caseId = c.caseId;
quit;

PG

eap · Posted 04-13-2022 06:32 PM

Thanks PGStats! This works perfectly too but I was only able to accept one solution.

Registration is open

SAS Training: Just a Click Away