Hi ,
I need to create RANK variable in the dataset shown below (for each USUBJID and AE).
The input variables are USUBJID, ASTDT, AENDT, AE, LLT.
dataset sorted by USUBJID , AE , ASTDT and AENDT.
if the AENDT and LLT of the prior AE is not the ASTDT and LLT of the next AE, then we need to create RANK as +1.
If the ASTDT and LLT of the next AE is the AENDT and LLT of the previous AE record then RANK remains the same. (that is why we have RANK= 1 for obs. no. 2, as the 06MAR2020 is the AENDT and Lipa is the LLT for obs. no. 1)
If the ASTDT and LLT of the next AE is the AENDT and LLT of the any previous AE record then RANK remains the same.
so in case of obs. no. 6. ASTDT =21MAY2020 and LLT = Lipa is the AENDT and LLT of 4th obs. so the RANK value will be the whatever value is from 4th obs. in this case obs. no . 4 RANK=3 so it will be same for obs. no. 6.
could you please help me program this?
I am also attaching the datalines for this dataset for reference.
data test;
informat astdt date9. aendt date9.;
input @1 usubjid $3.
      @5 ae $7.
      @13 llt $7.
      @21 astdt date9.
      @33 aendt date9.
      @45 Rank 1.
;
format astdt aendt date9.;
datalines;
120	enzymes	Lipa    21FEB2020   06MAR2020   1
120	enzymes	Lipa    06MAR2020   13MAR2020   1
120	enzymes	Serum   06MAR2020   13MAR2020   2
120	enzymes	Lipa    09APR2020   21MAY2020   3
120	enzymes	Serum   09APR2020   03JUN2020   4
120	enzymes	Lipa    21MAY2020   01JUL2020   3
120	enzymes	Serum   03JUN2020   01JUL2020   4
120	enzymes	Lipa    01JUL2020   28JUL2020   3
120	enzymes	Serum   01JUL2020   18NOV2020   4
120	enzymes	Lipa    28JUL2020   26AUG2020   3
120	enzymes	Lipa    26AUG2020   21OCT2020   3
120	enzymes	Lipa    18NOV2020   15DEC2020   6
120	enzymes	Amylase 18NOV2020   15DEC2020   5
120	enzymes	Serum   15DEC2020   .           7
;
run;data test;
informat usubjid $3. llt $5. ae $7. astdt date9. aendt date9.;
input usubjid $
      ae $
      llt $
      astdt 
      aendt
;
format astdt aendt date9.;
datalines;
120   enzymes  Lipa    21FEB2020   06MAR2020   
120   enzymes  Lipa    06MAR2020   13MAR2020   
120   enzymes  Serum   06MAR2020   13MAR2020   
120   enzymes  Lipa    09APR2020   21MAY2020   
120   enzymes  Serum   09APR2020   03JUN2020   
120   enzymes  Lipa    21MAY2020   01JUL2020   
120   enzymes  Serum   03JUN2020   01JUL2020   
120   enzymes  Lipa    01JUL2020   28JUL2020   
120   enzymes  Serum   01JUL2020   18NOV2020   
120   enzymes  Lipa    28JUL2020   26AUG2020   
120   enzymes  Lipa    26AUG2020   21OCT2020   
120   enzymes  Lipa    18NOV2020   15DEC2020   
120   enzymes  Amylase 18NOV2020   15DEC2020   
120   enzymes  Serum   15DEC2020   .           
;
proc print;
run;
data want;
set test;
by usubjid ae;
length rank 8 maxrank 8;
retain rank;
if _n_ = 1
then do;
  declare hash end ();
  end.definekey("aendt","llt");
  end.definedata("rank");
  end.definedone();
end;
if first.usubjid then rank = 1;
else do;
  if first.rank
  then do;
    rc = end.clear();
    maxrank = 1;
    rank = 1;
    rc = end.add();
  end;
  else do;
    if end.find(key:astdt,key:llt) ne 0
    then do;
      maxrank + 1;
      rank = maxrank;
      rc = end.add();
    end;
    else do;
      if end.check() ne 0
      then rc = end.add();
    end;
  end;
end;
drop rc maxrank;
proc print;
run;
Adapting @Kurt_Bremser 's code. shouldn't the last 3 rows be 5,6,7 for rank and not 6,5,7?
you could possibly try the lag function to store the previous end date and previous LLT in new variables.
Looks like you can use my hash code from your other thread, with only usubjid and ae as group, and llt as an additional key in the hash. Give it a try.
data test;
informat usubjid $3. llt $5. ae $7. astdt date9. aendt date9.;
input usubjid $
      ae $
      llt $
      astdt 
      aendt
;
format astdt aendt date9.;
datalines;
120   enzymes  Lipa    21FEB2020   06MAR2020   
120   enzymes  Lipa    06MAR2020   13MAR2020   
120   enzymes  Serum   06MAR2020   13MAR2020   
120   enzymes  Lipa    09APR2020   21MAY2020   
120   enzymes  Serum   09APR2020   03JUN2020   
120   enzymes  Lipa    21MAY2020   01JUL2020   
120   enzymes  Serum   03JUN2020   01JUL2020   
120   enzymes  Lipa    01JUL2020   28JUL2020   
120   enzymes  Serum   01JUL2020   18NOV2020   
120   enzymes  Lipa    28JUL2020   26AUG2020   
120   enzymes  Lipa    26AUG2020   21OCT2020   
120   enzymes  Lipa    18NOV2020   15DEC2020   
120   enzymes  Amylase 18NOV2020   15DEC2020   
120   enzymes  Serum   15DEC2020   .           
;
proc print;
run;
data want;
set test;
by usubjid ae;
length rank 8 maxrank 8;
retain rank;
if _n_ = 1
then do;
  declare hash end ();
  end.definekey("aendt","llt");
  end.definedata("rank");
  end.definedone();
end;
if first.usubjid then rank = 1;
else do;
  if first.rank
  then do;
    rc = end.clear();
    maxrank = 1;
    rank = 1;
    rc = end.add();
  end;
  else do;
    if end.find(key:astdt,key:llt) ne 0
    then do;
      maxrank + 1;
      rank = maxrank;
      rc = end.add();
    end;
    else do;
      if end.check() ne 0
      then rc = end.add();
    end;
  end;
end;
drop rc maxrank;
proc print;
run;
Adapting @Kurt_Bremser 's code. shouldn't the last 3 rows be 5,6,7 for rank and not 6,5,7?
Hi @tarheel13 ,
Yes, last 3 row should be as 5,6,7. I just need to sort it by LLT.
Thank you and @Kurt_Bremser for the solution.
Abhinay
Hi @tarheel13
I am getting NOTE in sas log as....
NOTE: Variable first.rank is uninitialized.
do you suggest anything to avoid this as I don't want any uninitialized NOTE in my log.
do you have solution for below scenario...
My supervisor has another bizarre request with RANK value in this data
This is how we develop RANK value up till now. but my client need highlighted value for obs. 7 and 8 as 5 and not 1
like this...
Reason: He says...while deriving RANK value for obs. 7, he wanted to check latest of LLT = urea value, which is obs. no. 4 and AENDT of obs. no. 4 is not equal to ASTDT of obs. no. 7, so we need to keep adding RANK +1.
currently while deriving RANK value for obs. 7, we are considering obs. no. 3.
can we modify our existing code for this requirement.
sorry to trouble you with this again.
I am also attaching the datalines for this.
data test;
 
input @1 usubjid $3.
      @5 ae $8.
      @14 llt $8.
      @23 astdt date9.
      @33 aendt date9.
;
format astdt aendt date9.;
datalines;
410	amino    alan     19JUL2022	08AUG2022
410	amino    aspar    01AUG2022	08AUG2022
410	Toxicity urea     26JUL2022	18AUG2022
410	Toxicity urea     02AUG2022	15AUG2022
410	Toxicity prot     08AUG2022	21SEP2022
410	Toxicity creat    11AUG2022	22AUG2022
410	Toxicity urea     18AUG2022	18AUG2022
410	Toxicity urea     18AUG2022	02SEP2022
410	Toxicity minuria  29AUG2022	21SEP2022
410	Toxicity urea dec 15SEP2022	21SEP2022
720	Toxicity estim    27DEC2022	17JAN2023
720	Toxicity urea     03JAN2023	17JAN2023
720	Toxicity creat    10JAN2023	24JAN2023
720	Toxicity estim    17JAN2023	28FEB2023
720	Toxicity urea     31JAN2023	14FEB2023
720	Toxicity creat    31JAN2023	14FEB2023
720	Toxicity urea     21FEB2023	28FEB2023
720	Toxicity estim    28FEB2023	21MAR2023 
;
 
run;try this. no uninitialized in log now.
data want;
   set test;
   by usubjid ae;
   length rank 8 maxrank 8;
   retain rank;
   if _n_=1 then do;
      declare hash end ();
      end.definekey("aendt", "llt");
      end.definedata("rank");
      end.definedone();
   end;
   if first.usubjid then do;
      rc=end.clear();
      maxrank=1;
      rank=1;
      rc=end.add();
   end;
   else do;
      if end.find(key:astdt, key:llt) ne 0 then
         do;
            maxrank + 1;
            rank=maxrank;
            rc=end.add();
         end;
      else do;
            if end.check() ne 0 then
               rc=end.add();
         end;
   end;
drop rc maxrank;
proc print;
run;Thanks @tarheel13
But this code not working if AE changes for same USUBJID.
could you please suggest.
I have added the 2 more obs. in the data.
RANK value for last 2 obs. should be 1 and 2 as AE changes.
data test;
informat astdt date9. aendt date9.;
input @1 usubjid $3.
      @5 ae $7.
      @13 llt $7.
      @21 astdt date9.
      @33 aendt date9.
;
format astdt aendt date9.;
datalines;
120	enzymes	Lipa    21FEB2020   06MAR2020
120	enzymes	Lipa    06MAR2020   13MAR2020
120	enzymes	Serum   06MAR2020   13MAR2020
120	enzymes	Lipa    09APR2020   21MAY2020
120	enzymes	Serum   09APR2020   03JUN2020
120	enzymes	Lipa    21MAY2020   01JUL2020
120	enzymes	Serum   03JUN2020   01JUL2020
120	enzymes	Lipa    01JUL2020   28JUL2020
120	enzymes	Serum   01JUL2020   18NOV2020
120	enzymes	Lipa    28JUL2020   26AUG2020
120	enzymes	Lipa    26AUG2020   21OCT2020
120	enzymes	Lipa    18NOV2020   15DEC2020
120	enzymes	Amylase 18NOV2020   15DEC2020
120	enzymes	Serum   15DEC2020   .    
120 trans   vax     09APR2020   10APR2020
120 trans   vax     12APR2020   13APR2020
;
run;it seems you want it within ae too, so then just change to if first.ae as such.
data want;
   set test;
   by usubjid ae;
   length rank 8 maxrank 8;
   retain rank;
   if _n_=1 then do;
      declare hash end ();
      end.definekey("aendt", "llt");
      end.definedata("rank");
      end.definedone();
   end;
   if first.ae then do;
      rc=end.clear();
      maxrank=1;
      rank=1;
      rc=end.add();
   end;
   else do;
      if end.find(key:astdt, key:llt) ne 0 then
         do;
            maxrank + 1;
            rank=maxrank;
            rc=end.add();
         end;
      else do;
            if end.check() ne 0 then
               rc=end.add();
         end;
   end;
drop rc maxrank;
proc print;
run;If you allow data to be sorted by usubjid, llt, astdt. it would be much easier.
proc sort data=test;
  by usubjid ae llt astdt;
run;
data want;
  set test;
  by usubjid ae llt astdt;
  lag_aendt=lag(aendt);
  if first.usubjid then new_rank=1;
  else do;
    if first.llt then new_rank+1;
    else if astdt^=lag_aendt then new_rank+1;
  end;
run;
this does not match the desired output OP posted though
it seems like you are changing the requirements. you were only considering LLT and AE start and end dates previously. so what is the requirement now?
Hi @tarheel13
I am very sorry for any confusion.
I have mention in the first thread as....
I need to create RANK variable in the dataset shown below (for each USUBJID and AE).
So the RANK variable start from 1 if either USUBJID or AE changes.
again sorry for the confusion.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
