Hi ,
I need to create RANK variable in the dataset shown below (for each USUBJID and AE).
The input variables are USUBJID, ASTDT, AENDT, AE, LLT.
dataset sorted by USUBJID , AE , ASTDT and AENDT.
if the AENDT and LLT of the prior AE is not the ASTDT and LLT of the next AE, then we need to create RANK as +1.
If the ASTDT and LLT of the next AE is the AENDT and LLT of the previous AE record then RANK remains the same. (that is why we have RANK= 1 for obs. no. 2, as the 06MAR2020 is the AENDT and Lipa is the LLT for obs. no. 1)
If the ASTDT and LLT of the next AE is the AENDT and LLT of the any previous AE record then RANK remains the same.
so in case of obs. no. 6. ASTDT =21MAY2020 and LLT = Lipa is the AENDT and LLT of 4th obs. so the RANK value will be the whatever value is from 4th obs. in this case obs. no . 4 RANK=3 so it will be same for obs. no. 6.
could you please help me program this?
I am also attaching the datalines for this dataset for reference.
data test;
informat astdt date9. aendt date9.;
input @1 usubjid $3.
@5 ae $7.
@13 llt $7.
@21 astdt date9.
@33 aendt date9.
@45 Rank 1.
;
format astdt aendt date9.;
datalines;
120 enzymes Lipa 21FEB2020 06MAR2020 1
120 enzymes Lipa 06MAR2020 13MAR2020 1
120 enzymes Serum 06MAR2020 13MAR2020 2
120 enzymes Lipa 09APR2020 21MAY2020 3
120 enzymes Serum 09APR2020 03JUN2020 4
120 enzymes Lipa 21MAY2020 01JUL2020 3
120 enzymes Serum 03JUN2020 01JUL2020 4
120 enzymes Lipa 01JUL2020 28JUL2020 3
120 enzymes Serum 01JUL2020 18NOV2020 4
120 enzymes Lipa 28JUL2020 26AUG2020 3
120 enzymes Lipa 26AUG2020 21OCT2020 3
120 enzymes Lipa 18NOV2020 15DEC2020 6
120 enzymes Amylase 18NOV2020 15DEC2020 5
120 enzymes Serum 15DEC2020 . 7
;
run;
data test; informat usubjid $3. llt $5. ae $7. astdt date9. aendt date9.; input usubjid $ ae $ llt $ astdt aendt ; format astdt aendt date9.; datalines; 120 enzymes Lipa 21FEB2020 06MAR2020 120 enzymes Lipa 06MAR2020 13MAR2020 120 enzymes Serum 06MAR2020 13MAR2020 120 enzymes Lipa 09APR2020 21MAY2020 120 enzymes Serum 09APR2020 03JUN2020 120 enzymes Lipa 21MAY2020 01JUL2020 120 enzymes Serum 03JUN2020 01JUL2020 120 enzymes Lipa 01JUL2020 28JUL2020 120 enzymes Serum 01JUL2020 18NOV2020 120 enzymes Lipa 28JUL2020 26AUG2020 120 enzymes Lipa 26AUG2020 21OCT2020 120 enzymes Lipa 18NOV2020 15DEC2020 120 enzymes Amylase 18NOV2020 15DEC2020 120 enzymes Serum 15DEC2020 . ; proc print; run; data want; set test; by usubjid ae; length rank 8 maxrank 8; retain rank; if _n_ = 1 then do; declare hash end (); end.definekey("aendt","llt"); end.definedata("rank"); end.definedone(); end; if first.usubjid then rank = 1; else do; if first.rank then do; rc = end.clear(); maxrank = 1; rank = 1; rc = end.add(); end; else do; if end.find(key:astdt,key:llt) ne 0 then do; maxrank + 1; rank = maxrank; rc = end.add(); end; else do; if end.check() ne 0 then rc = end.add(); end; end; end; drop rc maxrank; proc print; run;
Adapting @Kurt_Bremser 's code. shouldn't the last 3 rows be 5,6,7 for rank and not 6,5,7?
you could possibly try the lag function to store the previous end date and previous LLT in new variables.
Looks like you can use my hash code from your other thread, with only usubjid and ae as group, and llt as an additional key in the hash. Give it a try.
data test; informat usubjid $3. llt $5. ae $7. astdt date9. aendt date9.; input usubjid $ ae $ llt $ astdt aendt ; format astdt aendt date9.; datalines; 120 enzymes Lipa 21FEB2020 06MAR2020 120 enzymes Lipa 06MAR2020 13MAR2020 120 enzymes Serum 06MAR2020 13MAR2020 120 enzymes Lipa 09APR2020 21MAY2020 120 enzymes Serum 09APR2020 03JUN2020 120 enzymes Lipa 21MAY2020 01JUL2020 120 enzymes Serum 03JUN2020 01JUL2020 120 enzymes Lipa 01JUL2020 28JUL2020 120 enzymes Serum 01JUL2020 18NOV2020 120 enzymes Lipa 28JUL2020 26AUG2020 120 enzymes Lipa 26AUG2020 21OCT2020 120 enzymes Lipa 18NOV2020 15DEC2020 120 enzymes Amylase 18NOV2020 15DEC2020 120 enzymes Serum 15DEC2020 . ; proc print; run; data want; set test; by usubjid ae; length rank 8 maxrank 8; retain rank; if _n_ = 1 then do; declare hash end (); end.definekey("aendt","llt"); end.definedata("rank"); end.definedone(); end; if first.usubjid then rank = 1; else do; if first.rank then do; rc = end.clear(); maxrank = 1; rank = 1; rc = end.add(); end; else do; if end.find(key:astdt,key:llt) ne 0 then do; maxrank + 1; rank = maxrank; rc = end.add(); end; else do; if end.check() ne 0 then rc = end.add(); end; end; end; drop rc maxrank; proc print; run;
Adapting @Kurt_Bremser 's code. shouldn't the last 3 rows be 5,6,7 for rank and not 6,5,7?
Hi @tarheel13 ,
Yes, last 3 row should be as 5,6,7. I just need to sort it by LLT.
Thank you and @Kurt_Bremser for the solution.
Abhinay
Hi @tarheel13
I am getting NOTE in sas log as....
NOTE: Variable first.rank is uninitialized.
do you suggest anything to avoid this as I don't want any uninitialized NOTE in my log.
do you have solution for below scenario...
My supervisor has another bizarre request with RANK value in this data
This is how we develop RANK value up till now. but my client need highlighted value for obs. 7 and 8 as 5 and not 1
like this...
Reason: He says...while deriving RANK value for obs. 7, he wanted to check latest of LLT = urea value, which is obs. no. 4 and AENDT of obs. no. 4 is not equal to ASTDT of obs. no. 7, so we need to keep adding RANK +1.
currently while deriving RANK value for obs. 7, we are considering obs. no. 3.
can we modify our existing code for this requirement.
sorry to trouble you with this again.
I am also attaching the datalines for this.
data test;
input @1 usubjid $3.
@5 ae $8.
@14 llt $8.
@23 astdt date9.
@33 aendt date9.
;
format astdt aendt date9.;
datalines;
410 amino alan 19JUL2022 08AUG2022
410 amino aspar 01AUG2022 08AUG2022
410 Toxicity urea 26JUL2022 18AUG2022
410 Toxicity urea 02AUG2022 15AUG2022
410 Toxicity prot 08AUG2022 21SEP2022
410 Toxicity creat 11AUG2022 22AUG2022
410 Toxicity urea 18AUG2022 18AUG2022
410 Toxicity urea 18AUG2022 02SEP2022
410 Toxicity minuria 29AUG2022 21SEP2022
410 Toxicity urea dec 15SEP2022 21SEP2022
720 Toxicity estim 27DEC2022 17JAN2023
720 Toxicity urea 03JAN2023 17JAN2023
720 Toxicity creat 10JAN2023 24JAN2023
720 Toxicity estim 17JAN2023 28FEB2023
720 Toxicity urea 31JAN2023 14FEB2023
720 Toxicity creat 31JAN2023 14FEB2023
720 Toxicity urea 21FEB2023 28FEB2023
720 Toxicity estim 28FEB2023 21MAR2023
;
run;
try this. no uninitialized in log now.
data want;
set test;
by usubjid ae;
length rank 8 maxrank 8;
retain rank;
if _n_=1 then do;
declare hash end ();
end.definekey("aendt", "llt");
end.definedata("rank");
end.definedone();
end;
if first.usubjid then do;
rc=end.clear();
maxrank=1;
rank=1;
rc=end.add();
end;
else do;
if end.find(key:astdt, key:llt) ne 0 then
do;
maxrank + 1;
rank=maxrank;
rc=end.add();
end;
else do;
if end.check() ne 0 then
rc=end.add();
end;
end;
drop rc maxrank;
proc print;
run;
Thanks @tarheel13
But this code not working if AE changes for same USUBJID.
could you please suggest.
I have added the 2 more obs. in the data.
RANK value for last 2 obs. should be 1 and 2 as AE changes.
data test;
informat astdt date9. aendt date9.;
input @1 usubjid $3.
@5 ae $7.
@13 llt $7.
@21 astdt date9.
@33 aendt date9.
;
format astdt aendt date9.;
datalines;
120 enzymes Lipa 21FEB2020 06MAR2020
120 enzymes Lipa 06MAR2020 13MAR2020
120 enzymes Serum 06MAR2020 13MAR2020
120 enzymes Lipa 09APR2020 21MAY2020
120 enzymes Serum 09APR2020 03JUN2020
120 enzymes Lipa 21MAY2020 01JUL2020
120 enzymes Serum 03JUN2020 01JUL2020
120 enzymes Lipa 01JUL2020 28JUL2020
120 enzymes Serum 01JUL2020 18NOV2020
120 enzymes Lipa 28JUL2020 26AUG2020
120 enzymes Lipa 26AUG2020 21OCT2020
120 enzymes Lipa 18NOV2020 15DEC2020
120 enzymes Amylase 18NOV2020 15DEC2020
120 enzymes Serum 15DEC2020 .
120 trans vax 09APR2020 10APR2020
120 trans vax 12APR2020 13APR2020
;
run;
it seems you want it within ae too, so then just change to if first.ae as such.
data want;
set test;
by usubjid ae;
length rank 8 maxrank 8;
retain rank;
if _n_=1 then do;
declare hash end ();
end.definekey("aendt", "llt");
end.definedata("rank");
end.definedone();
end;
if first.ae then do;
rc=end.clear();
maxrank=1;
rank=1;
rc=end.add();
end;
else do;
if end.find(key:astdt, key:llt) ne 0 then
do;
maxrank + 1;
rank=maxrank;
rc=end.add();
end;
else do;
if end.check() ne 0 then
rc=end.add();
end;
end;
drop rc maxrank;
proc print;
run;
If you allow data to be sorted by usubjid, llt, astdt. it would be much easier.
proc sort data=test;
by usubjid ae llt astdt;
run;
data want;
set test;
by usubjid ae llt astdt;
lag_aendt=lag(aendt);
if first.usubjid then new_rank=1;
else do;
if first.llt then new_rank+1;
else if astdt^=lag_aendt then new_rank+1;
end;
run;
this does not match the desired output OP posted though
it seems like you are changing the requirements. you were only considering LLT and AE start and end dates previously. so what is the requirement now?
Hi @tarheel13
I am very sorry for any confusion.
I have mention in the first thread as....
I need to create RANK variable in the dataset shown below (for each USUBJID and AE).
So the RANK variable start from 1 if either USUBJID or AE changes.
again sorry for the confusion.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.