Hi
I need to create AJOINID variable as show in the xlsx file (test.xlsx), for each usubjid , aellt
if there is unique USUBJID and AELLT then AJOINID + 1
if there are multiple AELLT for unique USUBJID then AJOINID will repeat if AEENDTC same as AESTDTC for this
and ASTDT will be retain with the first observation
the final data should be sorted by USUBJID ASTDT AENDT AELLT and AJOINID will be derive accordingly.
could you please give me the sas code for this
By now you should know better than to attach a spreadsheet file. ALWAYS post example data as a working DATA step in a code box.
If you need Excel help, there are Microsoft-oriented forums for this.
Hi Kurt,
apology.
here I am attaching the code.
could you please help me with the solution
data test;
infile datalines dsd dlm=',';
length USUBJID $20 AELLT $50 AESTDTC $10 AEENDTC $19 ;
input USUBJID $ AJOINID AELLT $ AESTDTC $ ASTDT date9. AEENDTC $ AENDT date9.;
format astdt aendt date9.;
datalines;
101,1,BBL,2024-12-23,23DEC24,2025-01-07,07JAN25
101,2,Hyperbilirubinemia,2025-01-14,14JAN25,2025-01-29,29JAN25
101,3,API,2025-01-22,22JAN25,2025-02-06,06FEB25
101,4,Anme,2025-01-22,22JAN25,2025-01-26,26JAN25
101,5,Hypoalbuminemia,2025-01-22,22JAN25,2025-01-30,30JAN25
101,6,Leuk,2025-01-22,22JAN25,2025-01-26,26JAN25
101,6,Leuk,2025-01-26,22JAN25,2025-01-26,26JAN25
101,7,Lymphocytopenia,2025-01-22,22JAN25,2025-01-26,26JAN25
101,8,Neut,2025-01-22,22JAN25,2025-01-22,22JAN25
101,9,Thromb,2025-01-22,22JAN25,2025-01-25,30JAN25
101,9,Thromb,2025-01-25,22JAN25,2025-01-26,30JAN25
101,9,Thromb,2025-01-26,22JAN25,2025-01-27,30JAN25
101,9,Thromb,2025-01-27,22JAN25,2025-01-29,30JAN25
101,9,Thromb,2025-01-29,22JAN25,2025-01-30,30JAN25
101,9,Thromb,2025-01-30,22JAN25,2025-01-30,30JAN25
101,10,Febrile Neut,2025-01-23,23JAN25,2025-01-24,24JAN25
101,11,Neut,2025-01-25,25JAN25,2025-01-26,26JAN25
101,12,Pancytopenia,2025-01-25,25JAN25,2025-02-03,03FEB25
101,13,Anme,2025-01-27,27JAN25,2025-01-28,28JAN25
101,14,Leuk,2025-01-27,27JAN25,2025-01-27,27JAN25
101,15,Lymphocytopenia,2025-01-27,27JAN25,2025-01-27,27JAN25
101,16,Neut,2025-01-27,27JAN25,2025-01-28,28JAN25
101,17,Leuk,2025-01-28,28JAN25,2025-02-06,06FEB25
101,18,Anme,2025-01-29,29JAN25,2025-01-29,29JAN25
101,19,Neut,2025-01-29,29JAN25,2025-02-06,06FEB25
101,20,Anme,2025-01-30,30JAN25,2025-01-30,30JAN25
101,21,Anme,2025-01-31,31JAN25,2025-03-05,05MAR25
101,22,Leuk,2025-02-07,07FEB25,2025-02-26,26FEB25
101,23,Neut,2025-02-07,07FEB25,2025-02-13,13FEB25
101,24,Neut,2025-02-14,14FEB25,2025-02-27,27FEB25
101,25,Arthralgia,2025-02-18,18FEB25,2025-05-10,10MAY25
101,26,Asthenia,2025-02-18,18FEB25,2025-05-10,10MAY25
101,27,API,2025-02-28,28FEB25,2025-03-13,13MAR25
101,28,Leuk,2025-02-28,28FEB25,2025-03-05,05MAR25
101,29,Neut,2025-02-28,28FEB25,2025-03-05,05MAR25
101,30,Thromb,2025-02-28,28FEB25,2025-03-13,13MAR25
101,31,Anme,2025-03-06,06MAR25,2025-03-13,13MAR25
101,32,Leuk,2025-03-06,06MAR25,2025-05-15,15MAY25
101,33,Neut,2025-03-06,06MAR25,2025-03-13,13MAR25
101,34,Anme,2025-03-14,14MAR25,2025-03-27,27MAR25
101,35,Neut,2025-03-14,14MAR25,2025-04-25,25APR25
101,36,API,2025-03-28,28MAR25,2025-04-10,10APR25
101,37,Anme,2025-03-28,28MAR25,2025-04-10,10APR25
101,38,Thromb,2025-03-28,28MAR25,2025-04-10,10APR25
101,39,Anme,2025-04-11,11APR25,2025-04-25,25APR25
101,40,Anme,2025-04-26,26APR25,2025-05-15,15MAY25
101,41,Neut,2025-04-26,26APR25,2025-05-15,15MAY25
101,42,Thromb,2025-04-26,26APR25,2025-05-08,08MAY25
101,43,Anme,2025-05-16,16MAY25, ,.
101,44,Leuk,2025-05-16,16MAY25, ,.
102,1,Vom,2025-03-26,26MAR25,2025-04-15,15APR25
102,2,Vomiting,2025-04-01,01APR25,2025-04-01,01APR25
102,3,Vom,2025-04-02,02APR25,2025-04-04,04APR25
102,4,Asthenia,2025-04-10,10APR25, ,.
;
run;
Your example dataset you posted later already has AJOINID on it.
Do you want to make a new variable? What do you want to name it? If you want it to be named AJOINID then what do you want to do with the original variable?
please consider other data as input except AJOIND variable, I wanted to present (newly create) AJOINID as given then code(dataset)
What is the actual goal here? Is it to make this counter variable? Or is that just some intermediate step towards your true objective?
I can recreate your variable.
data want;
set test(rename=(AJOINID=EXPECT));
by USUBJID AELLT notsorted;
if first.usubjid then AJOINID=0;
if first.AELLT or (astdt > lag(aendt) > .Z ) then AJOINID+1;
run;
proc compare;
var expect;
with ajoinid;
run;
But somehow I doubt that is actually what you want to do. It will have trouble with real data. If you have two or more AELLT term values that have overlapping time periods how can you put them into an order that would make that code notice the overlapping time periods for one or more of the individual AELLT terms?
And also why did you NOT consider these two records as overlapping?
Regarding Overlapping, we need to consider Overlapping AE if pervious AENDT is equal to ASTDT
Shouldn't you sort by LLT and then start/stop dates if the goal is to collapse multiple records for the same LLT with overlapping date ranges?
You can always later re-sort the data by start date for your presentation.
Ok, I got this one but at one point I am stuck
I am using below code to get AJOINID values.
data test;
SET test;
by usubjid astdt aellt;
if first.aellt=1 then ajoinid+1;
if first.usubjid=1 then ajoinid=1;
run;
but as you see, at highlighted part, I am getting wrong values.
I only want to repeat number when for same USUBJID , same AELLT if previous AENDT same as current ASTDTC
on the observation no. 19 then previous AENDT is missing and current ASTDTC is 27MAR2025, so I want to continue with AJOINID = 10 and 11 for last observation.
Please help me with this
I am attaching the code for input dataset .
data test;
infile datalines dsd dlm=',' truncover;
length USUBJID $20 AELLT $50 ASTDT $9 AENDT $9 ASTDTC $9;
input USUBJID $ AELLT $ ASTDT $ AENDT $ AJOINID ASTDTC $;
datalines;
101,Fat,17APR2025,27APR2025,1,17APR2025
101,Fat,17APR2025,.,1,27APR2025
101,Nausea,17APR2025,29APR2025,2,17APR2025
101,Anorexia,23APR2025,29APR2025,3,23APR2025
101,Arthralgia,27APR2025,14MAY2025,4,27APR2025
101,Arthralgia,27APR2025,15MAY2025,4,14MAY2025
101,Arthralgia,27APR2025,.,4,15MAY2025
101,Hemorrhoids,05MAY2025,.,5,05MAY2025
101,Vomiting,27MAY2025,27MAY2025,6,27MAY2025
201,Genital bleeding,18JAN2025,30JAN2025,1,18JAN2025
201,Nausea,23JAN2025,.,2,23JAN2025
201,Pelvic pain,23JAN2025,.,3,23JAN2025
201,Genital bleeding,08FEB2025,13FEB2025,4,08FEB2025
201,Asthenia,27FEB2025,.,5,27FEB2025
201,Vaginal dryness,27FEB2025,.,6,27FEB2025
201,Genital bleeding,01MAR2025,01MAR2025,7,.
201,Diarrhea,27MAR2025,.,8,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Hyponatremia,08MAY2025,.,10,08MAY2025
;
run;
data test (drop = astdtx aendtx astdtcx ajoinid);
retain usubjid aellt astdtc aendt astdt ajoinid ;
set test (rename=(astdt=astdtx aendt=aendtx astdtc =astdtcx ));
astdt = input(strip(astdtx),??date9.);
aendt = input(strip(aendtx),??date9.);
astdtc = input(strip(astdtcx),??date9.);
format astdt aendt astdtc date9.;
run;
Not sure why you are making the variable character instead of numeric. Strings in DDMONYYYY style do not sort in chronological order.
Also it will help to make a variable that set the end date for ongoing AE's to some far future date so that you can sort them into the right place.
Like this:
data test;
infile datalines dsd dlm=',' truncover;
length USUBJID $20 AELLT $50 ASTDT 8 AENDT 8 EXAMPLE 8 ASTDTC $9;
input USUBJID AELLT ASTDT :date. AENDT :date. EXAMPLE ASTDTC ;
fake_end = coalesce(aendt,'31DEC3999'd);
format astdt aendt fake_end date9.;
datalines;
101,Fat,17APR2025,27APR2025,1,17APR2025
101,Fat,17APR2025,.,1,27APR2025
101,Nausea,17APR2025,29APR2025,2,17APR2025
101,Anorexia,23APR2025,29APR2025,3,23APR2025
101,Arthralgia,27APR2025,14MAY2025,4,27APR2025
101,Arthralgia,27APR2025,15MAY2025,4,14MAY2025
101,Arthralgia,27APR2025,.,4,15MAY2025
101,Hemorrhoids,05MAY2025,.,5,05MAY2025
101,Vomiting,27MAY2025,27MAY2025,6,27MAY2025
201,Genital bleeding,18JAN2025,30JAN2025,1,18JAN2025
201,Nausea,23JAN2025,.,2,23JAN2025
201,Pelvic pain,23JAN2025,.,3,23JAN2025
201,Genital bleeding,08FEB2025,13FEB2025,4,08FEB2025
201,Asthenia,27FEB2025,.,5,27FEB2025
201,Vaginal dryness,27FEB2025,.,6,27FEB2025
201,Genital bleeding,01MAR2025,01MAR2025,7,.
201,Diarrhea,27MAR2025,.,8,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Hyponatremia,08MAY2025,.,10,08MAY2025
;
Now you can sort the data by subject, term, start, and end so you can detect the overlapping records.
proc sort data=test;
by usubjid aellt astdt fake_end ;
run;
data want ;
set test ;
by usubjid aellt ;
if first.usubjid then AJOINID=0;
if first.aellt or (astdt > lag(fake_end)) then AJOINID+1;
run;
Result
f U E a A S X A k J U A A A A S e O B E S E M T _ I O J L T N P D e N b I L D D L T n I s D T T T E C d D 1 101 Anorexia 23APR2025 29APR2025 3 23APR2025 29APR2025 1 2 101 Arthralgia 27APR2025 14MAY2025 4 27APR2025 14MAY2025 2 3 101 Arthralgia 27APR2025 15MAY2025 4 14MAY2025 15MAY2025 2 4 101 Arthralgia 27APR2025 . 4 15MAY2025 31DEC3999 2 5 101 Fat 17APR2025 27APR2025 1 17APR2025 27APR2025 3 6 101 Fat 17APR2025 . 1 27APR2025 31DEC3999 3 7 101 Hemorrhoids 05MAY2025 . 5 05MAY2025 31DEC3999 4 8 101 Nausea 17APR2025 29APR2025 2 17APR2025 29APR2025 5 9 101 Vomiting 27MAY2025 27MAY2025 6 27MAY2025 27MAY2025 6 10 201 Asthenia 27FEB2025 . 5 27FEB2025 31DEC3999 1 11 201 Diarrhea 27MAR2025 . 8 27MAR2025 31DEC3999 2 12 201 Genital bleeding 18JAN2025 30JAN2025 1 18JAN2025 30JAN2025 3 13 201 Genital bleeding 08FEB2025 13FEB2025 4 08FEB2025 13FEB2025 4 14 201 Genital bleeding 01MAR2025 01MAR2025 7 01MAR2025 5 15 201 Hyponatremia 08MAY2025 . 10 08MAY2025 31DEC3999 6 16 201 Nausea 23JAN2025 . 2 23JAN2025 31DEC3999 7 17 201 Pelvic pain 23JAN2025 . 3 23JAN2025 31DEC3999 8 18 201 Rash 27MAR2025 . 9 27MAR2025 31DEC3999 9 19 201 Rash 27MAR2025 . 9 27MAR2025 31DEC3999 9 20 201 Vaginal dryness 27FEB2025 . 6 27FEB2025 31DEC3999 10
If you need it to look more like your spreadsheet then just sort again.
proc sort data=want;
by usubjid astdt fake_end ;
run;
Result
f U E a A S X A k J U A A A A S e O B E S E M T _ I O J L T N P D e N b I L D D L T n I s D T T T E C d D 1 101 Fat 17APR2025 27APR2025 1 17APR2025 27APR2025 3 2 101 Nausea 17APR2025 29APR2025 2 17APR2025 29APR2025 5 3 101 Fat 17APR2025 . 1 27APR2025 31DEC3999 3 4 101 Anorexia 23APR2025 29APR2025 3 23APR2025 29APR2025 1 5 101 Arthralgia 27APR2025 14MAY2025 4 27APR2025 14MAY2025 2 6 101 Arthralgia 27APR2025 15MAY2025 4 14MAY2025 15MAY2025 2 7 101 Arthralgia 27APR2025 . 4 15MAY2025 31DEC3999 2 8 101 Hemorrhoids 05MAY2025 . 5 05MAY2025 31DEC3999 4 9 101 Vomiting 27MAY2025 27MAY2025 6 27MAY2025 27MAY2025 6 10 201 Genital bleeding 18JAN2025 30JAN2025 1 18JAN2025 30JAN2025 3 11 201 Nausea 23JAN2025 . 2 23JAN2025 31DEC3999 7 12 201 Pelvic pain 23JAN2025 . 3 23JAN2025 31DEC3999 8 13 201 Genital bleeding 08FEB2025 13FEB2025 4 08FEB2025 13FEB2025 4 14 201 Asthenia 27FEB2025 . 5 27FEB2025 31DEC3999 1 15 201 Vaginal dryness 27FEB2025 . 6 27FEB2025 31DEC3999 10 16 201 Genital bleeding 01MAR2025 01MAR2025 7 01MAR2025 5 17 201 Diarrhea 27MAR2025 . 8 27MAR2025 31DEC3999 2 18 201 Rash 27MAR2025 . 9 27MAR2025 31DEC3999 9 19 201 Rash 27MAR2025 . 9 27MAR2025 31DEC3999 9 20 201 Hyponatremia 08MAY2025 . 10 08MAY2025 31DEC3999 6
The order of the value of AJOINID are not the same as in your hand calculated EXAMPLE, but the grouping is.
proc freq ;
tables usubjid*example*ajoinid / list ;
run;
USUBJID EXAMPLE AJOINID Frequency ---------------------------------------- 101 1 3 2 101 2 5 1 101 3 1 1 101 4 2 3 101 5 4 1 101 6 6 1 201 1 3 1 201 2 7 1 201 3 8 1 201 4 4 1 201 5 1 1 201 6 10 1 201 7 5 1 201 8 2 1 201 9 9 2 201 10 6 1
I would require AJOINID = 10 for 19th observation , as ASTDTC is not equal to previous AENDT
Huh?
You have two identical observations
SUBJECT AE_TERM START_DATE END_DATE 201 Rash 27MAR2025 . 201 Rash 27MAR2025 .
and want to treat them as two different events? Why?
Note that you do not show a separate ONGOING flag in your example data, but generally if the END date is missing that means that the AE was still present when the observation was recorded. If the AE spanned only a single day the END_DATE would be set to the same date as the START_DATE.
Yes, I have that query too, but my analysis plan doc. want to treat this as separate in AJOINID as 1st obs. (pervious one) has missing end date, which is not equal to the start date (2nd observation)
I am not able to do it programmatically this 😞
someone please help me
It is not a good idea to always just do whatever someone says. If what they want seems illogical it is your role to make sure they have a good reason for asking for it.
But if you want to add the rule that having the missing AENDT on the previous observation triggers a new grouping then that is not very hard to do.
data want ;
set test ;
by usubjid aellt ;
if first.usubjid then AJOINID=0;
if first.aellt or (astdt > lag(fake_end)) or missing(lag(aendt)) then AJOINID+1;
run;
proc print;
where aellt='Rash';
run;
Obs USUBJID AELLT ASTDT AENDT EXAMPLE ASTDTC fake_end AJOINID 18 201 Rash 27MAR2025 . 9 27MAR2025 31DEC3999 9 19 201 Rash 27MAR2025 . 9 27MAR2025 31DEC3999 10
And since SAS treats missing numeric values as less than any actual number the rule reduces to just:
if first.aellt or (astdt > lag(aendt)) then AJOINID+1;
Note: You still probably want the FAKE_END variable to put the observations into a reasonable order.
Thanks Tom for the reply.
but, by USUBJID AELLT will not achieve the below sorting order.
I want AJOINID to assign as per USUBJID ASTDT AELLT
that is why I am finding it very hard to do it.
do you have any idea how to achieve this one ? (again, thanks for all your reply 🙂 )
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.