BookmarkSubscribeRSS Feed
abhinayingole
Obsidian | Level 7

Hi

I need to create AJOINID variable as show in the xlsx file (test.xlsx), for each usubjid , aellt
if there is unique USUBJID and AELLT then AJOINID + 1
if there are multiple AELLT for unique USUBJID then AJOINID will repeat if AEENDTC same as AESTDTC for this
and ASTDT will be retain with the first observation

 

the final data should be sorted by USUBJID ASTDT AENDT AELLT and AJOINID will be derive accordingly.

 

could you please give me the sas code for this

15 REPLIES 15
Kurt_Bremser
Super User

By now you should know better than to attach a spreadsheet file. ALWAYS post example data as a working DATA step in a code box.

 

If you need Excel help, there are Microsoft-oriented forums for this.

abhinayingole
Obsidian | Level 7

Hi Kurt,

apology.

here I am attaching the code.

could you please help me with the solution



data test;
    infile datalines dsd dlm=',';
    length USUBJID $20 AELLT $50 AESTDTC $10   AEENDTC $19  ;
    input USUBJID $ AJOINID AELLT $ AESTDTC $ ASTDT date9. AEENDTC $ AENDT date9.;
        format  astdt aendt date9.;
    datalines;
101,1,BBL,2024-12-23,23DEC24,2025-01-07,07JAN25
101,2,Hyperbilirubinemia,2025-01-14,14JAN25,2025-01-29,29JAN25
101,3,API,2025-01-22,22JAN25,2025-02-06,06FEB25
101,4,Anme,2025-01-22,22JAN25,2025-01-26,26JAN25
101,5,Hypoalbuminemia,2025-01-22,22JAN25,2025-01-30,30JAN25
101,6,Leuk,2025-01-22,22JAN25,2025-01-26,26JAN25
101,6,Leuk,2025-01-26,22JAN25,2025-01-26,26JAN25
101,7,Lymphocytopenia,2025-01-22,22JAN25,2025-01-26,26JAN25
101,8,Neut,2025-01-22,22JAN25,2025-01-22,22JAN25
101,9,Thromb,2025-01-22,22JAN25,2025-01-25,30JAN25
101,9,Thromb,2025-01-25,22JAN25,2025-01-26,30JAN25
101,9,Thromb,2025-01-26,22JAN25,2025-01-27,30JAN25
101,9,Thromb,2025-01-27,22JAN25,2025-01-29,30JAN25
101,9,Thromb,2025-01-29,22JAN25,2025-01-30,30JAN25
101,9,Thromb,2025-01-30,22JAN25,2025-01-30,30JAN25
101,10,Febrile Neut,2025-01-23,23JAN25,2025-01-24,24JAN25
101,11,Neut,2025-01-25,25JAN25,2025-01-26,26JAN25
101,12,Pancytopenia,2025-01-25,25JAN25,2025-02-03,03FEB25
101,13,Anme,2025-01-27,27JAN25,2025-01-28,28JAN25
101,14,Leuk,2025-01-27,27JAN25,2025-01-27,27JAN25
101,15,Lymphocytopenia,2025-01-27,27JAN25,2025-01-27,27JAN25
101,16,Neut,2025-01-27,27JAN25,2025-01-28,28JAN25
101,17,Leuk,2025-01-28,28JAN25,2025-02-06,06FEB25
101,18,Anme,2025-01-29,29JAN25,2025-01-29,29JAN25
101,19,Neut,2025-01-29,29JAN25,2025-02-06,06FEB25
101,20,Anme,2025-01-30,30JAN25,2025-01-30,30JAN25
101,21,Anme,2025-01-31,31JAN25,2025-03-05,05MAR25
101,22,Leuk,2025-02-07,07FEB25,2025-02-26,26FEB25
101,23,Neut,2025-02-07,07FEB25,2025-02-13,13FEB25
101,24,Neut,2025-02-14,14FEB25,2025-02-27,27FEB25
101,25,Arthralgia,2025-02-18,18FEB25,2025-05-10,10MAY25
101,26,Asthenia,2025-02-18,18FEB25,2025-05-10,10MAY25
101,27,API,2025-02-28,28FEB25,2025-03-13,13MAR25
101,28,Leuk,2025-02-28,28FEB25,2025-03-05,05MAR25
101,29,Neut,2025-02-28,28FEB25,2025-03-05,05MAR25
101,30,Thromb,2025-02-28,28FEB25,2025-03-13,13MAR25
101,31,Anme,2025-03-06,06MAR25,2025-03-13,13MAR25
101,32,Leuk,2025-03-06,06MAR25,2025-05-15,15MAY25
101,33,Neut,2025-03-06,06MAR25,2025-03-13,13MAR25
101,34,Anme,2025-03-14,14MAR25,2025-03-27,27MAR25
101,35,Neut,2025-03-14,14MAR25,2025-04-25,25APR25
101,36,API,2025-03-28,28MAR25,2025-04-10,10APR25
101,37,Anme,2025-03-28,28MAR25,2025-04-10,10APR25
101,38,Thromb,2025-03-28,28MAR25,2025-04-10,10APR25
101,39,Anme,2025-04-11,11APR25,2025-04-25,25APR25
101,40,Anme,2025-04-26,26APR25,2025-05-15,15MAY25
101,41,Neut,2025-04-26,26APR25,2025-05-15,15MAY25
101,42,Thromb,2025-04-26,26APR25,2025-05-08,08MAY25
101,43,Anme,2025-05-16,16MAY25, ,.
101,44,Leuk,2025-05-16,16MAY25, ,.
102,1,Vom,2025-03-26,26MAR25,2025-04-15,15APR25
102,2,Vomiting,2025-04-01,01APR25,2025-04-01,01APR25
102,3,Vom,2025-04-02,02APR25,2025-04-04,04APR25
102,4,Asthenia,2025-04-10,10APR25, ,.
;
run;
Tom
Super User Tom
Super User

Your example dataset you posted later already has AJOINID on it.

Do you want to make a new variable?  What do you want to name it?  If you want it to be named AJOINID then what do you want to do with the original variable?

abhinayingole
Obsidian | Level 7

please consider other data as input except AJOIND variable, I wanted to present (newly create) AJOINID as given then code(dataset)

Tom
Super User Tom
Super User

What is the actual goal here?  Is it to make this counter variable?  Or is that just some intermediate step towards your true objective?

 

I can recreate your variable.

data want;
  set test(rename=(AJOINID=EXPECT));
  by USUBJID AELLT notsorted;
  if first.usubjid then AJOINID=0;
  if first.AELLT or (astdt > lag(aendt) > .Z ) then AJOINID+1;
run;

proc compare;
 var expect;
 with ajoinid;
run;

But somehow I doubt that is actually what you want to do.  It will have trouble with real data.  If you have two or more AELLT term values that have overlapping time periods how can you put them into an order that would make that code notice the overlapping time periods for one or more of the individual AELLT terms?

 

And also why did you NOT consider these two records as overlapping?

Tom_0-1751305820652.png

 

 

abhinayingole
Obsidian | Level 7

Regarding Overlapping, we need to consider Overlapping AE if pervious AENDT is equal to ASTDT

 

Tom
Super User Tom
Super User

Shouldn't you sort by LLT and then start/stop dates if the goal is to collapse multiple records for the same LLT with overlapping date ranges?

 

You can always later re-sort the data by start date for your presentation.

abhinayingole
Obsidian | Level 7

Ok, I got this one but at one point I am stuck

I am using below code to get AJOINID values.

 

data test;
SET test;
by usubjid astdt aellt;

if first.aellt=1 then ajoinid+1;
if first.usubjid=1 then ajoinid=1;

run;

 

but as you see, at highlighted part, I am getting wrong values.

I only want to repeat number when for same USUBJID , same AELLT if previous AENDT same as current ASTDTC  

on the observation no. 19 then previous AENDT is missing and current ASTDTC is 27MAR2025, so I want to continue with AJOINID = 10 and 11 for last observation.

 

Please help me with this 

I am attaching the code for input dataset .

 

abhinayingole_0-1751386469822.png

 


data test;
    infile datalines dsd dlm=',' truncover;
    length USUBJID $20 AELLT $50 ASTDT $9 AENDT $9 ASTDTC $9;
    input USUBJID $ AELLT $ ASTDT $ AENDT $ AJOINID ASTDTC $;
    datalines;
101,Fat,17APR2025,27APR2025,1,17APR2025
101,Fat,17APR2025,.,1,27APR2025
101,Nausea,17APR2025,29APR2025,2,17APR2025
101,Anorexia,23APR2025,29APR2025,3,23APR2025
101,Arthralgia,27APR2025,14MAY2025,4,27APR2025
101,Arthralgia,27APR2025,15MAY2025,4,14MAY2025
101,Arthralgia,27APR2025,.,4,15MAY2025
101,Hemorrhoids,05MAY2025,.,5,05MAY2025
101,Vomiting,27MAY2025,27MAY2025,6,27MAY2025
201,Genital bleeding,18JAN2025,30JAN2025,1,18JAN2025
201,Nausea,23JAN2025,.,2,23JAN2025
201,Pelvic pain,23JAN2025,.,3,23JAN2025
201,Genital bleeding,08FEB2025,13FEB2025,4,08FEB2025
201,Asthenia,27FEB2025,.,5,27FEB2025
201,Vaginal dryness,27FEB2025,.,6,27FEB2025
201,Genital bleeding,01MAR2025,01MAR2025,7,.
201,Diarrhea,27MAR2025,.,8,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Hyponatremia,08MAY2025,.,10,08MAY2025
;
run;


data test (drop = astdtx aendtx astdtcx ajoinid);
  retain usubjid aellt astdtc aendt astdt ajoinid ;
  set test (rename=(astdt=astdtx  aendt=aendtx astdtc =astdtcx  ));
  astdt = input(strip(astdtx),??date9.);
  aendt = input(strip(aendtx),??date9.);
  astdtc = input(strip(astdtcx),??date9.);
  
  format astdt aendt astdtc date9.;
run;

 

Tom
Super User Tom
Super User

Not sure why you are making the variable character instead of numeric.  Strings in DDMONYYYY style do not sort in chronological order.

 

Also it will help to make a variable that set the end date for ongoing AE's to some far future date so that you can sort them into the right place.

 

Like this:

Spoiler
data test;
  infile datalines dsd dlm=',' truncover;
  length USUBJID $20 AELLT $50 ASTDT 8 AENDT 8 EXAMPLE 8 ASTDTC $9;
  input USUBJID AELLT ASTDT :date. AENDT :date. EXAMPLE ASTDTC ;
  fake_end = coalesce(aendt,'31DEC3999'd);
  format astdt aendt fake_end date9.;
datalines;
101,Fat,17APR2025,27APR2025,1,17APR2025
101,Fat,17APR2025,.,1,27APR2025
101,Nausea,17APR2025,29APR2025,2,17APR2025
101,Anorexia,23APR2025,29APR2025,3,23APR2025
101,Arthralgia,27APR2025,14MAY2025,4,27APR2025
101,Arthralgia,27APR2025,15MAY2025,4,14MAY2025
101,Arthralgia,27APR2025,.,4,15MAY2025
101,Hemorrhoids,05MAY2025,.,5,05MAY2025
101,Vomiting,27MAY2025,27MAY2025,6,27MAY2025
201,Genital bleeding,18JAN2025,30JAN2025,1,18JAN2025
201,Nausea,23JAN2025,.,2,23JAN2025
201,Pelvic pain,23JAN2025,.,3,23JAN2025
201,Genital bleeding,08FEB2025,13FEB2025,4,08FEB2025
201,Asthenia,27FEB2025,.,5,27FEB2025
201,Vaginal dryness,27FEB2025,.,6,27FEB2025
201,Genital bleeding,01MAR2025,01MAR2025,7,.
201,Diarrhea,27MAR2025,.,8,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Rash,27MAR2025,.,9,27MAR2025
201,Hyponatremia,08MAY2025,.,10,08MAY2025
;

Now you can sort the data by subject, term, start, and end so you can detect the overlapping records.

proc sort data=test;
  by usubjid aellt astdt fake_end ;
run;

data want ;
  set test ;
  by usubjid aellt ;
  if first.usubjid then AJOINID=0;
  if first.aellt or (astdt > lag(fake_end)) then AJOINID+1;
run;

Result

                                                                    f
     U                                            E                 a       A
     S                                            X      A          k       J
     U          A              A          A       A      S          e       O
     B          E              S          E       M      T          _       I
 O   J          L              T          N       P      D          e       N
 b   I          L              D          D       L      T          n       I
 s   D          T              T          T       E      C          d       D

 1  101  Anorexia          23APR2025  29APR2025   3  23APR2025  29APR2025   1
 2  101  Arthralgia        27APR2025  14MAY2025   4  27APR2025  14MAY2025   2
 3  101  Arthralgia        27APR2025  15MAY2025   4  14MAY2025  15MAY2025   2
 4  101  Arthralgia        27APR2025          .   4  15MAY2025  31DEC3999   2
 5  101  Fat               17APR2025  27APR2025   1  17APR2025  27APR2025   3
 6  101  Fat               17APR2025          .   1  27APR2025  31DEC3999   3
 7  101  Hemorrhoids       05MAY2025          .   5  05MAY2025  31DEC3999   4
 8  101  Nausea            17APR2025  29APR2025   2  17APR2025  29APR2025   5
 9  101  Vomiting          27MAY2025  27MAY2025   6  27MAY2025  27MAY2025   6
10  201  Asthenia          27FEB2025          .   5  27FEB2025  31DEC3999   1
11  201  Diarrhea          27MAR2025          .   8  27MAR2025  31DEC3999   2
12  201  Genital bleeding  18JAN2025  30JAN2025   1  18JAN2025  30JAN2025   3
13  201  Genital bleeding  08FEB2025  13FEB2025   4  08FEB2025  13FEB2025   4
14  201  Genital bleeding  01MAR2025  01MAR2025   7             01MAR2025   5
15  201  Hyponatremia      08MAY2025          .  10  08MAY2025  31DEC3999   6
16  201  Nausea            23JAN2025          .   2  23JAN2025  31DEC3999   7
17  201  Pelvic pain       23JAN2025          .   3  23JAN2025  31DEC3999   8
18  201  Rash              27MAR2025          .   9  27MAR2025  31DEC3999   9
19  201  Rash              27MAR2025          .   9  27MAR2025  31DEC3999   9
20  201  Vaginal dryness   27FEB2025          .   6  27FEB2025  31DEC3999  10

If you need it to look more like your spreadsheet then just sort again.

proc sort data=want;
  by usubjid astdt fake_end ;
run;

Result

                                                                    f
     U                                            E                 a       A
     S                                            X      A          k       J
     U          A              A          A       A      S          e       O
     B          E              S          E       M      T          _       I
 O   J          L              T          N       P      D          e       N
 b   I          L              D          D       L      T          n       I
 s   D          T              T          T       E      C          d       D

 1  101  Fat               17APR2025  27APR2025   1  17APR2025  27APR2025   3
 2  101  Nausea            17APR2025  29APR2025   2  17APR2025  29APR2025   5
 3  101  Fat               17APR2025          .   1  27APR2025  31DEC3999   3
 4  101  Anorexia          23APR2025  29APR2025   3  23APR2025  29APR2025   1
 5  101  Arthralgia        27APR2025  14MAY2025   4  27APR2025  14MAY2025   2
 6  101  Arthralgia        27APR2025  15MAY2025   4  14MAY2025  15MAY2025   2
 7  101  Arthralgia        27APR2025          .   4  15MAY2025  31DEC3999   2
 8  101  Hemorrhoids       05MAY2025          .   5  05MAY2025  31DEC3999   4
 9  101  Vomiting          27MAY2025  27MAY2025   6  27MAY2025  27MAY2025   6
10  201  Genital bleeding  18JAN2025  30JAN2025   1  18JAN2025  30JAN2025   3
11  201  Nausea            23JAN2025          .   2  23JAN2025  31DEC3999   7
12  201  Pelvic pain       23JAN2025          .   3  23JAN2025  31DEC3999   8
13  201  Genital bleeding  08FEB2025  13FEB2025   4  08FEB2025  13FEB2025   4
14  201  Asthenia          27FEB2025          .   5  27FEB2025  31DEC3999   1
15  201  Vaginal dryness   27FEB2025          .   6  27FEB2025  31DEC3999  10
16  201  Genital bleeding  01MAR2025  01MAR2025   7             01MAR2025   5
17  201  Diarrhea          27MAR2025          .   8  27MAR2025  31DEC3999   2
18  201  Rash              27MAR2025          .   9  27MAR2025  31DEC3999   9
19  201  Rash              27MAR2025          .   9  27MAR2025  31DEC3999   9
20  201  Hyponatremia      08MAY2025          .  10  08MAY2025  31DEC3999   6

The order of the value of AJOINID are not the same as in your hand calculated EXAMPLE, but the grouping is.

proc freq ;
 tables usubjid*example*ajoinid / list ;
run;
USUBJID   EXAMPLE   AJOINID   Frequency
----------------------------------------
101             1         3          2
101             2         5          1
101             3         1          1
101             4         2          3
101             5         4          1
101             6         6          1
201             1         3          1
201             2         7          1
201             3         8          1
201             4         4          1
201             5         1          1
201             6        10          1
201             7         5          1
201             8         2          1
201             9         9          2
201            10         6          1

 

 

 

 

abhinayingole
Obsidian | Level 7

I would require AJOINID = 10 for 19th observation , as ASTDTC is not equal to previous AENDT

 

abhinayingole_0-1751448127184.png

 

Tom
Super User Tom
Super User

Huh?

You have two identical observations

SUBJECT AE_TERM START_DATE END_DATE
 201    Rash    27MAR2025         .  
 201    Rash    27MAR2025         .  

and want to treat them as two different events? Why?

 

Note that you do not show a separate ONGOING flag in your example data, but generally if the END date is missing that means that the AE was still present when the observation was recorded.  If the AE spanned only a single day the END_DATE would be set to the same date as the START_DATE.

abhinayingole
Obsidian | Level 7

Yes, I have that query too, but my analysis plan doc. want to treat this as separate in AJOINID as 1st obs. (pervious one) has missing end date, which is not equal to the start date (2nd observation)

I am not able to do it programmatically this 😞

someone please help me

 

abhinayingole_1-1751464524650.png

 

Tom
Super User Tom
Super User

It is not a good idea to always just do whatever someone says.  If what they want seems illogical it is your role to make sure they have a good reason for asking for it.

 

But if you want to add the rule that having the missing AENDT on the previous observation triggers a new grouping then that is not very hard to do.

 

data want ;
  set test ;
  by usubjid aellt ;
  if first.usubjid then AJOINID=0;
  if first.aellt or (astdt > lag(fake_end)) or missing(lag(aendt)) then AJOINID+1;
run;

proc print;
 where aellt='Rash';
run;
  Obs USUBJID AELLT     ASTDT     AENDT EXAMPLE  ASTDTC    fake_end AJOINID

   18   201   Rash  27MAR2025         .    9    27MAR2025 31DEC3999     9
   19   201   Rash  27MAR2025         .    9    27MAR2025 31DEC3999    10
 

And since SAS treats missing numeric values as less than any actual number the rule reduces to just:

  if first.aellt or (astdt > lag(aendt)) then AJOINID+1;

Note: You still probably want the FAKE_END variable to put the observations into a reasonable order.

abhinayingole
Obsidian | Level 7

Thanks Tom for the reply.

but, by USUBJID AELLT will not achieve the below sorting order.

I want AJOINID to assign as per USUBJID ASTDT AELLT

that is why I am finding it very hard to do it.

do you have any idea how to achieve this one ? (again, thanks for all your reply 🙂 )

 

abhinayingole_0-1751520550278.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 1478 views
  • 1 like
  • 3 in conversation