Hi ,
I have to create join variable in below dataset (as shown in the dataset), for each USUBJID and AE.
To check the set of observation with overlapping AEs.
For eg. if you see 4th observation start date is 9APR2020, which did not fall under any of the previous AE start and end date, that is why join variable value is 2 (2nd set of observation with continuous AE)
How to do it programmatically.
I have also paste the datalines for the test dataset.
data test; informat astdt date9. aendt date9.; input @1 usubjid $3. @5 astdt date9. @15 aendt date9. @25 ae $15.; format astdt aendt date9.; datalines; 301 21FEB2020 06MAR2020 Anemia 301 06MAR2020 13MAR2020 Anemia 301 06MAR2020 13MAR2020 Anemia 301 09APR2020 21MAY2020 Anemia 301 09APR2020 03JUN2020 Anemia 301 21MAY2020 01JUL2020 Anemia 301 03JUN2020 01JUL2020 Anemia 301 01JUL2020 28JUL2020 Anemia 301 01JUL2020 18NOV2020 Anemia 301 28JUL2020 26AUG2020 Anemia 301 26AUG2020 21OCT2020 Anemia 301 18NOV2020 15DEC2020 Anemia 301 18NOV2020 15DEC2020 Anemia 301 15DEC2020 . Anemia run;
data test;
informat astdt date9. aendt date9.;
input @1 usubjid $3.
@5 astdt date9.
@15 aendt date9.
@25 ae $15.;
format astdt aendt date9.;
datalines;
301 21FEB2020 06MAR2020 Anemia
301 06MAR2020 13MAR2020 Anemia
301 06MAR2020 13MAR2020 Anemia
301 09APR2020 21MAY2020 Anemia
301 09APR2020 03JUN2020 Anemia
301 21MAY2020 01JUL2020 Anemia
301 03JUN2020 01JUL2020 Anemia
301 01JUL2020 28JUL2020 Anemia
301 01JUL2020 18NOV2020 Anemia
301 28JUL2020 26AUG2020 Anemia
301 26AUG2020 21OCT2020 Anemia
301 18NOV2020 15DEC2020 Anemia
301 18NOV2020 15DEC2020 Anemia
301 15DEC2020 . Anemia
run;
data want;
set test;
by usubjid ae;
if first.ae then join = 1;
if not first.ae and astdt gt lag(aendt) then join + 1;
run;
Untested, posted from my tablet.
Note that the use of a SUM Statement causes an implied RETAIN.
Thanks for the reply, but lag function will not work here as observation no. 12 still consider as overlapping AE , as observation no. 9 is 18NOV2020
So we need a different approach, by remembering a maximum end date.
data want;
set test;
by usubjid ae;
retain end;
if first.ae
then do;
join = 1;
end = aendt;
end;
else do;
if astdt gt end
then do;
join + 1;
end = aendt;
end;
else end = max(end,aendt);
end;
drop end;
run;
With this derivation, join value is 3 for observation no. 4 , we want it as 2, as the 2nd set of AE overlapping starts from observation no. 4.
Hi Josvander,
I don't have code for this, however I wanted JOIN variable in below manner.
I hope you wanted code for below dataset , right ?
Please correct me if I mistaken.
@abhinayingole wrote:
With this derivation, join value is 3 for observation no. 4 , we want it as 2, as the 2nd set of AE overlapping starts from observation no. 4.
This cannot be the result of my code. In my code, join is only incremented, never decremented, so it can't get a 2 after a 3.
Please post the complete log from the code that gave you this result.
Hi Kurt,
Yes , you are right. apology.
there must be some mistake at my end, but it is working fine now.
Hi Kurt,
Thanks for the solution. it worked.
However, I have one more scenario.
with your derivation, we got JOIN value 2 for 7th observation.
we need the JOIN value as 1 for this observation. as we don't know the end date of pervious observation (or missing end date for any of the previous records), so we consider this as cutaneous overlap AE.
The last observation JOIN value 2 is correct, though it has missing end date, but no other previous record has missing end dates.
do you have any solution for this.
I am pasting datalines for this test dataset below for refrence.
data test1;
informat astdt date9. aendt date9.;
input @1 usubjid $3.
@5 astdt date9.
@15 aendt date9.
@25 ae $15.;
format astdt aendt date9.;
datalines;
101 10JUN2016 06JUL2016 vomit
101 06JUL2016 08JUL2016 vomit
101 06JUL2016 08JUL2016 vomit
101 08JUL2016 22JUL2016 vomit
101 22JUL2016 28JUL2016 vomit
101 28JUL2016 . vomit
101 16SEP2016 . vomit
202 27DEC2022 17JAN2023 vomit
202 03JAN2023 17JAN2023 vomit
202 10JAN2023 24JAN2023 vomit
202 17JAN2023 28FEB2023 vomit
202 31JAN2023 14FEB2023 vomit
202 31JAN2023 14FEB2023 vomit
202 21FEB2023 28FEB2023 vomit
202 28FEB2023 21MAR2023 vomit
202 11APR2023 . vomit
;
run;
If you are going to post a data step that reads inline data (datalines) with fixed column input you MUST paste the code into either a code or text box opened with the "running man" or </> icons above the main message window.
The forum software will reformat text pasted into the main message window.
So your data step has invalid data for the last observation because spaces have been removed and the format for reading the Aendt variable then reads part of Anemia in the columns for the date.
Hi , as requested , I am pasting code at </> , please let me know if it is still invalid
data test; informat astdt date9. aendt date9.; input @1 usubjid $3. @5 astdt date9. @15 aendt date9. @25 ae $15.; format astdt aendt date9.; datalines; 301 21FEB2020 06MAR2020 Anemia 301 06MAR2020 13MAR2020 Anemia 301 06MAR2020 13MAR2020 Anemia 301 09APR2020 21MAY2020 Anemia 301 09APR2020 03JUN2020 Anemia 301 21MAY2020 01JUL2020 Anemia 301 03JUN2020 01JUL2020 Anemia 301 01JUL2020 28JUL2020 Anemia 301 01JUL2020 18NOV2020 Anemia 301 28JUL2020 26AUG2020 Anemia 301 26AUG2020 21OCT2020 Anemia 301 18NOV2020 15DEC2020 Anemia 301 18NOV2020 15DEC2020 Anemia 301 15DEC2020 . Anemia run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.