Hi ,
I have to create join variable and astdt variable (as shown in the dataset), JOIN variable for each USUBJID and AE.
ASTDT is date9. Format for AESTDTC variable.
To check the set of observation with overlapping AEs.
For eg. if you see 5th observation Neutropenia, this observation fall in between previous observation of Neutropenia (ie. aestdtc = 2025-01-27 is in between previous observation (Neutropenia aestdtc = 2025-01-06 aendt=07FEB2025)
Which is why it is considered as same event with JOIN = 4 for both 4th and 5th observation.
And the ASTDT is same as previous one.
Could you please help me do programmatically.
I have also paste the datalines for the test dataset (input dataset).
data test;
infile datalines dlm='09'x dsd truncover;
informat aendt date9.;
input usubjid :$3. aestdtc :$10. aendt :date9. ae :$200.;
format aendt date9.;
datalines;
101 2024-12-18 28JAN25 Appetite lost
101 2024-12-19 26DEC24 Constipation
101 2024-12-30 10FEB25 Thrombopenia
101 2025-01-06 07FEB25 Neutropenia
101 2025-01-27 07FEB25 Neutropenia
101 2025-01-13 23MAR25 Anemia
101 2025-02-10 23MAR25 Anemia
101 2025-01-25 10FEB25 Dizziness
101 2025-01-25 10FEB25 Dyspnea
101 2025-01-25 . Fatigue
101 2025-01-27 31JAN25 Nausea
101 2025-01-27 31JAN25 WBC decreased
101 2025-02-03 15FEB25 Rhinitis
101 2025-03-01 . Epigastralgia
101 2025-03-01 03MAR25 Nausea
101 2025-03-01 03MAR25 Vomiting
101 2025-03-24 . Anemia
101 2025-03-24 . Thrombopenia
;
run;
In order to make this work, you cannot have aestdtc as a "character date" and aendt as a numeric date, since SAS does not know how to compare character strings to numbers. Making aestdtc numeric should solve the problem. So this code will create a numeric variable named aestdc, and since both aendt and aestdtc are numeric dates, they can be compared.
data test1;
set test(rename=(aestdc=old_aestdc));
aestdc=input(old_aestdc,yymmdd10.); /* the new aestdc in a numeric date */
run;
Now that both are dates, the comparisons should work. This code seems to do what you want.
data want;
set test1;
prev_ae=lag(ae);
prev_aendt=lag(aendt);
prev_aestdtc=lag(aestdtc);
format prev_aendt prev_aestdtc date9.;
/* Compare dates of adjacent observations, if they don't overlap then add 1 to JOIN */
if not (aendt<=prev_aendt and aestdtc>=prev_aestdtc) or ae^=prev_ae then join+1;
drop prev_:;
run;
This will not work as I require result per USUBJID and I want AESTDTC as character in final dataset (as in input itself)
data want;
set test1;
by usubjid;
prev_ae=lag(ae);
prev_aendt=lag(aendt);
prev_aestdtc=lag(aestdtc);
format prev_aendt prev_aestdtc date9.;
/* Compare dates of adjacent observations, if they don't overlap then add 1 to JOIN */
if first.usubjid then join=0;
if not (aendt<=prev_aendt and aestdtc>=prev_aestdtc) or ae^=prev_ae then join+1;
drop prev_:;
run;
If you need AESTDTC as character, simply modify my code to so you don't rename AESTDTC (so it leaves AESTDTC unchanged as character) and create a new variable for the numeric SAS date .
still it is not working .... i am getting below result in this case
I can't read your mind. Please tell us, in words, what is wrong. Also, please show us the code you used that doesn't produce the right results.
Hi Kurt
actually I wanted to present AESTDTC in ISO8601 format in my dataset.
sorry for any confusion.
@abhinayingole wrote:
Hi Kurt
actually I wanted to present AESTDTC in ISO8601 format in my dataset.
sorry for any confusion.
Then you want to make the date numeric and use the proper ISO 8601 format (which SAS has built-in), see https://documentation.sas.com/doc/en/pgmmvacdc/9.4/allprodslang/syntaxByCategory-format.htm#p0aa41u6...
You do not want character dates.
In a DATA step, use
by usubjid;
At first.usubjid, set join to 1.
If not first.usubjid, and either ae is not equal to lag(ae) or aestdtc is greater than lag(aendt), increment join by 1.
If I understand your request correctly, then:
data test;
infile datalines truncover;
input usubjid $3. aestdtc :$10. @17 aendt date7. ae & :$200.;
format aendt date9.;
datalines;
101 2024-12-18 28JAN25 Appetite lost
101 2024-12-19 26DEC24 Constipation
101 2024-12-30 10FEB25 Thrombopenia
101 2025-01-06 07FEB25 Neutropenia
101 2025-01-27 07FEB25 Neutropenia
101 2025-01-13 23MAR25 Anemia
101 2025-02-10 23MAR25 Anemia
101 2025-01-25 10FEB25 Dizziness
101 2025-01-25 10FEB25 Dyspnea
101 2025-01-25 . Fatigue
101 2025-01-27 31JAN25 Nausea
101 2025-01-27 31JAN25 WBC decreased
101 2025-02-03 15FEB25 Rhinitis
101 2025-03-01 . Epigastralgia
101 2025-03-01 03MAR25 Nausea
101 2025-03-01 03MAR25 Vomiting
101 2025-03-24 . Anemia
101 2025-03-24 . Thrombopenia
run;
DATA WANT;
SET test;
by usubjid ae notsorted;
astdt=input(aestdtc,yymmdd10.);
format astdt date9.;
if astdt>lag(aendt) or first.ae=1 then join+1;
if first.usubjid=1 then join=1;
run;
Do you have instances of a given AE qualifying to be assigned the same JOIN value, but occurring in non-consecutive observations? If so, this code would need to be modified.
Note the BY statement allows detection of whenever a new AE description occurs (first.ae=1).
Also please provide your sample data in a working data step. For example, your informat of date9. for aendt should have been date7.
Thanks Mkeintz,
with this code the JOIN values are fine but not ASTDT
for eg. for observation no. 5 I want ASTDT as 06JAN2025 as this is overlapping AE
could you please help me with this ?
ohhh, no. this code is not working .
I think it requires proper sorting . if the datalines as in the below code (I just shuffle few lines as compare to previous code).
this solution is not working.
data test;
infile datalines dlm='09'x dsd truncover;
informat aendt date9.;
input usubjid :$3. aestdtc :$10. aendt :date9. ae :$200.;
format aendt date9.;
datalines;
101 2024-12-19 26DEC24 Constipation
101 2024-12-30 10FEB25 Thrombopenia
101 2025-01-27 07FEB25 Neutropenia
101 2025-01-13 23MAR25 Anemia
101 2025-01-25 10FEB25 Dizziness
101 2025-01-25 10FEB25 Dyspnea
101 2024-12-18 28JAN25 Appetite lost
101 2025-01-25 . Fatigue
101 2025-02-10 23MAR25 Anemia
101 2025-01-27 31JAN25 Nausea
101 2025-01-27 31JAN25 WBC decreased
101 2025-02-03 15FEB25 Rhinitis
101 2025-03-01 . Epigastralgia
101 2025-03-01 03MAR25 Nausea
101 2025-03-01 03MAR25 Vomiting
101 2025-01-06 07FEB25 Neutropenia
101 2025-03-24 . Anemia
101 2025-03-24 . Thrombopenia
;
run;
@mkeintz wrote:
...
Also please provide your sample data in a working data step. For example, your informat of date9. for aendt should have been date7.
Actually since the original posted data step is using LIST MODE input the width on the informat specification does not matter. You seem to have converted the step to using FORMATTED MODE to read the numeric date variable, which is not necessary when missing values are represented by periods in the card images.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.