Hi all,
I will provide few more examples of my data, because it is more complex. The dot means missing data, it also means the treatment is "ongoing"
If I have this :
patientid | name | date_start | date_end | comment |
1 | drug A | 23.06.2015 | . | here the dot disappear, regarding the following line |
1 | drug A | 23.06.2015 | 25.09.2015 | |
1 | drug A | 25.09.2015 | . | here the dot disappear, regarding the following line |
1 | drug A | 25.09.2015 | 12.11.2015 | |
1 | drug A | 13.11.2015 | 22.12.2015 | |
2 | drug B | 12.11.2015 | 25.11.2015 | |
2 | drug B | 26.11.2015 | . | here the dot disappear, regarding the following line |
2 | drug B | 26.11.2015 | 22.06.2016 | |
2 | drug B | 12.11.2016 | 25.11.2016 | |
3 | drug C | 12.11.2015 | 25.11.2015 | |
3 | drug C | 26.11.2015 | . | here we don't know what happened from 26.11.2015 to 12.11.2016, so we need to keep the dot. |
3 | drug C | 12.11.2016 | 25.11.2016 | |
4 | drug D | 12.11.2015 | 25.11.2015 | |
4 | drug D | 26.11.2015 | . | here the dot disappear, regarding the following line |
4 | drug D | 26.11.2015 | 22.06.2016 | |
4 | drug D | 12.11.2016 | 25.11.2016 |
I would like in my final table to have this:
patientid | name | date_start | date_end |
1 | drug A | 23.06.2015 | 22.12.2015 |
2 | drug B | 12.11.2015 | 22.06.2016 |
2 | drug B | 12.11.2016 | 25.11.2016 |
3 | drug C | 12.11.2015 | . |
3 | drug C | 12.11.2016 | 25.11.2016 |
4 | drug D | 12.11.2015 | 22.06.2016 |
4 | drug D | 12.11.2016 | 25.11.2016 |
Note: when we compare the previous end date with the following start date, we can say that drug intake is continuous if those two dates are same, or if the following start date is = previous end date+1.
Also, note that I never have a dot in start date, its always in end date.
It looks so complicated to me, I have a feeling this thing is impossible to resolve...
Thank you very much +++
Kind regards.
Your initial comment about treatment of missing date values (apparently you never have missing values for DATE_START, only for DATE_END, is that right?) referred to introducing some sort of max date. I initially took this to mean some sort of study-wide maximum date, but I don't see any "max date" rule being applied here.
I do see instances of tied, consecutive, START_DATE values, in which the first record has a missing value for DATE_END and the last has a valid value for DATE_END. So, if you are saying that the only instances in which a missing DATE_END is to be replaced by a valid value is when there are consecutive records with the same DATE_START, then you have a fairly straightforward problem.
Namely when you encounter a missing DATE_END value in the case of consecutive identical DATE_START records, you can assign DATE_END=_NEXT_START-1, which just supports continuing the loop to the next record, where the real DATE_END becomes available. Here's my suggestion with a single new statement:
data have;
infile cards expandtabs truncover;
input patientid name & $10. (date_start date_end) (:ddmmyy10.);
format date_start date_end ddmmyy10.;
datalines;
1 drug A 23.06.2015 .
1 drug A 23.06.2015 25.09.2015
1 drug A 25.09.2015 .
1 drug A 25.09.2015 12.11.2015
1 drug A 13.11.2015 22.12.2015
2 drug B 12.11.2015 25.11.2015
2 drug B 26.11.2015 .
2 drug B 26.11.2015 22.06.2016
2 drug B 12.11.2016 25.11.2016
3 drug C 12.11.2015 25.11.2015
3 drug C 26.11.2015 .
3 drug C 12.11.2016 25.11.2016
4 drug D 12.11.2015 25.11.2015
4 drug D 26.11.2015 .
4 drug D 26.11.2015 22.06.2016
4 drug D 12.11.2016 25.11.2016
run;
data want (drop=_:);
do until ((date_end+1 < _next_start) or last.name=1);
set have;
by patientid name date_start;
merge have have (firstobs=2 keep=date_start rename=(date_start=_next_start));
_min_start=min(_min_start,date_start);
if date_end=. and last.date_start=0 then date_end=_next_start-1;
end;
date_start=_min_start;
run;
The "if date_end=. and last.date_start=0" condition is a test for missing date_end, but the record can not be the last one for a given date_start. And the BY statement has been changed to
by patientid name date_start.
Dear all,
Thank you for providing me with your various solutions. I am sorry for my late answer.
@mkeintz, I used your last program as the solution. But first I cleaned my data :
-I did "proc sort" on a rows that have the same start date but different end date,
-I deleted all drugs that have missing start date and end date.
Then I run your program, and after I checked all data fusions on all occurrences, and all looks to work perfectly.
Thank you so much.
Have a nice day.
Aljosa
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.