BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Aljosa
Fluorite | Level 6

Hi all,

I will provide few more examples of my data, because it is more complex. The dot means missing data, it also means the treatment is "ongoing"

If I have this :

patientidnamedate_startdate_endcomment
1drug A23.06.2015.here the dot disappear, regarding the following line
1drug A23.06.201525.09.2015 
1drug A25.09.2015.here the dot disappear, regarding the following line
1drug A25.09.201512.11.2015 
1drug A13.11.201522.12.2015 
2drug B12.11.201525.11.2015 
2drug B26.11.2015.here the dot disappear, regarding the following line
2drug B26.11.201522.06.2016 
2drug B12.11.201625.11.2016 
3drug C12.11.201525.11.2015 
3drug C26.11.2015.here we don't know what happened from 26.11.2015 to 12.11.2016, so we need to keep the dot.
3drug C12.11.201625.11.2016 
4drug D12.11.201525.11.2015 
4drug D26.11.2015.here the dot disappear, regarding the following line
4drug D26.11.201522.06.2016 
4drug D12.11.201625.11.2016 

 

I would like in my final table to have this:

patientidnamedate_startdate_end
1drug A23.06.201522.12.2015
2drug B12.11.201522.06.2016
2drug B12.11.201625.11.2016
3drug C12.11.2015.
3drug C12.11.201625.11.2016
4drug D12.11.201522.06.2016
4drug D12.11.201625.11.2016

 

Note: when we compare the previous end date with the following start date, we can say that drug intake is continuous if those two dates are same, or if the following start date is = previous end date+1.

Also, note that I never have a dot in start date, its always in end date.

 

It looks so complicated to me, I have a feeling this thing is impossible to resolve...

Thank you very much +++

Kind regards.

mkeintz
PROC Star

Your initial comment about treatment of missing date values (apparently you never have missing values for DATE_START, only for DATE_END, is that right?) referred to introducing some sort of max date.  I initially took this to mean some sort of study-wide maximum date, but I don't see any "max date" rule being applied here.

 

I do see instances of tied, consecutive, START_DATE values, in which the first record has a missing value for DATE_END and the last has a valid value for DATE_END.  So, if you are saying that the only instances in which a missing DATE_END is to be replaced by a valid value is when there are consecutive records with the same DATE_START, then you have a fairly straightforward problem. 

 

Namely when you encounter a missing DATE_END value in the case of consecutive identical DATE_START records, you can assign DATE_END=_NEXT_START-1, which just supports continuing the loop to the next record, where the real DATE_END becomes available.  Here's my suggestion with a single new statement:

 

data have;
  infile cards expandtabs truncover;
  input patientid	name & $10.	(date_start	date_end) (:ddmmyy10.);
  format date_start	date_end ddmmyy10.;
datalines;
1	drug A	23.06.2015	.
1	drug A	23.06.2015	25.09.2015
1	drug A	25.09.2015	.
1	drug A	25.09.2015	12.11.2015
1	drug A	13.11.2015	22.12.2015
2	drug B	12.11.2015	25.11.2015
2	drug B	26.11.2015	.
2	drug B	26.11.2015	22.06.2016
2	drug B	12.11.2016	25.11.2016
3	drug C	12.11.2015	25.11.2015
3	drug C	26.11.2015	.
3	drug C	12.11.2016	25.11.2016
4	drug D	12.11.2015	25.11.2015
4	drug D	26.11.2015	.
4	drug D	26.11.2015	22.06.2016
4	drug D	12.11.2016	25.11.2016
run;

data want (drop=_:);
  do until ((date_end+1 < _next_start) or last.name=1);
    set have;
    by patientid name  date_start;
    merge have have (firstobs=2 keep=date_start rename=(date_start=_next_start));
    _min_start=min(_min_start,date_start);
    if date_end=. and last.date_start=0 then date_end=_next_start-1;
  end;
  date_start=_min_start;
run;

The "if date_end=. and last.date_start=0" condition is a test for missing date_end, but the record can not be the last one for a given date_start.  And the BY statement has been changed to
  by patientid name date_start.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Aljosa
Fluorite | Level 6

Dear all,

Thank you for providing me with your various solutions. I am sorry for my late answer.

@mkeintz, I used your last program as the solution. But first I cleaned my data :

-I did "proc sort" on a rows that have the same start date but different end date,

-I deleted all drugs that have missing start date and end date.

Then I run your program, and after I checked all data fusions on all occurrences, and all looks to work perfectly.Smiley Very Happy

Thank you so much.

Have a nice day.

Aljosa

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1959 views
  • 1 like
  • 6 in conversation