BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fluorite | Level 6

Hi all,

I will provide few more examples of my data, because it is more complex. The dot means missing data, it also means the treatment is "ongoing"

If I have this :

1drug the dot disappear, regarding the following line
1drug A23.06.201525.09.2015 
1drug the dot disappear, regarding the following line
1drug A25.09.201512.11.2015 
1drug A13.11.201522.12.2015 
2drug B12.11.201525.11.2015 
2drug the dot disappear, regarding the following line
2drug B26.11.201522.06.2016 
2drug B12.11.201625.11.2016 
3drug C12.11.201525.11.2015 
3drug we don't know what happened from 26.11.2015 to 12.11.2016, so we need to keep the dot.
3drug C12.11.201625.11.2016 
4drug D12.11.201525.11.2015 
4drug the dot disappear, regarding the following line
4drug D26.11.201522.06.2016 
4drug D12.11.201625.11.2016 


I would like in my final table to have this:

1drug A23.06.201522.12.2015
2drug B12.11.201522.06.2016
2drug B12.11.201625.11.2016
3drug C12.11.2015.
3drug C12.11.201625.11.2016
4drug D12.11.201522.06.2016
4drug D12.11.201625.11.2016


Note: when we compare the previous end date with the following start date, we can say that drug intake is continuous if those two dates are same, or if the following start date is = previous end date+1.

Also, note that I never have a dot in start date, its always in end date.


It looks so complicated to me, I have a feeling this thing is impossible to resolve...

Thank you very much +++

Kind regards.


Your initial comment about treatment of missing date values (apparently you never have missing values for DATE_START, only for DATE_END, is that right?) referred to introducing some sort of max date.  I initially took this to mean some sort of study-wide maximum date, but I don't see any "max date" rule being applied here.


I do see instances of tied, consecutive, START_DATE values, in which the first record has a missing value for DATE_END and the last has a valid value for DATE_END.  So, if you are saying that the only instances in which a missing DATE_END is to be replaced by a valid value is when there are consecutive records with the same DATE_START, then you have a fairly straightforward problem. 


Namely when you encounter a missing DATE_END value in the case of consecutive identical DATE_START records, you can assign DATE_END=_NEXT_START-1, which just supports continuing the loop to the next record, where the real DATE_END becomes available.  Here's my suggestion with a single new statement:


data have;
  infile cards expandtabs truncover;
  input patientid	name & $10.	(date_start	date_end) (:ddmmyy10.);
  format date_start	date_end ddmmyy10.;
1	drug A	23.06.2015	.
1	drug A	23.06.2015	25.09.2015
1	drug A	25.09.2015	.
1	drug A	25.09.2015	12.11.2015
1	drug A	13.11.2015	22.12.2015
2	drug B	12.11.2015	25.11.2015
2	drug B	26.11.2015	.
2	drug B	26.11.2015	22.06.2016
2	drug B	12.11.2016	25.11.2016
3	drug C	12.11.2015	25.11.2015
3	drug C	26.11.2015	.
3	drug C	12.11.2016	25.11.2016
4	drug D	12.11.2015	25.11.2015
4	drug D	26.11.2015	.
4	drug D	26.11.2015	22.06.2016
4	drug D	12.11.2016	25.11.2016

data want (drop=_:);
  do until ((date_end+1 < _next_start) or;
    set have;
    by patientid name  date_start;
    merge have have (firstobs=2 keep=date_start rename=(date_start=_next_start));
    if date_end=. and last.date_start=0 then date_end=_next_start-1;

The "if date_end=. and last.date_start=0" condition is a test for missing date_end, but the record can not be the last one for a given date_start.  And the BY statement has been changed to
  by patientid name date_start.

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

Fluorite | Level 6

Dear all,

Thank you for providing me with your various solutions. I am sorry for my late answer.

@mkeintz, I used your last program as the solution. But first I cleaned my data :

-I did "proc sort" on a rows that have the same start date but different end date,

-I deleted all drugs that have missing start date and end date.

Then I run your program, and after I checked all data fusions on all occurrences, and all looks to work perfectly.Smiley Very Happy

Thank you so much.

Have a nice day.


Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1 like
  • 6 in conversation