I have a data set like below.
PTID new_stdt new_end
P3747440 02Nov2011 29Dec2011
P3747440 03Nov2011 02Dec2011
P3747440 01Dec2011 30Dec2011
P3747440 30Dec2011 31Dec2011
I want to line up the dates .So that my new data looks like below.
PTID new_stdt new_end
P3747440 2-Nov-11 2-Nov-11
P3747440 3-Nov-11 30-Nov-11
P3747440 1-Dec-11 1-Dec-11
P3747440 2-Dec-11 29-Dec-11
P3747440 30-Dec-11 30-Dec-11
P3747440 31-Dec-11 31-Dec-11
How can I do this ?Any leads will be appreciated.
Regards,
Aninda
What is the logic to go from have to want, I see no logical reason why row 1 would have 2nov11 and the second row would have 30nov11.
Hello.
Thanks for your reply.My objective is to make the dates consecutive with no gaps.
Regards,
Aninda
Am in a meeting so can't look too much, but I think your want data may be wrong. This code should get you near:
data have; informat new_stdt new_end date9.; input ptid $ new_stdt new_end; format new_stdt new_end date9.; datalines; P3747440 02Nov2011 29Dec2011 P3747440 03Nov2011 02Dec2011 P3747440 01Dec2011 30Dec2011 P3747440 30Dec2011 31Dec2011 ; run; data inter; set have (keep=ptid new_stdt) have (keep=ptid new_end rename=(new_end=new_stdt)); run; proc sort data=inter; by ptid new_stdt; run; data want; set inter (rename=(new_stdt=stdt)); format new_stdt new_end date9.; by ptid; retain lstd; if first.ptid then lstd=stdt; else do; new_stdt=lstd; new_end=stdt-1; output; end; lstd=stdt; run;
Is the rule like that:
a) time ranges must not overlap
b) time ranges need to be split along calender months
?
Like @RW9, I think your expected output is wrong. I can't understand why your expected new data doesn't have 7 rows and include a row with new_stdt & new_end=29-Dec-11
PTID new_stdt new_end
P3747440 2-Nov-11 2-Nov-11
P3747440 3-Nov-11 30-Nov-11
P3747440 1-Dec-11 1-Dec-11
P3747440 2-Dec-11 28-Dec-11
P3747440 29-Dec-11 29-Dec-11
P3747440 30-Dec-11 30-Dec-11
P3747440 31-Dec-11 31-Dec-11
I think basically you are taking all your uniques dates regardless of whether it's a start or end date, and then using those to create a new start_date. The corresponding end date is then equal to the next start date -1. Except for the last date, where it looks like you want new_end=new_stdt
In my attempt to code this I've reversed the date order because i'm comparing to the next date. It starts with @RW9's code as far as the INTER data table.
data have;
informat new_stdt new_end date9.;
input ptid $ new_stdt new_end;
format new_stdt new_end date9.;
datalines;
P3747440 02Nov2011 29Dec2011
P3747440 03Nov2011 02Dec2011
P3747440 01Dec2011 30Dec2011
P3747440 30Dec2011 31Dec2011
;
run;
data inter;
set have (keep=ptid new_stdt) have (keep=ptid new_end rename=(new_end=new_stdt));
run;
proc sort data=inter nodup;
by ptid descending new_stdt;
run;
data want (keep= ptid new_stdt new_end);
format new_end date9.;
set inter;
retain lagdate ;
by ptid;
diff=lagdate-new_stdt;
if lagdate-new_stdt in (.,1) then new_end=new_stdt;else new_end=lagdate-1;
output;
if last.ptid then lagdate=. ;else lagdate=new_stdt ;
run;
proc sort data=want; by ptid new_stdt;run;
Like @RW9 , I think your expected data may be wrong. I don't see why it doesn't look like this.
PTID new_stdt new_end
P3747440 2-Nov-11 2-Nov-11
P3747440 3-Nov-11 30-Nov-11
P3747440 1-Dec-11 1-Dec-11
P3747440 2-Dec-11 28-Dec-11
P3747440 29-Dec-11 29-Dec-11
P3747440 30-Dec-11 30-Dec-11
P3747440 31-Dec-11 31-Dec-11
I think you are taking every available date and creating a period between each date & the day before the next date.
Then on the last date, that is the start date and the end date.
@RW9code does this , but needs to include NODUP when sorting data table 'INTER' .
Also add a line right at the end of the 'DATA WANT' step
...
if last.ptid then do;
new_end=stdt;
output;
end;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.