I have a dataset that looks like this:
ID Date Status 1 1/1/2010 A 1 2/14/2010 B 1 3/1/2010 . 2 1/11/2010 A 2 1/20/2010 .
I need to a dataset where observations are daily increments of the 2 dates, and I want to keep the same 'status' until the next date.
Essentially, I would like my final dataset to somehow look like this:
ID Date Status 1 1/1/2010 A 1 1/2/2010 A 1 1/3/2010 A 1 ... A 1 2/13/2010 A 1 2/14/2010 B 1 2/15/2010 B 1 ... B 1 3/1/2010 B 2 1/11/2010 A 2 1/12/2010 A 2 ... A 3 1/20/2010 A
This seems to be a simple enough problem, but I can't seem to find the most efficient way to do this.
Thanks for your help!
I think that the following does what you want to accomplish:
data have; informat date mmddyy10.; format date mmddyy10.; input ID Date Status $; cards; 1 1/1/2010 A 1 2/14/2010 B 1 3/1/2010 . 2 1/11/2010 A 2 1/20/2010 . ; data want (drop=n_:); merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status)); if not missing(status) then output; if ID eq n_ID then do; if missing(n_status) then n_end=n_date; else n_end=n_date-1; do date=date+1 to n_end; output; end; end; run;
Art, CEO, AnalystFinder.com
I think that the following does what you want to accomplish:
data have; informat date mmddyy10.; format date mmddyy10.; input ID Date Status $; cards; 1 1/1/2010 A 1 2/14/2010 B 1 3/1/2010 . 2 1/11/2010 A 2 1/20/2010 . ; data want (drop=n_:); merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status)); if not missing(status) then output; if ID eq n_ID then do; if missing(n_status) then n_end=n_date; else n_end=n_date-1; do date=date+1 to n_end; output; end; end; run;
Art, CEO, AnalystFinder.com
I suggest you compare @s_lassen's results with those from the code I suggested. His leaves off the last record, while mine includes it.
Art, CEO, AnalystFinder.com
Some thoughts on a solution.
Putting the dates in macro variables then subtracting them for each interval may work.
Your dataset is call 'have'
Data _null_;
set have end=no_more;
call symputx('Date'||left(_n_),date);
If no_more then call symputx('num_records',(_n_));
run;
now you dates in macro variables date1 date2 etc..
Run a PROC FREQ on the variable ID to get the size of a do loop inside a datastep to make new records. Loop over all unique IDs
proc freq data=have;
var ID / out=distinctID;
run;
now you you have a dataset distinctID
data _null_;
set distinctID end=no_more;
call symputx ('ID'||left(_n_),ID);
call symputx('CounID'||left(_n_),count);
if no_more then call symputx('num_distinctIDs',(_n_));
run;
data have;
informat date mmddyy10.;
format date mmddyy10.;
input ID Date Status $;
cards;
1 1/1/2010 A
1 2/14/2010 B
1 3/1/2010 .
2 1/11/2010 A
2 1/20/2010 .
;
run;
data temp;
set have;
by id;
length _status $ 40;
retain _status;
if first.id then call missing(_status);
if not missing(status) then _status=status;
drop status;
run;
data want;
merge temp temp(keep=id date rename=(id=_id date=_date) firstobs=2);
output;
if id=_id then do;
do i=date+1 to _date-1;
date=i;output;
end;
end;
drop _id _date i;
run;
Hi Fettah,
An alternative solution to this might be one that does not use the DATA step MERGE.
The code below might be useful.
Data Have;
Input ID
Date MMDDYY10.
@15 Status $1.
;
Format Date Date9.;
DATALINES;
1 1/1/2010 A
1 2/14/2010 B
1 3/1/2010 .
2 1/11/2010 A
2 1/20/2010 .
;
Run;
**************************************************************;
*Make sure your rows are in the correct order. ;
**************************************************************;
Proc Sort Data=Have;
By ID
Date
;
Run;
**************************************************************;
*Now get the dataset that we want....... ;
**************************************************************;
Data Want;
********************************************************************************************;
*Use a NULL Informat statement to force the order of our columns just to make reading easier;
********************************************************************************************;
Informat ID
Date
Status
;
******************************************************************************;
*Set these two variables to hold their values between passes of the DATA step.;
******************************************************************************;
Retain prevDate .
prevStatus ""
;
*******************************************************************************;
*Go and get our dataset and RENAME our two variables that we want to manipulate;
*******************************************************************************;
Set Have(Rename=(Date=xDate Status=xStatus));
By ID
xDate
;
********************************************************************************;
*If this is anything BUT the first row for a group of rows sharing the same ID: ;
*1) Trap the Date value held over from the previous row as our StartDate. ;
*2.1) If this is the final row in our group of rows with same ID, trap its Date ;
* value as our StopDate. ;
*2.2) If this is NOT the final row in our group of rows with the same ID, ;
* subtract 1 day from its Date value and use this as our StopDate. ;
********************************************************************************;
If NOT First.ID Then
Do;
StartDate=prevDate;
If Last.ID Then
Do;
StopDate=xDate;
End;
Else
Do;
StopDate=xDate-1;
End;
********************************************************************************;
*3) Loop over StartDate to StopDate and Output one row per pass of the loop. ;
* We will use the value of Status held over from the previous row. ;
********************************************************************************;
Status=prevStatus;
Do Date=StartDate To StopDate;
Output;
End;
End;
********************************************************************************************;
*Set aside the Date and Status values from this row ready for the next pass of the DATA step;
********************************************************************************************;
prevDate=xDate;
prevStatus=xStatus;
**************************************************************;
*Tidy up the dataset....... ;
**************************************************************;
Format prevDate Date xDate Date9.;
Drop prevDate
prevStatus
StartDate
StopDate
xDate
xStatus
;
Run;
I hope that this helps more than it hinders.
Cheers,
Downunder Dave.
Assuming your date is sorted by ID and date:
data want;
set have end=done;
by ID;
if not done then
set have(keep=date rename=(date=next_date) firstobs=2);
if not missing(status);
output;
if not last.ID then do date=date+1 to next_date-1;
output;
end;
drop next_date;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.