I have a dataset that looks like this:
ID Date Status 1 1/1/2010 A 1 2/14/2010 B 1 3/1/2010 . 2 1/11/2010 A 2 1/20/2010 .
I need to a dataset where observations are daily increments of the 2 dates, and I want to keep the same 'status' until the next date.
Essentially, I would like my final dataset to somehow look like this:
ID Date Status 1 1/1/2010 A 1 1/2/2010 A 1 1/3/2010 A 1 ... A 1 2/13/2010 A 1 2/14/2010 B 1 2/15/2010 B 1 ... B 1 3/1/2010 B 2 1/11/2010 A 2 1/12/2010 A 2 ... A 3 1/20/2010 A
This seems to be a simple enough problem, but I can't seem to find the most efficient way to do this.
Thanks for your help!
I think that the following does what you want to accomplish:
data have; informat date mmddyy10.; format date mmddyy10.; input ID Date Status $; cards; 1 1/1/2010 A 1 2/14/2010 B 1 3/1/2010 . 2 1/11/2010 A 2 1/20/2010 . ; data want (drop=n_:); merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status)); if not missing(status) then output; if ID eq n_ID then do; if missing(n_status) then n_end=n_date; else n_end=n_date-1; do date=date+1 to n_end; output; end; end; run;
Art, CEO, AnalystFinder.com
I think that the following does what you want to accomplish:
data have; informat date mmddyy10.; format date mmddyy10.; input ID Date Status $; cards; 1 1/1/2010 A 1 2/14/2010 B 1 3/1/2010 . 2 1/11/2010 A 2 1/20/2010 . ; data want (drop=n_:); merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status)); if not missing(status) then output; if ID eq n_ID then do; if missing(n_status) then n_end=n_date; else n_end=n_date-1; do date=date+1 to n_end; output; end; end; run;
Art, CEO, AnalystFinder.com
I suggest you compare @s_lassen's results with those from the code I suggested. His leaves off the last record, while mine includes it.
Art, CEO, AnalystFinder.com
Some thoughts on a solution.
Putting the dates in macro variables then subtracting them for each interval may work.
Your dataset is call 'have'
Data _null_;
set have end=no_more;
call symputx('Date'||left(_n_),date);
If no_more then call symputx('num_records',(_n_));
run;
now you dates in macro variables date1 date2 etc..
Run a PROC FREQ on the variable ID to get the size of a do loop inside a datastep to make new records. Loop over all unique IDs
proc freq data=have;
var ID / out=distinctID;
run;
now you you have a dataset distinctID
data _null_;
set distinctID end=no_more;
call symputx ('ID'||left(_n_),ID);
call symputx('CounID'||left(_n_),count);
if no_more then call symputx('num_distinctIDs',(_n_));
run;
data have;
informat date mmddyy10.;
format date mmddyy10.;
input ID Date Status $;
cards;
1 1/1/2010 A
1 2/14/2010 B
1 3/1/2010 .
2 1/11/2010 A
2 1/20/2010 .
;
run;
data temp;
set have;
by id;
length _status $ 40;
retain _status;
if first.id then call missing(_status);
if not missing(status) then _status=status;
drop status;
run;
data want;
merge temp temp(keep=id date rename=(id=_id date=_date) firstobs=2);
output;
if id=_id then do;
do i=date+1 to _date-1;
date=i;output;
end;
end;
drop _id _date i;
run;
Hi Fettah,
An alternative solution to this might be one that does not use the DATA step MERGE.
The code below might be useful.
Data Have;
Input ID
Date MMDDYY10.
@15 Status $1.
;
Format Date Date9.;
DATALINES;
1 1/1/2010 A
1 2/14/2010 B
1 3/1/2010 .
2 1/11/2010 A
2 1/20/2010 .
;
Run;
**************************************************************;
*Make sure your rows are in the correct order. ;
**************************************************************;
Proc Sort Data=Have;
By ID
Date
;
Run;
**************************************************************;
*Now get the dataset that we want....... ;
**************************************************************;
Data Want;
********************************************************************************************;
*Use a NULL Informat statement to force the order of our columns just to make reading easier;
********************************************************************************************;
Informat ID
Date
Status
;
******************************************************************************;
*Set these two variables to hold their values between passes of the DATA step.;
******************************************************************************;
Retain prevDate .
prevStatus ""
;
*******************************************************************************;
*Go and get our dataset and RENAME our two variables that we want to manipulate;
*******************************************************************************;
Set Have(Rename=(Date=xDate Status=xStatus));
By ID
xDate
;
********************************************************************************;
*If this is anything BUT the first row for a group of rows sharing the same ID: ;
*1) Trap the Date value held over from the previous row as our StartDate. ;
*2.1) If this is the final row in our group of rows with same ID, trap its Date ;
* value as our StopDate. ;
*2.2) If this is NOT the final row in our group of rows with the same ID, ;
* subtract 1 day from its Date value and use this as our StopDate. ;
********************************************************************************;
If NOT First.ID Then
Do;
StartDate=prevDate;
If Last.ID Then
Do;
StopDate=xDate;
End;
Else
Do;
StopDate=xDate-1;
End;
********************************************************************************;
*3) Loop over StartDate to StopDate and Output one row per pass of the loop. ;
* We will use the value of Status held over from the previous row. ;
********************************************************************************;
Status=prevStatus;
Do Date=StartDate To StopDate;
Output;
End;
End;
********************************************************************************************;
*Set aside the Date and Status values from this row ready for the next pass of the DATA step;
********************************************************************************************;
prevDate=xDate;
prevStatus=xStatus;
**************************************************************;
*Tidy up the dataset....... ;
**************************************************************;
Format prevDate Date xDate Date9.;
Drop prevDate
prevStatus
StartDate
StopDate
xDate
xStatus
;
Run;
I hope that this helps more than it hinders.
Cheers,
Downunder Dave.
Assuming your date is sorted by ID and date:
data want;
set have end=done;
by ID;
if not done then
set have(keep=date rename=(date=next_date) firstobs=2);
if not missing(status);
output;
if not last.ID then do date=date+1 to next_date-1;
output;
end;
drop next_date;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.