BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fettah
Fluorite | Level 6

I have a dataset that looks like this:

ID Date       Status
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .

I need to a dataset where observations are daily increments of the 2 dates, and I want to keep the same 'status' until the next date.

Essentially, I would like my final dataset to somehow look like this:

ID Date      Status
1  1/1/2010  A
1  1/2/2010  A
1  1/3/2010  A
1  ...       A
1  2/13/2010 A
1  2/14/2010 B
1  2/15/2010 B
1  ...       B
1  3/1/2010  B
2  1/11/2010 A
2  1/12/2010 A
2  ...       A
3  1/20/2010 A

This seems to be a simple enough problem, but I can't seem to find the most efficient way to do this. 

 

Thanks for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I think that the following does what you want to accomplish:

 

data have;
  informat date mmddyy10.;
  format date mmddyy10.;
  input ID Date       Status $;
  cards;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;

data want (drop=n_:);
  merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status));
  if not missing(status) then output;
  if ID eq n_ID then do;
    if missing(n_status) then n_end=n_date;
    else n_end=n_date-1;
    do date=date+1 to n_end;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post

9 REPLIES 9
art297
Opal | Level 21

I think that the following does what you want to accomplish:

 

data have;
  informat date mmddyy10.;
  format date mmddyy10.;
  input ID Date       Status $;
  cards;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;

data want (drop=n_:);
  merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status));
  if not missing(status) then output;
  if ID eq n_ID then do;
    if missing(n_status) then n_end=n_date;
    else n_end=n_date-1;
    do date=date+1 to n_end;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

Fettah
Fluorite | Level 6
Thank you, Art. I wish I could choose multiple replies as the "solution" because this one also works like a charm.
art297
Opal | Level 21

I suggest you compare @s_lassen's results with those from the code I suggested. His leaves off the last record, while mine includes it.

 

Art, CEO, AnalystFinder.com

 

Fettah
Fluorite | Level 6
You're right Art. After applying the codes yours is the more accurate solution. Thanks for pointing it out.
ptimusk
Obsidian | Level 7

Some thoughts on a solution.

 

Putting the dates in macro variables then subtracting them for each interval may work.

 

Your dataset is call 'have'

 

Data _null_;

set have end=no_more;

call symputx('Date'||left(_n_),date);

If no_more then call symputx('num_records',(_n_));

run;

 

now you dates in macro variables date1 date2 etc..

 

Run a PROC FREQ on the variable ID to get the size of a do loop inside a datastep to make new records. Loop over all unique IDs 

 

proc freq data=have;

var ID / out=distinctID;

run;

 

now you you have a dataset distinctID

 

data _null_;

set distinctID end=no_more;

call symputx ('ID'||left(_n_),ID);

call symputx('CounID'||left(_n_),count);

if no_more then call symputx('num_distinctIDs',(_n_));

run;

 

 

Ksharp
Super User
data have;
  informat date mmddyy10.;
  format date mmddyy10.;
  input ID Date       Status $;
  cards;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;
run;

data temp;
 set have;
 by id;
 length _status $ 40;
 retain _status;
 if first.id then call missing(_status);
 if not missing(status) then _status=status;
 drop status;
run;

data want;
 merge temp temp(keep=id date rename=(id=_id date=_date) firstobs=2);
 output;
 if id=_id then do;
   do i=date+1 to _date-1;
    date=i;output;
   end;
 end;
drop _id _date i;
run;

DaveShea
Lapis Lazuli | Level 10

Hi Fettah,

 

An alternative solution to this might be one that does not use the DATA step MERGE.

 

The code below might be useful.

Data Have;
Input ID
      Date  MMDDYY10.
  @15 Status $1.
      ;
Format  Date Date9.;
DATALINES;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;
Run;

**************************************************************;
*Make sure your rows are in the correct order.                ;
**************************************************************;
Proc Sort Data=Have;
 By ID
    Date
    ;
Run;

**************************************************************;
*Now get the dataset that we want.......                      ;
**************************************************************;
Data Want;
 ********************************************************************************************;
 *Use a NULL Informat statement to force the order of our columns just to make reading easier;
 ********************************************************************************************;
 Informat ID
          Date
          Status
          ;
 ******************************************************************************;
 *Set these two variables to hold their values between passes of the DATA step.;
 ******************************************************************************;
 Retain prevDate   .
        prevStatus ""
        ;
 *******************************************************************************;
 *Go and get our dataset and RENAME our two variables that we want to manipulate;
 *******************************************************************************;
 Set Have(Rename=(Date=xDate Status=xStatus));
  By ID
     xDate
     ;

********************************************************************************;
*If this is anything BUT the first row for a group of rows sharing the same ID: ;
*1) Trap the Date value held over from the previous row as our StartDate.       ;
*2.1) If this is the final row in our group of rows with same ID, trap its Date ;
*     value as our StopDate.                                                    ;
*2.2) If this is NOT the final row in our group of rows with the same ID,       ;
*     subtract 1 day from its Date value and use this as our StopDate.          ;
********************************************************************************;
If NOT First.ID Then
    Do;
        StartDate=prevDate;
        If Last.ID Then
            Do;
                StopDate=xDate;
            End;
            Else
            Do;
                StopDate=xDate-1;
            End;
        ********************************************************************************;
        *3) Loop over StartDate to StopDate and Output one row per pass of the loop.    ;
        *   We will use the value of Status held over from the previous row.            ;
        ********************************************************************************;
        Status=prevStatus;
        Do Date=StartDate To StopDate;
            Output;
        End;

    End;

********************************************************************************************;
*Set aside the Date and Status values from this row ready for the next pass of the DATA step;
********************************************************************************************;
prevDate=xDate;
prevStatus=xStatus;


**************************************************************;
*Tidy up the dataset.......                                   ;
**************************************************************;
Format  prevDate Date xDate Date9.;

Drop prevDate
     prevStatus
     StartDate
     StopDate
     xDate
     xStatus
     ;

Run;

I hope that this helps more than it hinders.

 

Cheers,

 

Downunder Dave.

s_lassen
Meteorite | Level 14

Assuming your date is sorted by ID and date:

data want;
  set have end=done;
  by ID;
  if not done then
    set have(keep=date rename=(date=next_date) firstobs=2);
  if not missing(status);
  output;
  if not last.ID then do date=date+1 to next_date-1;
    output;
    end;
  drop next_date;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3459 views
  • 4 likes
  • 6 in conversation