BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fettah
Fluorite | Level 6

I have a dataset that looks like this:

ID Date       Status
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .

I need to a dataset where observations are daily increments of the 2 dates, and I want to keep the same 'status' until the next date.

Essentially, I would like my final dataset to somehow look like this:

ID Date      Status
1  1/1/2010  A
1  1/2/2010  A
1  1/3/2010  A
1  ...       A
1  2/13/2010 A
1  2/14/2010 B
1  2/15/2010 B
1  ...       B
1  3/1/2010  B
2  1/11/2010 A
2  1/12/2010 A
2  ...       A
3  1/20/2010 A

This seems to be a simple enough problem, but I can't seem to find the most efficient way to do this. 

 

Thanks for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I think that the following does what you want to accomplish:

 

data have;
  informat date mmddyy10.;
  format date mmddyy10.;
  input ID Date       Status $;
  cards;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;

data want (drop=n_:);
  merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status));
  if not missing(status) then output;
  if ID eq n_ID then do;
    if missing(n_status) then n_end=n_date;
    else n_end=n_date-1;
    do date=date+1 to n_end;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post

9 REPLIES 9
art297
Opal | Level 21

I think that the following does what you want to accomplish:

 

data have;
  informat date mmddyy10.;
  format date mmddyy10.;
  input ID Date       Status $;
  cards;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;

data want (drop=n_:);
  merge have have (firstobs=2 rename=(ID=n_ID date=n_date status=n_status));
  if not missing(status) then output;
  if ID eq n_ID then do;
    if missing(n_status) then n_end=n_date;
    else n_end=n_date-1;
    do date=date+1 to n_end;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

Fettah
Fluorite | Level 6
Thank you, Art. I wish I could choose multiple replies as the "solution" because this one also works like a charm.
art297
Opal | Level 21

I suggest you compare @s_lassen's results with those from the code I suggested. His leaves off the last record, while mine includes it.

 

Art, CEO, AnalystFinder.com

 

Fettah
Fluorite | Level 6
You're right Art. After applying the codes yours is the more accurate solution. Thanks for pointing it out.
ptimusk
Obsidian | Level 7

Some thoughts on a solution.

 

Putting the dates in macro variables then subtracting them for each interval may work.

 

Your dataset is call 'have'

 

Data _null_;

set have end=no_more;

call symputx('Date'||left(_n_),date);

If no_more then call symputx('num_records',(_n_));

run;

 

now you dates in macro variables date1 date2 etc..

 

Run a PROC FREQ on the variable ID to get the size of a do loop inside a datastep to make new records. Loop over all unique IDs 

 

proc freq data=have;

var ID / out=distinctID;

run;

 

now you you have a dataset distinctID

 

data _null_;

set distinctID end=no_more;

call symputx ('ID'||left(_n_),ID);

call symputx('CounID'||left(_n_),count);

if no_more then call symputx('num_distinctIDs',(_n_));

run;

 

 

Ksharp
Super User
data have;
  informat date mmddyy10.;
  format date mmddyy10.;
  input ID Date       Status $;
  cards;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;
run;

data temp;
 set have;
 by id;
 length _status $ 40;
 retain _status;
 if first.id then call missing(_status);
 if not missing(status) then _status=status;
 drop status;
run;

data want;
 merge temp temp(keep=id date rename=(id=_id date=_date) firstobs=2);
 output;
 if id=_id then do;
   do i=date+1 to _date-1;
    date=i;output;
   end;
 end;
drop _id _date i;
run;

DaveShea
Lapis Lazuli | Level 10

Hi Fettah,

 

An alternative solution to this might be one that does not use the DATA step MERGE.

 

The code below might be useful.

Data Have;
Input ID
      Date  MMDDYY10.
  @15 Status $1.
      ;
Format  Date Date9.;
DATALINES;
1  1/1/2010   A
1  2/14/2010  B
1  3/1/2010   .
2  1/11/2010  A
2  1/20/2010  .
;
Run;

**************************************************************;
*Make sure your rows are in the correct order.                ;
**************************************************************;
Proc Sort Data=Have;
 By ID
    Date
    ;
Run;

**************************************************************;
*Now get the dataset that we want.......                      ;
**************************************************************;
Data Want;
 ********************************************************************************************;
 *Use a NULL Informat statement to force the order of our columns just to make reading easier;
 ********************************************************************************************;
 Informat ID
          Date
          Status
          ;
 ******************************************************************************;
 *Set these two variables to hold their values between passes of the DATA step.;
 ******************************************************************************;
 Retain prevDate   .
        prevStatus ""
        ;
 *******************************************************************************;
 *Go and get our dataset and RENAME our two variables that we want to manipulate;
 *******************************************************************************;
 Set Have(Rename=(Date=xDate Status=xStatus));
  By ID
     xDate
     ;

********************************************************************************;
*If this is anything BUT the first row for a group of rows sharing the same ID: ;
*1) Trap the Date value held over from the previous row as our StartDate.       ;
*2.1) If this is the final row in our group of rows with same ID, trap its Date ;
*     value as our StopDate.                                                    ;
*2.2) If this is NOT the final row in our group of rows with the same ID,       ;
*     subtract 1 day from its Date value and use this as our StopDate.          ;
********************************************************************************;
If NOT First.ID Then
    Do;
        StartDate=prevDate;
        If Last.ID Then
            Do;
                StopDate=xDate;
            End;
            Else
            Do;
                StopDate=xDate-1;
            End;
        ********************************************************************************;
        *3) Loop over StartDate to StopDate and Output one row per pass of the loop.    ;
        *   We will use the value of Status held over from the previous row.            ;
        ********************************************************************************;
        Status=prevStatus;
        Do Date=StartDate To StopDate;
            Output;
        End;

    End;

********************************************************************************************;
*Set aside the Date and Status values from this row ready for the next pass of the DATA step;
********************************************************************************************;
prevDate=xDate;
prevStatus=xStatus;


**************************************************************;
*Tidy up the dataset.......                                   ;
**************************************************************;
Format  prevDate Date xDate Date9.;

Drop prevDate
     prevStatus
     StartDate
     StopDate
     xDate
     xStatus
     ;

Run;

I hope that this helps more than it hinders.

 

Cheers,

 

Downunder Dave.

s_lassen
Meteorite | Level 14

Assuming your date is sorted by ID and date:

data want;
  set have end=done;
  by ID;
  if not done then
    set have(keep=date rename=(date=next_date) firstobs=2);
  if not missing(status);
  output;
  if not last.ID then do date=date+1 to next_date-1;
    output;
    end;
  drop next_date;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3623 views
  • 4 likes
  • 6 in conversation