BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I have a data set with multiple rows per customer Id and each row related to specific date.

I would like to perform 2 tasks:

Task1:

Add rows for dates that are not existing.

Task2:

For each customer need to fill data of empty rows (except of date) with data from previous non-missing row.

Ronein_0-1703503604124.png

 

 

 


data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
Run;
 
proc sql  noprint;
select  min(date) as FROMDate  into :FROMDate
from tbl1
;
quit;

proc sql  noprint;
select  max(date) as TillDate  into :TillDate
from tbl1
;
quit;


data dates(Keep=date) ;
date=&FROMDate.;
end_date=&TillDate.;
format date  end_date date9.;
do while (date<=end_date);
output;
date=intnx('day', date, 1, 's');
end;
format date ddmmyy10.;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

First, I don't see a reason to use macro variables or SQL here. Neither are good solutions for the case where you have to do things row-by-row to get the desired answer. The solution is to write a DATA step, and check for missing dates and react accordingly by looping through the missing dates.

 

Here is my solution:

 

data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
data want;
    set tbl1;
    previd=lag(id);
    prevdate=lag(date);
    prevy=lag(y);
    /* Check for missing dates */
    if prevdate^=(date-1) and id=previd then do;
        /* Loop over missing dates */
        do iter=1 to (date-prevdate);
        	newdate=prevdate+iter;
        	if newdate<date then newy=prevy;
        	else newy=y;
        	output;
    	end;
	end;
	else do; /* No missing dates found */
	    newdate=date;
	    newy=y;
	    output;
    end;
    format newdate date9.;
    drop prev: iter;
run;
--
Paige Miller

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

First, I don't see a reason to use macro variables or SQL here. Neither are good solutions for the case where you have to do things row-by-row to get the desired answer. The solution is to write a DATA step, and check for missing dates and react accordingly by looping through the missing dates.

 

Here is my solution:

 

data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
data want;
    set tbl1;
    previd=lag(id);
    prevdate=lag(date);
    prevy=lag(y);
    /* Check for missing dates */
    if prevdate^=(date-1) and id=previd then do;
        /* Loop over missing dates */
        do iter=1 to (date-prevdate);
        	newdate=prevdate+iter;
        	if newdate<date then newy=prevy;
        	else newy=y;
        	output;
    	end;
	end;
	else do; /* No missing dates found */
	    newdate=date;
	    newy=y;
	    output;
    end;
    format newdate date9.;
    drop prev: iter;
run;
--
Paige Miller
sbxkoenk
SAS Super FREQ

If you have SAS/ETS in your license key.

data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
Run;

proc expand data=tbl1
            out=tbl2
            to=day method=step
            plots=(input output);
  by ID;
  id date;
run;
/* end of program */

Koen

Tom
Super User Tom
Super User

Just build up a skeleton or empty dataset with just the ID and DATE values for all of the days you want.  (No need to get fancy to increment dates by days. Simple arithmetic will work since dates are stored as number of days.)

data empty;
  set have;
  by id;
  if first.id then start=date;
  retain start;
  if last.id;
  end=date;
  do date=start to end;
    output;
  end;
  keep id date;
run;

Then you can combine them and use some simple last observation carried forward logic to replace the missing values.

data want;
  merge empty have;
  by id date;
  if not first.id then y=coalesce(y,oldy);
  oldy=y;
  retain oldy;
  drop oldy;
run;

Result:

OBS    ID          date     Y

 1      1    2023-12-01    10
 2      1    2023-12-02    10
 3      1    2023-12-03    10
 4      1    2023-12-04    20
 5      1    2023-12-05    30
 6      2    2023-12-02    40
 7      2    2023-12-03    40
 8      2    2023-12-04    50
 9      2    2023-12-05    60

PS  Don't display dates in either DMY or MDY order.  It took me a couple of seconds to figure out why you would want to insert dates in the middle of the months of Feb and March.

Patrick
Opal | Level 21

If licensed then I'd be using Proc Expand as already proposed. Else here another code variant to create the additional rows.

data have;
  format date date9.;
  input ID date :date9. Y;
  cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;

data want;
  set have;
  by id date;
  output;

  if not last.id then
    do;
      /* read ahead: If next date not date+1 create additional rows */
      i=_n_+1;
      set have(keep=date rename=(date=_date)) point=i;
      do i= 1 to (_date-date-1);
        date+1;
        output;
      end;
    end;
  drop _date;
run;

proc print data=want;
run;

Patrick_0-1703550524415.png

 

 

Ksharp
Super User

data tbl1;
format date ddmmyy10.;
input ID date :date9. Y;
cards;
1 01DEC2023 10
1 04DEC2023 20
1 05DEC2023 30
2 02DEC2023 40
2 04DEC2023 50
2 05DEC2023 60
;
Run;

data want;
 merge tbl1 tbl1(firstobs=2 keep=id date rename=(id=_id date=_date));
 output;
 if id=_id then do;
  do date=date+1 to _date-1;
    output;
  end;
 end;
drop _id _date;
run;
mkeintz
PROC Star

It looks to me like your code (unlike your sample desired output) intends not only to carry-forward Y values through every missing date up through global maximum &TILLDATE, but also your generation of global &FROMDATE implies you may want to create observations with missing values of Y from &FROMDATE up to the date before the first observed date.    Otherwise there is no need for &FROMDATE.

 

If you do NOT need to create dates back to &FROMDATE, then:


proc sql noprint;
  select max(date) into :max_date from tbl1;
quit;

data want;
  set tbl1 (keep=id);
  by id;
  merge tbl1
        tbl1 (firstobs=2 keep=date rename=(date=_nxt_date));

  if last.id then _nxt_date=&max_date+1;
  do date=date to _nxt_date-1;
    output;
  end;
run;

But if you do, then you need an extra conditional process for first.id,:

 

proc sql noprint;
  select min(date), max(date) into :min_date,:max_date
  from tbl1;
quit;

data want  (drop=_:);
  set tbl1 (keep=id date rename=(date=_date));
  by id;
  if first.id=1 and _date>&min_date then do date= &min_date to _date-1;
    output;
  end;

  merge tbl1
        tbl1 (firstobs=2 keep=date rename=(date=_nxt_date));
  if last.id=1 then _nxt_date=&max_date+1;

  do date=date to _nxt_date-1;
    output;
  end;
  call missing(of _all_);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1317 views
  • 5 likes
  • 7 in conversation