Solved: Fill in missing dates

newsas007 · Posted 11-20-2020 10:31 AM

Hi all,

The dataset below has some missing dates that i need to fill in. I need LOT_start and LOT_stop dates filled in cells that are empty specific to the LOT and ID. Also, when there is no start or end dates, then it has to be left missing. For example: ID#2 for LOT = fourth there is no LOT_Start date, so therefore should be left missing.

Data Have:

ID	LOT_Start	LOT_End	LOT
1	7/8/2016		Second
1			Second
1			Second
1			Second
1			Second
1			Second
1			Second
1		10/9/2016	Second
2	7/22/2016	8/21/2016	First
2		8/21/2016	First
2	8/22/2016	11/11/2016	First
2	9/10/2016	11/11/2016	First
2	9/10/2016	11/11/2016	First
2	11/12/2016	3/18/2017	Second
2		3/18/2017	Second
2		3/18/2017	Second
2		3/18/2017	Second
2	3/19/2017		Third
2	3/19/2017		Third
2	3/19/2017	1/23/2018	Third
2		4/2/2018	Fourth

Data Want:

ID	LOT_Start	LOT_End	LOT
1	7/8/2016	10/9/2016	Second
1	7/8/2016	10/9/2016	Second
1	7/8/2016	10/9/2016	Second
1	7/8/2016	10/9/2016	Second
1	7/8/2016	10/9/2016	Second
1	7/8/2016	10/9/2016	Second
1	7/8/2016	10/9/2016	Second
1	7/8/2016	10/9/2016	Second
2	7/22/2016	8/21/2016	First
2	7/23/2016	8/21/2016	First
2	8/22/2016	11/11/2016	First
2	9/10/2016	11/11/2016	First
2	9/10/2016	11/11/2016	First
2	11/12/2016	3/18/2017	Second
2	11/12/2016	3/18/2017	Second
2	11/12/2016	3/18/2017	Second
2	11/15/2016	3/18/2017	Second
2	3/19/2017	1/23/2018	Third
2	3/19/2017	1/23/2018	Third
2	3/19/2017	1/23/2018	Third
2		4/2/2018	Fourth

Kurt_Bremser · Posted 11-20-2020 11:04 AM

Try this:

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,10/9/2016,Second 
2,7/22/2016,8/21/2016,First 
2,,8/21/2016,First 
2,8/22/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,11/12/2016,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,3/19/2017,,Third 
2,3/19/2017,,Third 
2,3/19/2017,1/23/2018,Third 
2,,4/2/2018,Fourth
;

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;

proc sort data=first;
by descending n;
run;

data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;

proc sort
  data=second
  out=want (drop=n)
;
by n;
run;

And note how I presented the data in a data step with datalines; please do so yourself in the future, to make it easier for us to help you.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 11-20-2020 11:04 AM

Try this:

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,10/9/2016,Second 
2,7/22/2016,8/21/2016,First 
2,,8/21/2016,First 
2,8/22/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,11/12/2016,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,3/19/2017,,Third 
2,3/19/2017,,Third 
2,3/19/2017,1/23/2018,Third 
2,,4/2/2018,Fourth
;

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;

proc sort data=first;
by descending n;
run;

data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;

proc sort
  data=second
  out=want (drop=n)
;
by n;
run;

And note how I presented the data in a data step with datalines; please do so yourself in the future, to make it easier for us to help you.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

newsas007 · Posted 11-23-2020 12:48 PM

I noticed a small error in this code. Say if i change the dataset a bit where LOT_END for ID#2 in LOT = 'Third' is listed differently the code gives a empty cell. Please see below:

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,10/9/2016,Second
2,7/22/2016,8/21/2016,First
2,,8/21/2016,First
2,8/22/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,11/12/2016,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,3/19/2017,,Third
2,3/19/2017,1/23/2018,Third
2,3/19/2017,,Third
2,,4/2/2018,Fourth
;

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;

proc sort data=first;
by descending n;
run;

data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;

proc sort
data=second
out=want (drop=n)
;
by n;
run;

Kurt_Bremser · Posted 11-23-2020 12:56 PM

Then you need to add a "carry forward" in the first data step:

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls le;
if first.lot
then do;
  ls = .;
  le = .;
end;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop ls le;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 11-20-2020 11:22 AM

Why do some of your missing LOT_Start values get set to consecutive dates, while other are set to constants?

For ID=1,LOT=Second, all your missing LOT_Start values are set to 7/8/2016.

But for ID=2, LOT=Second, the missing LOT_Start values are set to 11/13/2016, 11/14/2016, and 11/15/2016.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ballardw · Posted 11-20-2020 12:02 PM

@mkeintz wrote:

Why do some of your missing LOT_Start values get set to consecutive dates, while other are set to constants?

For ID=1,LOT=Second, all your missing LOT_Start values are set to 7/8/2016.

But for ID=2, LOT=Second, the missing LOT_Start values are set to 11/13/2016, 11/14/2016, and 11/15/2016.

Any bets the increment behavior is from using Excel and dragging a cell without paying attention to results?

I can't remember how many times I have had to fix data related to people using Excel for data entry and doing that for "repeated" values. One data file like that added over 130 school districts to our school system each with exactly one student.

newsas007 · Posted 11-20-2020 01:04 PM

Yes, it is from excel dragging..I am usually careful but it happened again. So sorry.

novinosrin · Posted 11-20-2020 11:45 AM

Hi @newsas007 For what it's worth, here is my stab at it-


data have;
 input ID	(LOT_Start	LOT_End	) (:mmddyy10.)LOT $;
 format lot_: mmddyy10.;
 cards;
1	7/8/2016	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	10/9/2016	Second 
2	7/22/2016	8/21/2016	First 
2	.	8/21/2016	First 
2	8/22/2016	11/11/2016	First 
2	9/10/2016	11/11/2016	First 
2	9/10/2016	11/11/2016	First 
2	11/12/2016	3/18/2017	Second 
2	.	3/18/2017	Second 
2	.	3/18/2017	Second 
2	.	3/18/2017	Second 
2	3/19/2017	.	Third 
2	3/19/2017	.	Third 
2	3/19/2017	1/23/2018	Third 
2	.	4/2/2018	Fourth
;

proc sql;
 create table temp as 
 select id,lot, min(lot_start)as l_s,min(lot_end) as l_e
 from have
 group by id,lot;
quit;

data want;
 if _n_=1 then do;
  if 0 then set have temp;
  dcl hash h(dataset:'temp');
  h.definekey('id','lot');
  h.definedata('l_s','l_e');
  h.definedone();
 end;
 do until(last.lot);
  set have;
  by id lot notsorted;
  if first.lot then h.find();
  if lot_start then _n3=lot_start;
  else do;
   _n3=ifn(_n1,sum(_n3,1),_n3);
   lot_start=_n3;
  end;
  if lot_end then _n4=lot_end;
  else  do;
   _n4=ifn(_n2,sum(_n4,1),_n4);
   lot_end=_n4;
  end;
  if l_s=lot_start then _n1=1;
  if l_e=lot_end then _n2=1;
  if n(lot_start,_n1)=0 then lot_start=l_s;
  if n(lot_end,_n2)=0 then lot_end=l_e;
  output;
 end;
 keep id lot_start lot_end lot;
run;

proc print noobs;run;

ID	LOT_Start	LOT_End	LOT
1	07/08/2016	10/09/2016	Second
1	07/09/2016	10/09/2016	Second
1	07/10/2016	10/09/2016	Second
1	07/11/2016	10/09/2016	Second
1	07/12/2016	10/09/2016	Second
1	07/13/2016	10/09/2016	Second
1	07/14/2016	10/09/2016	Second
1	07/15/2016	10/09/2016	Second
2	07/22/2016	08/21/2016	First
2	07/23/2016	08/21/2016	First
2	08/22/2016	11/11/2016	First
2	09/10/2016	11/11/2016	First
2	09/10/2016	11/11/2016	First
2	11/12/2016	03/18/2017	Second
2	11/13/2016	03/18/2017	Second
2	11/14/2016	03/18/2017	Second
2	11/15/2016	03/18/2017	Second
2	03/19/2017	01/23/2018	Third
2	03/19/2017	01/23/2018	Third
2	03/19/2017	01/23/2018	Third
2	.	04/02/2018	Fourth

novinosrin · Posted 11-20-2020 01:15 PM

Or just



data want;
 do _n_=1 by 1 until(last.lot);
  set have;
  by id lot notsorted;
  if lot_start then  l_s=min(l_s,lot_start);
  if lot_end then l_e=min(l_e,lot_end);
 end;
 do _n_=1 to _n_;
  set have;
  if lot_start then _n3=lot_start;
  else do;
   _n3=ifn(_n1,sum(_n3,1),_n3);
   lot_start=_n3;
  end;
  if lot_end then _n4=lot_end;
  else  do;
   _n4=ifn(_n2,sum(_n4,1),_n4);
   lot_end=_n4;
  end;
  if l_s=lot_start then _n1=1;
  if l_e=lot_end then _n2=1;
  if n(lot_start,_n1)=0 then lot_start=l_s;
  if n(lot_end,_n2)=0 then lot_end=l_e;
  output;
 end;
 keep id lot_start lot_end lot;
run;

newsas007 · Posted 11-20-2020 05:55 PM

This is great! there was a subtle change in the data though. The blank dates in the data don't need to increment by a day.
Also, is there anyway to PROC SQL to accomplish this? Thanks

mkeintz · Posted 11-20-2020 10:39 PM

I know a solution has already been accepted, but I offer this as an example of avoiding sorting of data set files:

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,10/9/2016,Second 
2,7/22/2016,8/21/2016,First 
2,,8/21/2016,First 
2,8/22/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,11/12/2016,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,3/19/2017,,Third 
2,3/19/2017,,Third 
2,3/19/2017,1/23/2018,Third 
2,,4/2/2018,Fourth
;

data want (drop=_:);

  array _start{0:30} _temporary_;
  array _end{0:30} _temporary_;

  call missing(of _start{*},of _end{*});

  do _n=1 by 1 until (last.lot);  /* Carry forward lot_start*/
    set have;
	by id lot notsorted;
	_start{_n}=coalesce(lot_start,_start{_n-1});
	if lot_end^=. then _end{_n}=lot_end;
  end;

  do _n=_n to 1 by -1;;   /* Carry backward lot_end*/
    if _end{_n}=. then _end{_n}=_end{_n+1};
  end;

  do _n=1 by 1 until (last.lot);  /* Reread and retrieve carried values*/
    set have;
	by id lot notsorted;
	lot_start=coalesce(lot_start,_start{_n});
	lot_end=coalesce(lot_end,_end{_n});
    output;
  end;
run;

Editted note: The array lower bounds are set to 0 (to prevent error messages from the coalesce(lot_start,_start{_n-1}) expression). And set the upper bounds to some number you know will accommodate the largest ID/LOT group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Re: Fill in missing dates

Registration is open

SAS Training: Just a Click Away