BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
newsas007
Quartz | Level 8

Hi all,

The dataset below has some missing dates that i need to fill in. I need LOT_start and LOT_stop dates filled in cells that are empty specific to the LOT and ID. Also, when there is no start or end dates, then it has to be left missing. For example: ID#2 for LOT = fourth there is no LOT_Start date, so therefore should be left missing.

 

Data Have:

ID LOT_Start LOT_End LOT
1 7/8/2016   Second 
1     Second 
1     Second 
1     Second 
1     Second 
1     Second 
1     Second 
1   10/9/2016 Second 
2 7/22/2016 8/21/2016 First 
2   8/21/2016 First 
2 8/22/2016 11/11/2016 First 
2 9/10/2016 11/11/2016 First 
2 9/10/2016 11/11/2016 First 
2 11/12/2016 3/18/2017 Second 
2   3/18/2017 Second 
2   3/18/2017 Second 
2   3/18/2017 Second 
2 3/19/2017   Third 
2 3/19/2017   Third 
2 3/19/2017 1/23/2018 Third 
2   4/2/2018 Fourth

 

Data Want:

ID LOT_Start LOT_End LOT
1 7/8/2016 10/9/2016 Second 
1 7/8/2016 10/9/2016 Second 
1 7/8/2016 10/9/2016 Second 
1 7/8/2016 10/9/2016 Second 
1 7/8/2016 10/9/2016 Second 
1 7/8/2016 10/9/2016 Second 
1 7/8/2016 10/9/2016 Second 
1 7/8/2016 10/9/2016 Second 
2 7/22/2016 8/21/2016 First 
2 7/23/2016 8/21/2016 First 
2 8/22/2016 11/11/2016 First 
2 9/10/2016 11/11/2016 First 
2 9/10/2016 11/11/2016 First 
2 11/12/2016 3/18/2017 Second 
2 11/12/2016 3/18/2017 Second 
2 11/12/2016 3/18/2017 Second 
2 11/15/2016 3/18/2017 Second 
2 3/19/2017 1/23/2018 Third 
2 3/19/2017 1/23/2018 Third 
2 3/19/2017 1/23/2018 Third 
2   4/2/2018 Fourth
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Try this:

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,10/9/2016,Second 
2,7/22/2016,8/21/2016,First 
2,,8/21/2016,First 
2,8/22/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,11/12/2016,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,3/19/2017,,Third 
2,3/19/2017,,Third 
2,3/19/2017,1/23/2018,Third 
2,,4/2/2018,Fourth
;

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;

proc sort data=first;
by descending n;
run;

data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;

proc sort
  data=second
  out=want (drop=n)
;
by n;
run;

And note how I presented the data in a data step with datalines; please do so yourself in the future, to make it easier for us to help you.

View solution in original post

10 REPLIES 10
Kurt_Bremser
Super User

Try this:

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,10/9/2016,Second 
2,7/22/2016,8/21/2016,First 
2,,8/21/2016,First 
2,8/22/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,11/12/2016,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,3/19/2017,,Third 
2,3/19/2017,,Third 
2,3/19/2017,1/23/2018,Third 
2,,4/2/2018,Fourth
;

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;

proc sort data=first;
by descending n;
run;

data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;

proc sort
  data=second
  out=want (drop=n)
;
by n;
run;

And note how I presented the data in a data step with datalines; please do so yourself in the future, to make it easier for us to help you.

newsas007
Quartz | Level 8
I noticed a small error in this code. Say if i change the dataset a bit where LOT_END for ID#2 in LOT = 'Third' is listed differently the code gives a empty cell. Please see below:

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,10/9/2016,Second
2,7/22/2016,8/21/2016,First
2,,8/21/2016,First
2,8/22/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,11/12/2016,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,3/19/2017,,Third
2,3/19/2017,1/23/2018,Third
2,3/19/2017,,Third
2,,4/2/2018,Fourth
;

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;

proc sort data=first;
by descending n;
run;

data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;

proc sort
data=second
out=want (drop=n)
;
by n;
run;

Kurt_Bremser
Super User

Then you need to add a "carry forward" in the first data step:

data first;
set have;
by id lot notsorted;
n = _N_;
retain ls le;
if first.lot
then do;
  ls = .;
  le = .;
end;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop ls le;
run;
mkeintz
PROC Star

Why do some of your missing LOT_Start values get set to consecutive dates, while other are set to constants?

 

For ID=1,LOT=Second, all your missing LOT_Start values are set to 7/8/2016.

But for ID=2, LOT=Second, the missing LOT_Start values are set to 11/13/2016, 11/14/2016, and 11/15/2016.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

@mkeintz wrote:

Why do some of your missing LOT_Start values get set to consecutive dates, while other are set to constants?

 

For ID=1,LOT=Second, all your missing LOT_Start values are set to 7/8/2016.

But for ID=2, LOT=Second, the missing LOT_Start values are set to 11/13/2016, 11/14/2016, and 11/15/2016.


Any bets the increment behavior is from using Excel and dragging a cell without paying attention to results?

 

I can't remember how many times I have had to fix data related to people using Excel for data entry and doing that for "repeated" values. One data file like that added over 130 school districts to our school system each with exactly one student.

newsas007
Quartz | Level 8
Yes, it is from excel dragging..I am usually careful but it happened again. So sorry.
novinosrin
Tourmaline | Level 20

Hi @newsas007  For what it's worth, here is my stab at it-

 


data have;
 input ID	(LOT_Start	LOT_End	) (:mmddyy10.)LOT $;
 format lot_: mmddyy10.;
 cards;
1	7/8/2016	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	.	Second 
1	.	10/9/2016	Second 
2	7/22/2016	8/21/2016	First 
2	.	8/21/2016	First 
2	8/22/2016	11/11/2016	First 
2	9/10/2016	11/11/2016	First 
2	9/10/2016	11/11/2016	First 
2	11/12/2016	3/18/2017	Second 
2	.	3/18/2017	Second 
2	.	3/18/2017	Second 
2	.	3/18/2017	Second 
2	3/19/2017	.	Third 
2	3/19/2017	.	Third 
2	3/19/2017	1/23/2018	Third 
2	.	4/2/2018	Fourth
;

proc sql;
 create table temp as 
 select id,lot, min(lot_start)as l_s,min(lot_end) as l_e
 from have
 group by id,lot;
quit;

data want;
 if _n_=1 then do;
  if 0 then set have temp;
  dcl hash h(dataset:'temp');
  h.definekey('id','lot');
  h.definedata('l_s','l_e');
  h.definedone();
 end;
 do until(last.lot);
  set have;
  by id lot notsorted;
  if first.lot then h.find();
  if lot_start then _n3=lot_start;
  else do;
   _n3=ifn(_n1,sum(_n3,1),_n3);
   lot_start=_n3;
  end;
  if lot_end then _n4=lot_end;
  else  do;
   _n4=ifn(_n2,sum(_n4,1),_n4);
   lot_end=_n4;
  end;
  if l_s=lot_start then _n1=1;
  if l_e=lot_end then _n2=1;
  if n(lot_start,_n1)=0 then lot_start=l_s;
  if n(lot_end,_n2)=0 then lot_end=l_e;
  output;
 end;
 keep id lot_start lot_end lot;
run;

proc print noobs;run;
 
ID LOT_Start LOT_End LOT
1 07/08/2016 10/09/2016 Second 
1 07/09/2016 10/09/2016 Second 
1 07/10/2016 10/09/2016 Second 
1 07/11/2016 10/09/2016 Second 
1 07/12/2016 10/09/2016 Second 
1 07/13/2016 10/09/2016 Second 
1 07/14/2016 10/09/2016 Second 
1 07/15/2016 10/09/2016 Second 
2 07/22/2016 08/21/2016 First 
2 07/23/2016 08/21/2016 First 
2 08/22/2016 11/11/2016 First 
2 09/10/2016 11/11/2016 First 
2 09/10/2016 11/11/2016 First 
2 11/12/2016 03/18/2017 Second 
2 11/13/2016 03/18/2017 Second 
2 11/14/2016 03/18/2017 Second 
2 11/15/2016 03/18/2017 Second 
2 03/19/2017 01/23/2018 Third 
2 03/19/2017 01/23/2018 Third 
2 03/19/2017 01/23/2018 Third 
2 . 04/02/2018 Fourth
novinosrin
Tourmaline | Level 20

Or just



data want;
 do _n_=1 by 1 until(last.lot);
  set have;
  by id lot notsorted;
  if lot_start then  l_s=min(l_s,lot_start);
  if lot_end then l_e=min(l_e,lot_end);
 end;
 do _n_=1 to _n_;
  set have;
  if lot_start then _n3=lot_start;
  else do;
   _n3=ifn(_n1,sum(_n3,1),_n3);
   lot_start=_n3;
  end;
  if lot_end then _n4=lot_end;
  else  do;
   _n4=ifn(_n2,sum(_n4,1),_n4);
   lot_end=_n4;
  end;
  if l_s=lot_start then _n1=1;
  if l_e=lot_end then _n2=1;
  if n(lot_start,_n1)=0 then lot_start=l_s;
  if n(lot_end,_n2)=0 then lot_end=l_e;
  output;
 end;
 keep id lot_start lot_end lot;
run;
newsas007
Quartz | Level 8
This is great! there was a subtle change in the data though. The blank dates in the data don't need to increment by a day.
Also, is there anyway to PROC SQL to accomplish this? Thanks
mkeintz
PROC Star

I know a solution has already been accepted, but I offer this as an example of avoiding sorting of data set files:

 

data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,,Second 
1,,10/9/2016,Second 
2,7/22/2016,8/21/2016,First 
2,,8/21/2016,First 
2,8/22/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,9/10/2016,11/11/2016,First 
2,11/12/2016,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,,3/18/2017,Second 
2,3/19/2017,,Third 
2,3/19/2017,,Third 
2,3/19/2017,1/23/2018,Third 
2,,4/2/2018,Fourth
;

data want (drop=_:);

  array _start{0:30} _temporary_;
  array _end{0:30} _temporary_;

  call missing(of _start{*},of _end{*});

  do _n=1 by 1 until (last.lot);  /* Carry forward lot_start*/
    set have;
	by id lot notsorted;
	_start{_n}=coalesce(lot_start,_start{_n-1});
	if lot_end^=. then _end{_n}=lot_end;
  end;

  do _n=_n to 1 by -1;;   /* Carry backward lot_end*/
    if _end{_n}=. then _end{_n}=_end{_n+1};
  end;

  do _n=1 by 1 until (last.lot);  /* Reread and retrieve carried values*/
    set have;
	by id lot notsorted;
	lot_start=coalesce(lot_start,_start{_n});
	lot_end=coalesce(lot_end,_end{_n});
    output;
  end;
run;

 Editted note:  The array lower bounds are set to 0 (to prevent error messages from the coalesce(lot_start,_start{_n-1}) expression).  And set the upper bounds to some number you know will accommodate the largest ID/LOT group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1621 views
  • 4 likes
  • 5 in conversation