Hi all,
The dataset below has some missing dates that i need to fill in. I need LOT_start and LOT_stop dates filled in cells that are empty specific to the LOT and ID. Also, when there is no start or end dates, then it has to be left missing. For example: ID#2 for LOT = fourth there is no LOT_Start date, so therefore should be left missing.
Data Have:
ID | LOT_Start | LOT_End | LOT |
1 | 7/8/2016 | Second | |
1 | Second | ||
1 | Second | ||
1 | Second | ||
1 | Second | ||
1 | Second | ||
1 | Second | ||
1 | 10/9/2016 | Second | |
2 | 7/22/2016 | 8/21/2016 | First |
2 | 8/21/2016 | First | |
2 | 8/22/2016 | 11/11/2016 | First |
2 | 9/10/2016 | 11/11/2016 | First |
2 | 9/10/2016 | 11/11/2016 | First |
2 | 11/12/2016 | 3/18/2017 | Second |
2 | 3/18/2017 | Second | |
2 | 3/18/2017 | Second | |
2 | 3/18/2017 | Second | |
2 | 3/19/2017 | Third | |
2 | 3/19/2017 | Third | |
2 | 3/19/2017 | 1/23/2018 | Third |
2 | 4/2/2018 | Fourth |
Data Want:
ID | LOT_Start | LOT_End | LOT |
1 | 7/8/2016 | 10/9/2016 | Second |
1 | 7/8/2016 | 10/9/2016 | Second |
1 | 7/8/2016 | 10/9/2016 | Second |
1 | 7/8/2016 | 10/9/2016 | Second |
1 | 7/8/2016 | 10/9/2016 | Second |
1 | 7/8/2016 | 10/9/2016 | Second |
1 | 7/8/2016 | 10/9/2016 | Second |
1 | 7/8/2016 | 10/9/2016 | Second |
2 | 7/22/2016 | 8/21/2016 | First |
2 | 7/23/2016 | 8/21/2016 | First |
2 | 8/22/2016 | 11/11/2016 | First |
2 | 9/10/2016 | 11/11/2016 | First |
2 | 9/10/2016 | 11/11/2016 | First |
2 | 11/12/2016 | 3/18/2017 | Second |
2 | 11/12/2016 | 3/18/2017 | Second |
2 | 11/12/2016 | 3/18/2017 | Second |
2 | 11/15/2016 | 3/18/2017 | Second |
2 | 3/19/2017 | 1/23/2018 | Third |
2 | 3/19/2017 | 1/23/2018 | Third |
2 | 3/19/2017 | 1/23/2018 | Third |
2 | 4/2/2018 | Fourth |
Try this:
data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,10/9/2016,Second
2,7/22/2016,8/21/2016,First
2,,8/21/2016,First
2,8/22/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,11/12/2016,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,3/19/2017,,Third
2,3/19/2017,,Third
2,3/19/2017,1/23/2018,Third
2,,4/2/2018,Fourth
;
data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;
proc sort data=first;
by descending n;
run;
data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;
proc sort
data=second
out=want (drop=n)
;
by n;
run;
And note how I presented the data in a data step with datalines; please do so yourself in the future, to make it easier for us to help you.
Try this:
data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,10/9/2016,Second
2,7/22/2016,8/21/2016,First
2,,8/21/2016,First
2,8/22/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,11/12/2016,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,3/19/2017,,Third
2,3/19/2017,,Third
2,3/19/2017,1/23/2018,Third
2,,4/2/2018,Fourth
;
data first;
set have;
by id lot notsorted;
n = _N_;
retain ls;
if first.lot then ls = .;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
drop ls;
run;
proc sort data=first;
by descending n;
run;
data second;
set first;
by descending id lot notsorted;
retain le;
if first.lot then le = .;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop le;
run;
proc sort
data=second
out=want (drop=n)
;
by n;
run;
And note how I presented the data in a data step with datalines; please do so yourself in the future, to make it easier for us to help you.
Then you need to add a "carry forward" in the first data step:
data first;
set have;
by id lot notsorted;
n = _N_;
retain ls le;
if first.lot
then do;
ls = .;
le = .;
end;
if lot_start ne .
then ls = lot_start;
else lot_start = ls;
if lot_end ne .
then le = lot_end;
else lot_end = le;
drop ls le;
run;
Why do some of your missing LOT_Start values get set to consecutive dates, while other are set to constants?
For ID=1,LOT=Second, all your missing LOT_Start values are set to 7/8/2016.
But for ID=2, LOT=Second, the missing LOT_Start values are set to 11/13/2016, 11/14/2016, and 11/15/2016.
@mkeintz wrote:
Why do some of your missing LOT_Start values get set to consecutive dates, while other are set to constants?
For ID=1,LOT=Second, all your missing LOT_Start values are set to 7/8/2016.
But for ID=2, LOT=Second, the missing LOT_Start values are set to 11/13/2016, 11/14/2016, and 11/15/2016.
Any bets the increment behavior is from using Excel and dragging a cell without paying attention to results?
I can't remember how many times I have had to fix data related to people using Excel for data entry and doing that for "repeated" values. One data file like that added over 130 school districts to our school system each with exactly one student.
Hi @newsas007 For what it's worth, here is my stab at it-
data have;
input ID (LOT_Start LOT_End ) (:mmddyy10.)LOT $;
format lot_: mmddyy10.;
cards;
1 7/8/2016 . Second
1 . . Second
1 . . Second
1 . . Second
1 . . Second
1 . . Second
1 . . Second
1 . 10/9/2016 Second
2 7/22/2016 8/21/2016 First
2 . 8/21/2016 First
2 8/22/2016 11/11/2016 First
2 9/10/2016 11/11/2016 First
2 9/10/2016 11/11/2016 First
2 11/12/2016 3/18/2017 Second
2 . 3/18/2017 Second
2 . 3/18/2017 Second
2 . 3/18/2017 Second
2 3/19/2017 . Third
2 3/19/2017 . Third
2 3/19/2017 1/23/2018 Third
2 . 4/2/2018 Fourth
;
proc sql;
create table temp as
select id,lot, min(lot_start)as l_s,min(lot_end) as l_e
from have
group by id,lot;
quit;
data want;
if _n_=1 then do;
if 0 then set have temp;
dcl hash h(dataset:'temp');
h.definekey('id','lot');
h.definedata('l_s','l_e');
h.definedone();
end;
do until(last.lot);
set have;
by id lot notsorted;
if first.lot then h.find();
if lot_start then _n3=lot_start;
else do;
_n3=ifn(_n1,sum(_n3,1),_n3);
lot_start=_n3;
end;
if lot_end then _n4=lot_end;
else do;
_n4=ifn(_n2,sum(_n4,1),_n4);
lot_end=_n4;
end;
if l_s=lot_start then _n1=1;
if l_e=lot_end then _n2=1;
if n(lot_start,_n1)=0 then lot_start=l_s;
if n(lot_end,_n2)=0 then lot_end=l_e;
output;
end;
keep id lot_start lot_end lot;
run;
proc print noobs;run;
ID | LOT_Start | LOT_End | LOT |
---|---|---|---|
1 | 07/08/2016 | 10/09/2016 | Second |
1 | 07/09/2016 | 10/09/2016 | Second |
1 | 07/10/2016 | 10/09/2016 | Second |
1 | 07/11/2016 | 10/09/2016 | Second |
1 | 07/12/2016 | 10/09/2016 | Second |
1 | 07/13/2016 | 10/09/2016 | Second |
1 | 07/14/2016 | 10/09/2016 | Second |
1 | 07/15/2016 | 10/09/2016 | Second |
2 | 07/22/2016 | 08/21/2016 | First |
2 | 07/23/2016 | 08/21/2016 | First |
2 | 08/22/2016 | 11/11/2016 | First |
2 | 09/10/2016 | 11/11/2016 | First |
2 | 09/10/2016 | 11/11/2016 | First |
2 | 11/12/2016 | 03/18/2017 | Second |
2 | 11/13/2016 | 03/18/2017 | Second |
2 | 11/14/2016 | 03/18/2017 | Second |
2 | 11/15/2016 | 03/18/2017 | Second |
2 | 03/19/2017 | 01/23/2018 | Third |
2 | 03/19/2017 | 01/23/2018 | Third |
2 | 03/19/2017 | 01/23/2018 | Third |
2 | . | 04/02/2018 | Fourth |
Or just
data want;
do _n_=1 by 1 until(last.lot);
set have;
by id lot notsorted;
if lot_start then l_s=min(l_s,lot_start);
if lot_end then l_e=min(l_e,lot_end);
end;
do _n_=1 to _n_;
set have;
if lot_start then _n3=lot_start;
else do;
_n3=ifn(_n1,sum(_n3,1),_n3);
lot_start=_n3;
end;
if lot_end then _n4=lot_end;
else do;
_n4=ifn(_n2,sum(_n4,1),_n4);
lot_end=_n4;
end;
if l_s=lot_start then _n1=1;
if l_e=lot_end then _n2=1;
if n(lot_start,_n1)=0 then lot_start=l_s;
if n(lot_end,_n2)=0 then lot_end=l_e;
output;
end;
keep id lot_start lot_end lot;
run;
I know a solution has already been accepted, but I offer this as an example of avoiding sorting of data set files:
data have;
infile datalines dlm="," dsd truncover;
input ID $ LOT_Start :mmddyy10. LOT_End :mmddyy10. LOT $;
format lot_start lot_end yymmdd10.;
datalines;
1,7/8/2016,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,,Second
1,,10/9/2016,Second
2,7/22/2016,8/21/2016,First
2,,8/21/2016,First
2,8/22/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,9/10/2016,11/11/2016,First
2,11/12/2016,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,,3/18/2017,Second
2,3/19/2017,,Third
2,3/19/2017,,Third
2,3/19/2017,1/23/2018,Third
2,,4/2/2018,Fourth
;
data want (drop=_:);
array _start{0:30} _temporary_;
array _end{0:30} _temporary_;
call missing(of _start{*},of _end{*});
do _n=1 by 1 until (last.lot); /* Carry forward lot_start*/
set have;
by id lot notsorted;
_start{_n}=coalesce(lot_start,_start{_n-1});
if lot_end^=. then _end{_n}=lot_end;
end;
do _n=_n to 1 by -1;; /* Carry backward lot_end*/
if _end{_n}=. then _end{_n}=_end{_n+1};
end;
do _n=1 by 1 until (last.lot); /* Reread and retrieve carried values*/
set have;
by id lot notsorted;
lot_start=coalesce(lot_start,_start{_n});
lot_end=coalesce(lot_end,_end{_n});
output;
end;
run;
Editted note: The array lower bounds are set to 0 (to prevent error messages from the coalesce(lot_start,_start{_n-1}) expression). And set the upper bounds to some number you know will accommodate the largest ID/LOT group.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.