DATA Step, Macro, Functions and more

Fill missing with the next known values across columns

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

Fill missing with the next known values across columns

Hi, 

My current data looks like this, hour0-hour23 mean the 24 hours across the day. This data tends to record the time of work. Some people started work during daytime and some started work at night. So the start hour is 6a for some and 6p for others. 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

I want to fill the missing values between the columns while people at work, but leave the column as missing at night if people started work in the morning and leave the column as missing during the day if people started work at night. The ideal want data looks like this:

data want;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 . . .
3 1 1 1 1 1 1 . . . . . . . . . . . . 1 1 1 1 1 1
4 1 1 1 1 1 1 . . . . . . . . . . 1 1 1 1 1 1 1 1
;

I am not sure how to do that. Any idea? Thanks!!


Accepted Solutions
Solution
2 weeks ago
PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

@panda No need for words like sorry in a knowledge sharing forum. Please feel free to post any number of questions/answers etc.

OK, try the modified based on your latest description--

 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . 1 . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
if whichn(1,of t(*))<7 then do;
do _n_=12 by -1 to 1; /*check before 12*/
if not missing(t(_n_)) then k2=1;
else if missing(t(_n_)) and k2 then t(_n_)= coalesce(of t(*));
end;
k2=0;
do _n_= 14 to dim(t); /*check after 12*/
if not missing(t(_n_)) then k2=1;
else if missing(t(_n_)) and k2 then t(_n_)= coalesce(of t(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;

View solution in original post


All Replies
PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

[ Edited ]
data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;
data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
if whichn(1,of t(*))<7 then do;
do _n_=1 to dim(t);
if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;
Contributor
Posts: 44

Re: Fill missing with the next known values across columns

Posted in reply to novinosrin

Thanks @novinosrin! Your code works perfectly. Could you explain what does _n_ mean? 

 

Especially I can't understand the logic of this line of code:

if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));

Thanks!!

PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

@pandaHang on there is some fix needed which i failed to test effectively. I am gonna fix it properly to make it perfect. I sincerely apologise for some lack of attention to detail. Bear with me. I will be right back with a super clean code . Right now, I am very hungry at 2:20pm. Will message back after lunch

PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

Ok, I think the above should work as I believe I am thinking something that may not be warranted. However, I will stay open for any changes required should you notice any erroneous results. So please feel free to reach out the very moment you come across any issue

It's a very interesting question

Contributor
Posts: 44

Re: Fill missing with the next known values across columns

Posted in reply to novinosrin
Thanks! The code worked perfectly for me now. Does _n_ mean the count of column which has the first non-missing value?
PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

_n_  here is an index variable I am using to loop through the array. _n_ happens to be an automatic variable that is not written to the dataset. So typical lazy blokes like me who use that a lot as I am always lazy to type an extra statement. 

Contributor
Posts: 44

Re: Fill missing with the next known values across columns

Posted in reply to novinosrin
That makes sense. Sorry I am really not familiar with the whichn function. Could you elaborate on how you decide the cut off time (number): 7, 6, 19, 18? Many thanks!
PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

[ Edited ]

This description should give an easy intuition-

 

 

 timehrarray_unit
am1201
am112
am223
am334
am445
am556
am667
am778
am889
am9910
am101011
am111112
pm121213
pm11314
pm21415
pm31516
pm41617
pm51718
pm61819
pm71920
pm82021
pm92122
pm102223
pm112324

 

I will definitely come up with some notes to explain the code and logic a bit later. I am working on my assignment now, so little busy

Contributor
Posts: 44

Re: Fill missing with the next known values across columns

Posted in reply to novinosrin
Got it, thanks! This is very helpful! Smiley Happy
Contributor
Posts: 44

Re: Fill missing with the next known values across columns

Posted in reply to novinosrin

Sorry but I just realized the current code still does not work like what I want for people who work at night. 

do _n_=1 to dim(t);
if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
end;

For this part, it seems that you assume the end time of work is 5a, and start time of work is 6p. In reality the time of start and end varies between people. It would be perfect if the code can fill the blanks from 0a to the last non-missing 1 in the morning, and fill the blanks from first nonmissing 1 from afternoon to 0a. I tried this but did not work well:

do _n_=1 to k;
if missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
do _n_= k1+1 to dim(t);
if missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
end;

Any suggestions?  Thanks!!

PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

My apprehensions has just come true lol. Will be back with the fix., How urgent it is? is it ok, if i give right after my class lectures?

Contributor
Posts: 44

Re: Fill missing with the next known values across columns

Posted in reply to novinosrin
Thanks for the prompt reply! I can definitely wait till you finish your class! Sorry to interrupt your own work for this... Smiley Happy
PROC Star
Posts: 1,833

Re: Fill missing with the next known values across columns

[ Edited ]

@panda Plz try and let me know

 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
array t2(*) hour0-hour5;
array t3(*) hour18-hour23;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
k2=whichn(1,of t2(*));
if whichn(1,of t(*))<7 then do;
do _n_=k2 to dim(t2);
if missing(t2(_n_)) then t2(_n_)= coalesce(of t2(*));
end;
do _n_= dim(t3) by -1 to 1;
if missing( t3(_n_)) then t3(_n_)= coalesce(of t3(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;

 

Also plz test by replacing

do _n_= dim(t3) by -1 to 1;

in the above with

 

do _n_= dim(t3) by -1 to whichn(1,of t3(*)); 

 

 

Contributor
Posts: 44

Re: Fill missing with the next known values across columns

Posted in reply to novinosrin

Sorry to bug you again, but instead of assuming hour5 to be the last hour of non-missing value in the morning, and hour18 to be the first hour of non-missing value after 12. Is there a way to detect the position of last non-missing value before 12pm and first non-missing value after 12pm? 

 

If my data is like this: 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . 1 . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

Your solution will leave a blank on hour 6 for id 3. 

 

Thanks again!!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 205 views
  • 1 like
  • 2 in conversation