Hi,
My current data looks like this, hour0-hour23 mean the 24 hours across the day. This data tends to record the time of work. Some people started work during daytime and some started work at night. So the start hour is 6a for some and 6p for others.
data have;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;
I want to fill the missing values between the columns while people at work, but leave the column as missing at night if people started work in the morning and leave the column as missing during the day if people started work at night. The ideal want data looks like this:
data want;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 . . .
3 1 1 1 1 1 1 . . . . . . . . . . . . 1 1 1 1 1 1
4 1 1 1 1 1 1 . . . . . . . . . . 1 1 1 1 1 1 1 1
;
I am not sure how to do that. Any idea? Thanks!!
@panda No need for words like sorry in a knowledge sharing forum. Please feel free to post any number of questions/answers etc.
OK, try the modified based on your latest description--
data have;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . 1 . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;
data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
if whichn(1,of t(*))<7 then do;
do _n_=12 by -1 to 1; /*check before 12*/
if not missing(t(_n_)) then k2=1;
else if missing(t(_n_)) and k2 then t(_n_)= coalesce(of t(*));
end;
k2=0;
do _n_= 14 to dim(t); /*check after 12*/
if not missing(t(_n_)) then k2=1;
else if missing(t(_n_)) and k2 then t(_n_)= coalesce(of t(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;
data have;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;
data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
if whichn(1,of t(*))<7 then do;
do _n_=1 to dim(t);
if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;
Thanks @novinosrin! Your code works perfectly. Could you explain what does _n_ mean?
Especially I can't understand the logic of this line of code:
if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));
Thanks!!
@pandaHang on there is some fix needed which i failed to test effectively. I am gonna fix it properly to make it perfect. I sincerely apologise for some lack of attention to detail. Bear with me. I will be right back with a super clean code . Right now, I am very hungry at 2:20pm. Will message back after lunch
Ok, I think the above should work as I believe I am thinking something that may not be warranted. However, I will stay open for any changes required should you notice any erroneous results. So please feel free to reach out the very moment you come across any issue.
It's a very interesting question
_n_ here is an index variable I am using to loop through the array. _n_ happens to be an automatic variable that is not written to the dataset. So typical lazy blokes like me who use that a lot as I am always lazy to type an extra statement.
This description should give an easy intuition-
time | hr | array_unit | |
am | 12 | 0 | 1 |
am | 1 | 1 | 2 |
am | 2 | 2 | 3 |
am | 3 | 3 | 4 |
am | 4 | 4 | 5 |
am | 5 | 5 | 6 |
am | 6 | 6 | 7 |
am | 7 | 7 | 8 |
am | 8 | 8 | 9 |
am | 9 | 9 | 10 |
am | 10 | 10 | 11 |
am | 11 | 11 | 12 |
pm | 12 | 12 | 13 |
pm | 1 | 13 | 14 |
pm | 2 | 14 | 15 |
pm | 3 | 15 | 16 |
pm | 4 | 16 | 17 |
pm | 5 | 17 | 18 |
pm | 6 | 18 | 19 |
pm | 7 | 19 | 20 |
pm | 8 | 20 | 21 |
pm | 9 | 21 | 22 |
pm | 10 | 22 | 23 |
pm | 11 | 23 | 24 |
I will definitely come up with some notes to explain the code and logic a bit later. I am working on my assignment now, so little busy
Sorry but I just realized the current code still does not work like what I want for people who work at night.
do _n_=1 to dim(t); if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*)); end; end;
For this part, it seems that you assume the end time of work is 5a, and start time of work is 6p. In reality the time of start and end varies between people. It would be perfect if the code can fill the blanks from 0a to the last non-missing 1 in the morning, and fill the blanks from first nonmissing 1 from afternoon to 0a. I tried this but did not work well:
do _n_=1 to k; if missing(t(_n_)) then t(_n_)=coalesce(of t(*)); end; do _n_= k1+1 to dim(t); if missing(t(_n_)) then t(_n_)=coalesce(of t(*)); end; end;
Any suggestions? Thanks!!
My apprehensions has just come true lol. Will be back with the fix., How urgent it is? is it ok, if i give right after my class lectures?
@panda Plz try and let me know
data have;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;
data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
array t2(*) hour0-hour5;
array t3(*) hour18-hour23;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
k2=whichn(1,of t2(*));
if whichn(1,of t(*))<7 then do;
do _n_=k2 to dim(t2);
if missing(t2(_n_)) then t2(_n_)= coalesce(of t2(*));
end;
do _n_= dim(t3) by -1 to 1;
if missing( t3(_n_)) then t3(_n_)= coalesce(of t3(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;
Also plz test by replacing
do _n_= dim(t3) by -1 to 1;
in the above with
do _n_= dim(t3) by -1 to whichn(1,of t3(*));
Sorry to bug you again, but instead of assuming hour5 to be the last hour of non-missing value in the morning, and hour18 to be the first hour of non-missing value after 12. Is there a way to detect the position of last non-missing value before 12pm and first non-missing value after 12pm?
If my data is like this:
data have;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . 1 . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;
Your solution will leave a blank on hour 6 for id 3.
Thanks again!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.