BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
panda
Quartz | Level 8

Hi, 

My current data looks like this, hour0-hour23 mean the 24 hours across the day. This data tends to record the time of work. Some people started work during daytime and some started work at night. So the start hour is 6a for some and 6p for others. 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

I want to fill the missing values between the columns while people at work, but leave the column as missing at night if people started work in the morning and leave the column as missing during the day if people started work at night. The ideal want data looks like this:

data want;
input id hour0-hour23;
cards;
1 . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 . . .
3 1 1 1 1 1 1 . . . . . . . . . . . . 1 1 1 1 1 1
4 1 1 1 1 1 1 . . . . . . . . . . 1 1 1 1 1 1 1 1
;

I am not sure how to do that. Any idea? Thanks!!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@panda No need for words like sorry in a knowledge sharing forum. Please feel free to post any number of questions/answers etc.

OK, try the modified based on your latest description--

 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . 1 . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
if whichn(1,of t(*))<7 then do;
do _n_=12 by -1 to 1; /*check before 12*/
if not missing(t(_n_)) then k2=1;
else if missing(t(_n_)) and k2 then t(_n_)= coalesce(of t(*));
end;
k2=0;
do _n_= 14 to dim(t); /*check after 12*/
if not missing(t(_n_)) then k2=1;
else if missing(t(_n_)) and k2 then t(_n_)= coalesce(of t(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;

View solution in original post

16 REPLIES 16
novinosrin
Tourmaline | Level 20
data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;
data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
if whichn(1,of t(*))<7 then do;
do _n_=1 to dim(t);
if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;
panda
Quartz | Level 8

Thanks @novinosrin! Your code works perfectly. Could you explain what does _n_ mean? 

 

Especially I can't understand the logic of this line of code:

if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));

Thanks!!

novinosrin
Tourmaline | Level 20

@pandaHang on there is some fix needed which i failed to test effectively. I am gonna fix it properly to make it perfect. I sincerely apologise for some lack of attention to detail. Bear with me. I will be right back with a super clean code . Right now, I am very hungry at 2:20pm. Will message back after lunch

novinosrin
Tourmaline | Level 20

Ok, I think the above should work as I believe I am thinking something that may not be warranted. However, I will stay open for any changes required should you notice any erroneous results. So please feel free to reach out the very moment you come across any issue

It's a very interesting question

panda
Quartz | Level 8
Thanks! The code worked perfectly for me now. Does _n_ mean the count of column which has the first non-missing value?
novinosrin
Tourmaline | Level 20

_n_  here is an index variable I am using to loop through the array. _n_ happens to be an automatic variable that is not written to the dataset. So typical lazy blokes like me who use that a lot as I am always lazy to type an extra statement. 

panda
Quartz | Level 8
That makes sense. Sorry I am really not familiar with the whichn function. Could you elaborate on how you decide the cut off time (number): 7, 6, 19, 18? Many thanks!
novinosrin
Tourmaline | Level 20

This description should give an easy intuition-

 

 

 timehrarray_unit
am1201
am112
am223
am334
am445
am556
am667
am778
am889
am9910
am101011
am111112
pm121213
pm11314
pm21415
pm31516
pm41617
pm51718
pm61819
pm71920
pm82021
pm92122
pm102223
pm112324

 

I will definitely come up with some notes to explain the code and logic a bit later. I am working on my assignment now, so little busy

panda
Quartz | Level 8
Got it, thanks! This is very helpful! 🙂
panda
Quartz | Level 8

Sorry but I just realized the current code still does not work like what I want for people who work at night. 

do _n_=1 to dim(t);
if (1<=_n_<=6) or (19<=_n_<=dim(t)) and missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
end;

For this part, it seems that you assume the end time of work is 5a, and start time of work is 6p. In reality the time of start and end varies between people. It would be perfect if the code can fill the blanks from 0a to the last non-missing 1 in the morning, and fill the blanks from first nonmissing 1 from afternoon to 0a. I tried this but did not work well:

do _n_=1 to k;
if missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
do _n_= k1+1 to dim(t);
if missing(t(_n_)) then t(_n_)=coalesce(of t(*));
end;
end;

Any suggestions?  Thanks!!

novinosrin
Tourmaline | Level 20

My apprehensions has just come true lol. Will be back with the fix., How urgent it is? is it ok, if i give right after my class lectures?

panda
Quartz | Level 8
Thanks for the prompt reply! I can definitely wait till you finish your class! Sorry to interrupt your own work for this... 🙂
novinosrin
Tourmaline | Level 20

@panda Plz try and let me know

 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . . . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

data want;
set have;
array t(*)hour0-hour23;
array t1(*) hour23-hour0;
array t2(*) hour0-hour5;
array t3(*) hour18-hour23;
k=whichn(1,of t(*));
k1=dim(t)-whichn(1,of t1(*));
k2=whichn(1,of t2(*));
if whichn(1,of t(*))<7 then do;
do _n_=k2 to dim(t2);
if missing(t2(_n_)) then t2(_n_)= coalesce(of t2(*));
end;
do _n_= dim(t3) by -1 to 1;
if missing( t3(_n_)) then t3(_n_)= coalesce(of t3(*));
end;
end;
else if 7<=whichn(1,of t(*))<=18 then do;
do _n_=k to k1;
t(_n_)=coalesce(of t(*));
end;
end;
drop k:;
run;

 

Also plz test by replacing

do _n_= dim(t3) by -1 to 1;

in the above with

 

do _n_= dim(t3) by -1 to whichn(1,of t3(*)); 

 

 

panda
Quartz | Level 8

Sorry to bug you again, but instead of assuming hour5 to be the last hour of non-missing value in the morning, and hour18 to be the first hour of non-missing value after 12. Is there a way to detect the position of last non-missing value before 12pm and first non-missing value after 12pm? 

 

If my data is like this: 

data have;
   input id hour0-hour23;
   cards;
1 . . . . . . 1 1 . . 1 1 1 1 . 1 1 1 . . . . . .
2 . . . . . . . . 1 1 1 1 1 . . . 1 1 1 . 1 . . .
3 1 1 . . 1 1 . 1 . . . . . . . . . . 1 1 1 . . .
4 1 1 1 1 . 1 . . . . . . . . . . 1 1 1 1 . 1 1 1
;

Your solution will leave a blank on hour 6 for id 3. 

 

Thanks again!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1743 views
  • 1 like
  • 2 in conversation