Hello Community,
I am attempting to solve this coding issue. I have the dataset below. The status hour is set based on time in my dataset and then I need to count 24 hours from there to complete an analysis on just those set of hours once they go on status. I am attempting to count 24 hours from the point they go on status. The easiest way is for me to go to the source excel and add the '1' status for the subsequent 24 hours but it is not efficient as I will have many IDs I would have to do this for. So far my attempts count the spots that do not have a status. Can I please have help either adding 1 to the 23 rows below when status appears or just counting 24 rows with the new variable 'Status hour'.
Here is a few things I have tried from other threads to this point but it is not what I am looking for.
Please ask if I can clarify anything and thank you in advance for any and all help.
data Want;
set Have;
by ID;
if Status=1
then Status_Hour= 1;
else Status_Hour + 1;
run;
data Want;
set Have;
by ID Status notsorted;
if first.Status then Status_Hour= 0;
Status_Hour + 1;
run;
data Want;
set Have;
by ID;
if Status=1 then Status_Hour=1;
if first.ID_ then Status_Hour+1;
run;
Have:
ID | Date | Hour | Status | Status Hour |
223257 | 26-Jul-23 | 1 | ||
223257 | 26-Jul-23 | 2 | ||
223257 | 26-Jul-23 | 3 | ||
223257 | 26-Jul-23 | 4 | ||
223257 | 26-Jul-23 | 5 | ||
223257 | 26-Jul-23 | 6 | ||
223257 | 26-Jul-23 | 7 | ||
223257 | 26-Jul-23 | 8 | 1 | |
223257 | 26-Jul-23 | 9 | ||
223257 | 26-Jul-23 | 10 | ||
223257 | 26-Jul-23 | 11 | ||
223257 | 26-Jul-23 | 12 | ||
223257 | 26-Jul-23 | 13 | ||
223257 | 26-Jul-23 | 14 | ||
232706 | 26-Jul-23 | 1 | ||
232706 | 26-Jul-23 | 2 | ||
232706 | 26-Jul-23 | 3 | ||
232706 | 26-Jul-23 | 4 | ||
232706 | 26-Jul-23 | 5 | ||
232706 | 26-Jul-23 | 6 | ||
232706 | 26-Jul-23 | 7 | ||
232706 | 26-Jul-23 | 8 | ||
232706 | 26-Jul-23 | 9 | ||
232706 | 26-Jul-23 | 10 | ||
232706 | 26-Jul-23 | 11 | ||
232706 | 26-Jul-23 | 12 | 1 | |
232706 | 26-Jul-23 | 13 | ||
232706 | 26-Jul-23 | 14 | ||
232706 | 26-Jul-23 | 15 | ||
232706 | 26-Jul-23 | 16 |
Want:
ID | Date | Hour | Status | Status Hour |
223257 | 26-Jul-23 | 1 | ||
223257 | 26-Jul-23 | 2 | ||
223257 | 26-Jul-23 | 3 | ||
223257 | 26-Jul-23 | 4 | ||
223257 | 26-Jul-23 | 5 | ||
223257 | 26-Jul-23 | 6 | ||
223257 | 26-Jul-23 | 7 | ||
223257 | 26-Jul-23 | 8 | 1 | 1 |
223257 | 26-Jul-23 | 9 | 2 | |
223257 | 26-Jul-23 | 10 | 3 | |
223257 | 26-Jul-23 | 11 | 4 | |
223257 | 26-Jul-23 | 12 | 5 | |
223257 | 26-Jul-23 | 13 | 6 | |
223257 | 26-Jul-23 | 14 | 7 | |
232706 | 26-Jul-23 | 1 | ||
232706 | 26-Jul-23 | 2 | ||
232706 | 26-Jul-23 | 3 | ||
232706 | 26-Jul-23 | 4 | ||
232706 | 26-Jul-23 | 5 | ||
232706 | 26-Jul-23 | 6 | ||
232706 | 26-Jul-23 | 7 | ||
232706 | 26-Jul-23 | 8 | ||
232706 | 26-Jul-23 | 9 | ||
232706 | 26-Jul-23 | 10 | ||
232706 | 26-Jul-23 | 11 | ||
232706 | 26-Jul-23 | 12 | 1 | 1 |
232706 | 26-Jul-23 | 13 | 2 | |
232706 | 26-Jul-23 | 14 | 3 | |
232706 | 26-Jul-23 | 15 | 4 | |
232706 | 26-Jul-23 | 16 | 5 |
Under the assumption you have no missing hours in your data and that I understood your requirement correctly below should work.
data want;
set have;
by id date hour;
if first.id or status_hour=24 then status_hour=0;
else if status=1 then status_hour=1;
else if status_hour>0 then status_hour+1;
run;
Are those "hour" values of 1, 2, 3 actually used in your analysis or other process or are you just showing them as example counters to show the rows selected?
You also do not show a full set of 24. There are a couple ways "count 24 hours" might be interpreted and it helps to have an actual example.
Please provide example data in the form of working data step code pasted into a text box such as:
data have; infile datalines missover; input ID Date :date10. Hour Status; format date date9.; datalines; 223257 26-Jul-23 1 223257 26-Jul-23 2 223257 26-Jul-23 3 223257 26-Jul-23 4 223257 26-Jul-23 5 223257 26-Jul-23 6 223257 26-Jul-23 7 223257 26-Jul-23 8 1 223257 26-Jul-23 9 223257 26-Jul-23 10 223257 26-Jul-23 11 223257 26-Jul-23 12 223257 26-Jul-23 13 223257 26-Jul-23 14 232706 26-Jul-23 1 232706 26-Jul-23 2 232706 26-Jul-23 3 232706 26-Jul-23 4 232706 26-Jul-23 5 232706 26-Jul-23 6 232706 26-Jul-23 7 232706 26-Jul-23 8 232706 26-Jul-23 9 232706 26-Jul-23 10 232706 26-Jul-23 11 232706 26-Jul-23 12 1 232706 26-Jul-23 13 232706 26-Jul-23 14 232706 26-Jul-23 15 232706 26-Jul-23 16 ;
Is that similar enough to your actual data to serve as a Have example?
I felt that the post was really long and I shorten it. The data set per ID goes for days several hundreds of rows. The hour being named could be of use but not needed as I can get it there if instead of counting the hours this was solved simply by making the status repeat to the 23 rows below. Whatever is easiest honestly.
Here is a dataset with all 24 hours included.
ID | Date | Hour | Status | Status Hour |
223257 | 26-Jul-23 | 1 | ||
223257 | 26-Jul-23 | 2 | ||
223257 | 26-Jul-23 | 3 | ||
223257 | 26-Jul-23 | 4 | ||
223257 | 26-Jul-23 | 5 | ||
223257 | 26-Jul-23 | 6 | ||
223257 | 26-Jul-23 | 7 | ||
223257 | 26-Jul-23 | 8 | 1 | 1 |
223257 | 26-Jul-23 | 9 | 2 | |
223257 | 26-Jul-23 | 10 | 3 | |
223257 | 26-Jul-23 | 11 | 4 | |
223257 | 26-Jul-23 | 12 | 5 | |
223257 | 26-Jul-23 | 13 | 6 | |
223257 | 26-Jul-23 | 14 | 7 | |
223257 | 26-Jul-23 | 15 | 8 | |
223257 | 26-Jul-23 | 16 | 9 | |
223257 | 26-Jul-23 | 17 | 10 | |
223257 | 26-Jul-23 | 18 | 11 | |
223257 | 26-Jul-23 | 19 | 12 | |
223257 | 26-Jul-23 | 20 | 13 | |
223257 | 26-Jul-23 | 21 | 14 | |
223257 | 26-Jul-23 | 22 | 15 | |
223257 | 26-Jul-23 | 23 | 16 | |
223257 | 27-Jul-23 | 1 | 17 | |
223257 | 27-Jul-23 | 2 | 18 | |
223257 | 27-Jul-23 | 3 | 19 | |
223257 | 27-Jul-23 | 4 | 20 | |
223257 | 27-Jul-23 | 5 | 21 | |
223257 | 27-Jul-23 | 6 | 22 | |
223257 | 27-Jul-23 | 7 | 23 | |
223257 | 27-Jul-23 | 8 | 24 | |
223257 | 27-Jul-23 | 9 | ||
223257 | 27-Jul-23 | 10 | ||
223257 | … | … | ||
232706 | 26-Jul-23 | 1 | ||
232706 | 26-Jul-23 | 2 | ||
232706 | 26-Jul-23 | 3 | ||
232706 | 26-Jul-23 | 4 | ||
232706 | 26-Jul-23 | 5 | ||
232706 | 26-Jul-23 | 6 | ||
232706 | 26-Jul-23 | 7 | ||
232706 | 26-Jul-23 | 8 | ||
232706 | 26-Jul-23 | 9 | ||
232706 | 26-Jul-23 | 10 | ||
232706 | 26-Jul-23 | 11 | ||
232706 | 26-Jul-23 | 12 | 1 | 1 |
232706 | 26-Jul-23 | 13 | 2 | |
232706 | 26-Jul-23 | 14 | 3 | |
232706 | 26-Jul-23 | 15 | 4 | |
232706 | 26-Jul-23 | 16 | 5 | |
232706 | 26-Jul-23 | 17 | 6 | |
232706 | 26-Jul-23 | 18 | 7 | |
232706 | 26-Jul-23 | 19 | 8 | |
232706 | 26-Jul-23 | 20 | 9 | |
232706 | 26-Jul-23 | 21 | 10 | |
232706 | 26-Jul-23 | 22 | 11 | |
232706 | 26-Jul-23 | 23 | 12 | |
232706 | 26-Jul-23 | 24 | 13 | |
232706 | 27-Jul-23 | 1 | 14 | |
232706 | 27-Jul-23 | 2 | 15 | |
232706 | 27-Jul-23 | 3 | 16 | |
232706 | 27-Jul-23 | 4 | 17 | |
232706 | 27-Jul-23 | 5 | 18 | |
232706 | 27-Jul-23 | 6 | 19 | |
232706 | 27-Jul-23 | 7 | 20 | |
232706 | 27-Jul-23 | 8 | 21 | |
232706 | 27-Jul-23 | 9 | 22 | |
232706 | 27-Jul-23 | 10 | 23 | |
232706 | 27-Jul-23 | 11 | 24 | |
232706 | 27-Jul-23 | 12 | ||
232706 | 27-Jul-23 | 13 | ||
232706 | 27-Jul-23 | 14 |
Sorry I am putting it into a dataset now one moment.
Here it is.
data Testing;
input ID Date Hour Status;
format date date9.;
datalines;
223257 26-Jul-23 1
223257 26-Jul-23 2
223257 26-Jul-23 3
223257 26-Jul-23 4
223257 26-Jul-23 5
223257 26-Jul-23 6
223257 26-Jul-23 7
223257 26-Jul-23 8 1
223257 26-Jul-23 9
223257 26-Jul-23 10
223257 26-Jul-23 11
223257 26-Jul-23 12
223257 26-Jul-23 13
223257 26-Jul-23 14
223257 26-Jul-23 15
223257 26-Jul-23 16
223257 26-Jul-23 17
223257 26-Jul-23 18
223257 26-Jul-23 19
223257 26-Jul-23 20
223257 26-Jul-23 21
223257 26-Jul-23 22
223257 26-Jul-23 23
223257 27-Jul-23 1
223257 27-Jul-23 2
223257 27-Jul-23 3
223257 27-Jul-23 4
223257 27-Jul-23 5
223257 27-Jul-23 6
223257 27-Jul-23 7
223257 27-Jul-23 8
223257 27-Jul-23 9
223257 27-Jul-23 10
232706 26-Jul-23 1
232706 26-Jul-23 2
232706 26-Jul-23 3
232706 26-Jul-23 4
232706 26-Jul-23 5
232706 26-Jul-23 6
232706 26-Jul-23 7
232706 26-Jul-23 8
232706 26-Jul-23 9
232706 26-Jul-23 10
232706 26-Jul-23 11
232706 26-Jul-23 12 1
232706 26-Jul-23 13
232706 26-Jul-23 14
232706 26-Jul-23 15
232706 26-Jul-23 16
232706 26-Jul-23 17
232706 26-Jul-23 18
232706 26-Jul-23 19
232706 26-Jul-23 20
232706 26-Jul-23 21
232706 26-Jul-23 22
232706 26-Jul-23 23
232706 26-Jul-23 24
232706 27-Jul-23 1
232706 27-Jul-23 2
232706 27-Jul-23 3
232706 27-Jul-23 4
232706 27-Jul-23 5
232706 27-Jul-23 6
232706 27-Jul-23 7
232706 27-Jul-23 8
232706 27-Jul-23 9
232706 27-Jul-23 10
232706 27-Jul-23 11
232706 27-Jul-23 12
232706 27-Jul-23 13
232706 27-Jul-23 14
;
run;
Is there some reason you are excluding midnight from your data? I would expect to see one of
26-Jul-23 24 OR 27-Jul-23 0 as a midnight value for ID 223257?
The below code doesn't exactly match your "want" because you don't have a midnight for 223257 and do for the 232706 id. So the definition of "24 hours" differs a bit between the two. If there is data problem with midnight consistently I might look into that. When midnight is not present your actual time interval between start and end is 25 hours. So something needs to be considered about either the problem requirement or the data condition when starting.
You do want to test your data step code before showing it. You didn't apply an informat for date so that throws invalid data messages and without the Infile datalines missover; (or truncover) only every other line of the data is actually read sort of because of the missing value for STATUS on most of your lines. The option Missover or Truncover will stop reading when the input line runs out. Without it the ID value is read as STATUS for most of the lines and the reading pointer advances at that point.
Quite often a process that uses date and time is best done by creating a DATETIME value and appropriate functions with those values because then SAS will take care of those pesky change of date behaviors.
My take on your problem (note the corrected data step code).
data Testing; infile datalines missover; input ID Date :date10. Hour Status; format date date9.; datalines; 223257 26-Jul-23 1 223257 26-Jul-23 2 223257 26-Jul-23 3 223257 26-Jul-23 4 223257 26-Jul-23 5 223257 26-Jul-23 6 223257 26-Jul-23 7 223257 26-Jul-23 8 1 223257 26-Jul-23 9 223257 26-Jul-23 10 223257 26-Jul-23 11 223257 26-Jul-23 12 223257 26-Jul-23 13 223257 26-Jul-23 14 223257 26-Jul-23 15 223257 26-Jul-23 16 223257 26-Jul-23 17 223257 26-Jul-23 18 223257 26-Jul-23 19 223257 26-Jul-23 20 223257 26-Jul-23 21 223257 26-Jul-23 22 223257 26-Jul-23 23 223257 27-Jul-23 1 223257 27-Jul-23 2 223257 27-Jul-23 3 223257 27-Jul-23 4 223257 27-Jul-23 5 223257 27-Jul-23 6 223257 27-Jul-23 7 223257 27-Jul-23 8 223257 27-Jul-23 9 223257 27-Jul-23 10 232706 26-Jul-23 1 232706 26-Jul-23 2 232706 26-Jul-23 3 232706 26-Jul-23 4 232706 26-Jul-23 5 232706 26-Jul-23 6 232706 26-Jul-23 7 232706 26-Jul-23 8 232706 26-Jul-23 9 232706 26-Jul-23 10 232706 26-Jul-23 11 232706 26-Jul-23 12 1 232706 26-Jul-23 13 232706 26-Jul-23 14 232706 26-Jul-23 15 232706 26-Jul-23 16 232706 26-Jul-23 17 232706 26-Jul-23 18 232706 26-Jul-23 19 232706 26-Jul-23 20 232706 26-Jul-23 21 232706 26-Jul-23 22 232706 26-Jul-23 23 232706 26-Jul-23 24 232706 27-Jul-23 1 232706 27-Jul-23 2 232706 27-Jul-23 3 232706 27-Jul-23 4 232706 27-Jul-23 5 232706 27-Jul-23 6 232706 27-Jul-23 7 232706 27-Jul-23 8 232706 27-Jul-23 9 232706 27-Jul-23 10 232706 27-Jul-23 11 232706 27-Jul-23 12 232706 27-Jul-23 13 232706 27-Jul-23 14 ; run; proc sql; create table want as select b.id, b.date, b.hour ,b.datetime2 format datetime18. from (select id, date,hour,status, dhms(date,hour,0,0) as datetime from testing where status=1) as a left join (select id, date,hour,status, dhms(date,hour,0,0) as datetime2 from testing) as b on a.id=b.id and a.datetime le b.datetime2 le intnx('hour',a.datetime,24,'b') order by b.id, b.date, b.hour ; quit;
The Input statement wants some way to know that DATE is to be read as date value using an informat such as Date10. The :date10. on the input statement tells SAS to read the next value and attempt to use that format. You could also use an INFORMAT statement to assign a format to the variable .
Proc SQL is a way to match records. Select lists variables or calculated values. The From provides instructions of where data comes from. Where places constraints on what is selected.
Note that placing a select /from /where inside paraentheses as a sub query provides one group of values, in this case those that only have Status=1 and refers to the whole set with the alias A. Values from that group are addressed with a.<variable name>.
Note that I used the DHMS function to combine the date and hour into a single datetime value.
The Left JOIN says take every record from A and join on the second alias (all of the records in testing plus an added datetime variable) where the ID is the same and the second datetime variable is between the Status=1 datetime and , using the INTNX function to advance that time, up to + 24 hours.
Under the assumption you have no missing hours in your data and that I understood your requirement correctly below should work.
data want;
set have;
by id date hour;
if first.id or status_hour=24 then status_hour=0;
else if status=1 then status_hour=1;
else if status_hour>0 then status_hour+1;
run;
Thank you very much this works perfectly. I applied to my dataset and Status_hour = 0 for the hours before status = 1 and 0 after 24 hours are counted. Then it does it for every ID. Thank you so very much.
Actually: For this to also work as desired for cases where status=1 and hour=1 I believe you need to remove the first ELSE.
I ran it both ways and there is no difference in the result. Both programs identify the hour that status =1 appears then applies a new variable Status_Hour as 1 and continues counting down 24 rows. Works perfectly either way.
Thank you for your time.
If you don't remove the first ELSE then the code won't work properly if status=1 for the very first row of a new ID.
Please use below code to always get a correct result.
data want;
set have;
by id date hour;
if first.id or status_hour=24 then status_hour=0;
if status=1 then status_hour=1;
else if status_hour>0 then status_hour+1;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.