Hi all,
Below is a very small part of my big messy data
data temp;
input ID Action $ TS_SER TS Verify Notecount;
informat TS_SER datetime14. TS datetime14.;
format TS_SER datetime14. TS datetime14.;
datalines;
252670 Sign 13DEC17:11:05 13DEC17:11:14 . .
252670 Perform 13DEC17:11:05 13DEC17:11:14 . .
252670 Verify 13DEC17:11:05 13DEC17:11:14 1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . .
252670 Perform 13DEC17:15:27 13DEC17:15:41 . .
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . .
252670 Perform 13DEC17:16:28 13DEC17:16:30 . .
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . .
252670 Modify 13DEC17:16:29 13DEC17:16:30 . .
252670 Perform 14DEC17:07:32 14DEC17:07:34 . .
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . .
252670 Perform 14DEC17:09:15 14DEC17:09:15 . .
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . .
252670 Perform 14DEC17:14:18 14DEC17:16:29 . .
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . .
252670 Sign 14DEC17:14:18 14DEC17:16:29 . .
252670 Sign 14DEC17:16:13 14DEC17:16:17 . .
252670 Perform 14DEC17:16:13 14DEC17:16:17 . .
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Modify 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:16:29 14DEC17:16:29 . .
252670 Sign 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:19:12 14DEC17:19:15 . .
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . .
252670 Perform 15DEC17:15:27 15DEC17:15:27 . .
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . .
252670 Sign 16DEC17:10:38 16DEC17:10:38 . .
252670 Perform 16DEC17:10:38 16DEC17:10:38 . .
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . .
318672 Modify 13OCT17:20:36 14OCT17:10:11 . .
318672 Modify 13OCT17:20:37 14OCT17:10:11 . .
318672 Modify 13OCT17:20:40 14OCT17:10:11 . .
318672 Modify 13OCT17:20:44 14OCT17:10:11 . .
318672 Modify 13OCT17:20:46 14OCT17:10:11 . .
318672 Modify 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:04 13OCT17:23:04 . .
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . .
318672 Sign 14OCT17:10:26 14OCT17:10:52 . .
318672 Perform 14OCT17:10:26 14OCT17:14:09 . .
318672 Modify 14OCT17:10:52 14OCT17:14:09 . .
318672 Sign 14OCT17:10:54 14OCT17:10:54 . .
318672 Sign 14OCT17:10:54 14OCT17:14:09 . .
318672 Modify 14OCT17:10:54 14OCT17:14:09 . .
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;
I want to retain value of notecount so my clean data would look like this (as below). I need to retain variable notecount based on TS_SER and TS.
data temp1;
input ID Action $ TS_SER TS Verify Notecount;
informat TS_SER datetime14. TS datetime14.;
format TS_SER datetime14. TS datetime14.;
datalines;
252670 Sign 13DEC17:11:05 13DEC17:11:14 . 1
252670 Perform 13DEC17:11:05 13DEC17:11:14 . 1
252670 Verify 13DEC17:11:05 13DEC17:11:14 1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . 2
252670 Perform 13DEC17:15:27 13DEC17:15:41 . 2
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . 3
252670 Perform 13DEC17:16:28 13DEC17:16:30 . 3
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . 3
252670 Modify 13DEC17:16:29 13DEC17:16:30 . 3
252670 Perform 14DEC17:07:32 14DEC17:07:34 . 4
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . 4
252670 Perform 14DEC17:09:15 14DEC17:09:15 . 5
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . 5
252670 Perform 14DEC17:14:18 14DEC17:16:29 . 6
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . 6
252670 Sign 14DEC17:14:18 14DEC17:16:29 . 6
252670 Modify 14DEC17:16:29 14DEC17:16:29 . 6
252670 Perform 14DEC17:16:29 14DEC17:16:29 . 6
252670 Sign 14DEC17:16:29 14DEC17:16:29 . 6
252670 Sign 14DEC17:16:13 14DEC17:16:17 . 7
252670 Perform 14DEC17:16:13 14DEC17:16:17 . 7
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Perform 14DEC17:19:12 14DEC17:19:15 . 8
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . 8
252670 Perform 15DEC17:15:27 15DEC17:15:27 . 9
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . 9
252670 Sign 16DEC17:10:38 16DEC17:10:38 . 10
252670 Perform 16DEC17:10:38 16DEC17:10:38 . 10
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:36 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:37 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:40 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:44 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:46 14OCT17:10:11 . 1
318672 Modify 13OCT17:23:03 14OCT17:10:11 . 1
318672 Sign 13OCT17:23:03 14OCT17:10:11 . 1
318672 Sign 13OCT17:23:04 13OCT17:23:04 . 1
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . 1
318672 Sign 14OCT17:10:26 14OCT17:10:52 . 2
318672 Perform 14OCT17:10:26 14OCT17:14:09 . 2
318672 Modify 14OCT17:10:52 14OCT17:14:09 . 2
318672 Sign 14OCT17:10:54 14OCT17:10:54 . 2
318672 Sign 14OCT17:10:54 14OCT17:14:09 . 2
318672 Modify 14OCT17:10:54 14OCT17:14:09 . 2
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;
proc print data=work.temp1;
run;
Thank you all.
Here could give you a start.
data temp;
input ID Action $ TS_SER TS Verify Notecount;
informat TS_SER datetime14. TS datetime14.;
format TS_SER datetime14. TS datetime14.;
datalines;
252670 Sign 13DEC17:11:05 13DEC17:11:14 . .
252670 Perform 13DEC17:11:05 13DEC17:11:14 . .
252670 Verify 13DEC17:11:05 13DEC17:11:14 1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . .
252670 Perform 13DEC17:15:27 13DEC17:15:41 . .
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . .
252670 Perform 13DEC17:16:28 13DEC17:16:30 . .
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . .
252670 Modify 13DEC17:16:29 13DEC17:16:30 . .
252670 Perform 14DEC17:07:32 14DEC17:07:34 . .
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . .
252670 Perform 14DEC17:09:15 14DEC17:09:15 . .
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . .
252670 Perform 14DEC17:14:18 14DEC17:16:29 . .
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . .
252670 Sign 14DEC17:14:18 14DEC17:16:29 . .
252670 Sign 14DEC17:16:13 14DEC17:16:17 . .
252670 Perform 14DEC17:16:13 14DEC17:16:17 . .
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Modify 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:16:29 14DEC17:16:29 . .
252670 Sign 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:19:12 14DEC17:19:15 . .
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . .
252670 Perform 15DEC17:15:27 15DEC17:15:27 . .
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . .
252670 Sign 16DEC17:10:38 16DEC17:10:38 . .
252670 Perform 16DEC17:10:38 16DEC17:10:38 . .
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . .
318672 Modify 13OCT17:20:36 14OCT17:10:11 . .
318672 Modify 13OCT17:20:37 14OCT17:10:11 . .
318672 Modify 13OCT17:20:40 14OCT17:10:11 . .
318672 Modify 13OCT17:20:44 14OCT17:10:11 . .
318672 Modify 13OCT17:20:46 14OCT17:10:11 . .
318672 Modify 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:04 13OCT17:23:04 . .
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . .
318672 Sign 14OCT17:10:26 14OCT17:10:52 . .
318672 Perform 14OCT17:10:26 14OCT17:14:09 . .
318672 Modify 14OCT17:10:52 14OCT17:14:09 . .
318672 Sign 14OCT17:10:54 14OCT17:10:54 . .
318672 Sign 14OCT17:10:54 14OCT17:14:09 . .
318672 Modify 14OCT17:10:54 14OCT17:14:09 . .
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;
proc sort data=temp;
by id TS;
run;
data want;
merge temp
temp(keep=id TS_SER TS Notecount rename=(Notecount=want)
where=(want is not missing));
by id TS;
run;
proc print;run;
Thank you very much for your help. I am not getting desired results but I can try to work around. Please let me know if you find any other solution.
Where from do you get the "2" for ID=318672 and TS=14OCT17:10:52?
Other than that, try
proc sql;
create table want as
select
ID, Action, TS_SER, TS, Verify,
max(notecount) as notecount
from temp
group by id, ts
order by id, ts_ser
;
quit;
it would be 2 because if you look at the Ser_Ts when it got verified.14OCT17:10:54, both timestamps are very close. Your codes are very helpful but not giving me desired results. I will try to work around through your codes
Define "close". When do timestamps have to be considered as one, and when are they separate?
Yes, you are absolutely right. First we need to define "Close". Honestly I am also confuse. Actually this is someone else projects who left and now I am responsible to continue working on it. It is an ongoing project and my supervisor wants me to continue working on it the same way previous person was doing. She used STATA for analysis, and I have no background knowledge of STATA. I got most of her work but only this part of coding is difficult to understand. Below are STATA codes, it might help you. What I am understanding that data is sorted at id, ts_ser, and ts and then notecount variable has retained based on if ((ts_ser=lag(ts_ser) or ((ts_ser-ts_ser) le 2 min))) or (ts=lag(ts) or ((ts-ts) le 2 min))). In codes below, I think 2000= 2minutes. I am hoping I am making some sense to you.
Thanks
set more off
local i=1
while `i'<7 {
while inlist(`i',1,3,5) {
local j=1
sort id ts_ser ts
while `j'<100 {
replace notecount=notecount[_n+1] if notecount==. & notecount[_n+1]!=. & id==id[_n+1] & ///
(ts_ser==ts_ser[_n+1] | abs(ts_ser-ts_ser[_n+1])<=2000 | ts==ts[_n+1] | abs(ts-ts[_n+1])<=2000)
replace notecount=notecount[_n-1] if notecount==. & notecount[_n-1]!=. & id==id[_n-1] & ///
(ts_ser==ts_ser[_n-1] | abs(ts_ser-ts_ser[_n-1])<=2000 | ts==ts[_n-1] | abs(ts-ts[_n-1])<=2000)
local j=`j'+1
}
local i=`i'+1
}
while inlist(`i',2,4,6) {
local j=1
sort id ts ts_ser
while `j'<100 {
replace notecount=notecount[_n+1] if notecount==. & notecount[_n+1]!=. & id==id[_n+1] & ///
(ts_ser==ts_ser[_n+1] | abs(ts_ser-ts_ser[_n+1])<=2000 | ts==ts[_n+1] | abs(ts-ts[_n+1])<=2000)
replace notecount=notecount[_n-1] if notecount==. & notecount[_n-1]!=. & id==id[_n-1] & ///
(ts_ser==ts_ser[_n-1] | abs(ts_ser-ts_ser[_n-1])<=2000 | ts==ts[_n-1] | abs(ts-ts[_n-1])<=2000)
local j=`j'+1
}
local i=`i'+1
}
}
How about adding a cleaning step that aligns the TS timestamps:
data temp;
input
ID
Action $
TS_SER
TS
Verify
Notecount
;
informat
TS_SER
TS
datetime14.
;
format
TS_SER
TS
datetime14.
;
datalines;
252670 Sign 13DEC17:11:05 13DEC17:11:14 . .
252670 Perform 13DEC17:11:05 13DEC17:11:14 . .
252670 Verify 13DEC17:11:05 13DEC17:11:14 1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . .
252670 Perform 13DEC17:15:27 13DEC17:15:41 . .
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . .
252670 Perform 13DEC17:16:28 13DEC17:16:30 . .
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . .
252670 Modify 13DEC17:16:29 13DEC17:16:30 . .
252670 Perform 14DEC17:07:32 14DEC17:07:34 . .
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . .
252670 Perform 14DEC17:09:15 14DEC17:09:15 . .
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . .
252670 Perform 14DEC17:14:18 14DEC17:16:29 . .
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . .
252670 Sign 14DEC17:14:18 14DEC17:16:29 . .
252670 Sign 14DEC17:16:13 14DEC17:16:17 . .
252670 Perform 14DEC17:16:13 14DEC17:16:17 . .
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Modify 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:16:29 14DEC17:16:29 . .
252670 Sign 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:19:12 14DEC17:19:15 . .
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . .
252670 Perform 15DEC17:15:27 15DEC17:15:27 . .
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . .
252670 Sign 16DEC17:10:38 16DEC17:10:38 . .
252670 Perform 16DEC17:10:38 16DEC17:10:38 . .
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . .
318672 Modify 13OCT17:20:36 14OCT17:10:11 . .
318672 Modify 13OCT17:20:37 14OCT17:10:11 . .
318672 Modify 13OCT17:20:40 14OCT17:10:11 . .
318672 Modify 13OCT17:20:44 14OCT17:10:11 . .
318672 Modify 13OCT17:20:46 14OCT17:10:11 . .
318672 Modify 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:04 13OCT17:23:04 . .
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . .
318672 Sign 14OCT17:10:26 14OCT17:10:52 . .
318672 Perform 14OCT17:10:26 14OCT17:14:09 . .
318672 Modify 14OCT17:10:52 14OCT17:14:09 . .
318672 Sign 14OCT17:10:54 14OCT17:10:54 . .
318672 Sign 14OCT17:10:54 14OCT17:14:09 . .
318672 Modify 14OCT17:10:54 14OCT17:14:09 . .
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;
proc sort data=temp;
by id ts;
run;
data have; /* "clean" timestamps */
set temp;
by id ts;
retain current_time;
if first.id then current_time = ts;
if ts - current_time < 120 /* 2 minutes */
then ts = current_time;
else current_time = ts;
drop current_time;
run;
proc sql;
create table want as
select
ID, Action, TS_SER, TS, Verify,
max(notecount) as notecount
from have
group by id, ts
order by id, ts_ser
;
quit;
Thank you very much for your suggestion. I am still not getting desired output after cleaning step. I am working on better understanding the data. Sometime I feel like I know what I should I do and after a second I am confuse. I am working on other project for now because it was not urgent. I will update you once I start working on it again.
Thanks again, you are very helpful.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.