DATA Step, Macro, Functions and more

How to retain a value if we have a big messy data

Reply
Contributor
Posts: 41

How to retain a value if we have a big messy data

[ Edited ]

Hi all,

 

Below is a very small part of my big messy data

 

data temp;
 input ID Action $ TS_SER TS Verify Notecount;
 informat TS_SER datetime14. TS datetime14.;
 format TS_SER  datetime14. TS datetime14.;
 datalines;

252670 Sign 13DEC17:11:05 13DEC17:11:14 . .
252670 Perform 13DEC17:11:05 13DEC17:11:14 . .
252670 Verify 13DEC17:11:05 13DEC17:11:14  1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . .
252670 Perform 13DEC17:15:27 13DEC17:15:41 . .
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . .
252670 Perform 13DEC17:16:28 13DEC17:16:30 . .
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . .
252670 Modify 13DEC17:16:29 13DEC17:16:30 . .
252670 Perform 14DEC17:07:32 14DEC17:07:34 . .
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . .
252670 Perform 14DEC17:09:15 14DEC17:09:15 . .
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . .
252670 Perform 14DEC17:14:18 14DEC17:16:29 . .
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . .
252670 Sign 14DEC17:14:18 14DEC17:16:29 . .
252670 Sign 14DEC17:16:13 14DEC17:16:17 . .
252670 Perform 14DEC17:16:13 14DEC17:16:17 . .
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Modify 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:16:29 14DEC17:16:29 . .
252670 Sign 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:19:12 14DEC17:19:15 . .
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . .
252670 Perform 15DEC17:15:27 15DEC17:15:27 . .
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . .
252670 Sign 16DEC17:10:38 16DEC17:10:38 . .
252670 Perform 16DEC17:10:38 16DEC17:10:38 . .
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . .
318672 Modify 13OCT17:20:36 14OCT17:10:11  . .
318672 Modify 13OCT17:20:37 14OCT17:10:11 . .
318672 Modify 13OCT17:20:40 14OCT17:10:11 . .
318672 Modify 13OCT17:20:44 14OCT17:10:11 . .
318672 Modify 13OCT17:20:46 14OCT17:10:11 . .
318672 Modify 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:04 13OCT17:23:04 . .
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . .
318672 Sign 14OCT17:10:26 14OCT17:10:52 . .
318672 Perform 14OCT17:10:26 14OCT17:14:09 . .
318672 Modify 14OCT17:10:52 14OCT17:14:09 . .
318672 Sign 14OCT17:10:54 14OCT17:10:54 . .
318672 Sign 14OCT17:10:54 14OCT17:14:09 . .
318672 Modify 14OCT17:10:54 14OCT17:14:09 . .
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;

 

I want to retain value of notecount so my clean data would look like this (as below). I need to retain variable notecount  based on TS_SER and TS.


data temp1;
 input ID Action $ TS_SER TS Verify Notecount;
  informat TS_SER datetime14. TS datetime14.;
 format TS_SER datetime14. TS datetime14.;
 datalines;

252670 Sign 13DEC17:11:05 13DEC17:11:14 . 1
252670 Perform 13DEC17:11:05 13DEC17:11:14 . 1
252670 Verify 13DEC17:11:05 13DEC17:11:14  1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . 2
252670 Perform 13DEC17:15:27 13DEC17:15:41 . 2
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . 3
252670 Perform 13DEC17:16:28 13DEC17:16:30 . 3
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . 3
252670 Modify 13DEC17:16:29 13DEC17:16:30 . 3
252670 Perform 14DEC17:07:32 14DEC17:07:34 . 4
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . 4
252670 Perform 14DEC17:09:15 14DEC17:09:15 . 5
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . 5
252670 Perform 14DEC17:14:18 14DEC17:16:29 . 6
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . 6
252670 Sign 14DEC17:14:18 14DEC17:16:29 . 6
252670 Modify 14DEC17:16:29 14DEC17:16:29 . 6
252670 Perform 14DEC17:16:29 14DEC17:16:29 . 6
252670 Sign 14DEC17:16:29 14DEC17:16:29 . 6
252670 Sign 14DEC17:16:13 14DEC17:16:17 . 7
252670 Perform 14DEC17:16:13 14DEC17:16:17 . 7
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Perform 14DEC17:19:12 14DEC17:19:15 . 8
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . 8
252670 Perform 15DEC17:15:27 15DEC17:15:27 . 9
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . 9
252670 Sign 16DEC17:10:38 16DEC17:10:38 . 10
252670 Perform 16DEC17:10:38 16DEC17:10:38 . 10
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:36 14OCT17:10:11  . 1
318672 Modify 13OCT17:20:37 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:40 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:44 14OCT17:10:11 . 1
318672 Modify 13OCT17:20:46 14OCT17:10:11 . 1
318672 Modify 13OCT17:23:03 14OCT17:10:11 . 1
318672 Sign 13OCT17:23:03 14OCT17:10:11 . 1
318672 Sign 13OCT17:23:04 13OCT17:23:04 . 1
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . 1
318672 Sign 14OCT17:10:26 14OCT17:10:52 . 2
318672 Perform 14OCT17:10:26 14OCT17:14:09 . 2
318672 Modify 14OCT17:10:52 14OCT17:14:09 . 2
318672 Sign 14OCT17:10:54 14OCT17:10:54 . 2
318672 Sign 14OCT17:10:54 14OCT17:14:09 . 2
318672 Modify 14OCT17:10:54 14OCT17:14:09 . 2
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;

proc print data=work.temp1;
run;

 

 

Thank you all.

 

 

 

Super User
Posts: 10,691

Re: How to retain a value if we have a big messy data

Here could give you a start.

 


data temp;
 input ID Action $ TS_SER TS Verify Notecount;
 informat TS_SER datetime14. TS datetime14.;
 format TS_SER  datetime14. TS datetime14.;
 datalines;
252670 Sign 13DEC17:11:05 13DEC17:11:14 . .
252670 Perform 13DEC17:11:05 13DEC17:11:14 . .
252670 Verify 13DEC17:11:05 13DEC17:11:14  1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . .
252670 Perform 13DEC17:15:27 13DEC17:15:41 . .
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . .
252670 Perform 13DEC17:16:28 13DEC17:16:30 . .
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . .
252670 Modify 13DEC17:16:29 13DEC17:16:30 . .
252670 Perform 14DEC17:07:32 14DEC17:07:34 . .
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . .
252670 Perform 14DEC17:09:15 14DEC17:09:15 . .
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . .
252670 Perform 14DEC17:14:18 14DEC17:16:29 . .
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . .
252670 Sign 14DEC17:14:18 14DEC17:16:29 . .
252670 Sign 14DEC17:16:13 14DEC17:16:17 . .
252670 Perform 14DEC17:16:13 14DEC17:16:17 . .
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Modify 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:16:29 14DEC17:16:29 . .
252670 Sign 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:19:12 14DEC17:19:15 . .
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . .
252670 Perform 15DEC17:15:27 15DEC17:15:27 . .
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . .
252670 Sign 16DEC17:10:38 16DEC17:10:38 . .
252670 Perform 16DEC17:10:38 16DEC17:10:38 . .
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . .
318672 Modify 13OCT17:20:36 14OCT17:10:11  . .
318672 Modify 13OCT17:20:37 14OCT17:10:11 . .
318672 Modify 13OCT17:20:40 14OCT17:10:11 . .
318672 Modify 13OCT17:20:44 14OCT17:10:11 . .
318672 Modify 13OCT17:20:46 14OCT17:10:11 . .
318672 Modify 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:04 13OCT17:23:04 . .
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . .
318672 Sign 14OCT17:10:26 14OCT17:10:52 . .
318672 Perform 14OCT17:10:26 14OCT17:14:09 . .
318672 Modify 14OCT17:10:52 14OCT17:14:09 . .
318672 Sign 14OCT17:10:54 14OCT17:10:54 . .
318672 Sign 14OCT17:10:54 14OCT17:14:09 . .
318672 Modify 14OCT17:10:54 14OCT17:14:09 . .
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;
proc sort data=temp;
by  id  TS;
run;
data want;
 merge temp 
 temp(keep=id TS_SER TS Notecount rename=(Notecount=want)
 where=(want is not missing));
 by id  TS;
run;
proc print;run;
Contributor
Posts: 41

Re: How to retain a value if we have a big messy data

Thank you very much for your help. I am not getting desired results but I can try to work around. Please let me know if you find any other solution.

 

Super User
Posts: 9,922

Re: How to retain a value if we have a big messy data

[ Edited ]

Where from do you get the "2" for ID=318672 and TS=14OCT17:10:52?

 

Other than that, try

proc sql;
create table want as
select
  ID, Action, TS_SER, TS, Verify,
  max(notecount) as notecount
from temp
group by id, ts
order by id, ts_ser
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 41

Re: How to retain a value if we have a big messy data

Posted in reply to KurtBremser

it would be 2 because if you look at the Ser_Ts when it got verified.14OCT17:10:54, both timestamps are very close. Your codes are very helpful but not giving me desired results. I will try to work around through your codes

Super User
Posts: 9,922

Re: How to retain a value if we have a big messy data

Define "close". When do timestamps have to be considered as one, and when are they separate?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 41

Re: How to retain a value if we have a big messy data

[ Edited ]
Posted in reply to KurtBremser

Yes, you are absolutely right. First we need to define "Close". Honestly I am also confuse. Actually this is someone else projects who left and now I am responsible to continue working on it. It is an ongoing project and my supervisor wants me to continue working on it the same way previous person was doing. She used STATA for analysis, and I have no background knowledge of STATA. I got most of her work but only this part of coding is difficult to understand. Below are STATA codes, it might help you. What I am understanding that data is sorted at id, ts_ser, and ts and then notecount variable has retained based on if ((ts_ser=lag(ts_ser) or ((ts_ser-ts_ser) le  2 min))) or (ts=lag(ts) or ((ts-ts) le 2 min))). In codes below, I think 2000= 2minutes. I am hoping I am making some sense to you.

Thanks

 

 

 

set more off
 local i=1
 while `i'<7 {
  while inlist(`i',1,3,5) {
   local j=1
   sort id ts_ser ts
   while `j'<100 {
    replace notecount=notecount[_n+1] if notecount==. & notecount[_n+1]!=. & id==id[_n+1] & ///
     (ts_ser==ts_ser[_n+1] | abs(ts_ser-ts_ser[_n+1])<=2000 | ts==ts[_n+1] | abs(ts-ts[_n+1])<=2000)
    replace notecount=notecount[_n-1] if notecount==. & notecount[_n-1]!=. & id==id[_n-1] & ///
     (ts_ser==ts_ser[_n-1] | abs(ts_ser-ts_ser[_n-1])<=2000 | ts==ts[_n-1] | abs(ts-ts[_n-1])<=2000)
    local j=`j'+1
    }
   local i=`i'+1
   }
  while inlist(`i',2,4,6) {
   local j=1
   sort id ts ts_ser
   while `j'<100 {
    replace notecount=notecount[_n+1] if notecount==. & notecount[_n+1]!=. & id==id[_n+1] & ///
     (ts_ser==ts_ser[_n+1] | abs(ts_ser-ts_ser[_n+1])<=2000 | ts==ts[_n+1] | abs(ts-ts[_n+1])<=2000)
    replace notecount=notecount[_n-1] if notecount==. & notecount[_n-1]!=. & id==id[_n-1] & ///
     (ts_ser==ts_ser[_n-1] | abs(ts_ser-ts_ser[_n-1])<=2000 | ts==ts[_n-1] | abs(ts-ts[_n-1])<=2000)
    local j=`j'+1
    }
   local i=`i'+1
   }
  }

 

Super User
Posts: 9,922

Re: How to retain a value if we have a big messy data

How about adding a cleaning step that aligns the TS timestamps:

data temp;
input
  ID
  Action $
  TS_SER
  TS
  Verify
  Notecount
;
informat
  TS_SER
  TS
    datetime14.
;
format
  TS_SER
  TS
    datetime14.
;
datalines;
252670 Sign 13DEC17:11:05 13DEC17:11:14 . .
252670 Perform 13DEC17:11:05 13DEC17:11:14 . .
252670 Verify 13DEC17:11:05 13DEC17:11:14  1 1
252670 Sign 13DEC17:15:27 13DEC17:15:41 . .
252670 Perform 13DEC17:15:27 13DEC17:15:41 . .
252670 Verify 13DEC17:15:27 13DEC17:15:41 1 2
252670 Sign 13DEC17:16:28 13DEC17:16:30 . .
252670 Perform 13DEC17:16:28 13DEC17:16:30 . .
252670 Verify 13DEC17:16:28 13DEC17:16:30 1 3
252670 Sign 13DEC17:16:29 13DEC17:16:30 . .
252670 Modify 13DEC17:16:29 13DEC17:16:30 . .
252670 Perform 14DEC17:07:32 14DEC17:07:34 . .
252670 Verify 14DEC17:07:32 14DEC17:07:34 1 4
252670 Sign 14DEC17:07:32 14DEC17:07:34 . .
252670 Perform 14DEC17:09:15 14DEC17:09:15 . .
252670 Verify 14DEC17:09:15 14DEC17:09:15 1 5
252670 Sign 14DEC17:09:15 14DEC17:09:15 . .
252670 Perform 14DEC17:14:18 14DEC17:16:29 . .
252670 Verify 14DEC17:14:18 14DEC17:16:29 1 6
252670 Sign 14DEC17:14:18 14DEC17:14:18 . .
252670 Sign 14DEC17:14:18 14DEC17:16:29 . .
252670 Sign 14DEC17:16:13 14DEC17:16:17 . .
252670 Perform 14DEC17:16:13 14DEC17:16:17 . .
252670 Verify 14DEC17:16:13 14DEC17:16:17 1 7
252670 Modify 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:16:29 14DEC17:16:29 . .
252670 Sign 14DEC17:16:29 14DEC17:16:29 . .
252670 Perform 14DEC17:19:12 14DEC17:19:15 . .
252670 Verify 14DEC17:19:12 14DEC17:19:15 1 8
252670 Sign 14DEC17:19:12 14DEC17:19:15 . .
252670 Perform 15DEC17:15:27 15DEC17:15:27 . .
252670 Verify 15DEC17:15:27 15DEC17:15:27 1 9
252670 Sign 15DEC17:15:27 15DEC17:15:27 . .
252670 Sign 16DEC17:10:38 16DEC17:10:38 . .
252670 Perform 16DEC17:10:38 16DEC17:10:38 . .
252670 Verify 16DEC17:10:38 16DEC17:10:38 1 10
318672 Sign 13OCT17:20:32 14OCT17:10:11 . .
318672 Modify 13OCT17:20:36 14OCT17:10:11  . .
318672 Modify 13OCT17:20:37 14OCT17:10:11 . .
318672 Modify 13OCT17:20:40 14OCT17:10:11 . .
318672 Modify 13OCT17:20:44 14OCT17:10:11 . .
318672 Modify 13OCT17:20:46 14OCT17:10:11 . .
318672 Modify 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:03 14OCT17:10:11 . .
318672 Sign 13OCT17:23:04 13OCT17:23:04 . .
318672 Verify 14OCT17:10:11 14OCT17:10:11 1 1
318672 Sign 14OCT17:10:11 14OCT17:10:11 . .
318672 Sign 14OCT17:10:26 14OCT17:10:52 . .
318672 Perform 14OCT17:10:26 14OCT17:14:09 . .
318672 Modify 14OCT17:10:52 14OCT17:14:09 . .
318672 Sign 14OCT17:10:54 14OCT17:10:54 . .
318672 Sign 14OCT17:10:54 14OCT17:14:09 . .
318672 Modify 14OCT17:10:54 14OCT17:14:09 . .
318672 Verify 14OCT17:10:54 14OCT17:14:09 1 2
;
run;

proc sort data=temp;
by id ts;
run;

data have; /* "clean" timestamps */
set temp;
by id ts;
retain current_time;
if first.id then current_time = ts;
if ts - current_time < 120 /* 2 minutes */
then ts = current_time;
else current_time = ts;
drop current_time;
run;

proc sql;
create table want as
select
  ID, Action, TS_SER, TS, Verify,
  max(notecount) as notecount
from have
group by id, ts
order by id, ts_ser
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 41

Re: How to retain a value if we have a big messy data

Posted in reply to KurtBremser

Thank you very much for your suggestion. I am still not getting desired output after cleaning step. I am working on better understanding the data. Sometime I feel like I know what I should I do and after a second I am confuse. I am working on other project for now because it was not urgent. I will update you once I start working on it again.

Thanks again, you are very helpful.

Ask a Question
Discussion stats
  • 8 replies
  • 141 views
  • 1 like
  • 3 in conversation