Solved: Re: how do i group interval by time and do calculation based on ID and...

ihlayyel · Posted 11-09-2017 07:04 AM

TIME	X	Y	class	ID
16:00:06	103.785	71.45633	1	A
16:05:06	109.095	92.82449	2	A2
16:08:06	105.981	67.84456	1	A6
16:11:05	103.461	71.42452	3	F4
16:15:05	103.7674	71.47382	1	C55
16:17:26	107.4762	73.3445	3	E4
16:18:06	105.2773	87.58118	2	O8
16:19:21	103.7858	71.44293	2	H77
16:33:18	103.7201	71.37974	3	XX
16:40:11	107.3432	88.27521	3	X8
17:43:06	100.8496	67.38097	2	D
17:45:06	110.1006	66.26799	2	A
17:50:12	105.8542	87.77079	2	D4
17:55:09	96.14285	61.99789	2	F4
18:02:08	103.7829	71.48327	2	RR3
18:09:33	103.9275	71.49321	2	U7
18:25:12	104.7226	90.43502	3	S
18:30:05	109.6942	66.22699	3	S33
18:33:04	109.7128	97.24288	3	T17
18:40:44	56.12425	71.13522	3	G22
18:44:02	93.29675	63.89221	3	II1
18:50:13	109.7023	71.47756	1	S9

1- Above is a sample of a data frame which has thousands of records.

2- How can group it by the time (20 min) and class number. Then calculate the Euclidean distance between x, y using :

dist = sqrt( (x2 - x1)**2 + (y2 - y1)**2 )

3- An iteration of every 20 min for all data, no value duplicates in the single iteration. I have tried to implement this sample. but still not working.

The expected results for the 20min in two iteration: 

ID  CLASS  distnace 
A,A6    1   4.22695
A,C55   1   0.024806
A6, C55 1   4.251038
A2,O8   2   6.485861
A2,H77  2   22.030843
O8,H77  2   16.207033
F4, E4  3   4.4506
A,A6    1   17.279585

Ksharp · Posted 11-09-2017 07:46 AM

data have;
infile cards expandtabs;
input TIME $ 	X	Y	class	ID $;
cards;
16:00:06	103.785	71.45633	1	A
16:05:06	109.095	92.82449	2	A2
16:08:06	105.981	67.84456	1	A6
16:11:05	103.461	71.42452	3	F4
16:15:05	103.7674	71.47382	1	C55
16:17:26	107.4762	73.3445	3	E4
16:18:06	105.2773	87.58118	2	O8
16:19:21	103.7858	71.44293	2	H77
16:33:18	103.7201	71.37974	3	XX
16:40:11	107.3432	88.27521	3	X8
17:43:06	100.8496	67.38097	2	D
17:45:06	110.1006	66.26799	2	A
17:50:12	105.8542	87.77079	2	D4
17:55:09	96.14285	61.99789	2	F4
18:02:08	103.7829	71.48327	2	RR3
18:09:33	103.9275	71.49321	2	U7
18:25:12	104.7226	90.43502	3	S
18:30:05	109.6942	66.22699	3	S33
18:33:04	109.7128	97.24288	3	T17
18:40:44	56.12425	71.13522	3	G22
18:44:02	93.29675	63.89221	3	II1
18:50:13	109.7023	71.47756	1	S9
;
run;
proc sql;
create table want as
 select  catx(',',a.id,b.id) as id,Euclid(a.x-b.x,a.y-b.y) as distance
  from have as a,have as b
   where a.class=b.class and a.id ne b.id ;
quit;

View solution in original post

Kurt_Bremser · Posted 11-09-2017 07:25 AM

Since SAS time (and datetime) values are counts of seconds, you can normalize them to 20-minute frames by doing

tim20 = int(time / 1200) * 1200;
format tim20 time8.;

and then use that in your by statement(s).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 11-09-2017 07:36 AM

You can create groups based on intervals like this

data have;
input TIME:time8. X Y class ID$;
format time time8.;
datalines;
16:00:06 103.785 71.45633 1 A 
16:05:06 109.095 92.82449 2 A2 
16:08:06 105.981 67.84456 1 A6 
16:11:05 103.461 71.42452 3 F4 
16:15:05 103.7674 71.47382 1 C55 
16:17:26 107.4762 73.3445 3 E4 
16:18:06 105.2773 87.58118 2 O8 
16:19:21 103.7858 71.44293 2 H77 
16:33:18 103.7201 71.37974 3 XX 
16:40:11 107.3432 88.27521 3 X8 
17:43:06 100.8496 67.38097 2 D 
17:45:06 110.1006 66.26799 2 A 
17:50:12 105.8542 87.77079 2 D4 
17:55:09 96.14285 61.99789 2 F4 
18:02:08 103.7829 71.48327 2 RR3 
18:09:33 103.9275 71.49321 2 U7 
18:25:12 104.7226 90.43502 3 S 
18:30:05 109.6942 66.22699 3 S33 
18:33:04 109.7128 97.24288 3 T17 
18:40:44 56.12425 71.13522 3 G22 
18:44:02 93.29675 63.89221 3 II1 
18:50:13 109.7023 71.47756 1 S9 
;

data grouped;
	set have;
	interval= intck('minute20', '16:00:00't, time)+1;
run;

Regarding your desired output.. I don't understand why the Euclidian distance between A and A6 for Class 1 appears twice?

The DATA to DATA Step Macro
Blog: SASnrd

ihlayyel · Posted 11-09-2017 07:42 AM

the output is for 2 iteration.kinldy look at the time itration 1 = 16:00:06 to 16:19:21 iteration2 = 16:33:18 to 16:40:11

Ksharp · Posted 11-09-2017 07:46 AM

data have;
infile cards expandtabs;
input TIME $ 	X	Y	class	ID $;
cards;
16:00:06	103.785	71.45633	1	A
16:05:06	109.095	92.82449	2	A2
16:08:06	105.981	67.84456	1	A6
16:11:05	103.461	71.42452	3	F4
16:15:05	103.7674	71.47382	1	C55
16:17:26	107.4762	73.3445	3	E4
16:18:06	105.2773	87.58118	2	O8
16:19:21	103.7858	71.44293	2	H77
16:33:18	103.7201	71.37974	3	XX
16:40:11	107.3432	88.27521	3	X8
17:43:06	100.8496	67.38097	2	D
17:45:06	110.1006	66.26799	2	A
17:50:12	105.8542	87.77079	2	D4
17:55:09	96.14285	61.99789	2	F4
18:02:08	103.7829	71.48327	2	RR3
18:09:33	103.9275	71.49321	2	U7
18:25:12	104.7226	90.43502	3	S
18:30:05	109.6942	66.22699	3	S33
18:33:04	109.7128	97.24288	3	T17
18:40:44	56.12425	71.13522	3	G22
18:44:02	93.29675	63.89221	3	II1
18:50:13	109.7023	71.47756	1	S9
;
run;
proc sql;
create table want as
 select  catx(',',a.id,b.id) as id,Euclid(a.x-b.x,a.y-b.y) as distance
  from have as a,have as b
   where a.class=b.class and a.id ne b.id ;
quit;

ihlayyel · Posted 11-09-2017 07:54 AM

That is nice, thank you so much. but have you consider the time interval?

Ksharp · Posted 11-09-2017 08:14 AM

OK. No problem.

data have;
infile cards expandtabs;
input TIME : time8.	X	Y	class	ID $;
retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
cards;
16:00:06	103.785	71.45633	1	A
16:05:06	109.095	92.82449	2	A2
16:08:06	105.981	67.84456	1	A6
16:11:05	103.461	71.42452	3	F4
16:15:05	103.7674	71.47382	1	C55
16:17:26	107.4762	73.3445	3	E4
16:18:06	105.2773	87.58118	2	O8
16:19:21	103.7858	71.44293	2	H77
16:33:18	103.7201	71.37974	3	XX
16:40:11	107.3432	88.27521	3	X8
17:43:06	100.8496	67.38097	2	D
17:45:06	110.1006	66.26799	2	A
17:50:12	105.8542	87.77079	2	D4
17:55:09	96.14285	61.99789	2	F4
18:02:08	103.7829	71.48327	2	RR3
18:09:33	103.9275	71.49321	2	U7
18:25:12	104.7226	90.43502	3	S
18:30:05	109.6942	66.22699	3	S33
18:33:04	109.7128	97.24288	3	T17
18:40:44	56.12425	71.13522	3	G22
18:44:02	93.29675	63.89221	3	II1
18:50:13	109.7023	71.47756	1	S9
;
run;
proc sql;
create table want as
 select  catx(',',a.id,b.id) as id,a.class as class,Euclid(a.x-b.x,a.y-b.y) as distance
  from have as a,have as b
   where a.class=b.class and a.group=b.group and a.id ne b.id ;
quit;

ihlayyel · Posted 11-09-2017 08:28 AM

That is awesome. but iam facing a small issue here. iam reading data from csv file. so the code will be like that ?

data work.people;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;
end;
run;

proc sql;
create table want as
select catx(',',a.id,b.id) as id,a.class as class,Euclid(a.x-b.x,a.y-b.y) as distance
from have as a,have as b
where a.class=b.class and a.group=b.group and a.id ne b.id ;
quit;

Ksharp · Posted 11-09-2017 08:43 AM

proc import datafile='c:\temp\x.csv' out=have dbms=csv replace;
guessingrows=32767;
run;

The code above could import csv file . and check if TIME variable is numeric variable and have TIME. format .

Ksharp · Posted 11-09-2017 08:45 AM

Or use data step.

data have;
infile 'c:\temp\x.csv' truncover dsd;
input TIME : time8.	X	Y	class	ID : $40.;

ihlayyel · Posted 11-09-2017 08:55 AM

I think you didn't get my point. I have import the data into sas. so when i call the data using this. it returns empty cell.

data work.people;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;
end;
run;

Ksharp · Posted 11-09-2017 09:00 AM

OK. just use SET statement to replace INFILE . And remove INPUT.

Make sure time is numeric variable.

data have;
set have;

retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
run;

Note: fixed

ihlayyel · Posted 11-09-2017 09:12 AM

i have implemented that: I got this error
The time format is same as the sample I provided. Do i need to change it to other format?
ERROR: No DATALINES or INFILE statement.

Ksharp · Posted 11-09-2017 09:17 AM

That is weird. Your table name is PEOPLE ? and it is in WORK libarary ?

data have;
set people  ;

retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
run;

proc sql;
.............

ihlayyel · Posted 11-09-2017 10:00 AM

Thank you so much. the code works perfectly.
when i run the code

this what I get.

78 where a.floor=b.floor and a.uid ne b.uid ;
ERROR: Insufficient space in file WORK.WANT.DATA.
ERROR: File WORK.WANT.DATA is damaged. I/O processing did not complete.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
79 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):

the total number of my data is 1048576 . Do you have any suggestion for that?
I really appreciate your help .
Thank you

how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

Re: how do i group interval by time and do calculation based on ID and class

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away