TIME | X | Y | class | ID |
16:00:06 | 103.785 | 71.45633 | 1 | A |
16:05:06 | 109.095 | 92.82449 | 2 | A2 |
16:08:06 | 105.981 | 67.84456 | 1 | A6 |
16:11:05 | 103.461 | 71.42452 | 3 | F4 |
16:15:05 | 103.7674 | 71.47382 | 1 | C55 |
16:17:26 | 107.4762 | 73.3445 | 3 | E4 |
16:18:06 | 105.2773 | 87.58118 | 2 | O8 |
16:19:21 | 103.7858 | 71.44293 | 2 | H77 |
16:33:18 | 103.7201 | 71.37974 | 3 | XX |
16:40:11 | 107.3432 | 88.27521 | 3 | X8 |
17:43:06 | 100.8496 | 67.38097 | 2 | D |
17:45:06 | 110.1006 | 66.26799 | 2 | A |
17:50:12 | 105.8542 | 87.77079 | 2 | D4 |
17:55:09 | 96.14285 | 61.99789 | 2 | F4 |
18:02:08 | 103.7829 | 71.48327 | 2 | RR3 |
18:09:33 | 103.9275 | 71.49321 | 2 | U7 |
18:25:12 | 104.7226 | 90.43502 | 3 | S |
18:30:05 | 109.6942 | 66.22699 | 3 | S33 |
18:33:04 | 109.7128 | 97.24288 | 3 | T17 |
18:40:44 | 56.12425 | 71.13522 | 3 | G22 |
18:44:02 | 93.29675 | 63.89221 | 3 | II1 |
18:50:13 | 109.7023 | 71.47756 | 1 | S9 |
1- Above is a sample of a data frame which has thousands of records.
2- How can group it by the time (20 min) and class number. Then calculate the Euclidean distance between x, y using :
dist = sqrt( (x2 - x1)**2 + (y2 - y1)**2 )
3- An iteration of every 20 min for all data, no value duplicates in the single iteration. I have tried to implement this sample. but still not working.
The expected results for the 20min in two iteration:
ID CLASS distnace
A,A6 1 4.22695
A,C55 1 0.024806
A6, C55 1 4.251038
A2,O8 2 6.485861
A2,H77 2 22.030843
O8,H77 2 16.207033
F4, E4 3 4.4506
A,A6 1 17.279585
data have;
infile cards expandtabs;
input TIME $ X Y class ID $;
cards;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
run;
proc sql;
create table want as
select catx(',',a.id,b.id) as id,Euclid(a.x-b.x,a.y-b.y) as distance
from have as a,have as b
where a.class=b.class and a.id ne b.id ;
quit;
Since SAS time (and datetime) values are counts of seconds, you can normalize them to 20-minute frames by doing
tim20 = int(time / 1200) * 1200;
format tim20 time8.;
and then use that in your by statement(s).
You can create groups based on intervals like this
data have;
input TIME:time8. X Y class ID$;
format time time8.;
datalines;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
data grouped;
set have;
interval= intck('minute20', '16:00:00't, time)+1;
run;
Regarding your desired output.. I don't understand why the Euclidian distance between A and A6 for Class 1 appears twice?
data have;
infile cards expandtabs;
input TIME $ X Y class ID $;
cards;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
run;
proc sql;
create table want as
select catx(',',a.id,b.id) as id,Euclid(a.x-b.x,a.y-b.y) as distance
from have as a,have as b
where a.class=b.class and a.id ne b.id ;
quit;
OK. No problem.
data have;
infile cards expandtabs;
input TIME : time8. X Y class ID $;
retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
cards;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
run;
proc sql;
create table want as
select catx(',',a.id,b.id) as id,a.class as class,Euclid(a.x-b.x,a.y-b.y) as distance
from have as a,have as b
where a.class=b.class and a.group=b.group and a.id ne b.id ;
quit;
proc import datafile='c:\temp\x.csv' out=have dbms=csv replace;
guessingrows=32767;
run;
The code above could import csv file . and check if TIME variable is numeric variable and have TIME. format .
Or use data step.
data have;
infile 'c:\temp\x.csv' truncover dsd;
input TIME : time8. X Y class ID : $40.;
OK. just use SET statement to replace INFILE . And remove INPUT.
Make sure time is numeric variable.
data have;
set have;
retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
run;
Note: fixed
That is weird. Your table name is PEOPLE ? and it is in WORK libarary ?
data have;
set people ;
retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
run;
proc sql;
.............
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.