TIME | X | Y | class | ID |
16:00:06 | 103.785 | 71.45633 | 1 | A |
16:05:06 | 109.095 | 92.82449 | 2 | A2 |
16:08:06 | 105.981 | 67.84456 | 1 | A6 |
16:11:05 | 103.461 | 71.42452 | 3 | F4 |
16:15:05 | 103.7674 | 71.47382 | 1 | C55 |
16:17:26 | 107.4762 | 73.3445 | 3 | E4 |
16:18:06 | 105.2773 | 87.58118 | 2 | O8 |
16:19:21 | 103.7858 | 71.44293 | 2 | H77 |
16:33:18 | 103.7201 | 71.37974 | 3 | XX |
16:40:11 | 107.3432 | 88.27521 | 3 | X8 |
17:43:06 | 100.8496 | 67.38097 | 2 | D |
17:45:06 | 110.1006 | 66.26799 | 2 | A |
17:50:12 | 105.8542 | 87.77079 | 2 | D4 |
17:55:09 | 96.14285 | 61.99789 | 2 | F4 |
18:02:08 | 103.7829 | 71.48327 | 2 | RR3 |
18:09:33 | 103.9275 | 71.49321 | 2 | U7 |
18:25:12 | 104.7226 | 90.43502 | 3 | S |
18:30:05 | 109.6942 | 66.22699 | 3 | S33 |
18:33:04 | 109.7128 | 97.24288 | 3 | T17 |
18:40:44 | 56.12425 | 71.13522 | 3 | G22 |
18:44:02 | 93.29675 | 63.89221 | 3 | II1 |
18:50:13 | 109.7023 | 71.47756 | 1 | S9 |
1- Above is a sample of a data frame which has thousands of records.
2- How can group it by the time (20 min) and class number. Then calculate the Euclidean distance between x, y using :
dist = sqrt( (x2 - x1)**2 + (y2 - y1)**2 )
3- An iteration of every 20 min for all data, no value duplicates in the single iteration. I have tried to implement this sample. but still not working.
The expected results for the 20min in two iteration:
ID CLASS distnace
A,A6 1 4.22695
A,C55 1 0.024806
A6, C55 1 4.251038
A2,O8 2 6.485861
A2,H77 2 22.030843
O8,H77 2 16.207033
F4, E4 3 4.4506
A,A6 1 17.279585
data have;
infile cards expandtabs;
input TIME $ X Y class ID $;
cards;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
run;
proc sql;
create table want as
select catx(',',a.id,b.id) as id,Euclid(a.x-b.x,a.y-b.y) as distance
from have as a,have as b
where a.class=b.class and a.id ne b.id ;
quit;
Since SAS time (and datetime) values are counts of seconds, you can normalize them to 20-minute frames by doing
tim20 = int(time / 1200) * 1200;
format tim20 time8.;
and then use that in your by statement(s).
You can create groups based on intervals like this
data have;
input TIME:time8. X Y class ID$;
format time time8.;
datalines;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
data grouped;
set have;
interval= intck('minute20', '16:00:00't, time)+1;
run;
Regarding your desired output.. I don't understand why the Euclidian distance between A and A6 for Class 1 appears twice?
data have;
infile cards expandtabs;
input TIME $ X Y class ID $;
cards;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
run;
proc sql;
create table want as
select catx(',',a.id,b.id) as id,Euclid(a.x-b.x,a.y-b.y) as distance
from have as a,have as b
where a.class=b.class and a.id ne b.id ;
quit;
OK. No problem.
data have;
infile cards expandtabs;
input TIME : time8. X Y class ID $;
retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
cards;
16:00:06 103.785 71.45633 1 A
16:05:06 109.095 92.82449 2 A2
16:08:06 105.981 67.84456 1 A6
16:11:05 103.461 71.42452 3 F4
16:15:05 103.7674 71.47382 1 C55
16:17:26 107.4762 73.3445 3 E4
16:18:06 105.2773 87.58118 2 O8
16:19:21 103.7858 71.44293 2 H77
16:33:18 103.7201 71.37974 3 XX
16:40:11 107.3432 88.27521 3 X8
17:43:06 100.8496 67.38097 2 D
17:45:06 110.1006 66.26799 2 A
17:50:12 105.8542 87.77079 2 D4
17:55:09 96.14285 61.99789 2 F4
18:02:08 103.7829 71.48327 2 RR3
18:09:33 103.9275 71.49321 2 U7
18:25:12 104.7226 90.43502 3 S
18:30:05 109.6942 66.22699 3 S33
18:33:04 109.7128 97.24288 3 T17
18:40:44 56.12425 71.13522 3 G22
18:44:02 93.29675 63.89221 3 II1
18:50:13 109.7023 71.47756 1 S9
;
run;
proc sql;
create table want as
select catx(',',a.id,b.id) as id,a.class as class,Euclid(a.x-b.x,a.y-b.y) as distance
from have as a,have as b
where a.class=b.class and a.group=b.group and a.id ne b.id ;
quit;
proc import datafile='c:\temp\x.csv' out=have dbms=csv replace;
guessingrows=32767;
run;
The code above could import csv file . and check if TIME variable is numeric variable and have TIME. format .
Or use data step.
data have;
infile 'c:\temp\x.csv' truncover dsd;
input TIME : time8. X Y class ID : $40.;
OK. just use SET statement to replace INFILE . And remove INPUT.
Make sure time is numeric variable.
data have;
set have;
retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
run;
Note: fixed
That is weird. Your table name is PEOPLE ? and it is in WORK libarary ?
data have;
set people ;
retain _time ;
format time _time time8.;
if _n_=1 then _time=time;
if time > _time+20*60 then do;group+1;_time=_time+20*60;end;
run;
proc sql;
.............
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.