BookmarkSubscribeRSS Feed
Mikkel_madsen
Obsidian | Level 7

Hi everyone,

 

I have a rather complicated task which I would love your input on. It consists of several tasks/ issues. Overall, the data look similar to this (beware: these data are fake and is only an example):

 

ID Event_time Event_east Event_north Poi_east Poi_north Notify_time Pos_east Pos_north Pos_time1 Pos_east1 Pos_north1 Pos_time2 Pos_east2 Pos_north2 Pos_time3 Pos_east3 Pos_north3
1 11Feb2021:18:08:02 12.134963 55.461058 12.451788 55.421875 2021-02-11T18:09:55.000+01:00 12.124598 55.457985 2021-02-11T18:10:55.000+01:00 12.561334 55.876498 2021-02-11T18:11:55.000+01:00 12.457935 55.215469 2021-02-11T18:10:57.000+01:00 12.457896 55.897569
2 12Feb2021:16:12:00 12.457965 55.589512 12.021369 55.021369 2021-02-12T16:12:55.000+01:00 12.146026 55.012459 2021-02-12T16:12:56.000+01:00 12.012459 55.124598 2021-02-12T16:12:57.000+01:00 12.215498 55.124023 2021-02-12T16:12:58.000+01:00 12.459065 55.213026

 

I have several GPS- and timestamps (up to e.g. Pos_time60 and Pos_east60 etc). The different positions (pos_eastand pos_northn may represent a route of a biker). 

 

I want to:

 

1) Calculate the true-line distance from pos_east and pos_north (e.g. a biker's starting position) to event_east and event_north (the event). As I have several GPS stamps from 'the biker' (up to >60) I believe it could be done by straight-line calculation from pos_east to pos_east1 to pos_east2  to pos_eastn.... to event_east (the event). However, some of the last GPS-stamps from 'the biker' may be relatively close to the event GPS position. Therefore, the first position of 'the biker' within 25 meters from event_east and event_north should be used as the final position (if it makes any sense). 

 

I do not know how to incorporate all this in the geodist function. 

 

Thanks!

 

All best

/MM

10 REPLIES 10
ballardw
Super User

Please provide an example of what you want to calculate. With 60 points it is not clear exactly what you are calculating. Do you want the distance between successive pairs of points, i.e. 60 additional values? One value of the cumulative distance as you go to point to point?

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

If you have data that is too sensitive to share either exclude the variable or change values so that they aren't sensitive.

 

I suspect that you be looking at a number of arrays such as this:

data example;
   set have;
   array pe (*) Pos_east: ;
   array pn (*) Pos_north: ;
   array d  (60);
   do i= 1 to ( dim(pe) -1);
      d[i] = geodist(pn[i],pe[i],pn[i+1],pe[i+1]);
   end;
   cumd = sum(of d(*));
run;

which calculates the distance between successive points, if I understand your data, and places each distance in the array D. The totals all those values.

You can get the index position in the arrays of "last" assigned values by using the N function: lastindex = n(of pe(*));

and use that index with the array to the pair of values for comparison with that "first position", not really well defined.

 

BTW including "the biker" in your description does not provide any information as there is no variable in the data with that designation. If you mean by ID, then use that since it is in the data.

Reeza
Super User
data example;
   set have;
   array pe (*) Pos_east: ;
   array pn (*) Pos_north: ;
    array d  (60); 


   do i= 1 to ( dim(pe) -1);
      d[i] = geodist(pn[i],pe[i],pn[i+1],pe[i+1]);

      *distance to event;
        d2event = geodist(pn[i], pe[i], event_north, event_east);
       *exit out of array loop once closes distance to event is found; 
       if d2event < 25 then leave;
   end;
   cumd = sum(of d(*));
run;
Reeza
Super User
Will you always have the same number of locations/time stamps for each user? I suspect not....

I would recommend flipping the data into a long format, with location and event location included for each row.
Assuming BY group processing:
For each record calculate two distances, one from prior location (use LAG()) and one to the event destination.
Once you are within x distance of the event, you can stop the calculations.
Then for total distance, sum the distance amounts.
If you really want the wide data format, remerge this with the original data set now or refactor into a DoW loop.

Or use the array solution from @ballardw, add another array for distance to event and once you hit that 25 you know where to stop the summation.

Mikkel_madsen
Obsidian | Level 7

Hi @Reeza and @ballardw

 

Thanks for your time. I tried my best to explain it, I am sorry if it was unclear. 

It was just an example with the baker, to make it easier to explain and understand. The data is highly sensitive. I tried to make a data-step as requested, however, it would be several hundred values I had to re-arrange. Further, to test it in the right way you would need reel GPS-stamps on a given road. 

 

Every observation can have several (more than 2 and up to around 60) GPS-stamps from pos_east and pos_north to event_east and event_north. I labelled the travelling positions wrong, they are called "pos_east_1, pos_east_2 and so on). If I understand it correctly, the method @Reeza describes in code does calculate the distance from pos_east to pos_east_1 and from pos_east_1 to pos_east_2 and finally sum the different distances. The first position within 25 meters from the event_east and event_north should be used as the last (I believe it is also done in the code). Unfortunately, the code does not seem to work. Maybe it is because of the wrong labelling? 

 

Further, most observations goes by poi_east and poi_north on its way to event_east and event_north. Can this be taken in to account? So those observations without and poi-coordinates are done as first described, but those with poi-coordinates will first have the distance from pos_east and pos_north to poi-coordinates calculated and then again from poi-coordinates to event-coordinates?

 

I know I must sound like a giant newbie which is also what I am. I am very grateful for your spare full time and a huge help! Sorry for my incompetence. 

Reeza
Super User
Please make some fake data that represents this situation in better detail. Doesn't have to be real, doesn't have to have 60 points but it needs to be representative.
Two or three IDs with 5 to 10 data points is sufficient. Try to make some of the issues you've identified in your situation above included.
ballardw
Super User

Doesn't work is awful vague.

Are there errors in the log?: Post the code and log in a code box opened with the "<>" to maintain formatting of error messages.

No output? Post any log in a code box.

Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.

 

Mikkel_madsen
Obsidian | Level 7

Hi @ballardw and @Reeza 

 

Thanks for both of your answers. I have tried to make a datastep as requested:

 

data WORK.FAKE;
  infile datalines dsd truncover;

input IncidentId:BEST12. IncidentTime:DATETIME18. AssignmentTime:DATETIME18. ConfirmedTime:DATETIME18. AcceptedTime:DATETIME18. incident_easting:$9. incident_northing:$9. _alias:$6. selected_time:$21. pos_easting:$9. pos_northing:$9. poi_id:$10. poi_eastin
g:$9. poi_northing:$9. notified_time:$29. closed_time:$29. pos_time_1:$29. pos_easting_1:$9. pos_northing_1:$9. pos_time_2:$29. pos_easting_2:$9. pos_northing_2:$9. pos_time_3:$29. pos_easting_3:$9. pos_northing_3:$9. pos_time_4:$29. pos_easting_4:$9. pos_
northing_4:$9. pos_time_5:$29. pos_easting_5:$9. pos_northing_5:$9. pos_time_6:$29. pos_easting_6:$9. pos_northing_6:$9. pos_time_7:$29. pos_easting_7:$10. pos_northing_7:$9.;
  format IncidentId BEST12. IncidentTime DATETIME18. AssignmentTime DATETIME18. ConfirmedTime DATETIME18. AcceptedTime DATETIME18.;
datalines;
1064378 12FEB18:15:43:22 12FEB18:15:59:53 12FEB18:15:59:40 12FEB18:15:59:53 12.280186 55.626429 451245 2018-02-12 15:46:56.0 12.266320 55.610988       2018-02-12T15:46:56.000+01:00   2018-02-12T12:34:20.000+01:00 12.267114 55.611986
2018-02-12T12:34:20.000+01:00 12.269278 55.614381 2018-02-12T15:59:54.000+01:00 12.270823 55616077 2018-02-12T15:59:54.000+01:00 12.272192 55.617449 2018-02-12T15:59:54.000+01:00 12.274400 55.620118 2018-02-12T15:59:54.000+01:00 12.276432 55.622264
2018-02-12T16:01:26.000+01:00 12.278331 55.624359
1064378 12FEB18:15:43:22 12FEB18:15:47:19 12FEB18:15:47:17 12FEB18:15:47:20 12.280186 55.626429 745985 2018-02-12 15:46:56.0 12.318220 55.623781       2018-02-12T15:46:56.000+01:00 2018-02-12T15:58:16.000+01:00 2018-02-12T09:42:44.000+01:00 12.311948
55.625928 2018-02-12T15:47:16.000+01:00 12.304616 55.628223 2018-02-12T15:47:16.000+01:00 12.296400 55.630943 2018-02-12T15:47:21.000+01:00 12.285620 55.632489 2018-02-12T15:47:52.000+01:00 12.278861 55.624907 2018-02-12T15:48:19.000+01:00 12.278508
55.624558      
1064378 12FEB18:15:43:22 12FEB18:15:47:20 12FEB18:15:47:14 12FEB18:15:47:20 12.280186 55.626429 785489 2018-02-12 15:46:56.0 12.265386 55.633611 1000009418 12.262428 55.627251 2018-02-12T15:46:56.000+01:00 2018-02-12T15:55:16.000+01:00
2018-02-12T09:46:25.000+01:00 12.265431 55.630718 2018-02-12T15:47:13.000+01:00 12.261898 55.627226 2018-02-12T15:47:13.000+01:00 12.257615 55.622935 2018-02-12T15:47:21.000+01:00 12.267510 55.622737 2018-02-12T15:47:58.000+01:00 12.278199 55.624434
2018-02-12T15:48:15.000+01:00 12.278773 55.624833      
1064378 12FEB18:15:43:22 12FEB18:15:48:30 12FEB18:15:48:17 12FEB18:15:48:30 12.280186 55.626429 547845 2018-02-12 15:46:56.0 12.314328 55.616049       2018-02-12T15:46:57.000+01:00 2018-02-12T16:04:31.000+01:00 2018-02-12T15:30:30.000+01:00 12.320076
55.622383 2018-02-12T15:47:14.000+01:00 12.308370 55.623458 2018-02-12T15:48:17.000+01:00 12.290789 55.628274 2018-02-12T15:48:17.000+01:00 12.278729 55.624733                  
6                                                                                                                        The SAS System                                                                                         12:54 Friday, September 17, 2021

6326548 13FEB18:08:14:26 13FEB18:08:26:36 13FEB18:08:26:22 13FEB18:08:26:36 12.298998 55.929033 741014 2018-02-13 08:15:51.0 12.321744 55.927888       2018-02-13T08:15:53.000+01:00 2018-02-13T08:27:52.000+01:00 2018-02-12T18:37:57.000+01:00 12.325474
55.925710 2018-02-12T18:37:57.000+01:00 12.326471 55.920177 2018-02-13T08:26:37.000+01:00 12.320957 55.916881 2018-02-13T08:26:38.000+01:00 12.318909 55.919059 2018-02-13T08:27:09.000+01:00 12.312553 55.923355 2018-02-13T08:45:13.000+01:00 12.306773
55.927387 2018-02-13T08:45:13.000+01:00 12.299365 55.928975
;;;;

 I hope it is correct? I have replaced all values with fake values. To ensure, that the calculations is correct, I have replaced all GPS stamps with real GPS stamps on real roads (got the GPS stamps from Google Maps). As mentioned, there can be up to around 63 GPS observations. However, in this fake dataset, I did only make up a maximum of 7 positions for each observation. 

 

I tried the code:

 

data mydata;
   set mydata;
   array pe (*) Pos_easting: ;
   array pn (*) Pos_northing: ;
    array d  (63); 


   do i= 1 to ( dim(pe) -1);
      d[i] = geodist(pn[i],pe[i],pn[i+1],pe[i+1]);

      *distance to event;
        d2event = geodist(pn[i], pe[i], incident_northing, incident_easting);
       *exit out of array loop once closes distance to event is found; 
       if d2event < 25 then leave;
   end;
   cumd = sum(of d(*));
run;

It runs without any errors. But my output data does not seem correct. I get several new variables (d1-d60) but most with missing values (value = .) except d1. However, d1 is often 0 which cumd also is (because of all the missing or the fact that d1=0, I guess). 

 

I hope that it is more clear with the above information. Thanks once again! 

 

/MM

Reeza
Super User

When I run it, I get a few warnings that may give you a clue to what's wrong:

 

 
 NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
       103:22   103:28   103:34   103:42   106:27   106:34   106:41   106:60   
 NOTE: Missing values were generated as a result of performing an operation on missing values.
       Each place is given by: (Number of times) at (Line):(Column).
       16 at 103:14   16 at 106:19   16 at 110:11   

Big hint - you cannot do math on character columns.


EDIT your input code doesn't work at all for some reason. Did you manually modify the generated file?

ballardw
Super User

Northing and easting values provided to GEODIST must be numeric.

 

To make any use of the "time" values you will want those as actual SAS times, read with the E8601dt. format for those values with T in the middle.

 

You really need to describe 1) why you think the values calculated are wrong, give the actual easting and northing data pair you think are wrong and what you think would be correct.

 

For purposes of working with the GEODIST, you could drop all of the non easting and northing variables to simplify things.

PGStats
Opal | Level 21

First, you need to fix your data and read it properly. For date and time values, your data (the first observation, for example) seems to contain fields that should be read with the following informats :

 

DATETIME16. for "12FEB18:15:43:22"

YYMMDD10. for "2018-02-12"

TIME10.1 for "15:46:56.0"

E8601DZ29. for "2018-02-12T15:46:56.000+01:00"

 

Furthermore, all easting and northing values should be read as simple numbers, i.e. without informats

 

(untested)

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 923 views
  • 0 likes
  • 4 in conversation