BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
eh51
Calcite | Level 5

I have a dataset of operating data as measured on three variables (e.g. speed, temperature, weight). I would like to identify which variables lie outside a boundary. The boundary are values for the three variables. The boundary is made up of about 200 observations in another table, and can be made to be contiguous, though operationally, only the max values are of consideration. 

 

The only analogy I can find to this is gps related such as 'proc ginside'. My need is to compare numeric variables, but more than 2, and i don't know if my application would match a mapping procedure.  

 

Is this just a matter of testing an operating point against each point in the boundary table to see if it falls outside the limits? In which case construct a macro or loop?

Is there an existing method that streamlines this? 

 

example operating data from field

unitdatetemperaturespeedweight
g01JAN2021:00:00:00.000000121,231.40248.93
g01JAN2021:00:10:00.00000012.51,480.82250.74
g01JAN2021:00:20:00.000000151,134.39282.13
g01JAN2021:00:30:00.000000221,090.64275.02



example limits table
pointTemperatureSpeed_limitWeight_limit
1300601.5029396
23050601.5029334
33050.00000055765.8339735
430100.0000011765.8339736
530150.0000017765.8339737
630200.0000023765.8339737
730250.0000028765.8339738
830300.0000034765.8339739
930350.000004765.8339739
1030400.0000047765.833974
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

IF your polygons are convex and possibly concentric then use Ginside with concentric polygons with Temperature as the ID variable. Use the other two coordinates as the map variables. Ginside would return the innermost polygon id value that a point is inside. If the data is ordered correctly. At least it does for a very small example.

 

/* create concentric polygon set
   with the ID of the innermost smallest
   first (Not sure if the order is critical)
note the repeat of first vertex as last to close
the polygon
*/
data mapset;
   input x y idvar;
datalines;
5 5 1
5 10 1
10 10 1
10 5  1
5 5 1
0 0 2
0 15 2
15 15 2
15 0 2
0 0 2
;

/* this shows appearance of the polygons*/
pattern value=empty;
proc gmap map=mapset data=mapset;
id idvar;
choro idvar /coutline=black;
run;
quit;

/* create some points to test ginside*/
data testvalues;
  input x y;
datalines;
6 6
6 13
1 1
;

proc ginside map=mapset data=testvalues
     out=resultset
     includeborder
  ;
  id idvar;
run;

The "trick" may be getting the map data set created as I cannot tell if you have the values in proper graphing order which is not a trivial exercise.

I have an example that closes each polygon to make sure the entire boundary is as expected. This could be accomplished if your data is in a good order by using first. and last processing to set the starting coordinates of the polygon on the First.temperature and then add them as the closing point at the Last.temperature from boundaries set.

This is one way:

data limits;
  input cat	Temperature speed weight;
datalines;
0.9	30	0	602
0.9	30	50	602
0.9	30	50	1128
0.9	30	1953	1128
0.95	30	2161	395
0.95	30	2162	339
0.95	30	2162	282
0.95	30	2163	226
0.95	30	2164	169
0.95	30	2164	113
0.9	35	1978	-564
0.9	35	1974	-621
0.9	35	1968	-677
0.95	35	2037	1016
0.95	35	2050	959
0.95	35	2062	903
0.95	35	2071	846
0.95	35	2079	790
0.95	35	2085	734
0.95	35	2090	677
0.95	35	2092	621
;

data mapset;
  /*map set requires specific variables*/
  set limits (rename=(temperature=id speed=x weight=y));
  /* if your id variable is grouped but not actually sorted*/
  by id notsorted;
  retain tempx tempy;
  if first.id then do;
     tempx=x;
     tempy=y;
  end;
  output;
  if last.id then do;
     x=tempx;
     y=tempy;
     output;
  end;
  drop tempx tempy;
run;

No clue as to the behavior of how overlapping non-concentric polygons will react. I think the first ID value found would be the result but not interested enough to try to create such data and test.

 

Note that depending on GINSIDE options chosen you can restrict the output data set to only inside, only outside or all the data. If "outside" of all polygons the ID variable should be missing.

View solution in original post

16 REPLIES 16
ballardw
Super User

Can you show what you expect for the result?

 

This is likely a fairly straight-forward Proc SQL call as that is one way to combine and compare all records in one data set with another. This would be called a Cartesian Join. One example that selects records where ALL three limits are exceeded would look like this. Note that you didn't very clearly specify if you want when ALL three exceed or two of three or if any one of the three exceed their limit. So that might mean some additional logic involving OR instead of AND for the comparisons.

 

proc sql;
   create table compared as 
   select a.*, b.*
   from operatingdata as a
        ,limitdata as b
   where a.temperature > b.temperature
      and a.speed > b.speed_limit
      and a.weight > b.weight_limit
   ;
quit;

The output of the above would have all of the values involved so you could process data further for some process.

 

I suspect somewhere you are going to get to a "but I only wanted the lowest set of limits that were exceeded". If so, you should state that but the output set above could be sorted and processed to get that.

 

 

eh51
Calcite | Level 5

Thanks @ballardw, I think a simple compare on all three variables would show some points as out of bounds when they are in fact in bounds. 

A math method I have found that comes close is 'ray casting'. (in the case of 2 dimensions) take any operating point's y value and extend to infinite. If the line intersects the limits boundary an odd number of times, the point is inside. If an even number of times, or 0, then it is outside the boundary. 

 

So some time of ray casting in SAS for more than 2 dimensions. I might be able to remove a dimension though. 

ballardw
Super User

@eh51 wrote:

Thanks @ballardw, I think a simple compare on all three variables would show some points as out of bounds when they are in fact in bounds. 

A math method I have found that comes close is 'ray casting'. (in the case of 2 dimensions) take any operating point's y value and extend to infinite. If the line intersects the limits boundary an odd number of times, the point is inside. If an even number of times, or 0, then it is outside the boundary. 

 

So some time of ray casting in SAS for more than 2 dimensions. I might be able to remove a dimension though. 


I think you may be understating your problem by a lot. If the points in your limits are vertices of solution polygons or areas, then you need to state so. If so you may be looking for SAS/IML or SAS/OR depending on the interpretation of your complete data.

 

A reasonable two-dimension solutions set such as just speed and weight to construct solution polygons then Proc Ginside would work.  Which may well be an implementation of the "ray casting" as I have no idea of the internal coding.

 

But this is three dimensions and I'm not sure of a 3d approach that is quite equivalent. Something related to solid modelling perhaps.

eh51
Calcite | Level 5

I am really working from scratch here. 

My limits can be vertices of a polygon, and i believe I can further assume they are always convex.  

 

I am looking at SAS/IML right now and might try to implement the equation (y-y0)(x1-x0)-(x-x0)(y1-y0)

if the result is less than 0, then the point is to the right of the segment, greater than is to the left, and 0 is on the line segment. To be inside or outside, the point has to be to the right of every segment, or left of every segment. Any mix means it is outside the polygon. 

 

I might give ginside a try though given your input. Assuming the map can be anything. 

In either case, I will have determine which of 10 different speed and weight polygons to use based on the operating temperature. There is some error in this, but I can make it conservative. 

 

Reeza
Super User

Ignoring the fact you have three variables for the moment, see this example. I'd try and get it working for one first and then figure out how to scale it. 

https://communities.sas.com/t5/Graphics-Programming/Create-a-polygon-and-test-whether-point-s-lie-wi...

 

 

eh51
Calcite | Level 5

Looking further at Ginside, I don't see a way to switch between different polygons based on the operating point. Ginside wants adjoining polygons, but my polygons will enclose each other since the third variable expands the polygon that should be used. 

For this reason, I am creating a set of arrays that define the polygon, and using the third variable from the operating data to choose which polygon to compare. I will determine if the op point lies always to one side of all of the polygon segments. 

Reeza
Super User

You can use arrays here most likely but it depends a bit on how your limits are structured. GINSIDE is likely not the right approach unless you're dealing with spatial limits.

 

Can you post a small reproducible example and someone can help you with the code?

 

And here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

Ksharp
Super User
you could calculated Mahalanobis distance between center and the rows of x by MAHALANOBIS function in SAS/IML . @Rick_SAS know more thing and might give you better method .
eh51
Calcite | Level 5

I will try to address all of the comments from my Array thread, which are essentially asking me to 'start over'.  🙂

 

The data set I have is a set of field measurements. Operating data.  

ObsGrouplocal_dttmSpeedweightTemperatureCat
1878404JAN2021:17:20:00.0000002,329.65-426.063.350.97
2878404JAN2021:17:30:00.0000002,330.98-480.733.740.97
3878404JAN2021:17:40:00.0000002,329.38-447.443.880.97
4878404JAN2021:17:50:00.0000002,327.91-365.983.740.98
5878404JAN2021:18:00:00.0000002,329.30-225.053.620.99
6878404JAN2021:18:10:00.0000002,329.98-234.773.370.99
7878404JAN2021:21:10:00.0000002,331.31-87.292.010.99
8878404JAN2021:21:20:00.0000002,329.99-77.51.810.99
9878404JAN2021:21:30:00.0000002,330.1712.781.831
10878404JAN2021:21:40:00.0000002,330.0716.91.681
11878404JAN2021:21:50:00.0000002,330.942.361.941
12878404JAN2021:22:00:00.0000002,329.89-33.222.030.99
13878404JAN2021:22:10:00.0000002,329.93-142.820.98
14878404JAN2021:22:20:00.0000002,330.20-141.091.950.98
15878404JAN2021:22:30:00.0000002,329.50-129.251.780.98
16878404JAN2021:22:40:00.0000002,330.17-133.561.690.98
17878404JAN2021:22:50:00.0000002,329.39-152.021.560.98
18878404JAN2021:23:00:00.0000002,330.09-182.971.520.98
19878404JAN2021:23:10:00.0000002,329.59-169.541.290.98
20878404JAN2021:23:20:00.0000002,330.77-221.521.310.98
21878404JAN2021:23:30:00.0000002,329.89-236.031.610.98
22878404JAN2021:23:40:00.0000002,329.93-243.141.70.97
23878404JAN2021:23:50:00.0000002,329.21-234.271.60.97
24878405JAN2021:00:00:00.0000002,328.60-229.13.0.97
25878405JAN2021:00:10:00.0000002,330.11-188.981.490.98
26878405JAN2021:00:20:00.0000002,328.01-216.421.340.98
27878405JAN2021:00:30:00.0000002,308.17-213.411.230.98
28878405JAN2021:00:40:00.0000002,327.50-212.381.39.

 

My desired data set would show if the operating point is over a limit (yes/no could be inside/outside or 1/0, etc.)

ObsGrouplocal_dttmSpeedweightTemperatureCatExceeds_limits
1878404JAN2021:17:20:00.0000002,329.65-426.063.350.97yes
2878404JAN2021:17:30:00.0000002,330.98-480.733.740.97No
3878404JAN2021:17:40:00.0000002,329.38-447.443.880.97No
4878404JAN2021:17:50:00.0000002,327.91-365.983.740.98No
5878404JAN2021:18:00:00.0000002,329.30-225.053.620.99No
6878404JAN2021:18:10:00.0000002,329.98-234.773.370.99No
7878404JAN2021:21:10:00.0000002,331.31-87.292.010.99No
8878404JAN2021:21:20:00.0000002,329.99-77.51.810.99No
9878404JAN2021:21:30:00.0000002,330.1712.781.831No
10878404JAN2021:21:40:00.0000002,330.0716.91.681Yes
11878404JAN2021:21:50:00.0000002,330.942.361.941No
12878404JAN2021:22:00:00.0000002,329.89-33.222.030.99No
13878404JAN2021:22:10:00.0000002,329.93-142.820.98No
14878404JAN2021:22:20:00.0000002,330.20-141.091.950.98No
15878404JAN2021:22:30:00.0000002,329.50-129.251.780.98No
16878404JAN2021:22:40:00.0000002,330.17-133.561.690.98No
17878404JAN2021:22:50:00.0000002,329.39-152.021.560.98Yes
18878404JAN2021:23:00:00.0000002,330.09-182.971.520.98No
19878404JAN2021:23:10:00.0000002,329.59-169.541.290.98No
20878404JAN2021:23:20:00.0000002,330.77-221.521.310.98No
21878404JAN2021:23:30:00.0000002,329.89-236.031.610.98Yes
22878404JAN2021:23:40:00.0000002,329.93-243.141.70.97No
23878404JAN2021:23:50:00.0000002,329.21-234.271.60.97No
24878405JAN2021:00:00:00.0000002,328.60-229.13.0.97No
25878405JAN2021:00:10:00.0000002,330.11-188.981.490.98No
26878405JAN2021:00:20:00.0000002,328.01-216.421.340.98No
27878405JAN2021:00:30:00.0000002,308.17-213.411.230.98No
28878405JAN2021:00:40:00.0000002,327.50-212.381.39.Yes

 

The final report by group, which is simple once the above is available. 

GroupPercent exceeds
87841.00%
56941.50%
52893.00%
45350.10%
24898.00%
35274.00%

 

The limits data set defines convex polygons. I cannot use a simple "is speed greater than the speed limit".  I need to use something like Ginside, but my polygons are not adjacent. They lay on top of each other. A lower temperature provides a larger polygon with the same center and some identical vertices.  

 

I can reduce my problem to two dimensions (two variables). The other variable (actually 2), I would use to choose the correct set of limits (speed/weight). 

 

Here is a look at the limits data set, from which 10 different polygon vertices are obtained. 

each cat/temperature combination represents one array.

catTemperaturespeedweight
0.9300602
0.93050602
0.930501128
0.93019531128
0.95302161395
0.95302162339
0.95302162282
0.95302163226
0.95302164169
0.95302164113
0.9351978-564
0.9351974-621
0.9351968-677
0.953520371016
0.95352050959
0.95352062903
0.95352071846
0.95352079790
0.95352085734
0.95352090677
0.95352092621

 

 

My thought was to step through each operating data set observation, 

Determine which set of limits to use (the rows where op-cat and op-temperature fall within a cat and temp in the limits data set)

Make the calculation for each operating set observation

Write the result to a new variable in the operating data set

 

This is what I have. Any recommendations on how to accomplish the above is appreciated. 

data op_results;
set op_base;

/* original idea was to load a set of arrays that I can step through later in this data step but am not seeing a path */
 create set of applicable_array (plural) here
 
/*   initialize some variables   */
negatives = 0;
positives = 0;
zeroes = 0;

/*   step through the limits and determine if the operating point is to the right or left SIDE of each line drawn by each vertices pair in the limits data set/array */
do j = 2 to numrow(applicable_array,1); 
   side = sign(
	(speed - applicable_array[i-1,1])*(applicable_array[i,2]-applicable_array[i-1,2]) - 
	(weight - applicable_array[i-1,2])*(applicable_array[i,1]-applicable_array[i-1,1])
	       );   
   if side = -1 then negatives++;
   else if side = 1 then positives++;
   else zeroes++;
end;
IF negatives = 0 and positives > 0 then exceedance = 'inside';  /*  inside/outside could be yes/no or 1/0, etc.  */
else exceedance = 'outside';

run;
Reeza
Super User

The limits data set defines convex polygons. I cannot use a simple "is speed greater than the speed limit".  I need to use something like Ginside, but my polygons are not adjacent. They lay on top of each other. A lower temperature provides a larger polygon with the same center and some identical vertices.  

How do you know which polygon goes with which point?

eh51
Calcite | Level 5

The value of 'cat' and 'temperature' in the operations data set will fall between one of the limits. I would use the more conservative limit. This is the code I planned to use to choose the applicable limit array.

I replaced the code with --use...-- to give the idea. 

 

/* determine which boundary to use */
if temperature <= 30 then do;
if cat <= 0.90 then --use limits 30090-- ;
else if (0.90 < cat <= 0.95) then --use limits 30095-- end;
else if (0.95 < cat <= 1) then --use limits 30100-- end;
else if (1 < cat <= 1.05) then --use limits 30105-- end;
else --use limits 30110;
end;

else do;
if cat <= 0.90 then --use limits 35090-- ;
else if (0.90 < cat <= 0.95) then --use limits 35095-- ;
else if (0.95 < cat <= 1) then --use limits 35100-- ;
else if (1 < cat <= 1.05) then --use limits 35105-- ;
else --use limits 35110-- ;
end;

 

Reeza
Super User
I may have missed it, did you share an example of what the polygon data set looks like? I don't see any x/y points in any data?
I would try GINSIDE as it may default to the closest boundary which align with your requirements. If you wanted the outer boundaries, I suspect that may be harder.

eh51
Calcite | Level 5
Yes. In the limits table above, speed and weight would be the x and y.
ballardw
Super User

IF your polygons are convex and possibly concentric then use Ginside with concentric polygons with Temperature as the ID variable. Use the other two coordinates as the map variables. Ginside would return the innermost polygon id value that a point is inside. If the data is ordered correctly. At least it does for a very small example.

 

/* create concentric polygon set
   with the ID of the innermost smallest
   first (Not sure if the order is critical)
note the repeat of first vertex as last to close
the polygon
*/
data mapset;
   input x y idvar;
datalines;
5 5 1
5 10 1
10 10 1
10 5  1
5 5 1
0 0 2
0 15 2
15 15 2
15 0 2
0 0 2
;

/* this shows appearance of the polygons*/
pattern value=empty;
proc gmap map=mapset data=mapset;
id idvar;
choro idvar /coutline=black;
run;
quit;

/* create some points to test ginside*/
data testvalues;
  input x y;
datalines;
6 6
6 13
1 1
;

proc ginside map=mapset data=testvalues
     out=resultset
     includeborder
  ;
  id idvar;
run;

The "trick" may be getting the map data set created as I cannot tell if you have the values in proper graphing order which is not a trivial exercise.

I have an example that closes each polygon to make sure the entire boundary is as expected. This could be accomplished if your data is in a good order by using first. and last processing to set the starting coordinates of the polygon on the First.temperature and then add them as the closing point at the Last.temperature from boundaries set.

This is one way:

data limits;
  input cat	Temperature speed weight;
datalines;
0.9	30	0	602
0.9	30	50	602
0.9	30	50	1128
0.9	30	1953	1128
0.95	30	2161	395
0.95	30	2162	339
0.95	30	2162	282
0.95	30	2163	226
0.95	30	2164	169
0.95	30	2164	113
0.9	35	1978	-564
0.9	35	1974	-621
0.9	35	1968	-677
0.95	35	2037	1016
0.95	35	2050	959
0.95	35	2062	903
0.95	35	2071	846
0.95	35	2079	790
0.95	35	2085	734
0.95	35	2090	677
0.95	35	2092	621
;

data mapset;
  /*map set requires specific variables*/
  set limits (rename=(temperature=id speed=x weight=y));
  /* if your id variable is grouped but not actually sorted*/
  by id notsorted;
  retain tempx tempy;
  if first.id then do;
     tempx=x;
     tempy=y;
  end;
  output;
  if last.id then do;
     x=tempx;
     y=tempy;
     output;
  end;
  drop tempx tempy;
run;

No clue as to the behavior of how overlapping non-concentric polygons will react. I think the first ID value found would be the result but not interested enough to try to create such data and test.

 

Note that depending on GINSIDE options chosen you can restrict the output data set to only inside, only outside or all the data. If "outside" of all polygons the ID variable should be missing.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 2244 views
  • 2 likes
  • 4 in conversation