Hi there, I have around 15 million observations, where each observation is a 1 metre length of road with the variables being various attributes. I want to create an aggregate data set, so that each observation is a longer length (with a upper limit of 1000 metres, but no lower limit). I would like this length to be determined by having the same attributes (homogeneous lengths of road). Where anyone of the attributes changes, a new length is created. Now, each observation has a start point (metres along a road) and an end point (e.g. 1 to 2) and the next contiguous length will have the previous end point as the start point and so on (think of a spatial file with lines). There will be breaks in the lengths, the observations will not always be contiguous so this is another break point I want to use. And on top of that I want to use the first start point as the new start point and take the end point of the last record in that length and use that as the end point in the aggregate data set. The start and end points also need to be in order when aggregating is occurring so that you don't end up with a start point that exists after your end point. Ideally I would also calculate the new length by summing the length_m variable of the observations aggregated into a length, or simply by subtracting the new end point from the new start point. I get the logic of what I need to do I think, but I'm struggling to turn this into code. Note that I'm ok with the fact that in this way the roads all exist in parallel (i.e. it doesn't take into account the real intersecting nature of a road network). I am using Base SAS 9.4. The following is a small example of the data that I have. It has 46 metres of road, with 46 observations/rows: data have;
input road $ carriage $ start end attribute1 $ length_m;
length road $ 2 carriage $ 2 start 3 end 3 attribute1 $ 4 length_m 4;
datalines;
P1 L 0 1 50PD 1
P1 L 1 2 50PD 1
P1 L 2 3 50PD 1
P1 L 7 8 50PD 1
P1 L 8 9 50PD 1
P1 L 3 4 60PD 1
P1 L 4 5 60PD 1
P1 L 6 7 60PD 1
P1 L 11 12 70AS 1
P1 L 9 10 70PD 1
P1 L 10 11 70PD 1
P1 R 0 1 50PD 1
P1 R 1 2 50PD 1
P1 R 2 3 50PD 1
P1 R 7 8 50PD 1
P1 R 8 9 50PD 1
P1 R 9 10 50PD 1
P1 R 3 4 60PD 1
P1 R 4 5 60PD 1
P1 R 6 7 60PD 1
P1 R 10 11 70PD 1
P1 R 11 12 70PD 1
P1 S 5 6 60PD 1
P2 L 0 1 50PD 1
P2 L 1 2 50PD 1
P2 L 2 3 50PD 1
P2 L 7 8 50PD 1
P2 L 8 9 50PD 1
P2 L 3 4 60PD 1
P2 L 4 5 60PD 1
P2 L 6 7 60PD 1
P2 L 11 12 70AS 1
P2 L 9 10 70PD 1
P2 L 10 11 70PD 1
P2 R 0 1 50PD 1
P2 R 1 2 50PD 1
P2 R 2 3 50PD 1
P2 R 7 8 50PD 1
P2 R 8 9 50PD 1
P2 R 9 10 50PD 1
P2 R 3 4 60PD 1
P2 R 4 5 60PD 1
P2 R 6 7 60PD 1
P2 R 10 11 70PD 1
P2 R 11 12 70PD 1
P2 S 5 6 60PD 1
run; The resulting table I'm hoping to get is as follows. It has 46 metres of road, broken up into 24 observations (unique, homogenous lengths of road of varying length): data want;
input road $ carriage $ start end attribute1 $ length_m;
length road $ 2 carriage $ 2 start 3 end 3 attribute1 $ 4 length_m 4;
datalines;
P1 L 0 3 50PD 3
P1 L 7 9 50PD 2
P1 L 3 5 60PD 2
P1 L 6 7 60PD 1
P1 L 11 12 70AS 1
P1 L 9 11 70PD 2
P1 R 0 3 50PD 3
P1 R 7 10 50PD 3
P1 R 3 5 60PD 2
P1 R 6 7 60PD 1
P1 R 10 12 70PD 2
P1 S 5 6 60PD 1
P2 L 0 3 50PD 3
P2 L 7 9 50PD 2
P2 L 3 5 60PD 2
P2 L 6 7 60PD 1
P2 L 11 12 70AS 1
P2 L 9 11 70PD 2
P2 R 0 3 50PD 3
P2 R 7 10 50PD 3
P2 R 3 5 60PD 2
P2 R 6 7 60PD 1
P2 R 10 12 70PD 2
P2 S 5 6 60PD 1
run; Can anyone help with an efficient and effective solution? Thanks for your time!
... View more