I know that I've been asking a lot of questions. I have been pushing my SAS envelope quite a bit lately.
I have an interesting dataset of GPS data tracking a machine as it travels on a field. It moves from field to headrows on each end. We have 10's of thousands of points taken from GPS and GIS. I am basically trying to quantify the time spent on each leg (up a row, turn around in a headrow, down another row, turn around in a headrow, up another row), and in the future do some other calculations.
My current problem
Sometimes the machine doesn't cleanly re-enter the field and has to back out, reset, and restart. My approach has been to assign leg numbers to these and when they cross the boundary into another zone. However, I wind up with tiny leg segments that should be lumped in with the point where the machine first exited the field. I anticipated this problem to a degree using a small artificial dataset, used proc means to find out these tiny legs, then reassign their leg using the lag statement. Merged it back and recalculated the legs... worked great.
Real life is not as simple. First off, I get multiple tiny leg segments (sometimes close to a dozen, sometimes just a couple).
So-- is there a way anyone can think of that I might automate this. I am not generally opposed to doing it by hand, but I have at least 10 of these datasets to go through, and if I can automate it I'd like to.
This table represents a subset of data that is generated by my proc means, minus the New_PLT column, which is the column I'd like to find a way to generate automatically. You can see that the machine proceeds from a headrow "HR1" into the field, into
"HR2", reenters the field. The attribute that is common is that when I want to lump them in with previous types, there are a series of "legs" (PLeg) with low _FREQ_, and it primarily there is a repeating pattern HR1 FIELD HR1 FIELD repeat... or HR2 FIELD HR2 FIELD repeat...
Any ideas? I had some good ones, but I have a headache now.
Obs | PLeg | PLT | _TYPE_ | _FREQ_ | speed | New_PLT | |
51 | 51 | HR1 | 0 | 44 | 2.69545 | HR1 | |
52 | 52 | FLD | 0 | 295 | 2.03322 | FLD | |
53 | 53 | HR2 | 0 | 145 | 0.68828 | HR2 | |
54 | 54 | FLD | 0 | 216 | 2.54676 | FLD | |
55 | 55 | HR1 | 0 | 43 | 3.08837 | HR1 | |
56 | 56 | FLD | 0 | 301 | 2.04751 | FLD | |
57 | 57 | HR2 | 0 | 119 | 0.73277 | HR2 | |
58 | 58 | FLD | 0 | 21 | 0.0619 | HR2 | |
59 | 59 | HR2 | 0 | 3 | 0 | HR2 | |
60 | 60 | FLD | 0 | 1 | 0.1 | HR2 | |
61 | 61 | HR2 | 0 | 2 | 0.15 | HR2 | |
62 | 62 | FLD | 0 | 5 | 0.12 | HR2 | |
63 | 63 | HR2 | 0 | 43 | 0.1 | HR2 | |
64 | 64 | FLD | 0 | 211 | 2.61374 | FLD | |
65 | 65 | HR1 | 0 | 35 | 3.22857 | HR1 | |
66 | 66 | FLD | 0 | 317 | 1.83533 | FLD | |
67 | 67 | HR2 | 0 | 331 | 0.37583 | HR2 | |
68 | 68 | FLD | 0 | 243 | 2.27037 | FLD | |
69 | 69 | HR1 | 0 | 86 | 2.48023 | HR1 | |
70 | 70 | FLD | 0 | 22 | 0.76364 | FLD | |
71 | 71 | HR1 | 0 | 43 | 2.3814 | HR1 | |
72 | 72 | FLD | 0 | 255 | 2.07961 | FLD | |
73 | 73 | HR2 | 0 | 232 | 0.80819 | HR2 | |
74 | 74 | FLD | 0 | 208 | 2.52837 | FLD | |
75 | 75 | HR1 | 0 | 2 | 0.5 | HR1 | |
76 | 76 | FLD | 0 | 14 | 0.70714 | HR1 | |
77 | 77 | HR1 | 0 | 37 | 3.94865 | HR1 | |
78 | 78 | FLD | 0 | 171 | 2.84971 | FLD | |
79 | 79 | HR2 | 0 | 299 | 0.64482 | HR2 | |
80 | 80 | FLD | 0 | 320 | 1.83125 | FLD | |
81 | 81 | HR1 | 0 | 35 | 4.18286 | HR1 | |
82 | 82 | FLD | 0 | 193 | 2.51244 | FLD | |
83 | 83 | HR2 | 0 | 778 | 0.31427 | HR2 | |
84 | 84 | FLD | 0 | 310 | 1.80097 | FLD | |
119 | 119 | HR1 | 0 | 39 | 3.76154 | HR1 | |
120 | 120 | FLD | 0 | 213 | 2.11549 | FLD | |
121 | 121 | HR2 | 0 | 2 | 3.45 | HR2 | |
122 | 122 | FLD | 0 | 2 | 2.7 | HR2 | |
123 | 123 | HR2 | 0 | 57 | 3.38246 | HR2 | |
124 | 124 | FLD | 0 | 570 | 1.02649 | FLD | |
125 | 125 | HR1 | 0 | 428 | 0.39556 | HR1 |
You're right. It is an interesting problem. And I understand the headache.
Here's an approach. Since you're working with summary data sets, it looks like you can process them multiple times without having efficiency come into play.
For the summary data set, take a subset by removing the FLD records. That lets you determine the starting and ending legs for each combination HDR segment:
data subset;
set original (keep=Pleg PLT) end=done;
where PLT ne 'FLD';
by PLT notsorted;
if first.PLT then start=Pleg;
retain start;
if last.PLT;
end = Pleg;
fmtname = 'hdr';
label = PLT;
output;
if done then do;
hlo='O';
label='FLD';
output;
end;
keep start end fmtname label hlo;
run;
proc format cntlin=subset;
run;
In a way, this mimics how you would do it if you were looking at the data on paper. Use the format to assign new_PLT:
data add_new_PLT;
set original;
new_PLT = put(Pleg, hdr.);
run;
The code is untested, so you may have to tweak it. But the approach should work.
Good luck.
I will try that.
I have been mulling over another idea that I'm pretty sure will work, but it is far less elegant looking, more of a square peg, round hole, big robot hammer approach.
I will try yours in the next day or two and let you know.
That's a really neat solution! Like it a lot.
I probably don't understand what you are trying to do, but wouldn't the following accomplish what you want?
data want (drop=hold:);
set have;
retain hold_plt;
if _n_ eq 1 or speed gt 0.15 then hold_PLT=PLT;
new_plt=hold_plt;
run;
Art,
It looks like you understand the problem, but I think your solution relies on the speed values too much. For example, look at observations 75 through 77 where your logic would switch back and forth instead of grouping all three. The same issue would exist for observations 121 through 123.
Just for the record, iiibbb, shouldn't observation 70 have new_PLT=HR1?
122 was the only one I questioned but, yes, my code relied totally on speed.
I figured out something that worked....
It is not Elegant. Get rid of the drop statement toward the end and you can see the progression.
data a;
input Obs PLeg PLT $ _TYPE_ _FREQ_ speed Desired_PLT $;
cards;
51 51 HR1 0 44 2.69545 HR1
52 52 FLD 0 295 2.03322 FLD
53 53 HR2 0 145 0.68828 HR2
54 54 FLD 0 216 2.54676 FLD
55 55 HR1 0 43 3.08837 HR1
56 56 FLD 0 301 2.04751 FLD
57 57 HR2 0 119 0.73277 HR2
58 58 FLD 0 21 0.0619 HR2
59 59 HR2 0 3 0 HR2
60 60 FLD 0 1 0.1 HR2
61 61 HR2 0 2 0.15 HR2
62 62 FLD 0 5 0.12 HR2
63 63 HR2 0 43 0.1 HR2
64 64 FLD 0 211 2.61374 FLD
65 65 HR1 0 35 3.22857 HR1
66 66 FLD 0 317 1.83533 FLD
67 67 HR2 0 331 0.37583 HR2
68 68 FLD 0 243 2.27037 FLD
69 69 HR1 0 86 2.48023 HR1
70 70 FLD 0 22 0.76364 HR1
71 71 HR1 0 43 2.3814 HR1
72 72 FLD 0 255 2.07961 FLD
73 73 HR2 0 232 0.80819 HR2
74 74 FLD 0 208 2.52837 FLD
75 75 HR1 0 2 0.5 HR1
76 76 FLD 0 14 0.70714 HR1
77 77 HR1 0 37 3.94865 HR1
78 78 FLD 0 171 2.84971 FLD
79 79 HR2 0 299 0.64482 HR2
80 80 FLD 0 320 1.83125 FLD
81 81 HR1 0 35 4.18286 HR1
82 82 FLD 0 193 2.51244 FLD
83 83 HR2 0 778 0.31427 HR2
84 84 FLD 0 310 1.80097 FLD
119 119 HR1 0 39 3.76154 HR1
120 120 FLD 0 213 2.11549 FLD
121 121 HR2 0 2 3.45 HR2
122 122 FLD 0 2 2.7 HR2
123 123 HR2 0 57 3.38246 HR2
124 124 FLD 0 570 1.02649 FLD
125 125 HR1 0 428 0.39556 HR1
;;
run;
data b;
set a;
if PLT="FLD" then a=0;
if PLT="HR1" then a=-1;
if PLT="HR2" then a=1;
b = lag2(a);
sumab= abs(a+b);
c = lag1(sumab);
if sumab = 0 then d = c; else d = sumab;
run;
proc sort;
by descending PLeg;
run;
data c;
drop sumab a b c d;
set b;
e = lag1(d);
LagPLT=lag1(PLT);
if e=2 and PLT="FLD" then NewPLT=LagPLT; else NewPLT=PLT;
run;
proc sort;
by PLeg;
proc print;
run;
Obs | Pleg | PLT | _TYPE_ | _FREQ_ | Speed | Desired_PLT | e | lagPLT | NewPLT | |
1 | 51 | 51 | HR1 | 0 | 44 | 2.69545 | HR1 | . | FLD | HR1 |
2 | 52 | 52 | FLD | 0 | 295 | 2.03322 | FLD | . | HR2 | FLD |
3 | 53 | 53 | HR2 | 0 | 145 | 0.68828 | HR2 | 0 | FLD | HR2 |
4 | 54 | 54 | FLD | 0 | 216 | 2.54676 | FLD | 0 | HR1 | FLD |
5 | 55 | 55 | HR1 | 0 | 43 | 3.08837 | HR1 | 0 | FLD | HR1 |
6 | 56 | 56 | FLD | 0 | 301 | 2.04751 | FLD | 0 | HR2 | FLD |
7 | 57 | 57 | HR2 | 0 | 119 | 0.73277 | HR2 | 0 | FLD | HR2 |
8 | 58 | 58 | FLD | 0 | 21 | 0.0619 | HR2 | 2 | HR2 | HR2 |
9 | 59 | 59 | HR2 | 0 | 3 | 0 | HR2 | 2 | FLD | HR2 |
10 | 60 | 60 | FLD | 0 | 1 | 0.1 | HR2 | 2 | HR2 | HR2 |
11 | 61 | 61 | HR2 | 0 | 2 | 0.15 | HR2 | 2 | FLD | HR2 |
12 | 62 | 62 | FLD | 0 | 5 | 0.12 | HR2 | 2 | HR2 | HR2 |
13 | 63 | 63 | HR2 | 0 | 43 | 0.1 | HR2 | 2 | FLD | HR2 |
14 | 64 | 64 | FLD | 0 | 211 | 2.61374 | FLD | 0 | HR1 | FLD |
15 | 65 | 65 | HR1 | 0 | 35 | 3.22857 | HR1 | 0 | FLD | HR1 |
16 | 66 | 66 | FLD | 0 | 317 | 1.83533 | FLD | 0 | HR2 | FLD |
17 | 67 | 67 | HR2 | 0 | 331 | 0.37583 | HR2 | 0 | FLD | HR2 |
18 | 68 | 68 | FLD | 0 | 243 | 2.27037 | FLD | 0 | HR1 | FLD |
19 | 69 | 69 | HR1 | 0 | 86 | 2.48023 | HR1 | 0 | FLD | HR1 |
20 | 70 | 70 | FLD | 0 | 22 | 0.76364 | HR1 | 2 | HR1 | HR1 |
21 | 71 | 71 | HR1 | 0 | 43 | 2.3814 | HR1 | 2 | FLD | HR1 |
22 | 72 | 72 | FLD | 0 | 255 | 2.07961 | FLD | 0 | HR2 | FLD |
23 | 73 | 73 | HR2 | 0 | 232 | 0.80819 | HR2 | 0 | FLD | HR2 |
24 | 74 | 74 | FLD | 0 | 208 | 2.52837 | FLD | 0 | HR1 | FLD |
25 | 75 | 75 | HR1 | 0 | 2 | 0.5 | HR1 | 0 | FLD | HR1 |
26 | 76 | 76 | FLD | 0 | 14 | 0.70714 | HR1 | 2 | HR1 | HR1 |
27 | 77 | 77 | HR1 | 0 | 37 | 3.94865 | HR1 | 2 | FLD | HR1 |
28 | 78 | 78 | FLD | 0 | 171 | 2.84971 | FLD | 0 | HR2 | FLD |
29 | 79 | 79 | HR2 | 0 | 299 | 0.64482 | HR2 | 0 | FLD | HR2 |
30 | 80 | 80 | FLD | 0 | 320 | 1.83125 | FLD | 0 | HR1 | FLD |
31 | 81 | 81 | HR1 | 0 | 35 | 4.18286 | HR1 | 0 | FLD | HR1 |
32 | 82 | 82 | FLD | 0 | 193 | 2.51244 | FLD | 0 | HR2 | FLD |
33 | 83 | 83 | HR2 | 0 | 778 | 0.31427 | HR2 | 0 | FLD | HR2 |
34 | 84 | 84 | FLD | 0 | 310 | 1.80097 | FLD | 0 | HR1 | FLD |
35 | 119 | 119 | HR1 | 0 | 39 | 3.76154 | HR1 | 0 | FLD | HR1 |
36 | 120 | 120 | FLD | 0 | 213 | 2.11549 | FLD | 0 | HR2 | FLD |
37 | 121 | 121 | HR2 | 0 | 2 | 3.45 | HR2 | 0 | FLD | HR2 |
38 | 122 | 122 | FLD | 0 | 2 | 2.7 | HR2 | 2 | HR2 | HR2 |
39 | 123 | 123 | HR2 | 0 | 57 | 3.38246 | HR2 | 2 | FLD | HR2 |
40 | 124 | 124 | FLD | 0 | 570 | 1.02649 | FLD | 0 | HR1 | FLD |
41 | 125 | 125 | HR1 | 0 | 428 | 0.39556 | HR1 | . | HR1 |
Art I think I owe you a beer by now.
Thanks everyone.
iiibbb,
It looks like you're comfortable with a DATA step approach. Here's an intuitive one (or should I say, intuitive to me):
data boundaries;
set original (keep=Pleg PLT);
by PLT notsorted;
where PLT ne 'FLD';
first_PLT = first.plt;
last_PLT = last.plt;
keep Pleg first_PLT last_PLT;
run;
data final;
merge original boundaries;
by Pleg;
if first_PLT then status = 'Header Group Begins';
if last_PLT then status = 'Header Group Ends';
retain status new_PLT;
if PLT =: 'HR' then new_PLT = PLT;
else if status ne 'Header Group Begins' then new_PLT='FLD';
run;
Good luck.
I wish it were intutive, but it's not.
I'm willing to learn though.
Try adding some proc prints. Examine each of the data sets, especially the new variables first_PLT, last_PLT, and status.
Good luck.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.