Help using Base SAS procedures

An interesting problem...

Reply
Contributor
Posts: 41

An interesting problem...

I know that I've been asking a lot of questions.  I have been pushing my SAS envelope quite a bit lately.

I have an interesting dataset of GPS data tracking a machine as it travels on a field.  It moves from field to headrows on each end.  We have 10's of thousands of points taken from GPS and GIS.  I am basically trying to quantify the time spent on each leg (up a row, turn around in a headrow, down another row, turn around in a headrow, up another row), and in the future do some other calculations.

My current problem

Sometimes the machine doesn't cleanly re-enter the field and has to back out, reset, and restart.  My approach has been to assign leg numbers to these and when they cross the boundary into another zone.  However, I wind up with tiny leg segments that should be lumped in with the point where the machine first exited the field.  I anticipated this problem to a degree using a small artificial dataset, used proc means to find out these tiny legs, then reassign their leg using the lag statement.  Merged it back and recalculated the legs... worked great.

Real life is not as simple.  First off, I get multiple tiny leg segments (sometimes close to a dozen, sometimes just a couple). 

So-- is there a way anyone can think of that I might automate this.  I am not generally opposed to doing it by hand, but I have at least 10 of these datasets to go through, and if I can automate it I'd like to.

This table represents a subset of data that is generated by my proc means, minus the New_PLT column, which is the column I'd like to find a way to generate automatically.  You can see that the machine proceeds from a headrow "HR1" into the field, into
"HR2", reenters the field.  The attribute that is common is that when I want to lump them in with previous types, there are a series of "legs" (PLeg) with low _FREQ_, and it primarily there is a repeating pattern HR1 FIELD HR1 FIELD repeat... or HR2 FIELD HR2 FIELD repeat...

Any ideas?  I had some good ones, but I have a headache now.

ObsPLegPLT_TYPE__FREQ_speed
New_PLT
5151HR10442.69545
HR1
5252FLD02952.03322
FLD
5353HR201450.68828
HR2
5454FLD02162.54676
FLD
5555HR10433.08837
HR1
5656FLD03012.04751
FLD
5757HR201190.73277
HR2
5858FLD0210.0619
HR2
5959HR2030
HR2
6060FLD010.1
HR2
6161HR2020.15
HR2
6262FLD050.12
HR2
6363HR20430.1
HR2
6464FLD02112.61374
FLD
6565HR10353.22857
HR1
6666FLD03171.83533
FLD
6767HR203310.37583
HR2
6868FLD02432.27037
FLD
6969HR10862.48023
HR1
7070FLD0220.76364
FLD
7171HR10432.3814
HR1
7272FLD02552.07961
FLD
7373HR202320.80819
HR2
7474FLD02082.52837
FLD
7575HR1020.5
HR1
7676FLD0140.70714
HR1
7777HR10373.94865
HR1
7878FLD01712.84971
FLD
7979HR202990.64482
HR2
8080FLD03201.83125
FLD
8181HR10354.18286
HR1
8282FLD01932.51244
FLD
8383HR207780.31427
HR2
8484FLD03101.80097
FLD
119119HR10393.76154
HR1
120120FLD02132.11549
FLD
121121HR2023.45
HR2
122122FLD022.7
HR2
123123HR20573.38246
HR2
124124FLD05701.02649
FLD
125125HR104280.39556
HR1
Super User
Posts: 5,080

An interesting problem...

You're right.  It is an interesting problem.  And I understand the headache.

Here's an approach.  Since you're working with summary data sets, it looks like you can process them multiple times without having efficiency come into play.

For the summary data set, take a subset by removing the FLD records.  That lets you determine the starting and ending legs for each combination HDR segment: 

data subset;

   set original (keep=Pleg PLT) end=done;

   where PLT ne 'FLD';

   by PLT notsorted;

   if first.PLT then start=Pleg;

   retain start;

   if last.PLT;

   end = Pleg;

   fmtname = 'hdr';

   label = PLT;

   output;

   if done then do;

      hlo='O';

      label='FLD';

      output;

   end;

   keep start end fmtname label hlo;

run;

proc format cntlin=subset;

run;

In a way, this mimics how you would do it if you were looking at the data on paper.  Use the format to assign new_PLT:

data add_new_PLT;

   set original;

   new_PLT = put(Pleg, hdr.);

run;

  

The code is untested, so you may have to tweak it.  But the approach should work.

Good luck.

Contributor
Posts: 41

An interesting problem...

I will try that.

I have been mulling over another idea that I'm pretty sure will work, but it is far less elegant looking, more of a square peg, round hole, big robot hammer approach. 

I will try yours in the next day or two and let you know.

Respected Advisor
Posts: 3,887

An interesting problem...

That's a really neat solution! Like it a lot.

PROC Star
Posts: 7,360

An interesting problem...

I probably don't understand what you are trying to do, but wouldn't the following accomplish what you want?

data want (drop=holdSmiley Happy;

  set have;

  retain hold_plt;

  if _n_ eq 1 or speed gt 0.15 then hold_PLT=PLT;

  new_plt=hold_plt;

run;

Super User
Posts: 5,080

An interesting problem...

Art,

It looks like you understand the problem, but I think your solution relies on the speed values too much.  For example, look at observations 75 through 77 where your logic would switch back and forth instead of grouping all three.  The same issue would exist for observations 121 through 123.

Just for the record, iiibbb, shouldn't observation 70 have new_PLT=HR1?

PROC Star
Posts: 7,360

An interesting problem...

122 was the only one I questioned but, yes, my code relied totally on speed.

Contributor
Posts: 41

Re: An interesting problem...

I figured out something that worked....

It is not Elegant.  Get rid of the drop statement toward the end and you can see the progression.

data a;

input Obs    PLeg    PLT $    _TYPE_    _FREQ_    speed    Desired_PLT $;

cards;

51    51    HR1    0    44    2.69545    HR1

52    52    FLD    0    295    2.03322        FLD

53    53    HR2    0    145    0.68828        HR2

54    54    FLD    0    216    2.54676        FLD

55    55    HR1    0    43    3.08837        HR1

56    56    FLD    0    301    2.04751        FLD

57    57    HR2    0    119    0.73277        HR2

58    58    FLD    0    21    0.0619        HR2

59    59    HR2    0    3    0        HR2

60    60    FLD    0    1    0.1        HR2

61    61    HR2    0    2    0.15    HR2

62    62    FLD    0    5    0.12    HR2

63    63    HR2    0    43    0.1        HR2

64    64    FLD    0    211    2.61374        FLD

65    65    HR1    0    35    3.22857        HR1

66    66    FLD    0    317    1.83533        FLD

67    67    HR2    0    331    0.37583        HR2

68    68    FLD    0    243    2.27037        FLD

69    69    HR1    0    86    2.48023        HR1

70    70    FLD    0    22    0.76364        HR1

71    71    HR1    0    43    2.3814        HR1

72    72    FLD    0    255    2.07961        FLD

73    73    HR2    0    232    0.80819        HR2

74    74    FLD    0    208    2.52837    FLD

75    75    HR1    0    2    0.5    HR1

76    76    FLD    0    14    0.70714    HR1

77    77    HR1    0    37    3.94865    HR1

78    78    FLD    0    171    2.84971    FLD

79    79    HR2    0    299    0.64482    HR2

80    80    FLD    0    320    1.83125    FLD

81    81    HR1    0    35    4.18286    HR1

82    82    FLD    0    193    2.51244    FLD

83    83    HR2    0    778    0.31427    HR2

84    84    FLD    0    310    1.80097    FLD

119    119    HR1    0    39    3.76154    HR1

120    120    FLD    0    213    2.11549    FLD

121    121    HR2    0    2    3.45    HR2

122    122    FLD    0    2    2.7    HR2

123    123    HR2    0    57    3.38246    HR2

124    124    FLD    0    570    1.02649    FLD

125    125    HR1    0    428    0.39556    HR1

;;

run;

data b;

    set a;

    if PLT="FLD" then a=0;

    if PLT="HR1" then a=-1;

    if PLT="HR2" then a=1;

    b = lag2(a);

    sumab= abs(a+b);

    c = lag1(sumab);

    if sumab = 0 then d = c; else d = sumab;

run;

proc sort;

    by descending PLeg;

run;

data c;

    drop sumab a b c d;

    set b;

    e = lag1(d);

    LagPLT=lag1(PLT);

    if e=2 and PLT="FLD" then NewPLT=LagPLT; else NewPLT=PLT;

run;

proc sort;

    by PLeg;

proc print;

run;

Contributor
Posts: 41

Re: An interesting problem...


ObsPlegPLT_TYPE__FREQ_SpeedDesired_PLTelagPLTNewPLT
15151HR10442.69545HR1.FLDHR1
25252FLD02952.03322FLD.HR2FLD
35353HR201450.68828HR20FLDHR2
45454FLD02162.54676FLD0HR1FLD
55555HR10433.08837HR10FLDHR1
65656FLD03012.04751FLD0HR2FLD
75757HR201190.73277HR20FLDHR2
85858FLD0210.0619HR22HR2HR2
95959HR2030HR22FLDHR2
106060FLD010.1HR22HR2HR2
116161HR2020.15HR22FLDHR2
126262FLD050.12HR22HR2HR2
136363HR20430.1HR22FLDHR2
146464FLD02112.61374FLD0HR1FLD
156565HR10353.22857HR10FLDHR1
166666FLD03171.83533FLD0HR2FLD
176767HR203310.37583HR20FLDHR2
186868FLD02432.27037FLD0HR1FLD
196969HR10862.48023HR10FLDHR1
207070FLD0220.76364HR12HR1HR1
217171HR10432.3814HR12FLDHR1
227272FLD02552.07961FLD0HR2FLD
237373HR202320.80819HR20FLDHR2
247474FLD02082.52837FLD0HR1FLD
257575HR1020.5HR10FLDHR1
267676FLD0140.70714HR12HR1HR1
277777HR10373.94865HR12FLDHR1
287878FLD01712.84971FLD0HR2FLD
297979HR202990.64482HR20FLDHR2
308080FLD03201.83125FLD0HR1FLD
318181HR10354.18286HR10FLDHR1
328282FLD01932.51244FLD0HR2FLD
338383HR207780.31427HR20FLDHR2
348484FLD03101.80097FLD0HR1FLD
35119119HR10393.76154HR10FLDHR1
36120120FLD02132.11549FLD0HR2FLD
37121121HR2023.45HR20FLDHR2
38122122FLD022.7HR22HR2HR2
39123123HR20573.38246HR22FLDHR2
40124124FLD05701.02649FLD0HR1FLD
41125125HR104280.39556HR1.
HR1
Contributor
Posts: 41

Re: An interesting problem...

Art I think I owe you a beer by now.

Thanks everyone.

Super User
Posts: 5,080

Re: An interesting problem...

iiibbb,

It looks like you're comfortable with a DATA step approach.  Here's an intuitive one (or should I say, intuitive to me):

data boundaries;

   set original  (keep=Pleg PLT);

   by PLT notsorted;

   where PLT ne 'FLD';

   first_PLT = first.plt;

   last_PLT = last.plt;

   keep Pleg first_PLT last_PLT;

run;

data final;

   merge original boundaries;

   by Pleg;

   if first_PLT then status = 'Header Group Begins';

   if last_PLT then status = 'Header Group Ends';

   retain status new_PLT;

   if PLT =: 'HR' then new_PLT = PLT;

   else if status ne 'Header Group Begins' then new_PLT='FLD';

run;

Good luck.

Contributor
Posts: 41

Re: An interesting problem...

I wish it were intutive, but it's not. 

I'm willing to learn though.

Super User
Posts: 5,080

Re: An interesting problem...

Try adding some proc prints.  Examine each of the data sets, especially the new variables first_PLT, last_PLT, and status.

Good luck.

Ask a Question
Discussion stats
  • 12 replies
  • 217 views
  • 4 likes
  • 4 in conversation