I have a time dataset. I have 4 columns in this dataset that identify the zone a machine occupies as it moves from one place to another.
I want to create a field that increases by 1 whenever the data being collected changes from one state to another -- essentially I want to assign leg numbers, with a new leg beginning anytime it changes from one zone to another.
I thought I could do this using the LAG function with some if/then statements, but I'm apparently doing something wrong.
Now I wonder if the RETAIN might be what I need to use.
Any help would be appreciated.
I may not understand what you are trying to do but, using a slight extension of Mike's test data, does the following do what you want?:
data x;
input leg_type @@;
datalines;
66 66 66 77 77 77 77 66 66 888 888 77 77 77 9999 9999
;
run;
data want;
set x;
by leg_type notsorted;
leg + first.leg_type;
run;
hi ... so is this supposed to start at 1, then 2, etc.
data x;
input zone @@;
datalines;
66 66 66 77 77 77 77 888 888 9999 9999
;
run;
data x;
set x;
by zone;
leg + first.zone
run;
zone leg
66 1
66 1
66 1
77 2
77 2
77 2
77 2
888 3
888 3
9999 4
9999 4
That almost works except sometimes the machine re-enters a zone, but on a new leg.
And yes, the start leg is leg 1.
the practice dataset looks like this
UTC_Seconds | Landing | HL1 | Field | HL2 | Speed | Dummy | |
10001 | 1 | 0 | 0 | 0 | 1 | 10000001 | |
10002 | 1 | 0 | 0 | 0 | 1 | 10000002 | |
10003 | 1 | 0 | 0 | 0 | 1 | 10000003 | |
10004 | 1 | 0 | 0 | 0 | 1 | 10000004 | |
10005 | 1 | 0 | 0 | 0 | 1 | 10000005 | |
10006 | 1 | 0 | 0 | 0 | 1 | 10000006 | |
10007 | 1 | 0 | 0 | 0 | 1 | 10000007 | |
10008 | 1 | 0 | 0 | 0 | 2 | 10000008 | |
10009 | 1 | 0 | 0 | 0 | 2 | 10000009 |
But in part of the code I do assign a single variable to describe location
data a;
set timestd.practice;
run;
/*segmentation and control points*/
data b;
set a;
if landing = 1 then Leg_Type = 'LANDING';
if HL1 = 1 then Leg_Type = 'HL1';
if Field = 1 then Leg_Type = 'FIELD';
if HL2 = 1 then Leg_Type = 'HL2';
if HL2 + Field + HL2 + Landing = 0 then Leg_Type = 'TRANS';
run;
data c;
set b;
/* these statements want me to sort the data by leg_type */
by leg_type;
leg + first.legtype;
proc print;
run;
I'm still tinkering with it, but It appears that if I use the line
if Leg_Type = Lag1(Leg_Type) then CP=0; else CP=1;
I can create a dataset that looks the same as if I had used a first. variable in the datastep processing... based on this website's information,
http://www.pauldickman.com/teaching/sas/set_by.php
but I don't have the groups.
How can I reverse this to create the groups?
You don't have to reverse it so much as change what you want to do when the condition is met. I think you are asking how one does something like:
if Leg_Type ne Lag1(Leg_Type) then group+1;
That's what I thought would work. But that yields the same result as
if Leg_Type = Lag1(Leg_Type) then CP=0; else CP=1;
Which is still useful to me because it identifies my control points (the first point of every leg).
However, with
if Leg_Type ne Lag1(Leg_Type) then leg+1;
I get 0's except when I change type, I get a single 1, and then it goes back to zeros.
I may not understand what you are trying to do but, using a slight extension of Mike's test data, does the following do what you want?:
data x;
input leg_type @@;
datalines;
66 66 66 77 77 77 77 66 66 888 888 77 77 77 9999 9999
;
run;
data want;
set x;
by leg_type notsorted;
leg + first.leg_type;
run;
Thank you both!!!
FYI
I just learned statements can not be used (reliably) in conditional statements.
Always a good lesson to learn but, in the code you had posted, you didn't have any conditional use of a lag statement. If you did in the actual code you ran then, yes, you could easily run into trouble.
However, its not because the statement can't be used reliably but, more so, that they will be doing things other than what you had intended. But quite reliably!
I kind-of disagree.
--------------------------------
if AAA ne Lag1(AAA) then BBB=BBB+1; else BBB=BBB;
--------------------------------
LagAAA = Lag1(AAA);
if AAA ne LagAAA then BBB=BBB+1; else BBB=BBB;
Logically, these should be the same. But based on this experience, they do not yield the same result in certain conditional statements.
But thank you for solving my problem. And now I know how to do it with the lag statement too.
Would you post some data and code that shows a situation where you get different results using the two methods?
The following example appears to produce the exact same result:
data test;
input aaa bbb;
cards;
1 1
1 2
2 3
2 4
1 5
3 6
3 7
3 8
4 9
;
data test1;
set test;
if AAA ne Lag1(AAA) then BBB=BBB+1;
else BBB=BBB;
run;
data test2;
set test;
LagAAA = Lag1(AAA);
if AAA ne LagAAA then BBB=BBB+1;
else BBB=BBB;
run;
Art,
Notice the words that iiibbb used "result in certain conditional statements."
I suspect you are not being told the whole story. Most of us know what happens when LAG is executed conditionally.
DN, That is why I posted an earlier response to iiibbb, namely: Always a good lesson to learn but, in the code you had posted, you didn't have any conditional use of a lag statement. If you did in the actual code you ran then, yes, you could easily run into trouble.
Yes, I agree that using lag conditionally requires that one knows how the cue works, but I'm afraid that iiibbb will throw the baby out with the bath water for not understanding what is meant by conditional use.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.