Calcite | Level 5

## Lag function

Hi,

I am working on data which have several values and I have to compare them. I am trying to use Lag function but unable to get through. Below are the context of data:

 MN TN EII ND 238552 40006248 1 1 316969 40006248 0 0 631788 40006248 1 0 660798 50098722 1 0 660007 50098722 0 0 661359 50098722 1 1

TN is the unique value and scenarios are:

1) we have to select TN with EII and ND as 1 and 1 only if TN with same number is followed by either EII and ND as 0 and 0 or EII and ND as 1 and 0.

2) No Duplicate entries for TN.

11 REPLIES 11
PROC Star

## Re: Lag function

The trick when using LAG is to make sure it executes on every observation.  For example, your DATA step might contain:

prior_ND = lag(ND);

prior_EII = lag(EII);

Then refer to prior_ND and prior_EII in your programming logic.

If that doesn't fix things for you, try posting the log from your DATA step so we can see what you are attempting and why it doesn't work.

You also might need to describe the problem further:

• Does "follow" mean on the very next observation, or on any subsequent observation?
• Could the observation with ND=1 and EII=1 be the second (or later) observation for a TN?
• How many observations might there be (maximum) for a TN?
Calcite | Level 5

## Re: Lag function

I have sorted the data for better understanding.

• Does "follow" mean on the very next observation, or on any subsequent observation?                      subsequent
• Could the observation with ND=1 and EII=1 be the second (or later) observation for a TN?               Any place for observation
• How many observations might there be (maximum) for a TN?                                                              2 (ND/EII=0/0 or ND/EII=1/0)

I am new in to SAS so it would be really helpful if you would provide the full data step.

Thanks you for prompt response.

PROC Star

## Re: Lag function

You say there is a maximum of 2 observations per TN in your reply, but the sample data you posted contains 3 observations for a TN. At a minimum, you will need to post the output that you expect to see, based on the original data that you post.

If you really don't know enough SAS to begin writing this program, will the answer really be useful to you?  Even if it works, how can you tell if it would work for a different set of data?

If the ND=1 and EII=1 observation is the second one for that TN, should it be compared to all other observations for that TN (including the first one)?

Calcite | Level 5

## Re: Lag function

My apologies as MN is the field with unique value, it should be compared with rows having same MN. In data set it can be 2 or 3. I am in learning phase, so trying to grasp how different type of commands work in SAS.

Quartz | Level 8

## Re: Lag function

"TN is the unique value"-- Are you sure? Your example has multiple TN transactions with TN having duplicate keys.  And what is your expected output. Either your writing isn't clear or my comprehension is not.

@Sunnykapoor wrote:

Hi,

I am working on data which have several values and I have to compare them. I am trying to use Lag function but unable to get through. Below are the context of data:

 MN TN EII ND 238552 40006248 1 1 316969 40006248 0 0 631788 40006248 1 0 660798 50098722 1 0 660007 50098722 0 0 661359 50098722 1 1

TN is the unique value and scenarios are:

1) we have to select TN with EII and ND as 1 and 1 only if TN with same number is followed by either EII and ND as 0 and 0 or EII and ND as 1 and 0.

2) No Duplicate entries for TN.

Calcite | Level 5

## Re: Lag function

My apologies MN is the unique value. expected outcome for example is :

 MN TN EII ND 238552 40006248 1 1
Tourmaline | Level 20

## Re: Lag function

``````
data have;
input MN \$	TN\$ 	EII	ND ;
cards;
238552	40006248	1	1
316969	40006248	0	0
631788	40006248	1	0
660798	50098722	1	0
660007	50098722	0	0
661359	50098722	1	1
;

data want;
array t(999) _temporary_;
call missing(of t(*));
do _n=1 by 1 until(last.tn);
set have;
by tn ;
_k=sum(EII,nd);
if _k=2 then _n1=_n;
else if _k=0 and _n-1=_n1 then t(_n)=_n1;
end;
do _n_=1 by 1 until(last.tn);
set have;
by tn;
if _n_ in t then output;
end;
drop _:;
run;

``````
Tourmaline | Level 20

## Re: Lag function

A small correction to the previous

Instead of else if _k=0 and _n-1=_n1 then t(_n)=_n1;

changed to

else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;

``````data have;
input MN \$	TN\$ 	EII	ND ;
cards;
238552	40006248	1	1
316969	40006248	0	0
631788	40006248	1	0
660798	50098722	1	0
660007	50098722	0	0
661359	50098722	1	1
;

data want;
array t(999) _temporary_;
call missing(of t(*));
do _n=1 by 1 until(last.tn);
set have;
by tn ;
_k=sum(EII,nd);
if _k=2 then _n1=_n;
else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;
end;
do _n_=1 by 1 until(last.tn);
set have;
by tn;
if _n_ in t then output;
end;
drop _:;
run;``````

Calcite | Level 5

## Re: Lag function

It is showing no result.

Tourmaline | Level 20

## Re: Lag function

Here is my test:

LOG:

127 ;
128
129
130 data want;
131 array t(999) _temporary_;
132 call missing(of t(*));
133 do _n=1 by 1 until(last.tn);
134 set have;
135 by tn ;
136 _k=sum(EII,nd);
137 if _k=2 then _n1=_n;
138 else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;
139 end;
140 do _n_=1 by 1 until(last.tn);
141 set have;
142 by tn;
143 if _n_ in t then output;
144 end;
145 drop _:;
146 run;

NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

PRINT RESULTS:

SAS Output

 The SAS System

MN TN EII ND
238552 40006248 1 1
PROC Star

## Re: Lag function

Having the expected outcome is a big help.  Here's what I recommend:

proc sort data=have;

by TN;

run;

data want;

do until (last.tn);

set have;

by tn;

if EII=0 and ND=0 then both_zero='Y';

else if EII=1 and ND=0 then eii_only = 'Y';

end;

do until (last.tn);

set have;

by tn;

if EII = 1 and ND=1 and (eii_only='Y' or both_zero='Y') then output;

end;

drop both_zero eii_only;

run;

Discussion stats
• 11 replies
• 1156 views
• 0 likes
• 4 in conversation