BookmarkSubscribeRSS Feed
Sunnykapoor
Calcite | Level 5

Hi,

 

I am working on data which have several values and I have to compare them. I am trying to use Lag function but unable to get through. Below are the context of data:

 

MNTNEIIND
2385524000624811
3169694000624800
6317884000624810
6607985009872210
6600075009872200
6613595009872211

 

TN is the unique value and scenarios are:

1) we have to select TN with EII and ND as 1 and 1 only if TN with same number is followed by either EII and ND as 0 and 0 or EII and ND as 1 and 0.

2) No Duplicate entries for TN.

Please suggest

11 REPLIES 11
Astounding
PROC Star

The trick when using LAG is to make sure it executes on every observation.  For example, your DATA step might contain:

 

prior_ND = lag(ND);

prior_EII = lag(EII);

 

Then refer to prior_ND and prior_EII in your programming logic.

 

If that doesn't fix things for you, try posting the log from your DATA step so we can see what you are attempting and why it doesn't work.

 

You also might need to describe the problem further:

 

  • Does "follow" mean on the very next observation, or on any subsequent observation?
  • Could the observation with ND=1 and EII=1 be the second (or later) observation for a TN?
  • How many observations might there be (maximum) for a TN?
Sunnykapoor
Calcite | Level 5

I have sorted the data for better understanding.

  • Does "follow" mean on the very next observation, or on any subsequent observation?                      subsequent
  • Could the observation with ND=1 and EII=1 be the second (or later) observation for a TN?               Any place for observation
  • How many observations might there be (maximum) for a TN?                                                              2 (ND/EII=0/0 or ND/EII=1/0)

I am new in to SAS so it would be really helpful if you would provide the full data step.

 

Thanks you for prompt response.

Astounding
PROC Star

You say there is a maximum of 2 observations per TN in your reply, but the sample data you posted contains 3 observations for a TN. At a minimum, you will need to post the output that you expect to see, based on the original data that you post.

 

If you really don't know enough SAS to begin writing this program, will the answer really be useful to you?  Even if it works, how can you tell if it would work for a different set of data?

 

If the ND=1 and EII=1 observation is the second one for that TN, should it be compared to all other observations for that TN (including the first one)?

Sunnykapoor
Calcite | Level 5

My apologies as MN is the field with unique value, it should be compared with rows having same MN. In data set it can be 2 or 3. I am in learning phase, so trying to grasp how different type of commands work in SAS.

Allaluiah
Quartz | Level 8

"TN is the unique value"-- Are you sure? Your example has multiple TN transactions with TN having duplicate keys.  And what is your expected output. Either your writing isn't clear or my comprehension is not.

 

 


@Sunnykapoor wrote:

Hi,

 

I am working on data which have several values and I have to compare them. I am trying to use Lag function but unable to get through. Below are the context of data:

 

MN TN EII ND
238552 40006248 1 1
316969 40006248 0 0
631788 40006248 1 0
660798 50098722 1 0
660007 50098722 0 0
661359 50098722 1 1

 

TN is the unique value and scenarios are:

1) we have to select TN with EII and ND as 1 and 1 only if TN with same number is followed by either EII and ND as 0 and 0 or EII and ND as 1 and 0.

2) No Duplicate entries for TN.

Please suggest


 

Sunnykapoor
Calcite | Level 5

My apologies MN is the unique value. expected outcome for example is :

MNTNEIIND
2385524000624811
novinosrin
Tourmaline | Level 20

Assuming I understand your requirement:

 


data have;
input MN $	TN$ 	EII	ND ;
cards;
238552	40006248	1	1
316969	40006248	0	0
631788	40006248	1	0
660798	50098722	1	0
660007	50098722	0	0
661359	50098722	1	1
;

data want;
array t(999) _temporary_; 
call missing(of t(*));
do _n=1 by 1 until(last.tn);
set have;
by tn ;
_k=sum(EII,nd);
if _k=2 then _n1=_n;
else if _k=0 and _n-1=_n1 then t(_n)=_n1;
end;
do _n_=1 by 1 until(last.tn);
set have;
by tn;
if _n_ in t then output;
end;
drop _:;
run;


novinosrin
Tourmaline | Level 20

A small correction to the previous

Instead of else if _k=0 and _n-1=_n1 then t(_n)=_n1;

changed to

else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;

 

 

 

data have;
input MN $	TN$ 	EII	ND ;
cards;
238552	40006248	1	1
316969	40006248	0	0
631788	40006248	1	0
660798	50098722	1	0
660007	50098722	0	0
661359	50098722	1	1
;

data want;
array t(999) _temporary_; 
call missing(of t(*));
do _n=1 by 1 until(last.tn);
set have;
by tn ;
_k=sum(EII,nd);
if _k=2 then _n1=_n;
else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;
end;
do _n_=1 by 1 until(last.tn);
set have;
by tn;
if _n_ in t then output;
end;
drop _:;
run;

 

Sunnykapoor
Calcite | Level 5

It is showing no result.

novinosrin
Tourmaline | Level 20

Here is my test:

 

LOG:

127 ;
128
129
130 data want;
131 array t(999) _temporary_;
132 call missing(of t(*));
133 do _n=1 by 1 until(last.tn);
134 set have;
135 by tn ;
136 _k=sum(EII,nd);
137 if _k=2 then _n1=_n;
138 else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;
139 end;
140 do _n_=1 by 1 until(last.tn);
141 set have;
142 by tn;
143 if _n_ in t then output;
144 end;
145 drop _:;
146 run;

NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

 

PRINT RESULTS:

 

SAS Output

The SAS System

MN TN EII ND
238552 40006248 1 1
Astounding
PROC Star

Having the expected outcome is a big help.  Here's what I recommend:

 

proc sort data=have;

by TN;

run;

 

data want;

do until (last.tn);

   set have;

   by tn;

   if EII=0 and ND=0 then both_zero='Y';

   else if EII=1 and ND=0 then eii_only = 'Y';

end;

do until (last.tn);

   set have;

   by tn;

   if EII = 1 and ND=1 and (eii_only='Y' or both_zero='Y') then output;

end;

drop both_zero eii_only;

run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1906 views
  • 0 likes
  • 4 in conversation