Hi,
I am working on data which have several values and I have to compare them. I am trying to use Lag function but unable to get through. Below are the context of data:
MN | TN | EII | ND |
238552 | 40006248 | 1 | 1 |
316969 | 40006248 | 0 | 0 |
631788 | 40006248 | 1 | 0 |
660798 | 50098722 | 1 | 0 |
660007 | 50098722 | 0 | 0 |
661359 | 50098722 | 1 | 1 |
TN is the unique value and scenarios are:
1) we have to select TN with EII and ND as 1 and 1 only if TN with same number is followed by either EII and ND as 0 and 0 or EII and ND as 1 and 0.
2) No Duplicate entries for TN.
Please suggest
The trick when using LAG is to make sure it executes on every observation. For example, your DATA step might contain:
prior_ND = lag(ND);
prior_EII = lag(EII);
Then refer to prior_ND and prior_EII in your programming logic.
If that doesn't fix things for you, try posting the log from your DATA step so we can see what you are attempting and why it doesn't work.
You also might need to describe the problem further:
I have sorted the data for better understanding.
I am new in to SAS so it would be really helpful if you would provide the full data step.
Thanks you for prompt response.
You say there is a maximum of 2 observations per TN in your reply, but the sample data you posted contains 3 observations for a TN. At a minimum, you will need to post the output that you expect to see, based on the original data that you post.
If you really don't know enough SAS to begin writing this program, will the answer really be useful to you? Even if it works, how can you tell if it would work for a different set of data?
If the ND=1 and EII=1 observation is the second one for that TN, should it be compared to all other observations for that TN (including the first one)?
My apologies as MN is the field with unique value, it should be compared with rows having same MN. In data set it can be 2 or 3. I am in learning phase, so trying to grasp how different type of commands work in SAS.
"TN is the unique value"-- Are you sure? Your example has multiple TN transactions with TN having duplicate keys. And what is your expected output. Either your writing isn't clear or my comprehension is not.
@Sunnykapoor wrote:
Hi,
I am working on data which have several values and I have to compare them. I am trying to use Lag function but unable to get through. Below are the context of data:
MN TN EII ND 238552 40006248 1 1 316969 40006248 0 0 631788 40006248 1 0 660798 50098722 1 0 660007 50098722 0 0 661359 50098722 1 1
TN is the unique value and scenarios are:
1) we have to select TN with EII and ND as 1 and 1 only if TN with same number is followed by either EII and ND as 0 and 0 or EII and ND as 1 and 0.
2) No Duplicate entries for TN.
Please suggest
My apologies MN is the unique value. expected outcome for example is :
MN | TN | EII | ND |
238552 | 40006248 | 1 | 1 |
Assuming I understand your requirement:
data have;
input MN $ TN$ EII ND ;
cards;
238552 40006248 1 1
316969 40006248 0 0
631788 40006248 1 0
660798 50098722 1 0
660007 50098722 0 0
661359 50098722 1 1
;
data want;
array t(999) _temporary_;
call missing(of t(*));
do _n=1 by 1 until(last.tn);
set have;
by tn ;
_k=sum(EII,nd);
if _k=2 then _n1=_n;
else if _k=0 and _n-1=_n1 then t(_n)=_n1;
end;
do _n_=1 by 1 until(last.tn);
set have;
by tn;
if _n_ in t then output;
end;
drop _:;
run;
A small correction to the previous
Instead of else if _k=0 and _n-1=_n1 then t(_n)=_n1;
changed to
else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;
data have;
input MN $ TN$ EII ND ;
cards;
238552 40006248 1 1
316969 40006248 0 0
631788 40006248 1 0
660798 50098722 1 0
660007 50098722 0 0
661359 50098722 1 1
;
data want;
array t(999) _temporary_;
call missing(of t(*));
do _n=1 by 1 until(last.tn);
set have;
by tn ;
_k=sum(EII,nd);
if _k=2 then _n1=_n;
else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;
end;
do _n_=1 by 1 until(last.tn);
set have;
by tn;
if _n_ in t then output;
end;
drop _:;
run;
It is showing no result.
Here is my test:
LOG:
127 ;
128
129
130 data want;
131 array t(999) _temporary_;
132 call missing(of t(*));
133 do _n=1 by 1 until(last.tn);
134 set have;
135 by tn ;
136 _k=sum(EII,nd);
137 if _k=2 then _n1=_n;
138 else if _k in (0,1) and _n-1=_n1 then t(_n)=_n1;
139 end;
140 do _n_=1 by 1 until(last.tn);
141 set have;
142 by tn;
143 if _n_ in t then output;
144 end;
145 drop _:;
146 run;
NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: There were 6 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
PRINT RESULTS:
SAS Output
The SAS System |
MN | TN | EII | ND |
---|---|---|---|
238552 | 40006248 | 1 | 1 |
Having the expected outcome is a big help. Here's what I recommend:
proc sort data=have;
by TN;
run;
data want;
do until (last.tn);
set have;
by tn;
if EII=0 and ND=0 then both_zero='Y';
else if EII=1 and ND=0 then eii_only = 'Y';
end;
do until (last.tn);
set have;
by tn;
if EII = 1 and ND=1 and (eii_only='Y' or both_zero='Y') then output;
end;
drop both_zero eii_only;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.