I have a dataset in the following format with millions of observation.
data test;
infile datalines;
input id_acc acc_no status info_date;
datalines;
111 9586 aaa 30jun2007
111 9586 aaa 31jul2007
111 9586 d1a 31aug2007
111 9999 lep 30sep2007
111 9586 lep 30sep2007
111 9999 lep 31Oct2007
111 9586 lep 31Oct2007
111 9999 lep 30Nov2007
111 9586 lep 30Nov2007
112 9587 d1a 30jun2007
112 9587 aaa 31jul2007
112 9587 d2a 31aug2007
112 9587 d1d 30sep2007
112 9991 lep 31Oct2007
112 9587 lep 31Oct2007
112 9991 lep 30Nov2007
112 9587 lep 30Nov2007
112 9991 lep 31Dec2007
112 9587 lep 31Dec2007
I want the output as mentioned below.
Only i want to keep the first two info_date when the card goes into lep and rest only one dates of lep.
output:
111 9586 aaa 30jun2007
111 9586 aaa 31jul2007
111 9586 d1a 31aug2007
111 9999 lep 30sep2007
111 9586 lep 30sep2007
111 9999 lep 31Oct2007
111 9999 lep 30Nov2007
112 9587 d1a 30jun2007
112 9587 aaa 31jul2007
112 9587 d2a 31aug2007
112 9587 d1d 30sep2007
112 9991 lep 31Oct2007
112 9587 lep 31Oct2007
112 9991 lep 30Nov2007
112 9991 lep 31Dec2007
Your output does not match your description. You keep much more dates than the two "lep" dates.
Hi It is the same output:
111 9586 aaa 30jun2007
111 9586 aaa 31jul2007
111 9586 d1a 31aug2007
111 9999 lep 30sep2007
111 9586 lep 30sep2007
111 9999 lep 31Oct2007
111 9586 lep 31Oct2007-delete this row
111 9999 lep 30Nov2007
111 9586 lep 30Nov2007-delete this row
112 9587 d1a 30jun2007
112 9587 aaa 31jul2007
112 9587 d2a 31aug2007
112 9587 d1d 30sep2007
112 9991 lep 31Oct2007
112 9587 lep 31Oct2007
112 9991 lep 30Nov2007
112 9587 lep 30Nov2007-delete this row
112 9991 lep 31Dec2007
112 9587 lep 31Dec2007-delete this row...
So i want to delete these particular row and keep first occurance of dates where it goes into lep two time and rest only one time.
output:
111 9586 aaa 30jun2007
111 9586 aaa 31jul2007
111 9586 d1a 31aug2007
111 9999 lep 30sep2007
111 9586 lep 30sep2007
111 9999 lep 31Oct2007
111 9999 lep 30Nov2007
112 9587 d1a 30jun2007
112 9587 aaa 31jul2007
112 9587 d2a 31aug2007
112 9587 d1d 30sep2007
112 9991 lep 31Oct2007
112 9587 lep 31Oct2007
112 9991 lep 30Nov2007
112 9991 lep 31Dec2007
So, for every id_acc, you want to eliminate multiple dates after the first two "lep" states have been encountered?
I also guess that once an id_acc has entered "lep" state, it does not revert back?
Assuming I understand what you are asking for .
data have; input a b c $ d : $20.; cards; 111 9586 aaa 30jun2007 111 9586 aaa 31jul2007 111 9586 d1a 31aug2007 111 9999 lep 30sep2007 111 9586 lep 30sep2007 111 9999 lep 31Oct2007 111 9586 lep 31Oct2007 111 9999 lep 30Nov2007 111 9586 lep 30Nov2007 112 9587 d1a 30jun2007 112 9587 aaa 31jul2007 112 9587 d2a 31aug2007 112 9587 d1d 30sep2007 112 9991 lep 31Oct2007 112 9587 lep 31Oct2007 112 9991 lep 30Nov2007 112 9587 lep 30Nov2007 112 9991 lep 31Dec2007 112 9587 lep 31Dec2007 ; run; data want; set have; by a d notsorted; if first.a then lep=0; if first.d and c='lep' then lep+1; if first.d then n=0; n+1; if lep ne 1 and n ne 1 then delete; run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.