Hi SAS Forum,
I have this dataset (readily readable).
data disaster;
informat date date9.;
input country Eco_zone $ 5-7 date Disaster $22-29;
format date date9.;
datalines;
10 111 30SEP2010 Break
10 111 31OCT2010 Pandemic
10 111 30NOV2010 Break
10 111 30JUN2011 Pandemic
10 111 01JAN2012 Tonardo
40 333 30NOV2010 Break
40 333 31DEC2010 Break
40 333 30NOV2010 Flood
100 111 30APR2011 Tonardo
100 111 31MAY2011 Flood
100 111 31JUL2011 Break
100 111 31AUG2011 Pandemic
80 111 31DEC2011 Flood
40 666 03JAN2012 Break
40 999 03JAN2012 Pandemic
90 111 31DEC2011 Break
70 111 31OCT2010 Break
70 111 30NOV2010 Break
70 111 30JUN2011 Break
;
run;
Below are the varaibles that consititute a unique record.
country Eco_zone date
Q:
I need to identify follwing records:
1. First “Break” incident in a given country
2. First consecutive “Break” followed by “Pandemic” pair
Answer:
Blue highlighted shueld be the correct records.
1 | 30SEP2010 | 10 | 111 | Break |
2 | 31OCT2010 | 10 | 111 | Pandemic |
3 | 30NOV2010 | 10 | 111 | Break |
4 | 30JUN2011 | 10 | 111 | Pandemic |
5 | 01JAN2012 | 10 | 111 | Tonardo |
6 | 30NOV2010 | 40 | 333 | Break |
7 | 30NOV2010 | 40 | 333 | Flood |
8 | 31DEC2010 | 40 | 333 | Break |
9 | 03JAN2012 | 40 | 666 | Break |
10 | 03JAN2012 | 40 | 999 | Pandemic |
11 | 31OCT2010 | 70 | 111 | Break |
12 | 30NOV2010 | 70 | 111 | Break |
13 | 30JUN2011 | 70 | 111 | Break |
14 | 31DEC2011 | 80 | 111 | Flood |
15 | 31DEC2011 | 90 | 111 | Break |
16 | 30APR2011 | 100 | 111 | Tonardo |
17 | 31MAY2011 | 100 | 111 | Flood |
18 | 31JUL2011 | 100 | 111 | Break |
19 | 31AUG2011 | 100 | 111 | Pandemic |
My following (poor) code identify only one record correctly.
Could you help me to identify other qualified records too.
proc sort data=disaster out=one;
by country Eco_zone date;
run;
proc print;run;
data two;
set one;
by country Eco_zone date;
if (first.country and last.country) and disaster = "Break" then want_flag =1;
run;
Thank a lot.
Mirisa
"I have this dataset (readily readable)." You have no idea how much I appreciate this! Sometimes I spend more time to get the data in than working on the question itself.
data disaster;
informat date date9.;
input country Eco_zone $ 5-7 date Disaster $22-29;
format date date9.;
datalines;
10 111 30SEP2010 Break
10 111 31OCT2010 Pandemic
10 111 30NOV2010 Break
10 111 30JUN2011 Pandemic
10 111 01JAN2012 Tonardo
40 333 30NOV2010 Break
40 333 31DEC2010 Break
40 333 30NOV2010 Flood
100 111 30APR2011 Tonardo
100 111 31MAY2011 Flood
100 111 31JUL2011 Break
100 111 31AUG2011 Pandemic
80 111 31DEC2011 Flood
40 666 03JAN2012 Break
40 999 03JAN2012 Pandemic
90 111 31DEC2011 Break
70 111 31OCT2010 Break
70 111 30NOV2010 Break
70 111 30JUN2011 Break
;
run;
proc sort data=disaster out=have;
by country Eco_zone date;
run;
data want;
set have nobs= nobs;
by country;
if _n_<nobs then
set have(firstobs=2 keep= disaster rename=disaster=_dis);
if first.country then
call missing(_f, _f1);
if last.country then
call missing(_dis);
if disaster='Break' and _dis='Pandemic' then
_f+1;
if disaster='Break' then
_f1+1;
if (_f1=1 and disaster='Break') or (_f=1 and (disaster='Break' and _dis='Pandemic' or disaster='Pandemic' and lag(disaster)='Break')) then
flag=1;
drop _:;
run;
Regards,
Haikuo
updated after reading more of your question.
Hi Hai.kuo,
Great! Thank you very much.
You may have no idea how much I appreciate this, I have been trying this several days during my commute to and from the workplace.
Thanks again and I have a lot to study in this code.
Regards
Mirisa
P.S. I tried several times to click the "green" button but it did not appear in my computer.
data disaster; informat date date9.; input country Eco_zone $ 5-7 date Disaster $22-29; format date date9.; datalines; 10 111 30SEP2010 Break 10 111 31OCT2010 Pandemic 10 111 30NOV2010 Break 10 111 30JUN2011 Pandemic 10 111 01JAN2012 Tonardo 40 333 30NOV2010 Break 40 333 31DEC2010 Break 40 333 30NOV2010 Flood 100 111 30APR2011 Tonardo 100 111 31MAY2011 Flood 100 111 31JUL2011 Break 100 111 31AUG2011 Pandemic 80 111 31DEC2011 Flood 40 666 03JAN2012 Break 40 999 03JAN2012 Pandemic 90 111 31DEC2011 Break 70 111 31OCT2010 Break 70 111 30NOV2010 Break 70 111 30JUN2011 Break ; run; proc sort data=disaster ; by country Eco_zone date;run; data want(drop=foundP foundB); set disaster; by country Eco_zone; retain foundB foundP 0; if first.Eco_zone then do; foundB=0 ;foundP=0; end; if (not foundB and disaster='Break') then do;output;foundB=1;end; if (not foundP and disaster='Pandemic' and lag(disaster)='Break' and lag(Eco_zone)=Eco_zone and lag(country)=country ) /* have two group variables*/ then do; output;foundP=1; end; run;
Xia Keshan
消息编辑者为:xia keshan
Hi Ksharp,
Your code worked very well.
Thank you very much.
I am experimenting to understand how do these magical logics do the job.
Thanks again!
Mirisa
One more thing need to remind you . if data like :
10 111 30SEP2010 Break
10 111 31OCT2010 Break
10 111 30NOV2010 Pandemic
10 111 30JUN2011 Pandemic
10 111 01JAN2012 Tonardo
my code will get
10 111 30SEP2010 Break
10 111 31OCT2010 Break
10 111 30NOV2010 Pandemic
10 111 30JUN2011 Pandemic
10 111 01JAN2012 Tonardo
Thanks Ksharp for this additional note.
Regards
Mirisa
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.