Hello,
If this is my starting data:
data have;
infile datalines dsd dlm=',';
input person $ status date yymmdd10.;
format date yymmdd10.;
datalines;
a,.,2019-02-01
a,0,2019-03-01
a,1,2019-04-01
a,0,2019-05-01
a,1,2019-06-14
a,0,2019-12-31
b,.,2019-01-31
b,0,2019-03-01
b,1,2019-05-03
b,1,2019-05-04
c,0,2019-01-30
c,0,2019-01-31
c,1,2019-07-06
c,1,2019-07-07
d,1,2019-03-06
e,0,2019-06-01
f,.,2019-02-01
f,0,2019-03-01
f,1,2019-04-01
f,1,2019-05-01
f,0,2019-06-14
f,1,2019-11-30
f,1,2019-12-31
;
run;
How do I get the following results?
data want;
infile datalines dsd dlm=',';
input person $ status date yymmdd10.;
format date yymmdd10.;
datalines;
b,1,2019-05-03
c,1,2019-07-06
d,1,2019-03-06
f,1,2019-11-30
;
run;
Where the members returned have a last status of 1 and the date returned is the first date where status = 1 and there is no later status of 0 or null.
Person 'a' doesn't get returned because their last status is not 1.
Person 'b' gets returned with a date of 2019-05-03. Their last status = 1 and that is the earliest date of status 1 with no following statuses <> 1
Person 'c' gets returned with a date of 2019-07-06. Their last status = 1 and that is the earliest date of status 1 with no following statuses <> 1
Person 'd' gets returned with a date of 2019-06-06. Their only status = 1
Person 'e' doesn't gets returned because their only status <> = 1
Person 'f' gets returned with a date of 2019-11-30. Their last status = 1 and that is the earliest date of status 1 with no following statuses <> 1. In this example we wouldn't return the date of 2019-04-01 because there is a subsequent event with a status <> 1 on 2019-06-14.
How can I code to get the desired output?
Ah after all some fun question amid the corona thing. Thank you
data have;
infile datalines dsd dlm=',';
input person $ status date yymmdd10.;
format date yymmdd10.;
datalines;
a,.,2019-02-01
a,0,2019-03-01
a,1,2019-04-01
a,0,2019-05-01
a,1,2019-06-14
a,0,2019-12-31
b,.,2019-01-31
b,0,2019-03-01
b,1,2019-05-03
b,1,2019-05-04
c,0,2019-01-30
c,0,2019-01-31
c,1,2019-07-06
c,1,2019-07-07
d,1,2019-03-06
e,0,2019-06-01
f,.,2019-02-01
f,0,2019-03-01
f,1,2019-04-01
f,1,2019-05-01
f,0,2019-06-14
f,1,2019-11-30
f,1,2019-12-31
;
run;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H () ;
h.definekey ("person") ;
h.definedata ('person','status','date') ;
h.definedone () ;
end;
do until(last.person);
set have;
by person status notsorted;
if first.status and status then h.replace();
end;
if status=1 then do;
h.find();
output;
end;
h.clear();
run;
Though not for prod use, my colleagues at work challenged me to attempt the same in SQL, and for fun
data have;
infile datalines dsd dlm=',';
input person $ status date yymmdd10.;
format date yymmdd10.;
datalines;
a,.,2019-02-01
a,0,2019-03-01
a,1,2019-04-01
a,0,2019-05-01
a,1,2019-06-14
a,0,2019-12-31
b,.,2019-01-31
b,0,2019-03-01
b,1,2019-05-03
b,1,2019-05-04
c,0,2019-01-30
c,0,2019-01-31
c,1,2019-07-06
c,1,2019-07-07
d,1,2019-03-06
e,0,2019-06-01
f,.,2019-02-01
f,0,2019-03-01
f,1,2019-04-01
f,1,2019-05-01
f,0,2019-06-14
f,1,2019-11-30
f,1,2019-12-31
;
run;
proc sql;
create table want as
select a.*
from have(where=(status=1)) a ,
(select *,count(distinct status)=1 as _n_ from have group by person having date=max(date) and status=1) b
where a.person=b.person and a.date<b.date or a.person=b.person and _n_
group by a.person
having max(date)=date;
quit;
Well Our bank lives and relies on SQL, so here no matter whatever stuff contradicts SQL, they overrule in favor of SQL. God bless!
My version:
data want;
do until(last.person);
set have; by person;
if status ne 1 then call missing(d);
else if missing(d) then d = date;
end;
if status = 1 then output;
drop date; format d yymmdd10.; rename d=date;
run;
That's where the data step and the by statement really come into it's own:
data want;
set have (rename=(date=_date));
by person status notsorted;
retain date;
format date yymmdd10.;
if first.status and status = 1 then date = _date;
if last.person and status = 1;
drop _date;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.