Hi
I am trying to get the non missing min date record if first.subjid date is missing: Can gurus help.
Thanks
Data have
subjid AETERM AESTDT
487 | DIARRHEA | |
487 | DIARRHEA | 15AUG2012 |
487 | DIARRHEA | 26AUG2012 |
496 | DIARRHEA | 02NOV2010 |
516 | DIARRHEA | |
516 | DIARRHEA | 19APR2012 |
517 | DIARRHEA | 15MAR2013 |
552 | DIARRHEA | 15APR2011 |
612 | DIARRHEA | 15DEC2012 |
642 | DIARRHEA | 04JUL2011 |
642 | DIARRHEA | 07JAN2013 |
643 | DIARRHEA | 02MAR2011 |
645 | DIARRHEA | |
645 | DIARRHEA | |
645 | DIARRHEA | 17JUN2011 |
721 | DIARRHEA | 24NOV2011 |
724 | DIARRHEA | 15DEC2012 |
DATA NEEDED:
SUBJID AETERM AESTDT
487 | DIARRHEA | 15-Aug-12 |
496 | DIARRHEA | 2-Nov-10 |
516 | DIARRHEA | 19-Apr-12 |
517 | DIARRHEA | 15-Mar-13 |
552 | DIARRHEA | 15-Apr-11 |
612 | DIARRHEA | 15-Dec-12 |
642 | DIARRHEA | 4-Jul-11 |
643 | DIARRHEA | 2-Mar-11 |
645 | DIARRHEA | 17-Jun-11 |
721 | DIARRHEA | 24-Nov-11 |
724 | DIARRHEA | 15-Dec-12 |
I think the simplest way is to use proc summary. I have 3 proposals:
1.) proc summary data = have nway; class subjid; var aestdt; output out = want1 min =; run; * AETERM lost, 1 record per patient;
2.) proc summary data = have nway; class subjid; id AETERM; var aestdt; output out = want2 min =; run; * maximum AETERM (latest in alphabet)kept, 1 record per patient;
3.) proc summary data = have nway; class subjid AETERM; var aestdt; output out = want3 min =; run; * 1 record per patient and AETERM;
you may want to remove additional variables that proc summary creates, then use ... out = want1 (drop = _:) ... "_:" stands for all variables beginning with "_" which here is _type_ and _freq_
data test;
input subjid AETERM $ AESTDT date9.;
datalines;
subjid AETERM AESTDT
487 DIARRHEA
487 DIARRHEA 15AUG2012
487 DIARRHEA 26AUG2012
496 DIARRHEA 02NOV2010
516 DIARRHEA
516 DIARRHEA 19APR2012
517 DIARRHEA 15MAR2013
552 DIARRHEA 15APR2011
612 DIARRHEA 15DEC2012
642 DIARRHEA 04JUL2011
642 DIARRHEA 07JAN2013
643 DIARRHEA 02MAR2011
645 DIARRHEA
645 DIARRHEA
645 DIARRHEA 17JUN2011
721 DIARRHEA 24NOV2011
724 DIARRHEA 15DEC2012
;
run;
proc sql;
create table test1 as
select *
from test
where aestdt<>.;
quit;
This should work.
Delete all obs that have a missing date. Then:
proc sort data=a; by subjid aestdt; run;
data b;
set a;
by subjid;
if first.subjid;
run;
This will give you the first non-missing date per subject.
Thanks blakpebble;
i need to keep ( first.subjid and last.subjid) even if aestdt is is missing (ex: plz see subjid: 517 below)
subjid AETERM AESTDT
487 | DIARRHEA | |
487 | DIARRHEA | 15AUG2012 |
487 | DIARRHEA | 26AUG2012 |
496 | DIARRHEA | 02NOV2010 |
516 | DIARRHEA | |
516 | DIARRHEA | 19APR2012 |
517 | DIARRHEA | |
552 | DIARRHEA | 15APR2011 |
612 | DIARRHEA | 15DEC2012 |
642 | DIARRHEA | 04JUL2011 |
642 | DIARRHEA | 07JAN2013 |
643 | DIARRHEA | 02MAR2011 |
645 | DIARRHEA | |
645 | DIARRHEA | |
645 | DIARRHEA | 17JUN2011 |
721 | DIARRHEA | 24NOV2011 |
724 | DIARRHEA | 15DEC2012 |
DATA NEEDED:
SUBJID AETERM AESTDT
487 | DIARRHEA | 15-Aug-12 |
496 | DIARRHEA | 2-Nov-10 |
516 | DIARRHEA | 19-Apr-12 |
517 | DIARRHEA | |
552 | DIARRHEA | 15-Apr-11 |
612 | DIARRHEA | 15-Dec-12 |
642 | DIARRHEA | 4-Jul-11 |
643 | DIARRHEA | 2-Mar-11 |
645 | DIARRHEA | 17-Jun-11 |
721 | DIARRHEA | 24-Nov-11 |
724 | DIARRHEA | 15-Dec-12 |
proc sort data=test; by subjid aestdt; run;
data test1;
retain t;
set test;
by subjid;
if t=1 and aestdt ^="" then do; output; t=0; end;
else
if first.subjid and last.subjid then output;
else
if first.subjid and aestdt ^="" then output;
else t=1;
format aestdt date9.;
run;
try it
It worked partially, but I do see multiple records per subect as shown in the snapshot
I think It should to have do . please send raw data file and desire output
I think the simplest way is to use proc summary. I have 3 proposals:
1.) proc summary data = have nway; class subjid; var aestdt; output out = want1 min =; run; * AETERM lost, 1 record per patient;
2.) proc summary data = have nway; class subjid; id AETERM; var aestdt; output out = want2 min =; run; * maximum AETERM (latest in alphabet)kept, 1 record per patient;
3.) proc summary data = have nway; class subjid AETERM; var aestdt; output out = want3 min =; run; * 1 record per patient and AETERM;
you may want to remove additional variables that proc summary creates, then use ... out = want1 (drop = _:) ... "_:" stands for all variables beginning with "_" which here is _type_ and _freq_
Hi HWSteinberg,
Great, it worked! Thanks for your time....
Best
This is too good
Please try the below code
proc sort data=have;
by subjid AETERM aestdt;
run;
data want;
set have;
by subjid AETERM aestdt;
if first.subjid=0 and last.subjid=0 and aestdt=. then delete;
run;
This will delete the records which have aestdt=. and it is not first.subjid and last.subjid.
Thanks,
Jag
Hi Jag,
this will delete also a patient completely which has more than 1 record but all with missing aestdt.
using a flag variable t default 0
if t=1 and date is not missing then output and set t=0(default)
else if have single record then output
else if first record and date have non missing value then output
other wise set t=1
those records would be in output which are first nonmissing value of date
there are two case
1. A single record then output same
2. if more same dup record
nonmissing of date then output otherwise goto next record
goto next record repeated until not found .
proc sql;
create table x as
select subjid, aeterm, min(aestdt) as mindate format=date9. from test
group by subjid
having aestdt = min(aestdt);
quit;
proc print;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.