Hi All,
i Have a dataset with some dates and events occurred and sample collected. But few have missing. If any record is missing sample date or event date available i need to consider that records
data have;
pid reason eventdt samdt
100 xxxy 23aug2011 14agu2011
100 xxxy 19dec2011 .
110 yyyy 31jul2010 .
120 zzzzz 14jun2010
120 zzzzz 27oct2010 28oct2010
120 zzzzz 02jan2011 17feb2011
data want;
pid reason eventdt samdt flag
100 xxxy 23aug2011 14agu2011 1
100 xxxy 19dec2011 14ug2011 1
110 yyyy 31jul2010 . 0
120 zzzzz 14jun2010 28oct2010 1
120 zzzzz 27oct2010 28oct2010 1
120 zzzzz 02jan2011 17feb2011 1
I need to create a flag variable and sampdt is carry forward from previous available samdt
Thanks
Sam
data want;
set have;
if samdt=. then flag=0
else flag=1;
run;
Hi Sam,
Please try the below code
data have;
infile datalines missover;
input pid reason $ eventdt : date9. samdt : date9.;
format eventdt samdt date9.;
datalines;
100 xxxy 23aug2011 14aug2011
100 xxxy 19dec2011
110 yyyy 31jul2010
120 zzzzz 14jun2010
120 zzzzz 27oct2010 28oct2010
120 zzzzz 02jan2011 17feb2011
;
run;
proc sort data=have;
by pid;
run;
data want(keep=pid reason samdt flag);
set have(where=(samdt ne .));
retain new;
by pid reason;
flag=1;
if first.reason then output want;
run;
data want_;
merge have(in=a) want(in=b rename=(samdt=new_date));
by pid reason;
if a;
if flag=. then flag=0;
run;
Obs | pid | reason | eventdt | samdt | new_date | flag |
1 | 100 | xxxy | 23AUG2011 | 14AUG2011 | 14AUG2011 | 1 | |
2 | 100 | xxxy | 19DEC2011 | . | 14AUG2011 | 1 | |
3 | 110 | yyyy | 31JUL2010 | . | . | 0 | |
4 | 120 | zzzzz | 14JUN2010 | . | 28OCT2010 | 1 | |
5 | 120 | zzzzz | 27OCT2010 | 28OCT2010 | 28OCT2010 | 1 | |
6 | 120 | zzzzz | 02JAN2011 | 17FEB2011 | 28OCT2010 | 1 |
Thanks,
Jagadish
Hi,
Please use this below optimized code :-
data have;
infile datalines missover;
input pid reason $ eventdt : date9. samdt : date9.;
format eventdt samdt date9.;
datalines;
100 xxxy 23aug2011 14aug2011
100 xxxy 19dec2011
110 yyyy 31jul2010
120 zzzzz 14jun2010
120 zzzzz 27oct2010 28oct2010
120 zzzzz 02jan2011 17feb2011
;
run;
data want;
set have;
format samdt_old date9.;
retain samdt_old;
if samdt ne . then do;
samdt_old=samdt;
flag=1;
end;
else flag=0;
run;
proc print data=want;
run;
Output of the above code:-
==============================================
Obs pid reason eventdt samdt samdt_old flag
1 100 xxxy 23AUG2011 14AUG2011 14AUG2011 1
2 100 xxxy 19DEC2011 . 14AUG2011 0
3 110 yyyy 31JUL2010 . 14AUG2011 0
4 120 zzzzz 14JUN2010 . 14AUG2011 0
5 120 zzzzz 27OCT2010 28OCT2010 28OCT2010 1
6 120 zzzzz 02JAN2011 17FEB2011 17FEB2011 1
===============================================
/Daman
Hi All,
Thanks For the Help!!!. But it not solved my issue
Jagadish: You retain the new varaible but you never used it?
Daman: You o/p is not what i am looking for. You are retaining the last available date to all the Pids. Any way thanks for the help
Thanks
Sam
Hi Sam,
Thank you for the correction, yes i was suppose to remove it but i left it.
Please don't consider the retain statement in my code. Apart from that the code will give the output you desire.
Thanks,
Jagadish
if your data is sorted as you need it..
data want(drop=SamDt_Prev);
set have;
retain SamDt_Prev;
flag=0;
if samdt is missing then do;
samdt = SamDt_Prev;
flag=1;
end;
SamDt_Prev=samdt;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.